Browse Source

更新 'README-en.md'

master
JinWang 4 years ago
parent
commit
0866c7818b
1 changed files with 3 additions and 3 deletions
  1. +3
    -3
      README-en.md

+ 3
- 3
README-en.md View File

@@ -49,13 +49,13 @@ The text is completely repeated within and between different web pages. Based on


## 1. Language filtering, rule cleaning, sensitive word filtering ## 1. Language filtering, rule cleaning, sensitive word filtering
``` ```
python cc_ cleaner.py
python cc_cleaner.py
``` ```
## 2. Fasttext garbage classification ## 2. Fasttext garbage classification
``` ```
python Filter_ run.py
python Filter_run.py
``` ```
## 3. Weight removal ## 3. Weight removal
``` ```
python dedup_ simhash.py
python dedup_simhash.py
``` ```

Loading…
Cancel
Save