Browse Source

更新 'README-en.md'

master
JinWang 4 years ago
parent
commit
0866c7818b
1 changed files with 3 additions and 3 deletions
  1. +3
    -3
      README-en.md

+ 3
- 3
README-en.md View File

@@ -49,13 +49,13 @@ The text is completely repeated within and between different web pages. Based on

## 1. Language filtering, rule cleaning, sensitive word filtering
```
python cc_ cleaner.py
python cc_cleaner.py
```
## 2. Fasttext garbage classification
```
python Filter_ run.py
python Filter_run.py
```
## 3. Weight removal
```
python dedup_ simhash.py
python dedup_simhash.py
```

Loading…
Cancel
Save