基于BERT的中文新闻分类例子

This example use chinese news dataset from here to fine tune the bert pretrained model for classification, and save the fine-tuned model to test the result through a rest api deployed by flask. Also the basic thinking come from this blog BERT Fine-Tuning Tutorial with PyTorch. Pytorch and tensorflow are both used in this work, especially a library named pytorch-pretrained-bert which help to use pretrained model like BERT, GPT, GPT2 to downstream tasks.

BERT is a popular pretrained model From Google. Here is some great post for recommend:

-illustrated-transformer

-Dissecting BERT Part 1: Understanding the Transformer

-BERT Word Embeddings Tutorial

Getting Started

1. Download the Pre-trained BERT Model

Download the BERT-Base, Chinese model and unzip the file

2. Download the dataset followed by the command below and unzip to data dir

wget https://github.com/fate233/toutiao-text-classfication-dataset/blob/master/toutiao_cat_data.txt.zip

3. Prepare the virtual python environment and install the package in requirements.txt

4. Run the command below to fine tune for classification

python bert_for_classification.py --output_dir your/outout/dir --data_dir toutiao/dataset/dir --data_name toutiao_cat_data.txt --is_add_key_words True

4. Set the output file position above to api file, and run the command below to start the flask service

Line 9: model = torch.load('output')

python classification-api.py

5. Curl the rest api to test

curl -X POST http://xx.xx.xx.xx:8000/predict -H 'Content-Type: application/json' -d '{ "text":"珍惜当下 局部新一轮升浪悄然开启" ,"label":"财经"}' |jq

    {"Predict Label":"财经 财经","True Label":"财经"}

curl -X POST http://xx.xx.xx.xx:8000/predict -H 'Content-Type: application/json' -d '{ "text":"美国要在亚太建导弹基地？普京：给你脸了是不是！" ,"label":"军事"}' |jq

{"Predict Label":"国际 国际","True Label":"军事"}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
bert_for_classification.py		bert_for_classification.py
classification-api.py		classification-api.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

基于BERT的中文新闻分类例子

Getting Started

1. Download the Pre-trained BERT Model

2. Download the dataset followed by the command below and unzip to data dir

3. Prepare the virtual python environment and install the package in requirements.txt

4. Run the command below to fine tune for classification

4. Set the output file position above to api file, and run the command below to start the flask service

5. Curl the rest api to test

About

Releases

Packages

Languages

binnz/chinese-news-classification-example

Folders and files

Latest commit

History

Repository files navigation

基于BERT的中文新闻分类例子

Getting Started

1. Download the Pre-trained BERT Model

2. Download the dataset followed by the command below and unzip to data dir

3. Prepare the virtual python environment and install the package in requirements.txt

4. Run the command below to fine tune for classification

4. Set the output file position above to api file, and run the command below to start the flask service

5. Curl the rest api to test

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages