CN112860889A - BERT-based multi-label classification method - Google Patents

BERT-based multi-label classification method Download PDF

Info

Publication number
CN112860889A
CN112860889A CN202110121995.7A CN202110121995A CN112860889A CN 112860889 A CN112860889 A CN 112860889A CN 202110121995 A CN202110121995 A CN 202110121995A CN 112860889 A CN112860889 A CN 112860889A
Authority
CN
China
Prior art keywords
sentence
label
text data
bert
tag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110121995.7A
Other languages
Chinese (zh)
Inventor
郑文
张和伟
邓丽平
侯凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiyuan University of Technology
Original Assignee
Taiyuan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiyuan University of Technology filed Critical Taiyuan University of Technology
Priority to CN202110121995.7A priority Critical patent/CN112860889A/en
Publication of CN112860889A publication Critical patent/CN112860889A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a BERT-based multi-tag classification method, in particular to a BERT-based sentence classification task, which determines whether a sentence is marked by a tag or not by judging the context relationship between the sentence and the tag. Which comprises the following steps: the device comprises a data preprocessing module, a BERT fine adjustment module and a classifier module. The invention respectively forms sentences and all labels in the text into sentence pairs, and makes the sentence vectors of the sentences and the labels represent semantic information rich in context by utilizing the advantage that the classification task of the sentence pairs of the BERT model has obvious advantages in multiple fields. Finally, the obtained sentence vector is transmitted into a classifier module to obtain the semantic relation between the sentence and the label, so that whether the sentence is marked by the label is predicted. The method can greatly reduce the data required by training and ensure better results.

Description

BERT-based multi-label classification method
Technical Field
The invention relates to the technical field of natural language processing, in particular to a BERT-based multi-label classification method.
Background
Nowadays, the world is in the third wave of artificial intelligence, various fields generate various data, and machine learning methods are urgently needed to be introduced, so that intellectualization, informatization and industrial upgrading are realized. In order to extract the rich information in the data, the traditional way of manually carrying out data induction, analysis and classification tasks is widely used in the internet field and is replaced by a machine learning method. Meanwhile, various conventional fields are more urgent to expect the acceleration of industrial upgrading through machine learning. In order to learn a large amount of information in data more efficiently, the various directions of machine learning have been developed rapidly in recent years, and the research contents are deeper and deeper, and the research field is wider and wider. The classification problem has a high application value as one of important research directions in the field of machine learning, and has also received wide attention from a large number of researchers and practitioners in various fields.
In the real world, data accumulation is often a long-term collection process, and objects of a classification task generally belong to multiple categories, namely, multiple labels are associated. In the early days of application of machine learning methods, data with multiple tags is often the more common situation. In recent years, research on multi-label learning problems has attracted wide attention, and becomes a popular direction for research in two machine learning fields, and an application scenario of a conventional classification learning method is generally set as a single-label classification problem, and each instance is only associated with one label suitable for attribute characteristics of the single-label classification problem. However, in the real world, a label for an instance typically has a set of tags associated with it. For example, in the process of searching a paper database, a single-label search classification problem can be realized by using a paper title, but the paper search is not convenient. In the actual retrieval process, a keyword manner is usually adopted to perform classified retrieval on related papers, and a paper often includes a plurality of keywords. When the text is classified by the multi-keyword application, the traditional single-label supervised learning is not completely suitable for the multi-label classification task. Therefore, the importance of the multi-label classification problem which is more in line with the real life is highlighted.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a BERT-based multi-label classification method aiming at the defects in the prior art.
The technical scheme adopted by the invention for solving the technical problems is as follows: constructing a BERT-based multi-label classification method, comprising:
selecting a CAIL-2019 data set as a corpus, combining all text data in the data set with different labels, and marking new labels for sentence combinations according to a label list of sentences;
performing word segmentation on the processed text data, connecting a [ CLS ] mark at the beginning of a sentence of each text data, and adding an [ SEP ] mark between the sentence and the label;
vectorizing the text data after word segmentation, and representing each word in the input text data by using a pre-trained word feature vector to obtain a vector of the text data after word segmentation;
extracting the feature word vectors of the text data and the feature word vectors of the labels after word segmentation, and obtaining semantically fused sentence vectors by utilizing self-attention operation;
and inputting the sentence vector into a feedforward neural network model, and predicting the relation of the sentences through the output result of the model.
Wherein, in the step of combining all text data in the data set with different tags, each sentence in the sentence pair is combined with each tag once.
In the step of segmenting the processed text data, a sequence formed by splicing the pre-defined symbols [ CLS ] and [ SEP ] is obtained; the spliced sequence is 'CLS' sentence 'SEP' tag 'SEP', the 'CLS' is the input text sequence, and the 'SEP' is the division symbol of the sentence and the tag.
The sentence vector is used for predicting the relation of the sentences through a feedforward neural network, namely the probability that the sample y belongs to the label L is calculated:
Figure BDA0002922489020000031
where θ represents the model parameters, and finally outputs a two-dimensional vector V ═ V1,v2],viRepresents the conditional probability under label L;
normalizing the obtained two-dimensional vector, and obtaining a final result by using an indicative function I, wherein the formula is expressed as follows:
Figure BDA0002922489020000032
wherein k is1Indicates the probability, k, of tag 12Representing the probability of tag 2.
Compared with the prior art, the invention has the beneficial effects that: mainly comprises the following aspects:
when the BERT model used by the method is proposed by Google, a large number of text data sets of Wikipedia are used for pre-training, and compared with other models, the method can reduce the steps of pre-training and reduce the complex workload;
and secondly, the performance of the BERT model on a sentence pair task is more excellent than that on a single sentence sub task, and the characteristics of the BERT model can be fully utilized by taking the semantic information of the speaking tag into consideration.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
fig. 1 is a schematic diagram of an overall framework of a BERT-based multi-label classification method provided by the present invention.
Detailed Description
For a more clear understanding of the technical features, objects and effects of the present invention, embodiments of the present invention will now be described in detail with reference to the accompanying drawings.
As shown in fig. 1, the present invention designs a BERT-based multi-label classification method, which includes:
selecting a CAIL-2019 data set as a corpus, combining all text data in the data set with different labels, and marking new labels for sentence combinations according to a label list of sentences;
and selecting a CAIL-2019 data set as a corpus, forming a sentence combination for each sentence and all the labels, and marking new labels for the sentence combination according to the label list of the sentences. If a certain label exists in the list, the sentence pair formed by the sentence and the label is marked as 1, and if the label does not exist, the sentence pair is marked as 0;
performing word segmentation on the processed text data, connecting a [ CLS ] mark at the beginning of a sentence of each text data, and adding an [ SEP ] mark between the sentence and the label;
sequences spliced by predefined symbols [ CLS ] and [ SEP ]; wherein, the spliced sequence is a [ CLS ] original sentence sequence [ SEP ] auxiliary sentence sequence [ SEP ] ", [ CLS ] is a semantic symbol of an input text sequence, and [ SEP ] is a segmentation symbol of a problem sequence and a text segment sequence;
vectorizing the text data after word segmentation, and representing each word in the input text data by using a pre-trained word feature vector to obtain a vector of the text data after word segmentation;
extracting feature word vectors of aspect words and feature word vectors of auxiliary sentences from the vectors of the text data after word segmentation, and obtaining semantically fused sentence vectors by utilizing self-attention operation;
and inputting the sentence vector into a feedforward neural network model, and predicting the relation of the sentences through the output result of the model.
Wherein, in the step of combining all text data in the data set with different tags, each sentence in the sentence pair is combined with each tag once.
The sentence vector is used for predicting the relation of the sentences through a feedforward neural network, namely the probability that the sample y belongs to the label L is calculated:
Figure BDA0002922489020000041
where θ represents the model parameters, and finally outputs a two-dimensional vector V ═ V1,v2]Wherein,viRepresents the conditional probability under label L;
normalizing the obtained two-dimensional vector, and obtaining a final result by using an indicative function I, wherein the formula is expressed as follows:
Figure BDA0002922489020000042
wherein k is1Indicates the probability, k, of tag 12Representing the probability of tag 2.
The loss is calculated using a cross entropy loss function and the parameters are updated.
The present invention uses a CAIL-2019 dataset selected from the legal documents published by the "network of referees of china". Each line of data in the data represents a sentence division result of a partial paragraph extracted in one referee document, and an element tag list of the sentence. The referee documents mainly relate to three fields of marital families, labor disputes and borrowing disputes, and comprise 2740 pieces, including a marital family 1269 piece, a labor dispute 836 piece and a borrowing dispute 635 piece. The data is marked by professionals with legal backgrounds, and each of the three fields has 20 element labels and the represented Chinese semantics.
The problems studied by the invention belong to classification problems, and common evaluation indexes comprise accuracy rates in the classification problems
Figure BDA0002922489020000051
Recall rate
Figure BDA0002922489020000052
F1 value
Figure BDA0002922489020000053
Wherein the F1 values include a micro-average F1 value
Figure BDA0002922489020000054
And macroaverage F1 value
Figure BDA0002922489020000055
Figure BDA0002922489020000056
A confusion matrix is needed to be used in calculation, and True Positive (TP) represents that a Positive class is predicted to be a Positive class; true Negative (TN) indicates that a Negative class is predicted as a Negative class; false Positive (FP) indicates that a negative class is predicted as a Positive class; false Negative (FN) indicates that a positive class is predicted as a Negative class. The performance criteria of the model were evaluated using a Score, calculated by "Micro _ F1" and "Macro _ F1":
Figure BDA0002922489020000057
the invention carries out comparison experiments on the proposed multi-label classification method, a Support Vector Machine (SVM), a TextCNN algorithm and a BERT-based multi-label classification method. The results of the experiment are shown in table 1:
Figure BDA0002922489020000058
TABLE 1 comparative experimental results of different multi-label classification methods
From experimental results, the model provided by the invention has the best effect among the three models.
According to the invention, the BERT model is utilized to construct sentences to classify task tasks, and the BERT model is finely adjusted, so that the effect of extracting two subtasks of tasks and emotion classification in different aspects is improved, a plurality of aspect words can be extracted, the model efficiency is improved, and the related redundant workload is reduced.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (4)

1. A BERT-based multi-label classification method is characterized by comprising the following steps:
selecting a CAIL-2019 data set as a corpus, combining all text data in the data set with different labels, and marking new labels for sentence combinations according to a label list of sentences;
performing word segmentation on the processed text data, connecting a [ CLS ] mark at the beginning of a sentence of each text data, and adding an [ SEP ] mark between the sentence and the label;
vectorizing the text data after word segmentation, and representing each word in the input text data by using a pre-trained word feature vector to obtain a vector of the text data after word segmentation;
extracting the feature word vectors of the text data and the feature word vectors of the labels after word segmentation, and obtaining semantically fused sentence vectors by utilizing self-attention operation;
and inputting the sentence vector into a feedforward neural network model, and predicting the relation of the sentences through the output result of the model.
2. The BERT-based multi-label classification method of claim 1, wherein, in the step of combining all text data in the dataset with different labels, each sentence in the sentence pair is combined with each label once.
3. The BERT-based multi-label classification method of claim 1, wherein in the step of tokenizing the processed text data, the stitched sequence is performed by predefined symbols [ CLS ] and [ SEP ]; the spliced sequence is 'CLS' sentence 'SEP' tag 'SEP', the 'CLS' is the input text sequence, and the 'SEP' is the division symbol of the sentence and the tag.
4. The BERT-based multi-label classification method of claim 1, wherein the sentence vectors are used to predict the sentence relationship by a feedforward neural network, i.e. to find the probability that a sample y belongs to a label L:
Figure FDA0002922489010000011
where θ represents the model parameters, and finally outputs a two-dimensional vector V ═ V1,v2],viRepresents the conditional probability under label L;
normalizing the obtained two-dimensional vector, and obtaining a final result by using an indicative function I, wherein the formula is expressed as follows:
Figure FDA0002922489010000021
wherein k is1Indicates the probability, k, of tag 12Representing the probability of tag 2.
CN202110121995.7A 2021-01-29 2021-01-29 BERT-based multi-label classification method Pending CN112860889A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110121995.7A CN112860889A (en) 2021-01-29 2021-01-29 BERT-based multi-label classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110121995.7A CN112860889A (en) 2021-01-29 2021-01-29 BERT-based multi-label classification method

Publications (1)

Publication Number Publication Date
CN112860889A true CN112860889A (en) 2021-05-28

Family

ID=75987942

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110121995.7A Pending CN112860889A (en) 2021-01-29 2021-01-29 BERT-based multi-label classification method

Country Status (1)

Country Link
CN (1) CN112860889A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114091472A (en) * 2022-01-20 2022-02-25 北京零点远景网络科技有限公司 Training method of multi-label classification model
CN115358206A (en) * 2022-10-19 2022-11-18 上海浦东华宇信息技术有限公司 Text typesetting method and system
CN115470354A (en) * 2022-11-03 2022-12-13 杭州实在智能科技有限公司 Method and system for identifying nested and overlapped risk points based on multi-label classification
CN116304064A (en) * 2023-05-22 2023-06-23 中电云脑(天津)科技有限公司 Text classification method based on extraction

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104615767A (en) * 2015-02-15 2015-05-13 百度在线网络技术(北京)有限公司 Searching-ranking model training method and device and search processing method
CN105404632A (en) * 2014-09-15 2016-03-16 深港产学研基地 Deep neural network based biomedical text serialization labeling system and method
CN107748783A (en) * 2017-10-24 2018-03-02 天津大学 A kind of multi-tag company based on sentence vector describes file classification method
CN107798624A (en) * 2017-10-30 2018-03-13 北京航空航天大学 A kind of technical label in software Ask-Answer Community recommends method
CN108334499A (en) * 2018-02-08 2018-07-27 海南云江科技有限公司 A kind of text label tagging equipment, method and computing device
CN109086267A (en) * 2018-07-11 2018-12-25 南京邮电大学 A kind of Chinese word cutting method based on deep learning
WO2019052561A1 (en) * 2017-09-18 2019-03-21 同方威视技术股份有限公司 Check method and check device, and computer-readable medium
CN110096572A (en) * 2019-04-12 2019-08-06 平安普惠企业管理有限公司 A kind of sample generating method, device and computer-readable medium
CN110210037A (en) * 2019-06-12 2019-09-06 四川大学 Category detection method towards evidence-based medicine EBM field
CN110209822A (en) * 2019-06-11 2019-09-06 中译语通科技股份有限公司 Sphere of learning data dependence prediction technique based on deep learning, computer
CN110263150A (en) * 2019-03-05 2019-09-20 腾讯科技(深圳)有限公司 Document creation method, device, computer equipment and storage medium
CN110334080A (en) * 2019-06-26 2019-10-15 广州探迹科技有限公司 A kind of construction of knowledge base method for realizing autonomous learning
CN110413999A (en) * 2019-07-17 2019-11-05 新华三大数据技术有限公司 Entity relation extraction method, model training method and relevant apparatus
CN111104802A (en) * 2019-12-11 2020-05-05 中国平安财产保险股份有限公司 Method for extracting address information text and related equipment
CN111209399A (en) * 2020-01-02 2020-05-29 联想(北京)有限公司 Text classification method and device and electronic equipment
CN111339260A (en) * 2020-03-02 2020-06-26 北京理工大学 BERT and QA thought-based fine-grained emotion analysis method
CN111368079A (en) * 2020-02-28 2020-07-03 腾讯科技(深圳)有限公司 Text classification method, model training method, device and storage medium

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105404632A (en) * 2014-09-15 2016-03-16 深港产学研基地 Deep neural network based biomedical text serialization labeling system and method
CN104615767A (en) * 2015-02-15 2015-05-13 百度在线网络技术(北京)有限公司 Searching-ranking model training method and device and search processing method
WO2019052561A1 (en) * 2017-09-18 2019-03-21 同方威视技术股份有限公司 Check method and check device, and computer-readable medium
CN107748783A (en) * 2017-10-24 2018-03-02 天津大学 A kind of multi-tag company based on sentence vector describes file classification method
CN107798624A (en) * 2017-10-30 2018-03-13 北京航空航天大学 A kind of technical label in software Ask-Answer Community recommends method
CN108334499A (en) * 2018-02-08 2018-07-27 海南云江科技有限公司 A kind of text label tagging equipment, method and computing device
CN109086267A (en) * 2018-07-11 2018-12-25 南京邮电大学 A kind of Chinese word cutting method based on deep learning
CN110263150A (en) * 2019-03-05 2019-09-20 腾讯科技(深圳)有限公司 Document creation method, device, computer equipment and storage medium
CN110096572A (en) * 2019-04-12 2019-08-06 平安普惠企业管理有限公司 A kind of sample generating method, device and computer-readable medium
CN110209822A (en) * 2019-06-11 2019-09-06 中译语通科技股份有限公司 Sphere of learning data dependence prediction technique based on deep learning, computer
CN110210037A (en) * 2019-06-12 2019-09-06 四川大学 Category detection method towards evidence-based medicine EBM field
CN110334080A (en) * 2019-06-26 2019-10-15 广州探迹科技有限公司 A kind of construction of knowledge base method for realizing autonomous learning
CN110413999A (en) * 2019-07-17 2019-11-05 新华三大数据技术有限公司 Entity relation extraction method, model training method and relevant apparatus
CN111104802A (en) * 2019-12-11 2020-05-05 中国平安财产保险股份有限公司 Method for extracting address information text and related equipment
CN111209399A (en) * 2020-01-02 2020-05-29 联想(北京)有限公司 Text classification method and device and electronic equipment
CN111368079A (en) * 2020-02-28 2020-07-03 腾讯科技(深圳)有限公司 Text classification method, model training method, device and storage medium
CN111339260A (en) * 2020-03-02 2020-06-26 北京理工大学 BERT and QA thought-based fine-grained emotion analysis method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
李洋等: "基于CNN和BiLSTM网络特征融合的文本情感分析", 《计算机应用》 *
王廷银等: "基于北斗RDSS的核辐射监测应急通讯方法", 《计算机系统应用》 *
青晨等: "深度卷积神经网络图像语义分割研究进展", 《中国图象图形学报》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114091472A (en) * 2022-01-20 2022-02-25 北京零点远景网络科技有限公司 Training method of multi-label classification model
CN115358206A (en) * 2022-10-19 2022-11-18 上海浦东华宇信息技术有限公司 Text typesetting method and system
CN115358206B (en) * 2022-10-19 2023-03-24 上海浦东华宇信息技术有限公司 Text typesetting method and system
CN115470354A (en) * 2022-11-03 2022-12-13 杭州实在智能科技有限公司 Method and system for identifying nested and overlapped risk points based on multi-label classification
CN115470354B (en) * 2022-11-03 2023-08-22 杭州实在智能科技有限公司 Method and system for identifying nested and overlapped risk points based on multi-label classification
CN116304064A (en) * 2023-05-22 2023-06-23 中电云脑(天津)科技有限公司 Text classification method based on extraction

Similar Documents

Publication Publication Date Title
Devika et al. Sentiment analysis: a comparative study on different approaches
CN112860889A (en) BERT-based multi-label classification method
CN112231447B (en) Method and system for extracting Chinese document events
CN110222160A (en) Intelligent semantic document recommendation method, device and computer readable storage medium
CN108090070B (en) Chinese entity attribute extraction method
CN109002473B (en) Emotion analysis method based on word vectors and parts of speech
CN110362819B (en) Text emotion analysis method based on convolutional neural network
CN111104510B (en) Text classification training sample expansion method based on word embedding
CN107168956B (en) Chinese chapter structure analysis method and system based on pipeline
Nasim et al. Sentiment analysis on Urdu tweets using Markov chains
CN112434164B (en) Network public opinion analysis method and system taking topic discovery and emotion analysis into consideration
CN112199501A (en) Scientific and technological information text classification method
CN115952292B (en) Multi-label classification method, apparatus and computer readable medium
CN115203421A (en) Method, device and equipment for generating label of long text and storage medium
CN114970523B (en) Topic prompting type keyword extraction method based on text semantic enhancement
CN114416979A (en) Text query method, text query equipment and storage medium
CN114936277A (en) Similarity problem matching method and user similarity problem matching system
CN110110087A (en) A kind of Feature Engineering method for Law Text classification based on two classifiers
CN114491062B (en) Short text classification method integrating knowledge graph and topic model
CN114491024A (en) Small sample-based specific field multi-label text classification method
CN113590827B (en) Scientific research project text classification device and method based on multiple angles
Dhar et al. Bengali news headline categorization using optimized machine learning pipeline
CN113987175A (en) Text multi-label classification method based on enhanced representation of medical topic word list
CN111368532B (en) Topic word embedding disambiguation method and system based on LDA
CN110888983B (en) Positive and negative emotion analysis method, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210528

RJ01 Rejection of invention patent application after publication