CN114328930A - Text classification method and system based on entity extraction - Google Patents

Text classification method and system based on entity extraction Download PDF

Info

Publication number
CN114328930A
CN114328930A CN202111666994.7A CN202111666994A CN114328930A CN 114328930 A CN114328930 A CN 114328930A CN 202111666994 A CN202111666994 A CN 202111666994A CN 114328930 A CN114328930 A CN 114328930A
Authority
CN
China
Prior art keywords
text
similarity
keywords
keyword
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111666994.7A
Other languages
Chinese (zh)
Inventor
章明珠
钟志成
刘宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Siwei Century Technology Co ltd
Original Assignee
Chengdu Siwei Century Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Siwei Century Technology Co ltd filed Critical Chengdu Siwei Century Technology Co ltd
Priority to CN202111666994.7A priority Critical patent/CN114328930A/en
Publication of CN114328930A publication Critical patent/CN114328930A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a text classification method and a text classification system based on entity extraction, wherein the method comprises the following steps: respectively customizing keywords for different text categories, and then executing the following steps: s100: carrying out entity labeling based on words on the user-defined keywords in the known category texts to obtain training samples; s200: performing entity recognition training on the keyword extraction model by using the training sample; s300: extracting keywords from the text to be classified by using the trained keyword extraction model; s400: respectively calculating the similarity between the extracted keywords and the user-defined keywords of each text category by using a similarity detection model; s500: and classifying the texts to be classified based on the similarity. The method has strong reusability and higher accuracy, is also suitable for detecting the similarity of excessive and overlarge text data, and can be applied to automatic classification of the text data in the current enterprise data assets.

Description

Text classification method and system based on entity extraction
Technical Field
The invention relates to the technical field of text processing, in particular to a text classification method and system based on entity extraction.
Background
The existing text similarity detection method mostly uses Simhash similarity, and the steps are summarized as follows:
(1) the method comprises the steps of segmenting a text by adopting a keyword extraction method, taking the first N words (feature) with the highest weight value and the weight (weight) in a document, and obtaining a set (feature: weight) with the length of N; the larger the weight value is, the greater the importance of the word to the text is.
(2) And carrying out common hash on words in the set (feature: weight) to obtain a corresponding 64-bit binary number (hash), so that the set (feature: weight) is converted into a set (hash: weight) with the length of N.
(3) And according to the fact that each position of the binary number (hash) in the set (hash) is 1 or 0, taking a positive value and a negative value for the corresponding position to obtain N lists with the length of 64. For example, the term hashed binary number and weight value is (010111: 5), the binary number listed here is only 6 bits, which is only used as an example, and the binary number obtained in practical applications is often 64 bits. The list obtained after step (3) is then [ -5,5, -5,5,5,5 ].
(4) And (4) performing row-to-row accumulation on the N lists obtained in the step (3) to obtain a list with the length of 64. For example, the lists [ -5,5, -5,5, 5], [ -3, -3, -3,3, -3,3], [1, -1, -1,1,1,1] are accumulated in row directions to obtain the list [ -7, 1, -9, 9, 3, 9 ].
(5) Judging the positive and negative of each numerical value in the list obtained in the step (4), and taking 0 as the corresponding position when the numerical value is a negative value; when the value is positive, the corresponding position is 1; thereby obtaining a Simhash value of the text. For example, 010111 is obtained from the list [ -7, 1, -9, 9, 3, 9], i.e. the text has a Simhash value of 010111.
(6) Performing XOR on the Simhash values of the two texts, and judging that the two texts are not similar if the number of '1' in the result exceeds a user-defined threshold value M; otherwise, the judgment is similar.
The text similarity detection method is simple in thought, firstly extracts text keywords, and then reflects the similarity of the two texts according to the frequency and frequency of occurrence of the words. But the defect is obvious, and the text similarity detection method cannot judge the similar meaning words and the synonyms. In practice, words with similar meanings, such as "compensation" and "indemnity", are considered as two different words, so that two texts that are originally similar are judged to be dissimilar. Therefore, it is desirable to accurately measure the similarity between two texts, and the semantic analysis is also used as a basis.
When text similarity detection is performed, keywords need to be extracted from a text first, and then similarity is detected based on the frequency and frequency of occurrence of the keywords. At present, the keyword extraction method mainly comprises a keyword extraction method based on statistical characteristics (such as TF-IDF method) and a keyword extraction method based on a word graph model (such as PageRank algorithm and TextRank algorithm). The keyword extraction method based on statistical characteristics is to extract keywords by using statistical information of words in a text. The idea of the keyword extraction method based on the word graph model is as follows: firstly, a language network diagram of a text is constructed, and then the language network diagram is analyzed to find out words or phrases with important functions, wherein the words or phrases are keywords of the text. However, the current keyword extraction methods also have the following defects: words in text that appear less frequently but are more critical are easily ignored. Words such as "name", "graduation school", "work experience", etc. in the resume text are important words representing the text, although they occur less frequently in the resume text. The existing keyword extraction method often ignores the words, and screens out the common words which appear repeatedly.
In conclusion, the current text similarity detection method does not detect the text similarity based on semantics, and the detection accuracy is not high; and the accuracy of the existing keyword extraction method needs to be improved.
With the rapid development of the internet and big data, text data such as office documents, mails, research reports, laws and regulations become the main forms of enterprise data. In the face of these rapidly growing text data, how to classify them effectively becomes one of the major challenges facing the enterprise. When the text is too long, the above method of directly detecting the similarity of the text is not suitable. The main reasons are: excessive common words in the text bring extra interference to the text similarity detection, so that the detection accuracy is not high; the amount of computation is large, and too much and too long text may result in a failure to input the model at one time, thereby affecting efficiency.
Disclosure of Invention
The invention aims to provide a text classification method and system based on entity extraction, which are efficient and more accurate.
The invention adopts the following technical scheme:
the text classification method based on entity extraction comprises the following steps:
respectively customizing keywords for different text categories, and then executing the following steps:
s100: carrying out entity labeling based on words on the user-defined keywords in the known category texts to obtain training samples;
s200: performing entity recognition training on the keyword extraction model by using the training sample;
the keyword extraction model comprises an ALBERT layer, a BILSTM layer and a CRF layer; when a text is input, the keyword extraction model divides the text into single words and inputs the single words into an ALBERT layer, the ALBERT layer is used for representing the words into word vectors fused with context semantic information and inputting the word vectors into a BILSTM layer, and the BILSTM layer is used for calculating the probability that the words belong to each label; the CRF layer is used for outputting the label corresponding to the maximum probability value as the label of each character;
s300: extracting keywords from the text to be classified by using the trained keyword extraction model;
s400: respectively calculating the similarity between the extracted keywords and the user-defined keywords of each text category by using a similarity detection model; the similarity detection model adopts an ALBERT model;
s500: taking the maximum similarity, comparing the maximum similarity with a preset similarity threshold, and classifying the text to be classified into the text category corresponding to the maximum similarity when the maximum similarity is not less than the similarity threshold; otherwise, the text to be classified does not belong to any one of the current existing text categories; the similarity threshold is an empirical value.
In some embodiments, step S400 further comprises:
s410: respectively converting the extracted keywords and the self-defined keywords of each text category into word vectors by using an ALBERT model;
s420: splicing word vectors of the extracted keywords into keyword vectors, and splicing the word vectors of the self-defined keywords of each text category into self-defined keyword vectors respectively;
s430: and respectively calculating the similarity between the keyword vector and each text type self-defined keyword vector.
In some embodiments, the similarity is cosine similarity in step S430.
The text classification system based on entity extraction is characterized by comprising the following steps:
the entity labeling module is used for carrying out word-based entity labeling on the user-defined keywords in the known type texts to obtain training samples; respectively customizing keywords for different text categories before entity labeling;
the entity recognition training module is used for carrying out entity recognition training on the keyword extraction model by using the training samples;
the keyword extraction model comprises an ALBERT layer, a BILSTM layer and a CRF layer; when a text is input, the keyword extraction model divides the text into single words and inputs the single words into an ALBERT layer, the ALBERT layer is used for representing the words into word vectors fused with context semantic information and inputting the word vectors into a BILSTM layer, and the BILSTM layer is used for calculating the probability that the words belong to each label; the CRF layer is used for taking the label corresponding to the maximum probability value as the label of each character and outputting the label, and extracting keywords according to the label of each character;
the keyword extraction module is used for extracting keywords from the text to be classified by using the trained keyword extraction model;
the similarity calculation module is used for calculating the similarity between the extracted keywords and the user-defined keywords of each text category respectively by using a similarity detection model; the similarity detection model adopts an ALBERT model;
the classification module is used for obtaining the maximum similarity, comparing the maximum similarity with a preset similarity threshold, and classifying the text to be classified into the text category corresponding to the maximum similarity when the maximum similarity is not less than the similarity threshold; otherwise, the text to be classified does not belong to any one of the current existing text categories; the similarity threshold is an empirical value.
In some embodiments, the similarity calculation module further comprises a sub-module:
the word vector conversion sub-module is used for converting the extracted keywords and the self-defined keywords of each text category into word vectors by utilizing an ALBERT model;
the splicing module is used for splicing the word vectors of the extracted keywords into keyword vectors and splicing the word vectors of the self-defined keywords of each text category into self-defined keyword vectors respectively;
and the calculating submodule is used for calculating the similarity between the keyword vector and each text type self-defined keyword vector.
Compared with the prior art, the invention has the following characteristics and beneficial effects:
(1) according to the method, the ALBERT model is applied to the keyword extraction model, the keywords are abstracted into the entity, the process of extracting the keywords is the process of extracting the entity, and the accuracy of extracting the keywords can be improved; the keywords can be extracted without word segmentation, so that the interference of inaccurate word segmentation on text similarity detection can be avoided; therefore, the invention has strong reusability and higher accuracy.
(2) When the keyword extraction model is trained, the pre-training model of the ALBERT is utilized, and the training can be carried out only by marking a small number of samples, so that the workload of manual marking can be obviously reduced.
(3) The method can avoid the limitation that the synonyms and synonyms cannot be judged at present, has higher similarity detection accuracy, is also applicable to the similarity detection of excessive and overlarge text data, and can be applied to the automatic classification of the text data in the data assets of the enterprises at present.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of the principle of the ALBERT + BILSTM + CRF binding model of the present invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It is to be understood that the specific embodiments described are merely a few examples of the invention and not all examples. All other embodiments, which can be derived by a person skilled in the art from the described embodiments without inventive step, are within the scope of protection of the invention.
The invention relates to a text classification method and a text classification system based on entity extraction, wherein the thought is as follows: constructing a keyword extraction model based on the ALBERT + BILSTM + CRF combined model, training the keyword extraction model by adopting a text of a known category, and extracting keywords from the text to be classified by adopting the trained keyword extraction model; and constructing a similarity detection model by adopting an ALBERT model, converting the extracted keywords into corresponding word vectors by adopting the similarity detection model, then respectively calculating the similarity between the word vectors of the text to be classified and the keywords of the text of each known category, and classifying the text to be classified based on the similarity. In specific applications, the text categories include contract texts, patent texts, financial reports, web page contents, resume texts, legal texts, and the like, and the text categories can be customized according to actual situations.
The keyword extraction model mainly comprises an ALBERT layer, a BILSTM layer and a CRF layer, wherein the ALBERT layer, the BILSTM layer and the CRF layer are respectively realized by adopting an ALBERT model, a BILSTM model and a CRF model, and the ALBERT model, the BILSTM model and the CRF model are all existing models. The ALBERT model is a transformer-based bi-directional encoder representation model that may be used to represent words, words as word vectors. The BILSTM model is a bidirectional long-short memory recursive network model, which is used for classification. The CRF model is a conditional random field that is used to add constraints.
The ALBERT model has obvious effect on entity recognition. According to the method, the ALBERT model is introduced into the keyword extraction model, keywords are abstracted into entities to be extracted, and the process of extracting the keywords is the process of extracting the entities. In the invention, keywords are customized for different types of texts. For example, for the resume text, "name", "graduate school", "work experience" and the like can be customized as keywords; for legal texts, "original announcement", "announced" and the like can be customized as keywords. Therefore, the extracted keywords are representative for different types of texts, and the text types can be reflected more accurately.
Generally, before extracting keywords, the text needs to be segmented and then the keywords are extracted. Therefore, whether the word segmentation result is accurate or not greatly influences the extraction result of the keyword. However, the ALBERT model can be based on a word training model and is not restricted by words, so that the keyword extraction model does not need word segmentation, and the interference of inaccurate word segmentation on the text similarity detection result can be avoided.
In the invention, the detailed flow of extracting the keywords by using the keyword extraction model is as follows:
(1) and carrying out entity labeling on the keywords appearing in the texts of the known types to obtain the training samples.
The keywords adopted in the method are self-defined keywords which are respectively self-defined aiming at each text category, and the self-defined keywords can be extracted by subsequent keyword extraction. The invention is based on the word training model, therefore, when the entity is labeled, the entity is also labeled based on the word. The label comprises a non-keyword label, a keyword starting word label and a keyword follow-up word label. For example, for a sentence in the legal text, "weekly plan and answer debate", labeled "[ weekly O, Hua O, plan O, and O, answer B-W, debate I-W, person I-W, debate B-W, marriage I-W ]", where O represents content that does not need to be identified, i.e., a non-keyword tag; w represents a proper noun entity belonging to a legal text, namely a customized keyword; b represents the beginning of an entity, namely the beginning character of a certain keyword, and B-W is the starting character label of the keyword; i represents the successor of an entity, namely the successor word of a keyword, and I-W is the successor word tag of the keyword. Here, "divorce" means proper noun entity, denoted by W, and thus the word "divorce" is labeled "B-W" and the word "marriage" is labeled "I-W".
(2) And performing entity recognition training on the keyword extraction model by using the training samples.
(3) And extracting entities, namely keywords, from the text to be classified by using the trained keyword extraction model.
Referring to fig. 2, a schematic diagram of a keyword extraction model is shown. The keyword extraction model comprises an ALBERT layer, a BILSTM layer and a CRF layer. Text input keyword extraction model, single word W in text1、W2、W3、…WnThe maximum length of each piece of input data is 128 in this embodiment mode. The ALBERT layer initializes words into word vectors. In the model training process, the ALBERT layer repeatedly iterates and updates the word vector, and finally outputs the vector representation after the word context semantic information is blended. Word vector E output by ALBERT layer1、E2、E3… En fuses context semantic information, word vector E1、E2、E3… En as input to the BILSTM layer. The probability that the BILSTM layer output word belongs to each tag, and tags are labeled. The output data format of the BILSTM layer is [ batch _ size, num _ steps, num _ tags]The batch _ size represents the number of training sentences per time, num _ steps represents the length of each training sentence, and num _ tags is the number of tagged tags. Finally, the maximum value of the probability that each word belongs to each TAG is output by the CRF layer and is recorded as TAG1、TAG2、TAG3、TAGn。
For example, tag numbers labeled in training samples are O, B-W, I-W, the number of training sentences per time is 16, and the maximum length of each sentence is 128, so that batch _ size is 16, num _ steps is 128, num _ tags is the labeled tag number, that is, num _ tags is 5.
Without the CRF layer, a logically non-compliant sequence of [ B-W, O, I-W, O, I-W ] would appear if the maximum probability value of 5 tags for each word output directly in the BILSTM layer is taken as output. For the "weekly planning and answering debate" sentence, if the sentence is restored according to the sequence of [ B-W, O, I-W, O, I-W ], the sentence becomes "answering debate. The CRF layer is used to add constraints such as: after the 'B-W', only the 'I-W' can appear, and other tags cannot appear, so that the final prediction result is ensured to accord with language logic.
The training process of the keyword extraction model in the invention is as follows: in each iteration, the CRF layer outputs the label of each character in the training sample, compares each output character label with the label of the character, and calculates the accuracy rate; and when the accuracy rate does not reach the preset accuracy rate threshold value, repeating iterative training until the accuracy rate reaches the preset accuracy rate threshold value, and finishing the training. The trained keyword extraction model can endow each character in the text to be classified with a label, and keywords can be extracted based on the label.
In extracting keywords, the words are trained into word vectors using the ALBERT model. When the text similarity calculation is carried out, the word vectors of the keywords are trained by using the ALBERT model, and the extracted keywords are converted into corresponding word vectors.
The similarity detection model adopts an ALBERT model, and the detailed flow of text similarity is as follows:
(4) and respectively converting the extracted keywords and the self-defined keywords of each text category into word vectors by using an ALBERT model.
The keywords extracted from the text to be classified are assumed to be: [ "contract", "first party", "second party", "other. ], the ALBERT model converts the keywords into corresponding word vectors: { "contract": [9.79439989e-02, -3.78220007e-02,3.56911987e-01, ], "first square": [3.87089998e-02, -7.86560029e-02, -6.02289997e-02, ], "B-side": [2.89862990e-01, -9.67399962e-03, -1.52429000e-01,. ] }. Where each keyword word vector dimension is 300.
(5) And splicing the keywords extracted from the texts to be classified into keyword vectors, and splicing the self-defined keywords of each text category into the self-defined keyword vectors of each text category.
Based on the above example, the word vectors of the keywords [ "contract", "first party", "second party", "so. ] extracted from the text to be classified are spliced to [9.79439989e-02, -3.78220007e-02,3.56911987e-01,. so.,. 3.87089998e-02, -7.86560029e-02, -6.02289997e-02,. so.,. 2.89862990e-01, -9.67399962e-03, -1.52429000 e-01. ].
(6) And calculating the cosine similarity between the keyword vector of the text to be classified and the self-defined keyword vector of each text category, and calculating the formula (1).
Figure BDA0003451086710000081
In the formula (1), V1 and V2 are respectively a keyword vector and a self-defined keyword vector of the text to be classified; cos theta is the cosine similarity of V1 and V2.
The text classification method based on entity extraction is realized based on the keyword extraction and text similarity detection method, and comprises the following specific steps:
firstly, defining keywords for different text categories respectively.
The texts in different categories have representative words which can reflect the text categories, for example, the resume text, such as "name", "graduation school", "work experience", and the like, is a representative word of the resume text; for legal texts, "original announcement", "announced" and the like are representative words of the legal texts. And customizing the representative words of the texts in each category into the keywords of each text category. Therefore, the names, graduation schools, work experiences, and the like can be customized as keywords of the resume text, and the original reports, the announcements, and the like can be customized as keywords of the legal text.
And secondly, training a keyword extraction model by using the training sample, and extracting the keywords of the text to be classified by using the trained keyword extraction model. The principle and the specific implementation process of this step are detailed in the steps (1) - (3), and are not described herein again.
And thirdly, respectively detecting the similarity between the keywords extracted from the texts to be classified and the self-defined keywords of each text category by using a similarity detection model. The principle and the specific implementation process of this step are detailed in the steps (4) - (6), and are not described herein again.
Fourthly, the maximum similarity is obtained, the maximum similarity is compared with a preset similarity threshold, and when the maximum similarity is not smaller than the similarity threshold, the text to be classified is classified into the text category corresponding to the maximum similarity; otherwise, the text to be classified does not belong to any one of the current existing text categories. The similarity threshold is an empirical value, and can be adjusted for multiple times according to actual application scenarios to obtain an optimal value.
It will be understood by those skilled in the art that all or part of the steps of the above method may be implemented by hardware related to instructions of a program, the program may be stored in a computer readable storage medium, and when executed, the program comprises the following steps: (steps of the method), said storage medium, such as: ROM/RAM, magnetic disk, optical disk, etc.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (5)

1. The text classification method based on entity extraction is characterized by comprising the following steps:
respectively customizing keywords for different text categories, and then executing the following steps:
s100: carrying out entity labeling based on words on the user-defined keywords in the known category texts to obtain training samples;
s200: performing entity recognition training on the keyword extraction model by using the training sample;
the keyword extraction model comprises an ALBERT layer, a BILSTM layer and a CRF layer; when a text is input, the keyword extraction model divides the text into single words and inputs the single words into an ALBERT layer, the ALBERT layer is used for representing the words into word vectors fused with context semantic information and inputting the word vectors into a BILSTM layer, and the BILSTM layer is used for calculating the probability that the words belong to each label; the CRF layer is used for outputting the label corresponding to the maximum probability value as the label of each character;
s300: extracting keywords from the text to be classified by using the trained keyword extraction model;
s400: respectively calculating the similarity between the extracted keywords and the user-defined keywords of each text category by using a similarity detection model; the similarity detection model adopts an ALBERT model;
s500: taking the maximum similarity, comparing the maximum similarity with a preset similarity threshold, and classifying the text to be classified into the text category corresponding to the maximum similarity when the maximum similarity is not less than the similarity threshold; otherwise, the text to be classified does not belong to any one of the current existing text categories; the similarity threshold is an empirical value.
2. The method for text classification based on entity extraction as claimed in claim 1, wherein:
step S400 further includes:
s410: respectively converting the extracted keywords and the self-defined keywords of each text category into word vectors by using an ALBERT model;
s420: splicing word vectors of the extracted keywords into keyword vectors, and splicing the word vectors of the self-defined keywords of each text category into self-defined keyword vectors respectively;
s430: and respectively calculating the similarity between the keyword vector and each text type self-defined keyword vector.
3. The method of text classification based on entity extraction as claimed in claim 2, characterized by:
in step S430, the similarity is cosine similarity.
4. The text classification system based on entity extraction is characterized by comprising the following steps:
the entity labeling module is used for carrying out word-based entity labeling on the user-defined keywords in the known type texts to obtain training samples; respectively customizing keywords for different text categories before entity labeling;
the entity recognition training module is used for carrying out entity recognition training on the keyword extraction model by using the training samples;
the keyword extraction model comprises an ALBERT layer, a BILSTM layer and a CRF layer; when a text is input, the keyword extraction model divides the text into single words and inputs the single words into an ALBERT layer, the ALBERT layer is used for representing the words into word vectors fused with context semantic information and inputting the word vectors into a BILSTM layer, and the BILSTM layer is used for calculating the probability that the words belong to each label; the CRF layer is used for taking the label corresponding to the maximum probability value as the label of each character and outputting the label, and extracting keywords according to the label of each character;
the keyword extraction module is used for extracting keywords from the text to be classified by using the trained keyword extraction model;
the similarity calculation module is used for calculating the similarity between the extracted keywords and the user-defined keywords of each text category respectively by using a similarity detection model; the similarity detection model adopts an ALBERT model;
the classification module is used for obtaining the maximum similarity, comparing the maximum similarity with a preset similarity threshold, and classifying the text to be classified into the text category corresponding to the maximum similarity when the maximum similarity is not less than the similarity threshold; otherwise, the text to be classified does not belong to any one of the current existing text categories; the similarity threshold is an empirical value.
5. The entity extraction based text classification system of claim 4, wherein:
the similarity calculation module further comprises a sub-module:
the word vector conversion sub-module is used for converting the extracted keywords and the self-defined keywords of each text category into word vectors by utilizing an ALBERT model;
the splicing module is used for splicing the word vectors of the extracted keywords into keyword vectors and splicing the word vectors of the self-defined keywords of each text category into self-defined keyword vectors respectively;
and the calculating submodule is used for calculating the similarity between the keyword vector and each text type self-defined keyword vector.
CN202111666994.7A 2021-12-31 2021-12-31 Text classification method and system based on entity extraction Pending CN114328930A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111666994.7A CN114328930A (en) 2021-12-31 2021-12-31 Text classification method and system based on entity extraction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111666994.7A CN114328930A (en) 2021-12-31 2021-12-31 Text classification method and system based on entity extraction

Publications (1)

Publication Number Publication Date
CN114328930A true CN114328930A (en) 2022-04-12

Family

ID=81020355

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111666994.7A Pending CN114328930A (en) 2021-12-31 2021-12-31 Text classification method and system based on entity extraction

Country Status (1)

Country Link
CN (1) CN114328930A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115827815A (en) * 2022-11-17 2023-03-21 西安电子科技大学广州研究院 Keyword extraction method and device based on small sample learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115827815A (en) * 2022-11-17 2023-03-21 西安电子科技大学广州研究院 Keyword extraction method and device based on small sample learning
CN115827815B (en) * 2022-11-17 2023-12-29 西安电子科技大学广州研究院 Keyword extraction method and device based on small sample learning

Similar Documents

Publication Publication Date Title
CN108874776B (en) Junk text recognition method and device
CN111274394B (en) Method, device and equipment for extracting entity relationship and storage medium
CN109460455B (en) Text detection method and device
Zhang et al. Sentiment Classification Based on Piecewise Pooling Convolutional Neural Network.
CN111291566B (en) Event main body recognition method, device and storage medium
CN109271489B (en) Text detection method and device
CN108287911B (en) Relation extraction method based on constrained remote supervision
CN107341143B (en) Sentence continuity judgment method and device and electronic equipment
CN111611807B (en) Keyword extraction method and device based on neural network and electronic equipment
CN113591483A (en) Document-level event argument extraction method based on sequence labeling
CN114297987B (en) Document information extraction method and system based on text classification and reading understanding
CN115470354B (en) Method and system for identifying nested and overlapped risk points based on multi-label classification
CN114416979A (en) Text query method, text query equipment and storage medium
CN113033183B (en) Network new word discovery method and system based on statistics and similarity
CN114218945A (en) Entity identification method, device, server and storage medium
CN112287100A (en) Text recognition method, spelling error correction method and voice recognition method
CN112069312A (en) Text classification method based on entity recognition and electronic device
CN111178080A (en) Named entity identification method and system based on structured information
CN113204956B (en) Multi-model training method, abstract segmentation method, text segmentation method and text segmentation device
CN113486178B (en) Text recognition model training method, text recognition method, device and medium
CN114328930A (en) Text classification method and system based on entity extraction
CN112784580A (en) Financial data analysis method and device based on event extraction
CN116304023A (en) Method, system and storage medium for extracting bidding elements based on NLP technology
CN116976341A (en) Entity identification method, entity identification device, electronic equipment, storage medium and program product
CN115563278A (en) Question classification processing method and device for sentence text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination