LU504829B1 - Text classification method, computer readable storage medium and system - Google Patents
Text classification method, computer readable storage medium and system Download PDFInfo
- Publication number
- LU504829B1 LU504829B1 LU504829A LU504829A LU504829B1 LU 504829 B1 LU504829 B1 LU 504829B1 LU 504829 A LU504829 A LU 504829A LU 504829 A LU504829 A LU 504829A LU 504829 B1 LU504829 B1 LU 504829B1
- Authority
- LU
- Luxembourg
- Prior art keywords
- text
- training
- neural network
- character
- recurrent neural
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 239000013598 vector Substances 0.000 claims abstract description 111
- 238000013528 artificial neural network Methods 0.000 claims abstract description 57
- 230000002457 bidirectional effect Effects 0.000 claims abstract description 55
- 230000000306 recurrent effect Effects 0.000 claims abstract description 55
- 238000012549 training Methods 0.000 claims description 74
- 238000004590 computer program Methods 0.000 claims description 12
- 230000011218 segmentation Effects 0.000 claims description 8
- 230000008451 emotion Effects 0.000 description 8
- 238000010801 machine learning Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3347—Query execution using vector based model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/18—Extraction of features or characteristics of the image
- G06V30/1801—Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections
- G06V30/18019—Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections by matching or filtering
- G06V30/18038—Biologically-inspired filters, e.g. difference of Gaussians [DoG], Gabor filters
- G06V30/18048—Biologically-inspired filters, e.g. difference of Gaussians [DoG], Gabor filters with interaction between the responses of different filters, e.g. cortical complex cells
- G06V30/18057—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19147—Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19173—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/26—Techniques for post-processing, e.g. correcting the recognition result
- G06V30/262—Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
- G06V30/274—Syntactic or semantic context, e.g. balancing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Databases & Information Systems (AREA)
- Biodiversity & Conservation Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a text classification method, a computer readable storage medium and a system, including: obtaining the text to be classified; obtaining multiple characters and multiple words representing the text to be classified; obtaining multiple character vectors and multiple word vectors; multiple described character vectors are input into the stack bidirectional recurrent neural network based on character vector to obtain the classification results based on character vector, and multiple described word vectors are input into the stack bidirectional recurrent neural network based on word vector to obtain the classification results based on word vector; counting the number of characters and the number of words representing the text to be classified, if the relationship of the number of characters and words satisfies the set threshold, the classification result based on character vector is selected; otherwise, the classification result based on word vector is selected.
Description
DESCRIPTION LU504829
TEXT CLASSIFICATION METHOD, COMPUTER READABLE
STORAGE MEDIUM AND SYSTEM
The invention relates to the field of natural language processing, in particular to a text classification method, a computer readable storage medium and a system.
With the development of Internet technology, people can use the Internet to publish a variety of comments, so it also produces a large amount of text information. Text information expresses people's choice tendency and provides a platform for information display and communication. It has become a research topic for obtaining the selection tendency information from text information. Among them, in the process of making the invention, the inventor found that the way to obtain the selection information is inefficient and the analysis accuracy is low.
Based on the above problems, the purpose of the invention is to provide a text classification method, which has the advantages of improving accuracy and efficiency.
A text classification method, including the following steps: obtaining the text to be classified; the text to be classified is subjected to character dividing and word dividing to obtain multiple characters and multiple words representing the text to be classified, obtaining multiple character vectors and multiple word vectors by vectorizing multiple characters and multiple words respectively; constructing a stack bidirectional recurrent neural network based on character vector and a stack bidirectional recurrent neural network based on word vector, and multiple described character vectors are input into the stack bidirectional recurrent neural network based on character vector to obtain the classification results based on character vector, and multiple described word vectors are input into the stack bidirectional recurrent neural network based 504829 word vector to obtain the classification results based on word vector, among them, the stack bidirectional recurrent neural network includes three layers of BLSTM layer and one layer of
Sigmod layer, each BLSTM layer is stacked with multiple LSTM units, and multiple LSTM units in each layer are distributed hierarchically, multiple LSTM units in each layer are set corresponding weight parameters, each LSTM unit takes the output of the upper LSTM unit and/or the upper LSTM unit of the same layer as the input, and finally obtains the output result in the Sigmod layer; counting the number of characters and the number of words representing the text to be classified, if the number of characters is less than or equal to half of the number of words, the classification result based on character vector is selected; otherwise, the classification result based on word vector is selected.
By using the stack bidirectional recurrent neural network, the high-level features representing the semantics of the text can be obtained by analyzing the content of the context in the text to be classified, the accuracy and efficiency are improved by fusing the character information and word information of the text to be classified.
Furthermore, the steps of constructing a stack bidirectional recurrent neural network based on character vector include: obtaining multiple training texts and the corresponding selection labels for each training text; dividing each training text separately to obtain multiple characters representing each training text; vectorizing multiple described characters representing each training text to obtain multiple character vectors; inputting the multiple character vectors corresponding to each training text and the corresponding selection labels of each training text into the stack bidirectional recurrent neural network based on character vector for training, and the parameters of the stack bidirectional recurrent neural network are optimized to obtain the stack bidirectional recurrent neural network based on character vector.
Furthermore, the steps of constructing a stacked bidirectional recurrent neural network based on word vector include: LU504829 obtaining multiple training texts and the corresponding selection labels for each training text; dividing each training text separately to obtain multiple words representing each training text; vectorizing multiple described words representing each training text to obtain multiple word vectors; inputting the multiple word vectors corresponding to each training text and the corresponding selection labels of each training text into the stack bidirectional recurrent neural network based on word vector for training, and the parameters of the stack bidirectional recurrent neural network are optimized to obtain the stack bidirectional recurrent neural network based on word vector.
Furthermore, character segmentation and word segmentation are performed on the text and/or training text to be classified by the hidden Markov model to obtain multiple characters and multiple words, so as to perform fast and accurate character segmentation and word segmentation of the text through the prediction and evaluation of the text.
Furthermore, using word2vec to vectorize multiple characters and multiple words that represent the text to be analyzed and/or the training text to obtain multiple character vectors and multiple word vectors, so as to realize the rapid vectorization of character vectors and word vectors.
Furthermore, the relationship between the number of characters and the number of words satisfies the set threshold: the number of characters is less than or equal to half of the number of words, the number of characters and the number of words segmented in the text have a great influence on the classification results, therefore, the optimal classification results can be selected to classify the text more accurately by analyzing the number of characters and the number of words in the text to be classified.
The invention also provides a computer readable storage medium on which a computer program is stored, which is characterized in that the computer program is executed by the processor to implement the steps of the text classification method as described in any of the above content.
The invention also provides a text classification system, including a memory, a processdr/504829 and a computer program stored in the memory and can be executed by the processor. The processor implements the steps of the text classification method as described above when executing the computer program.
The invention is described in detail in the following for a better understanding and implementation.
Fig. 1 1s the flow chart of the text classification method in the embodiment of the invention;
Fig. 2 is the flow chart of the stack bidirectional recurrent neural network based on character vector in the embodiment of the invention;
Fig. 3 is the flow chart of the stack bidirectional recurrent neural network based on word vector in the embodiment of the invention;
Fig. 4 is the schematic diagram of the stack bidirectional recurrent neural network based on character vector and word vector in the embodiment of the invention.
Referring to Fig. 1, which is the flow chart of the text classification method in the embodiment of the invention. The text classification method includes the following steps:
Step S1: obtaining the text to be classified.
In an embodiment, the text to be classified is a text with a tendency to choose, for example, positive emotions such as preference and approval for a person, event, or product indicating the text of choosing the person, event, or product; or the negative emotions such as disgust and opposition to a person, event or product indicating the text of not choosing the person, event or product.
Step S2: the text to be classified is subjected to character dividing and word dividing to obtain multiple characters and multiple words representing the text to be classified,
Step S3: obtaining multiple character vectors and multiple word vectors by vectorizing multiple characters and multiple words respectively; LU504829 in one embodiment, the vectorization transforms the symbolic information in the form of natural language into digital information in the form of a vector, and then machine learning and processing is realized, for example, ‘good’ is expressed as [0000000100... ].
Step S4: constructing a stack bidirectional recurrent neural network based on character vector and a stack bidirectional recurrent neural network based on word vector, and multiple described character vectors are input into the stack bidirectional recurrent neural network based on character vector to obtain the classification results based on character vector, and multiple described word vectors are input into the stack bidirectional recurrent neural network based on word vector to obtain the classification results based on word vector;
In an embodiment, the classification result can be a text result with positive emotions such as preference and approval for a person, event, or product indicating the text of choosing the person, event, or product, or a text result with negative emotions such as disgust and opposition to a person, event or product indicating the text of not choosing the person, event or product. In machine learning and processing, optionally, 'l' denotes the selected text result, and '0' denotes the unselected text result.
Step SS: counting the number of characters and the number of words representing the text to be classified, if the number of characters is less than or equal to half of the number of words, the classification result based on character vector is selected; otherwise, the classification result based on word vector is selected.
In one embodiment, the inventor found in the creation process that the number of characters and the number of words segmented in the text have a great influence on the classification results. By analyzing the number of characters and the number of words in the text to be classified, the optimal classification results can be selected. In an embodiment, the inventor found in the creation process that the relationship between the number of characters and the number of words satisfies the set threshold: the number of characters is less than or equal to half of the number of words, that is, if the number of characters is less than or equal to half of the number of words, the classification result based on character vector is more accurate; if the number of characters is greater than half of the number of words, the classification result based on word vector is more accurate.
By using the stacked bidirectional recurrent neural network, the content of the context 904829 the text to be classified can be analyzed, and the high-level features representing the semantics of the text can be obtained, the accuracy and efficiency are improved by fusing the character information and word information of the text to be classified.
In one embodiment, the text to be classified is subjected to character dividing and word dividing through the hidden Markov model to obtain multiple characters and multiple words representing the text to be classified, so as to perform fast and accurate character segmentation and word segmentation on the text through the prediction and evaluation of the text.
In one embodiment, using word2vec to vectorize multiple characters and multiple words that represent the text to be analyzed and/or the training text to obtain multiple character vectors and multiple word vectors, so as to realize the rapid vectorization of character vectors and word vectors.
Please refer to Fig. 2, which is the flow chart of constructing a stacked bidirectional recurrent neural network based on word vector for the embodiment of the invention.
In one embodiment, the steps for constructing a stacked bidirectional recurrent neural network based on character vector include:
Step S411: obtaining multiple training texts and the corresponding selection labels for each training text;
In one embodiment, multiple training texts are training texts with selection labels from the Chinese sentiment analysis corpus of ChnSentiCorp, and/or texts in a network data set with selection labels, wherein the selection label can be a text label with positive emotions such as preference and approval for a person, event or product indicating the text of choosing the person, event or product, or a text label with negative emotions such as disgust and opposition to a person, event or product indicating the text of not choosing the person, event or product. In machine learning and processing, optionally, ‘1’ denotes the selected text result, and ‘0’ denotes the unselected text result.
Step S412: dividing each training text separately to obtain multiple characters representing each training text;
In one embodiment, the text to be classified is character-divided by the hidden Markov model to obtain multiple characters representing the text to be classified.
Step S413: vectorizing multiple described characters representing each training text t6)504829 obtain multiple character vectors;
Step S414: inputting the multiple character vectors corresponding to each training text and the corresponding selection labels of each training text into the stack bidirectional recurrent neural network based on character vector for training, and the parameters of the stack bidirectional recurrent neural network are optimized to obtain the stack bidirectional recurrent neural network based on character vector.
In one embodiment, the character vector-based stacked bidirectional recurrent neural network includes three BLSTM layers and one Sigmod layer; each BLSTM layer 1s stacked with multiple LSTM units, and multiple LSTM units in each layer are distributed hierarchically.
Multiple LSTM units in each layer are set with corresponding weight parameters. Each LSTM unit takes the output of the upper LSTM unit and/or the upper LSTM unit of the same layer as the input and finally obtains the output result in the Sigmod layer. For example, multiple character vectors corresponding to each training text are input into the stack bidirectional recurrent neural network based on word vector. After the three-layer BLSTM layer, the output results are obtained in the Sigmod layer. At this time, if the output results do not match the corresponding selection labels, the random gradient descent algorithm is used to update and iterate each weight parameter, and then multiple character vectors are used as input to recalculate until the output results match the corresponding selection labels. By repeating a large number of the above training, the stack bidirectional recurrent neural network based on character vector is obtained. In order to prevent the problem of over-fitting, the dropout strategy is adopted in the training process, that is, in a training cycle, we first randomly select some units in the neural layer and temporarily hide them, and then carry out the training and optimization process of the neural network in this cycle. In the next cycle, we will hide some other neurons until the training is over. In one embodiment, dropout is set to 0.5.
Please refer to Fig. 3 and Fig. 4 at the same time. Fig. 3 is the flow chart of the stack bidirectional recurrent neural network based on word vector in the embodiment of the invention.
Fig. 4 is the schematic diagram of the stack bidirectional recurrent neural network based on character vector and word vector in the embodiment of the invention.
In one embodiment, the steps of constructing a stacked bidirectional recurrent neural network based on word vector include: LU504829
Step S421: obtaining multiple training texts and the corresponding selection labels for each training text;
In one embodiment, multiple training texts are training texts with a selection label from the Chinese sentiment analysis corpus of ChnSentiCorp, and/or texts in a network dataset with a selection label, where the selection label can be a text label with positive emotions such as preference and approval for a person, event or product indicating the text of choosing the person, event or product, or a text label with negative emotions such as disgust and opposition to a person, event or product indicating the text of not choosing the person, event or product. In machine learning and processing, optionally, ‘1’ denotes the selected text result, and ‘0’ denotes the unselected text label.
Step S422: dividing each training text separately to obtain multiple words representing each training text;
In one embodiment, the text to be classified is word-divided by the hidden Markov model to obtain multiple words representing the text to be classified.
Step S423: vectorizing multiple described words representing each training text to obtain multiple word vectors;
Step S424: inputting the multiple word vectors corresponding to each training text and the corresponding selection labels of each training text into the stack bidirectional recurrent neural network based on word vector for training, and the parameters of the stack bidirectional recurrent neural network are optimized to obtain the stack bidirectional recurrent neural network based on word vector.
In one embodiment, the word vector-based stacked bidirectional recurrent neural network includes three BLSTM layers and one Sigmod layer; each BLSTM layer is stacked with multiple LSTM units, and multiple LSTM units in each layer are distributed hierarchically.
Multiple LSTM units in each layer are set with corresponding weight parameters. Each LSTM unit takes the output of the upper LSTM unit and/or the upper LSTM unit of the same layer as the input and finally obtains the output result in the Sigmod layer. For example, multiple word vectors corresponding to each training text are input into a word vector-based stacked bidirectional recurrent neural network, after passing through the three-layer BLSTM layer, the output result is obtained in the Sigmod layer. At this time, if the output result does not conforht504829 to the corresponding selection label, the random gradient descent algorithm is used to update and iterate the weight parameters, and then the multiple word vectors are used as input to recalculate until the output result conforms to the corresponding selection label. By repeating a lot of the above training, the word vector-based stacked bidirectional recurrent neural network is obtained.
In order to prevent the problem of over-fitting, the dropout strategy is adopted in the training process, that is, in a training cycle, we first randomly select some units in the neural layer and temporarily hide them, and then carry out the training and optimization process of the neural network in this cycle. In the next cycle, we will hide some other neurons until the training is over.
In one embodiment, dropout is set to 0.5.
In one embodiment, the training text is character-divide and word-divide by the hidden
Markov model to obtain multiple characters and multiple words of the training text. Through the prediction and evaluation of the text, the text is segmented quickly and accurately.
In one embodiment, word2vec is used to vectorize multiple described characters and multiple described words of the training text respectively to obtain multiple character vectors and multiple word vectors, so as to realize fast vectorization of character vectors and word vectors.
The invention also provides a computer readable storage medium on which a computer program is stored, which is characterized in that the computer program is executed by the processor to implement the steps of the text classification method as described in any of the above content.
The invention also provides a text classification system, including a memory, a processor, and a computer program stored in the memory and can be executed by the processor. The processor implements the steps of the text classification method as described above when executing the computer program.
By using the stacked bidirectional recurrent neural network, the high-level features representing the semantics of the text can be obtained by analyzing the content of the context in the text to be classified. By fusing the character information and word information of the text to be classified, the accuracy and efficiency are improved.
The above embodiments only express several implementation methods of the invention, and the descriptions are more specific and detailed, but they cannot be understood as restrictions on the scope of the invention, it should be pointed out that for the ordinary technical personnel 04829 this field, some deformations and improvements can be made without breaking away from the idea of the invention, those deformations and improvements are all within the protection scope of the invention.
Claims (8)
1. À text classification method, including the following steps: obtaining the text to be classified; the text to be classified is subjected to character dividing and word dividing to obtain multiple characters and multiple words representing the text to be classified; obtaining multiple character vectors and multiple word vectors by vectorizing multiple characters and multiple words respectively; constructing a stack bidirectional recurrent neural network based on character vector and a stack bidirectional recurrent neural network based on word vector, and multiple described character vectors are input into the stack bidirectional recurrent neural network based on character vector to obtain the classification results based on character vector, and multiple described word vectors are input into the stack bidirectional recurrent neural network based on word vector to obtain the classification results based on word vector, among them, the stack bidirectional recurrent neural network includes three layers of BLSTM layer and one layer of Sigmod layer; each BLSTM layer is stacked with multiple LSTM units, and multiple LSTM units in each layer are distributed hierarchically, multiple LSTM units in each layer are set corresponding weight parameters, each LSTM unit takes the output of the upper LSTM unit and/or the upper LSTM unit of the same layer as the input, and finally obtains the output result in the Sigmod layer; counting the number of characters and the number of words representing the text to be classified, 1f the number of characters is less than or equal to half of the number of words, the classification result based on character vector 1s selected; otherwise, the classification result based on word vector is selected.
2. The text classification method according to claim 1, wherein the steps of constructing a stack bidirectional recurrent neural network based on character vector include: obtaining multiple training texts and the corresponding selection labels for each training text; dividing each training text separately to obtain multiple characters representing each training text;
vectorizing multiple described characters representing each training text to obtain multipt/504829 character vectors; inputting the multiple character vectors corresponding to each training text and the corresponding selection labels of each training text into the stack bidirectional recurrent neural network based on character vector for training, and the parameters of the stack bidirectional recurrent neural network are optimized to obtain the stack bidirectional recurrent neural network based on character vector.
3. The text classification method according to claim 2, wherein the steps of constructing a stacked bidirectional recurrent neural network based on word vector include: obtaining multiple training texts and the corresponding selection labels for each training text; dividing each training text separately to obtain multiple words representing each training text; vectorizing multiple described words representing each training text to obtain multiple word vectors; inputting the multiple word vectors corresponding to each training text and the corresponding selection labels of each training text into the stack bidirectional recurrent neural network based on word vector for training, and the parameters of the stack bidirectional recurrent neural network are optimized to obtain the stack bidirectional recurrent neural network based on word vector.
4. The text classification method according to claim 3, wherein character segmentation and word segmentation are performed on the text and/or training text to be classified by the hidden Markov model to obtain multiple characters and multiple words.
5. The text classification method according to claim 3, wherein using word2vec to vectorize multiple characters and multiple words that represent the text to be analyzed and/or the training text to obtain multiple character vectors and multiple word vectors.
6. The text classification method according to claim 2, wherein multiple training texts are training texts with selection labels from the Chinese sentiment analysis corpus of ChnSentiCorp, and/or texts in a network data set with selection labels.
7. A computer readable storage medium on which a computer program is stored, wherein the computer program is executed by the processor to implement the steps of the tek}/504829 classification method as described in any of the claims 1-6.
8. À text classification system, wherein includes a memory, a processor, and a computer program stored in the memory and can be executed by the processor; the processor implements the steps of the text classification method as described in any of the claims 1-6 when executing the computer program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
LU504829A LU504829B1 (en) | 2023-07-28 | 2023-07-28 | Text classification method, computer readable storage medium and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
LU504829A LU504829B1 (en) | 2023-07-28 | 2023-07-28 | Text classification method, computer readable storage medium and system |
Publications (1)
Publication Number | Publication Date |
---|---|
LU504829B1 true LU504829B1 (en) | 2024-01-29 |
Family
ID=89808356
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
LU504829A LU504829B1 (en) | 2023-07-28 | 2023-07-28 | Text classification method, computer readable storage medium and system |
Country Status (1)
Country | Link |
---|---|
LU (1) | LU504829B1 (en) |
-
2023
- 2023-07-28 LU LU504829A patent/LU504829B1/en active
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110245229B (en) | Deep learning theme emotion classification method based on data enhancement | |
CN108391446B (en) | Automatic extraction of training corpus for data classifier based on machine learning algorithm | |
CN107729309A (en) | A kind of method and device of the Chinese semantic analysis based on deep learning | |
CN110619044B (en) | Emotion analysis method, system, storage medium and equipment | |
CN110263822B (en) | Image emotion analysis method based on multi-task learning mode | |
CN109271513B (en) | Text classification method, computer readable storage medium and system | |
CN109492105B (en) | Text emotion classification method based on multi-feature ensemble learning | |
CN114722805B (en) | Little sample emotion classification method based on size instructor knowledge distillation | |
CN111506732A (en) | Text multi-level label classification method | |
CN111339260A (en) | BERT and QA thought-based fine-grained emotion analysis method | |
CN113987187A (en) | Multi-label embedding-based public opinion text classification method, system, terminal and medium | |
KR102403330B1 (en) | Technique for generating and utilizing virtual fingerprint representing text data | |
CN111859909B (en) | Semantic scene consistency recognition reading robot | |
CN112000778A (en) | Natural language processing method, device and system based on semantic recognition | |
CN113515632A (en) | Text classification method based on graph path knowledge extraction | |
CN113849653A (en) | Text classification method and device | |
CN112364743A (en) | Video classification method based on semi-supervised learning and bullet screen analysis | |
Jishan et al. | Natural language description of images using hybrid recurrent neural network | |
CN115391520A (en) | Text emotion classification method, system, device and computer medium | |
CN115017879A (en) | Text comparison method, computer device and computer storage medium | |
CN113051887A (en) | Method, system and device for extracting announcement information elements | |
CN110827797A (en) | Voice response event classification processing method and device | |
CN117150436B (en) | Multi-mode self-adaptive fusion topic identification method and system | |
CN114443846A (en) | Classification method and device based on multi-level text abnormal composition and electronic equipment | |
CN110263148A (en) | Intelligent resume selection method and device |