LU504829B1 - Text classification method, computer readable storage medium and system - Google Patents

Text classification method, computer readable storage medium and system Download PDF

Info

Publication number
LU504829B1
LU504829B1 LU504829A LU504829A LU504829B1 LU 504829 B1 LU504829 B1 LU 504829B1 LU 504829 A LU504829 A LU 504829A LU 504829 A LU504829 A LU 504829A LU 504829 B1 LU504829 B1 LU 504829B1
Authority
LU
Luxembourg
Prior art keywords
text
training
neural network
character
recurrent neural
Prior art date
Application number
LU504829A
Other languages
French (fr)
Inventor
Biqing Zeng
Original Assignee
Univ South China Normal
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ South China Normal filed Critical Univ South China Normal
Priority to LU504829A priority Critical patent/LU504829B1/en
Application granted granted Critical
Publication of LU504829B1 publication Critical patent/LU504829B1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/1801Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections
    • G06V30/18019Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections by matching or filtering
    • G06V30/18038Biologically-inspired filters, e.g. difference of Gaussians [DoG], Gabor filters
    • G06V30/18048Biologically-inspired filters, e.g. difference of Gaussians [DoG], Gabor filters with interaction between the responses of different filters, e.g. cortical complex cells
    • G06V30/18057Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result
    • G06V30/262Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
    • G06V30/274Syntactic or semantic context, e.g. balancing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a text classification method, a computer readable storage medium and a system, including: obtaining the text to be classified; obtaining multiple characters and multiple words representing the text to be classified; obtaining multiple character vectors and multiple word vectors; multiple described character vectors are input into the stack bidirectional recurrent neural network based on character vector to obtain the classification results based on character vector, and multiple described word vectors are input into the stack bidirectional recurrent neural network based on word vector to obtain the classification results based on word vector; counting the number of characters and the number of words representing the text to be classified, if the relationship of the number of characters and words satisfies the set threshold, the classification result based on character vector is selected; otherwise, the classification result based on word vector is selected.

Description

DESCRIPTION LU504829
TEXT CLASSIFICATION METHOD, COMPUTER READABLE
STORAGE MEDIUM AND SYSTEM
TECHNICAL FIELD
The invention relates to the field of natural language processing, in particular to a text classification method, a computer readable storage medium and a system.
BACKGROUND ART
With the development of Internet technology, people can use the Internet to publish a variety of comments, so it also produces a large amount of text information. Text information expresses people's choice tendency and provides a platform for information display and communication. It has become a research topic for obtaining the selection tendency information from text information. Among them, in the process of making the invention, the inventor found that the way to obtain the selection information is inefficient and the analysis accuracy is low.
SUMMARY
Based on the above problems, the purpose of the invention is to provide a text classification method, which has the advantages of improving accuracy and efficiency.
A text classification method, including the following steps: obtaining the text to be classified; the text to be classified is subjected to character dividing and word dividing to obtain multiple characters and multiple words representing the text to be classified, obtaining multiple character vectors and multiple word vectors by vectorizing multiple characters and multiple words respectively; constructing a stack bidirectional recurrent neural network based on character vector and a stack bidirectional recurrent neural network based on word vector, and multiple described character vectors are input into the stack bidirectional recurrent neural network based on character vector to obtain the classification results based on character vector, and multiple described word vectors are input into the stack bidirectional recurrent neural network based 504829 word vector to obtain the classification results based on word vector, among them, the stack bidirectional recurrent neural network includes three layers of BLSTM layer and one layer of
Sigmod layer, each BLSTM layer is stacked with multiple LSTM units, and multiple LSTM units in each layer are distributed hierarchically, multiple LSTM units in each layer are set corresponding weight parameters, each LSTM unit takes the output of the upper LSTM unit and/or the upper LSTM unit of the same layer as the input, and finally obtains the output result in the Sigmod layer; counting the number of characters and the number of words representing the text to be classified, if the number of characters is less than or equal to half of the number of words, the classification result based on character vector is selected; otherwise, the classification result based on word vector is selected.
By using the stack bidirectional recurrent neural network, the high-level features representing the semantics of the text can be obtained by analyzing the content of the context in the text to be classified, the accuracy and efficiency are improved by fusing the character information and word information of the text to be classified.
Furthermore, the steps of constructing a stack bidirectional recurrent neural network based on character vector include: obtaining multiple training texts and the corresponding selection labels for each training text; dividing each training text separately to obtain multiple characters representing each training text; vectorizing multiple described characters representing each training text to obtain multiple character vectors; inputting the multiple character vectors corresponding to each training text and the corresponding selection labels of each training text into the stack bidirectional recurrent neural network based on character vector for training, and the parameters of the stack bidirectional recurrent neural network are optimized to obtain the stack bidirectional recurrent neural network based on character vector.
Furthermore, the steps of constructing a stacked bidirectional recurrent neural network based on word vector include: LU504829 obtaining multiple training texts and the corresponding selection labels for each training text; dividing each training text separately to obtain multiple words representing each training text; vectorizing multiple described words representing each training text to obtain multiple word vectors; inputting the multiple word vectors corresponding to each training text and the corresponding selection labels of each training text into the stack bidirectional recurrent neural network based on word vector for training, and the parameters of the stack bidirectional recurrent neural network are optimized to obtain the stack bidirectional recurrent neural network based on word vector.
Furthermore, character segmentation and word segmentation are performed on the text and/or training text to be classified by the hidden Markov model to obtain multiple characters and multiple words, so as to perform fast and accurate character segmentation and word segmentation of the text through the prediction and evaluation of the text.
Furthermore, using word2vec to vectorize multiple characters and multiple words that represent the text to be analyzed and/or the training text to obtain multiple character vectors and multiple word vectors, so as to realize the rapid vectorization of character vectors and word vectors.
Furthermore, the relationship between the number of characters and the number of words satisfies the set threshold: the number of characters is less than or equal to half of the number of words, the number of characters and the number of words segmented in the text have a great influence on the classification results, therefore, the optimal classification results can be selected to classify the text more accurately by analyzing the number of characters and the number of words in the text to be classified.
The invention also provides a computer readable storage medium on which a computer program is stored, which is characterized in that the computer program is executed by the processor to implement the steps of the text classification method as described in any of the above content.
The invention also provides a text classification system, including a memory, a processdr/504829 and a computer program stored in the memory and can be executed by the processor. The processor implements the steps of the text classification method as described above when executing the computer program.
The invention is described in detail in the following for a better understanding and implementation.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 1s the flow chart of the text classification method in the embodiment of the invention;
Fig. 2 is the flow chart of the stack bidirectional recurrent neural network based on character vector in the embodiment of the invention;
Fig. 3 is the flow chart of the stack bidirectional recurrent neural network based on word vector in the embodiment of the invention;
Fig. 4 is the schematic diagram of the stack bidirectional recurrent neural network based on character vector and word vector in the embodiment of the invention.
DETAILED DESCRIPTION OF THE EMBODIMENTS
Referring to Fig. 1, which is the flow chart of the text classification method in the embodiment of the invention. The text classification method includes the following steps:
Step S1: obtaining the text to be classified.
In an embodiment, the text to be classified is a text with a tendency to choose, for example, positive emotions such as preference and approval for a person, event, or product indicating the text of choosing the person, event, or product; or the negative emotions such as disgust and opposition to a person, event or product indicating the text of not choosing the person, event or product.
Step S2: the text to be classified is subjected to character dividing and word dividing to obtain multiple characters and multiple words representing the text to be classified,
Step S3: obtaining multiple character vectors and multiple word vectors by vectorizing multiple characters and multiple words respectively; LU504829 in one embodiment, the vectorization transforms the symbolic information in the form of natural language into digital information in the form of a vector, and then machine learning and processing is realized, for example, ‘good’ is expressed as [0000000100... ].
Step S4: constructing a stack bidirectional recurrent neural network based on character vector and a stack bidirectional recurrent neural network based on word vector, and multiple described character vectors are input into the stack bidirectional recurrent neural network based on character vector to obtain the classification results based on character vector, and multiple described word vectors are input into the stack bidirectional recurrent neural network based on word vector to obtain the classification results based on word vector;
In an embodiment, the classification result can be a text result with positive emotions such as preference and approval for a person, event, or product indicating the text of choosing the person, event, or product, or a text result with negative emotions such as disgust and opposition to a person, event or product indicating the text of not choosing the person, event or product. In machine learning and processing, optionally, 'l' denotes the selected text result, and '0' denotes the unselected text result.
Step SS: counting the number of characters and the number of words representing the text to be classified, if the number of characters is less than or equal to half of the number of words, the classification result based on character vector is selected; otherwise, the classification result based on word vector is selected.
In one embodiment, the inventor found in the creation process that the number of characters and the number of words segmented in the text have a great influence on the classification results. By analyzing the number of characters and the number of words in the text to be classified, the optimal classification results can be selected. In an embodiment, the inventor found in the creation process that the relationship between the number of characters and the number of words satisfies the set threshold: the number of characters is less than or equal to half of the number of words, that is, if the number of characters is less than or equal to half of the number of words, the classification result based on character vector is more accurate; if the number of characters is greater than half of the number of words, the classification result based on word vector is more accurate.
By using the stacked bidirectional recurrent neural network, the content of the context 904829 the text to be classified can be analyzed, and the high-level features representing the semantics of the text can be obtained, the accuracy and efficiency are improved by fusing the character information and word information of the text to be classified.
In one embodiment, the text to be classified is subjected to character dividing and word dividing through the hidden Markov model to obtain multiple characters and multiple words representing the text to be classified, so as to perform fast and accurate character segmentation and word segmentation on the text through the prediction and evaluation of the text.
In one embodiment, using word2vec to vectorize multiple characters and multiple words that represent the text to be analyzed and/or the training text to obtain multiple character vectors and multiple word vectors, so as to realize the rapid vectorization of character vectors and word vectors.
Please refer to Fig. 2, which is the flow chart of constructing a stacked bidirectional recurrent neural network based on word vector for the embodiment of the invention.
In one embodiment, the steps for constructing a stacked bidirectional recurrent neural network based on character vector include:
Step S411: obtaining multiple training texts and the corresponding selection labels for each training text;
In one embodiment, multiple training texts are training texts with selection labels from the Chinese sentiment analysis corpus of ChnSentiCorp, and/or texts in a network data set with selection labels, wherein the selection label can be a text label with positive emotions such as preference and approval for a person, event or product indicating the text of choosing the person, event or product, or a text label with negative emotions such as disgust and opposition to a person, event or product indicating the text of not choosing the person, event or product. In machine learning and processing, optionally, ‘1’ denotes the selected text result, and ‘0’ denotes the unselected text result.
Step S412: dividing each training text separately to obtain multiple characters representing each training text;
In one embodiment, the text to be classified is character-divided by the hidden Markov model to obtain multiple characters representing the text to be classified.
Step S413: vectorizing multiple described characters representing each training text t6)504829 obtain multiple character vectors;
Step S414: inputting the multiple character vectors corresponding to each training text and the corresponding selection labels of each training text into the stack bidirectional recurrent neural network based on character vector for training, and the parameters of the stack bidirectional recurrent neural network are optimized to obtain the stack bidirectional recurrent neural network based on character vector.
In one embodiment, the character vector-based stacked bidirectional recurrent neural network includes three BLSTM layers and one Sigmod layer; each BLSTM layer 1s stacked with multiple LSTM units, and multiple LSTM units in each layer are distributed hierarchically.
Multiple LSTM units in each layer are set with corresponding weight parameters. Each LSTM unit takes the output of the upper LSTM unit and/or the upper LSTM unit of the same layer as the input and finally obtains the output result in the Sigmod layer. For example, multiple character vectors corresponding to each training text are input into the stack bidirectional recurrent neural network based on word vector. After the three-layer BLSTM layer, the output results are obtained in the Sigmod layer. At this time, if the output results do not match the corresponding selection labels, the random gradient descent algorithm is used to update and iterate each weight parameter, and then multiple character vectors are used as input to recalculate until the output results match the corresponding selection labels. By repeating a large number of the above training, the stack bidirectional recurrent neural network based on character vector is obtained. In order to prevent the problem of over-fitting, the dropout strategy is adopted in the training process, that is, in a training cycle, we first randomly select some units in the neural layer and temporarily hide them, and then carry out the training and optimization process of the neural network in this cycle. In the next cycle, we will hide some other neurons until the training is over. In one embodiment, dropout is set to 0.5.
Please refer to Fig. 3 and Fig. 4 at the same time. Fig. 3 is the flow chart of the stack bidirectional recurrent neural network based on word vector in the embodiment of the invention.
Fig. 4 is the schematic diagram of the stack bidirectional recurrent neural network based on character vector and word vector in the embodiment of the invention.
In one embodiment, the steps of constructing a stacked bidirectional recurrent neural network based on word vector include: LU504829
Step S421: obtaining multiple training texts and the corresponding selection labels for each training text;
In one embodiment, multiple training texts are training texts with a selection label from the Chinese sentiment analysis corpus of ChnSentiCorp, and/or texts in a network dataset with a selection label, where the selection label can be a text label with positive emotions such as preference and approval for a person, event or product indicating the text of choosing the person, event or product, or a text label with negative emotions such as disgust and opposition to a person, event or product indicating the text of not choosing the person, event or product. In machine learning and processing, optionally, ‘1’ denotes the selected text result, and ‘0’ denotes the unselected text label.
Step S422: dividing each training text separately to obtain multiple words representing each training text;
In one embodiment, the text to be classified is word-divided by the hidden Markov model to obtain multiple words representing the text to be classified.
Step S423: vectorizing multiple described words representing each training text to obtain multiple word vectors;
Step S424: inputting the multiple word vectors corresponding to each training text and the corresponding selection labels of each training text into the stack bidirectional recurrent neural network based on word vector for training, and the parameters of the stack bidirectional recurrent neural network are optimized to obtain the stack bidirectional recurrent neural network based on word vector.
In one embodiment, the word vector-based stacked bidirectional recurrent neural network includes three BLSTM layers and one Sigmod layer; each BLSTM layer is stacked with multiple LSTM units, and multiple LSTM units in each layer are distributed hierarchically.
Multiple LSTM units in each layer are set with corresponding weight parameters. Each LSTM unit takes the output of the upper LSTM unit and/or the upper LSTM unit of the same layer as the input and finally obtains the output result in the Sigmod layer. For example, multiple word vectors corresponding to each training text are input into a word vector-based stacked bidirectional recurrent neural network, after passing through the three-layer BLSTM layer, the output result is obtained in the Sigmod layer. At this time, if the output result does not conforht504829 to the corresponding selection label, the random gradient descent algorithm is used to update and iterate the weight parameters, and then the multiple word vectors are used as input to recalculate until the output result conforms to the corresponding selection label. By repeating a lot of the above training, the word vector-based stacked bidirectional recurrent neural network is obtained.
In order to prevent the problem of over-fitting, the dropout strategy is adopted in the training process, that is, in a training cycle, we first randomly select some units in the neural layer and temporarily hide them, and then carry out the training and optimization process of the neural network in this cycle. In the next cycle, we will hide some other neurons until the training is over.
In one embodiment, dropout is set to 0.5.
In one embodiment, the training text is character-divide and word-divide by the hidden
Markov model to obtain multiple characters and multiple words of the training text. Through the prediction and evaluation of the text, the text is segmented quickly and accurately.
In one embodiment, word2vec is used to vectorize multiple described characters and multiple described words of the training text respectively to obtain multiple character vectors and multiple word vectors, so as to realize fast vectorization of character vectors and word vectors.
The invention also provides a computer readable storage medium on which a computer program is stored, which is characterized in that the computer program is executed by the processor to implement the steps of the text classification method as described in any of the above content.
The invention also provides a text classification system, including a memory, a processor, and a computer program stored in the memory and can be executed by the processor. The processor implements the steps of the text classification method as described above when executing the computer program.
By using the stacked bidirectional recurrent neural network, the high-level features representing the semantics of the text can be obtained by analyzing the content of the context in the text to be classified. By fusing the character information and word information of the text to be classified, the accuracy and efficiency are improved.
The above embodiments only express several implementation methods of the invention, and the descriptions are more specific and detailed, but they cannot be understood as restrictions on the scope of the invention, it should be pointed out that for the ordinary technical personnel 04829 this field, some deformations and improvements can be made without breaking away from the idea of the invention, those deformations and improvements are all within the protection scope of the invention.

Claims (8)

CLAIMS: LU504829
1. À text classification method, including the following steps: obtaining the text to be classified; the text to be classified is subjected to character dividing and word dividing to obtain multiple characters and multiple words representing the text to be classified; obtaining multiple character vectors and multiple word vectors by vectorizing multiple characters and multiple words respectively; constructing a stack bidirectional recurrent neural network based on character vector and a stack bidirectional recurrent neural network based on word vector, and multiple described character vectors are input into the stack bidirectional recurrent neural network based on character vector to obtain the classification results based on character vector, and multiple described word vectors are input into the stack bidirectional recurrent neural network based on word vector to obtain the classification results based on word vector, among them, the stack bidirectional recurrent neural network includes three layers of BLSTM layer and one layer of Sigmod layer; each BLSTM layer is stacked with multiple LSTM units, and multiple LSTM units in each layer are distributed hierarchically, multiple LSTM units in each layer are set corresponding weight parameters, each LSTM unit takes the output of the upper LSTM unit and/or the upper LSTM unit of the same layer as the input, and finally obtains the output result in the Sigmod layer; counting the number of characters and the number of words representing the text to be classified, 1f the number of characters is less than or equal to half of the number of words, the classification result based on character vector 1s selected; otherwise, the classification result based on word vector is selected.
2. The text classification method according to claim 1, wherein the steps of constructing a stack bidirectional recurrent neural network based on character vector include: obtaining multiple training texts and the corresponding selection labels for each training text; dividing each training text separately to obtain multiple characters representing each training text;
vectorizing multiple described characters representing each training text to obtain multipt/504829 character vectors; inputting the multiple character vectors corresponding to each training text and the corresponding selection labels of each training text into the stack bidirectional recurrent neural network based on character vector for training, and the parameters of the stack bidirectional recurrent neural network are optimized to obtain the stack bidirectional recurrent neural network based on character vector.
3. The text classification method according to claim 2, wherein the steps of constructing a stacked bidirectional recurrent neural network based on word vector include: obtaining multiple training texts and the corresponding selection labels for each training text; dividing each training text separately to obtain multiple words representing each training text; vectorizing multiple described words representing each training text to obtain multiple word vectors; inputting the multiple word vectors corresponding to each training text and the corresponding selection labels of each training text into the stack bidirectional recurrent neural network based on word vector for training, and the parameters of the stack bidirectional recurrent neural network are optimized to obtain the stack bidirectional recurrent neural network based on word vector.
4. The text classification method according to claim 3, wherein character segmentation and word segmentation are performed on the text and/or training text to be classified by the hidden Markov model to obtain multiple characters and multiple words.
5. The text classification method according to claim 3, wherein using word2vec to vectorize multiple characters and multiple words that represent the text to be analyzed and/or the training text to obtain multiple character vectors and multiple word vectors.
6. The text classification method according to claim 2, wherein multiple training texts are training texts with selection labels from the Chinese sentiment analysis corpus of ChnSentiCorp, and/or texts in a network data set with selection labels.
7. A computer readable storage medium on which a computer program is stored, wherein the computer program is executed by the processor to implement the steps of the tek}/504829 classification method as described in any of the claims 1-6.
8. À text classification system, wherein includes a memory, a processor, and a computer program stored in the memory and can be executed by the processor; the processor implements the steps of the text classification method as described in any of the claims 1-6 when executing the computer program.
LU504829A 2023-07-28 2023-07-28 Text classification method, computer readable storage medium and system LU504829B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
LU504829A LU504829B1 (en) 2023-07-28 2023-07-28 Text classification method, computer readable storage medium and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
LU504829A LU504829B1 (en) 2023-07-28 2023-07-28 Text classification method, computer readable storage medium and system

Publications (1)

Publication Number Publication Date
LU504829B1 true LU504829B1 (en) 2024-01-29

Family

ID=89808356

Family Applications (1)

Application Number Title Priority Date Filing Date
LU504829A LU504829B1 (en) 2023-07-28 2023-07-28 Text classification method, computer readable storage medium and system

Country Status (1)

Country Link
LU (1) LU504829B1 (en)

Similar Documents

Publication Publication Date Title
CN110245229B (en) Deep learning theme emotion classification method based on data enhancement
CN108391446B (en) Automatic extraction of training corpus for data classifier based on machine learning algorithm
CN107729309A (en) A kind of method and device of the Chinese semantic analysis based on deep learning
CN110619044B (en) Emotion analysis method, system, storage medium and equipment
CN110263822B (en) Image emotion analysis method based on multi-task learning mode
CN109271513B (en) Text classification method, computer readable storage medium and system
CN109492105B (en) Text emotion classification method based on multi-feature ensemble learning
CN114722805B (en) Little sample emotion classification method based on size instructor knowledge distillation
CN111506732A (en) Text multi-level label classification method
CN111339260A (en) BERT and QA thought-based fine-grained emotion analysis method
CN113987187A (en) Multi-label embedding-based public opinion text classification method, system, terminal and medium
KR102403330B1 (en) Technique for generating and utilizing virtual fingerprint representing text data
CN111859909B (en) Semantic scene consistency recognition reading robot
CN112000778A (en) Natural language processing method, device and system based on semantic recognition
CN113515632A (en) Text classification method based on graph path knowledge extraction
CN113849653A (en) Text classification method and device
CN112364743A (en) Video classification method based on semi-supervised learning and bullet screen analysis
Jishan et al. Natural language description of images using hybrid recurrent neural network
CN115391520A (en) Text emotion classification method, system, device and computer medium
CN115017879A (en) Text comparison method, computer device and computer storage medium
CN113051887A (en) Method, system and device for extracting announcement information elements
CN110827797A (en) Voice response event classification processing method and device
CN117150436B (en) Multi-mode self-adaptive fusion topic identification method and system
CN114443846A (en) Classification method and device based on multi-level text abnormal composition and electronic equipment
CN110263148A (en) Intelligent resume selection method and device