CN115759088A - Text analysis method and storage medium for comment information - Google Patents
Text analysis method and storage medium for comment information Download PDFInfo
- Publication number
- CN115759088A CN115759088A CN202310033845.XA CN202310033845A CN115759088A CN 115759088 A CN115759088 A CN 115759088A CN 202310033845 A CN202310033845 A CN 202310033845A CN 115759088 A CN115759088 A CN 115759088A
- Authority
- CN
- China
- Prior art keywords
- text
- gate
- output
- data
- analysis method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 34
- 239000013598 vector Substances 0.000 claims abstract description 48
- 230000009467 reduction Effects 0.000 claims abstract description 8
- 230000008451 emotion Effects 0.000 claims description 18
- 230000006870 function Effects 0.000 claims description 18
- 239000011159 matrix material Substances 0.000 claims description 18
- 238000004364 calculation method Methods 0.000 claims description 12
- 239000000126 substance Substances 0.000 claims description 12
- 230000002457 bidirectional effect Effects 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 9
- 230000011218 segmentation Effects 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 230000000306 recurrent effect Effects 0.000 claims description 3
- 238000000034 method Methods 0.000 claims description 2
- 230000006403 short-term memory Effects 0.000 abstract description 2
- 230000007787 long-term memory Effects 0.000 abstract 1
- 230000008859 change Effects 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A text analysis method for comment information and a storage medium thereof are disclosed, firstly, text comment data are preprocessed, text feature vectors are obtained by text vectorization on the preprocessed data, data noise reduction is carried out by using a self-encoder model, and then high-level feature vectors of the text comment information are extracted through a long-term and short-term memory network, so that text analysis of the comment information is realized. According to the invention, the data is subjected to noise reduction processing through the AE model, redundant features in the data are eliminated, and the efficiency of comment information analysis is effectively improved; by adopting the LSTM model, the document information is effectively utilized, so that the characteristics are more judgment, and the accuracy of comment information analysis is improved.
Description
Technical Field
The invention relates to the technical field of natural language processing, in particular to a text analysis method and a storage medium for comment information.
Background
With the development of internet technology, social media platforms become an important way for the public to publish opinions and communicate information. A large amount of social media comment data are collected from the Internet, valuable information in the social media comment data is mined, and the like of the people on the preference degree of a certain product or the attention degree and emotional change of a certain social phenomenon can be obtained.
Because social media websites are various in types and have too large amount of comment information, the arrangement and analysis of the comment information only by manpower can be a difficult task, so that the text comment information analysis needs to be further explored, complicated language features are automatically learned from a large amount of text data by adopting a more automatic and intelligent method and the text analysis is carried out, a large amount of manpower and material resources are saved, and the text comment analysis efficiency and accuracy are improved.
Disclosure of Invention
The invention aims to provide a text comment analysis method aiming at the problems of low efficiency, large workload, low accuracy and the like in the working mode of manual text comment analysis, so as to solve the problems.
In order to achieve the purpose, the invention adopts the following technical scheme:
a text analysis method of comment information comprises the following steps:
text comment data preprocessing step S110:
preprocessing the text comment data, filtering out irrelevant information, and performing word segmentation processing on the text comment data;
text comment vector extraction and processing step S120:
text feature vectors are obtained by text vectorization of the preprocessed text comment data, data noise reduction is carried out by using a self-encoder model, and then high-level feature vectors of text comment information are extracted through an LSTM model to represent the text comment data;
calculating emotion prediction results of the text comments S130:
receiving the high-level feature vector of the text comment information extracted in step S120, and calculating an emotion prediction result of the text comment.
Optionally, in step S110, the text comment data preprocessing specifically includes: and deleting punctuation marks and blank spaces by adopting a regular expression, introducing a field dictionary into the text data, and performing word segmentation processing on the data.
Optionally, in step S120, the self-encoder model is an unsupervised learning model, which can eliminate redundant features in data, reduce noise in data, and improve efficiency of comment information analysis.
wherein, the first and the second end of the pipe are connected with each other,in order to be a function of the ReLU,in order to input the feature vector of the text,to representThe weight matrix of (a) is determined,is composed ofThe bias term of (d);
optionally, in step S120, the LSTM model is a bidirectional improved recurrent neural network, and a bidirectional coding structure with stronger semantic ability is used to train the corpus, so as to implement deep bidirectional representation of corpus training.
Optionally, in step S120, the LSTM model is composed of 3 gate structures and 1 state unit, where the 3 gate structures include an input gate, a forgetting gate, and an output gate;
wherein the input gate receives two inputs, the output of the last-in-time LSTM modelAnd input of the current timeOutput of input gate at time tThe calculation formula is as follows:
wherein the content of the first and second substances,in order to be a sigmoid function,a weight matrix representing the input gate,indicating that two vectors are concatenated into one longer vector,is the bias term of the input gate;
output of the forgetting gateAlso receiving the output result of the last time LSTM modelAnd input of the current timeAnd determining whether to discard information from the state unit, the output calculation formula is:
wherein the content of the first and second substances,is a function of the sigmoid and is,is a weight matrix of the forgetting gate,meaning that two vectors are concatenated into one longer vector,is a biased term for a forgetting gate.
wherein the content of the first and second substances,is thatThe weight matrix of (a) is determined,representing the concatenation of two vectors into one longer vector, tanh represents the hyperbolic tangent activation function,is thatThe bias term of (c).
The status cell at the current timeReceiving values for the input gate and the forget gate, expressed as:
wherein, the first and the second end of the pipe are connected with each other,the cell state at the previous time is initialized to 1.
wherein, the first and the second end of the pipe are connected with each other,is a function of the sigmoid and is,is a weight matrix of the output gates,meaning that two vectors are concatenated into one longer vector,is the bias term for the output gate.
optionally, in step S130, an output of the LSTM model, which is a high-level feature vector of the text comment information extracted in step S120, is receivedObtaining emotion prediction result of text comment through softmax functionThe calculation formula is as follows:
wherein the content of the first and second substances,is a function of the sigmoid and is,is an emotion prediction resultThe weight matrix of (a) is determined,is an emotion prediction resultThe bias term of (c). When the temperature is higher than the set temperatureWhen so, the emotion prediction is positive.
Further, the present invention also discloses a storage medium for storing computer-executable instructions, which, when executed by a processor, perform the above-mentioned text analysis method for comment information.
Compared with the prior art, the invention has the following advantages:
1) Because the AE model is adopted for data noise reduction, redundant features in data can be eliminated, and text analysis efficiency of comment information is improved.
2) The invention adopts the LSTM model and effectively utilizes the document information, thereby enabling the characteristics to be more judgment and improving the accuracy of text analysis of comment information.
Drawings
Fig. 1 is a flowchart of a text analysis method of comment information according to a specific embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
The invention is characterized in that: and (3) carrying out data noise reduction by using an automatic encoder model (AE model), eliminating redundant features in the data, and extracting a high-level feature vector of the text comment information by using an LSTM model to realize text analysis of the comment information.
Referring to fig. 1, a flowchart of a text analysis method of comment information according to an embodiment of the present invention is shown, including the following steps:
text comment data preprocessing step S110:
and preprocessing the text comment data, filtering out irrelevant information, and performing word segmentation processing on the text comment data.
Specifically, in step S110, the text comment data preprocessing specifically includes: and deleting punctuation marks and blank spaces by adopting a regular expression, introducing a field dictionary into the text data, and performing word segmentation processing on the data.
Text comment vector extraction and processing step S120:
and for the preprocessed text comment data, text feature vectors are obtained by text vectorization, data noise reduction is carried out by using an Auto-Encoder model (Auto-Encoder), and then high-level feature vectors of text comment information are extracted through a Long Short-Term Memory network (LSTM) model to represent the text comment data.
Specifically, in step S120, the self-encoder model is an unsupervised learning model, which can eliminate redundant features in data, reduce noise in data, and improve efficiency of comment information analysis.
wherein the content of the first and second substances,in order to be a function of the ReLU,for the feature vector of the text to be input,representThe weight matrix of (a) is determined,is composed ofThe bias term of (c).
The LSTM model is a bidirectional improved recurrent neural network, and a bidirectional coding structure with stronger semantic ability is adopted to train the corpus so as to realize deep bidirectional representation of corpus training.
Specifically, in step S120, the LSTM model is composed of 3 gate structures and 1 state unit, where the 3 gate structures include an input gate, a forgetting gate, and an output gate;
wherein the input gate receives two inputs, the output of the last-in LSTM modelAnd input of the current timeOutput of input gate at time tThe calculation formula is as follows:
wherein, the first and the second end of the pipe are connected with each other,in order to be a sigmoid function,a weight matrix representing the input gate,meaning that two vectors are concatenated into one longer vector,is the bias term of the input gate;
output of the forgetting gateAlso receiving the output result of the last time LSTM modelAnd input of the current timeAnd determining whether to discard information from the state unit, the output calculation formula is:
wherein, the first and the second end of the pipe are connected with each other,is a function of the sigmoid and is,is a weight matrix of the forgetting gate,meaning that two vectors are concatenated into one longer vector,is a biased term for a forgetting gate.
wherein the content of the first and second substances,is thatThe weight matrix of (a) is determined,representing the concatenation of two vectors into one longer vector, tanh represents the hyperbolic tangent activation function,is thatThe bias term of (1).
The status cell at the current timeReceiving values for the input gate and the forget gate, expressed as:
wherein the content of the first and second substances,the cell state at the previous time is initialized to 1.
wherein the content of the first and second substances,is a function of the sigmoid and is,is a weight matrix of the output gates,indicating that two vectors are concatenated into one longer vector,is the bias term for the output gate.
calculating emotion prediction results of the text comments S130:
receiving the high-level feature vector of the text comment information extracted in step S120, and calculating an emotion prediction result of the text comment.
Specifically, in step S130, the input of the LSTM model, which is the high-level feature vector of the text comment information extracted in step S120, is receivedGo outObtaining emotion prediction result of text comment through softmax functionThe calculation formula is as follows:
wherein the content of the first and second substances,is a function of the sigmoid and is,is an emotion prediction resultThe weight matrix of (a) is determined,is an emotion prediction resultThe bias term of (c).
Further, the present invention also discloses a storage medium for storing computer-executable instructions, which, when executed by a processor, perform the above-mentioned text analysis method of comment information.
Compared with the prior art, the text analysis method of the comment information provided by the invention has the following advantages:
1) According to the invention, the AE model is adopted for data noise reduction, so that redundant features in the data can be eliminated, and the text analysis efficiency of comment information is improved.
2) The invention adopts the LSTM model and effectively utilizes the document information, thereby enabling the characteristics to be more judgment and improving the accuracy of text analysis of comment information.
It will be apparent to those skilled in the art that the various elements or steps of the invention described above may be implemented using a general purpose computing device, they may be centralized on a single computing device, or alternatively, they may be implemented using program code that is executable by a computing device, such that they may be stored in a memory device and executed by a computing device, or they may be separately fabricated into various integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
While the invention has been described in further detail with reference to specific preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (8)
1. A text analysis method for comment information is characterized by comprising the following steps:
text comment data preprocessing step S110:
preprocessing the text comment data, filtering out irrelevant information, and performing word segmentation processing on the text comment data;
text comment vector extraction and processing step S120:
text feature vectors are obtained by text vectorization on the preprocessed text comment data, data noise reduction is carried out by using a self-encoder model, and then high-level feature vectors of text comment information are extracted through an LSTM model to represent the text comment data;
calculating emotion prediction results of the text comments S130:
receiving the high-level feature vector of the text comment information extracted in step S120, and calculating an emotion prediction result of the text comment.
2. The text analysis method according to claim 1, wherein:
in step S110, the text comment data preprocessing specifically includes: and deleting punctuation marks and blank spaces by adopting a regular expression, introducing a field dictionary into the text data, and performing word segmentation processing on the data.
3. The text analysis method of claim 1, wherein:
in step S120, the self-encoder model is an unsupervised learning model,
4. The text analysis method of claim 3, wherein:
in step S120, the LSTM model is a bidirectional improved recurrent neural network, and a bidirectional coding structure with a stronger semantic ability is used to train the corpus, so as to implement deep bidirectional representation of corpus training.
5. The text analysis method of claim 4, wherein:
in step S120, the LSTM model is composed of 3 gate structures and 1 state unit, where the 3 gate structures include an input gate, a forgetting gate, and an output gate;
wherein the input gate receives two inputs, the output of the last-in-time LSTM modelAnd input of the current timeOutput of input gate at time tThe calculation formula is as follows:
wherein, the first and the second end of the pipe are connected with each other,in order to be a sigmoid function,a weight matrix representing the input gates is shown,meaning that two vectors are concatenated into one longer vector,is the bias term of the input gate;
output of the forgetting gateAlso receiving the output result of the last time LSTM modelAnd input of the current timeAnd determining whether to discard information from the state unit, the output calculation formula is:
wherein, the first and the second end of the pipe are connected with each other,is a function of the sigmoid and is,is a weight matrix of the forgetting gate,indicating that two vectors are concatenated into one longer vector,is a biased term for a forgetting gate;
wherein, the first and the second end of the pipe are connected with each other,is thatThe weight matrix of (a) is determined,representing the concatenation of two vectors into one longer vector, tanh represents the hyperbolic tangent activation function,is thatThe bias term of (a);
the status cell at the current timeReceiving values for the input gate and the forget gate, expressed as:
wherein, the first and the second end of the pipe are connected with each other,the cell state at the previous moment is initialized to 1;
wherein the content of the first and second substances,is a function of the sigmoid and is,is a weight matrix of the output gates,indicating that two vectors are concatenated into one longer vector,is the bias term of the output gate;
6. the text analysis method of claim 5, wherein:
in step S130, the output of the LSTM model, which is the high-level feature vector of the text comment information extracted in step S120, is receivedObtaining emotion prediction results of text comments through softmax functionThe calculation formula is as follows:
8. A storage medium storing computer-executable instructions, characterized in that:
the computer-executable instructions, when executed by a processor, perform a method of text analysis of review information as recited in any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310033845.XA CN115759088A (en) | 2023-01-10 | 2023-01-10 | Text analysis method and storage medium for comment information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310033845.XA CN115759088A (en) | 2023-01-10 | 2023-01-10 | Text analysis method and storage medium for comment information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115759088A true CN115759088A (en) | 2023-03-07 |
Family
ID=85348879
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310033845.XA Pending CN115759088A (en) | 2023-01-10 | 2023-01-10 | Text analysis method and storage medium for comment information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115759088A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107153642A (en) * | 2017-05-16 | 2017-09-12 | 华北电力大学 | A kind of analysis method based on neural network recognization text comments Sentiment orientation |
CN110737952A (en) * | 2019-09-17 | 2020-01-31 | 太原理工大学 | prediction method for residual life of key parts of mechanical equipment by combining AE and bi-LSTM |
CN111127146A (en) * | 2019-12-19 | 2020-05-08 | 江西财经大学 | Information recommendation method and system based on convolutional neural network and noise reduction self-encoder |
CN114138942A (en) * | 2021-12-09 | 2022-03-04 | 南京审计大学 | Violation detection method based on text emotional tendency |
-
2023
- 2023-01-10 CN CN202310033845.XA patent/CN115759088A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107153642A (en) * | 2017-05-16 | 2017-09-12 | 华北电力大学 | A kind of analysis method based on neural network recognization text comments Sentiment orientation |
CN110737952A (en) * | 2019-09-17 | 2020-01-31 | 太原理工大学 | prediction method for residual life of key parts of mechanical equipment by combining AE and bi-LSTM |
CN111127146A (en) * | 2019-12-19 | 2020-05-08 | 江西财经大学 | Information recommendation method and system based on convolutional neural network and noise reduction self-encoder |
CN114138942A (en) * | 2021-12-09 | 2022-03-04 | 南京审计大学 | Violation detection method based on text emotional tendency |
Non-Patent Citations (1)
Title |
---|
陶志勇等: "基于双向长短时记忆网络的改进注意力短文本分类方法" * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11631007B2 (en) | Method and device for text-enhanced knowledge graph joint representation learning | |
CN109871451B (en) | Method and system for extracting relation of dynamic word vectors | |
CN110321563B (en) | Text emotion analysis method based on hybrid supervision model | |
CN110717332B (en) | News and case similarity calculation method based on asymmetric twin network | |
CN110210016B (en) | Method and system for detecting false news of bilinear neural network based on style guidance | |
CN110472042B (en) | Fine-grained emotion classification method | |
CN108399241B (en) | Emerging hot topic detection system based on multi-class feature fusion | |
CN104268197A (en) | Industry comment data fine grain sentiment analysis method | |
CN111143563A (en) | Text classification method based on integration of BERT, LSTM and CNN | |
CN111782797A (en) | Automatic matching method for scientific and technological project review experts and storage medium | |
CN113392209B (en) | Text clustering method based on artificial intelligence, related equipment and storage medium | |
CN107688576B (en) | Construction and tendency classification method of CNN-SVM model | |
CN110134788B (en) | Microblog release optimization method and system based on text mining | |
CN111222330B (en) | Chinese event detection method and system | |
CN105975475A (en) | Chinese phrase string-based fine-grained thematic information extraction method | |
WO2023004528A1 (en) | Distributed system-based parallel named entity recognition method and apparatus | |
CN112256866A (en) | Text fine-grained emotion analysis method based on deep learning | |
CN114722820A (en) | Chinese entity relation extraction method based on gating mechanism and graph attention network | |
CN114020906A (en) | Chinese medical text information matching method and system based on twin neural network | |
CN115080750B (en) | Weak supervision text classification method, system and device based on fusion prompt sequence | |
CN112561718A (en) | Case microblog evaluation object emotion tendency analysis method based on BilSTM weight sharing | |
Fu et al. | Improving distributed word representation and topic model by word-topic mixture model | |
CN115759119A (en) | Financial text emotion analysis method, system, medium and equipment | |
CN111090724A (en) | Entity extraction method capable of judging relevance between text content and entity based on deep learning | |
CN111104508B (en) | Method, system and medium for representing word bag model text based on fault-tolerant rough set |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20230307 |