CN107818173B - Vector space model-based Chinese false comment filtering method - Google Patents

Vector space model-based Chinese false comment filtering method Download PDF

Info

Publication number
CN107818173B
CN107818173B CN201711129611.6A CN201711129611A CN107818173B CN 107818173 B CN107818173 B CN 107818173B CN 201711129611 A CN201711129611 A CN 201711129611A CN 107818173 B CN107818173 B CN 107818173B
Authority
CN
China
Prior art keywords
comments
comment
neural network
false
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711129611.6A
Other languages
Chinese (zh)
Other versions
CN107818173A (en
Inventor
刘珊
杨波
郑文锋
蔡礼高
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201711129611.6A priority Critical patent/CN107818173B/en
Publication of CN107818173A publication Critical patent/CN107818173A/en
Application granted granted Critical
Publication of CN107818173B publication Critical patent/CN107818173B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a Chinese false comment filtering method based on a vector space model. Meanwhile, another part of false comments is screened out by combining the emotional polarity of the comments and the user scores. And introducing a part of real comment samples, and training the BP neural network by using the two types of samples. And judging unlabeled comments by using the trained network.

Description

Vector space model-based Chinese false comment filtering method
Technical Field
The invention belongs to the technical field of machine learning, and particularly relates to a Chinese false comment filtering method based on a vector space model.
Background
With the increasing maturity of internet technology, the evaluation enthusiasm of consumer networks is gradually enhanced, and a large amount of comment data is generated on the networks. The users can use the comment information to assist consumption decision, and simultaneously, the users are troubled by the problems of uneven comment quality, information overload and the like.
The network brings convenient experience to consumers, and simultaneously, due to the characteristic of no regional limitation, the network causes the defects of lack of consumption basis, inconsistent commodity description information with reality and the like. Therefore, more and more consumers have to know the evaluation and attitude of the purchased customers to the product before consuming the product so as to make a reliable decision. However, with the rapid increase in the number of evaluations and the five-fold variety of evaluation contents, it is increasingly difficult for users to acquire valuable evaluation information.
Information really valuable to users is difficult to identify from massive comments only by a manual method, and an automatic method is urgently needed to assist people in screening, so that the method has important research value on evaluation and screening of text contents.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a Chinese false comment filtering method based on a vector space model, which is used for identifying false comments of a film comment website based on a BP neural network so as to provide a real consumption reference for a user.
In order to achieve the above purpose, the invention provides a method for filtering Chinese false comments based on a vector space model, which is characterized by comprising the following steps
(1) Simulating website login and capturing comments;
(2) according to the set comment length L, removing comments in the L;
(3) and segmenting the comments into words to obtain a sentence component structure
(3.1) firstly establishing an interference word bank, wherein the interference word bank comprises connecting words, subjects and objects, then calculating the proportion of the interference words in each comment, comparing the obtained proportion of the interference words with a preset proportion threshold, and rejecting the comments of which the proportion is greater than the proportion threshold;
(3.2) performing word segmentation processing on the comment obtained in the step (3.1) by using a Chinese academy NLPIR Chinese word segmentation java edition tool, deleting punctuation marks, encoding the segmented comment according to the part of speech, establishing a comment structure encoding library, searching whether the comment structure encoding library has the same code, if so, adding 1 to the characteristic value of the comment template, and if not, not modifying;
(4) sorting the comments obtained in the step (3) according to the useful number of votes of the user, selecting the 5% of the comments in the top sorting as real comments, and marking as a positive sample;
(5) constructing an improved version vector space model by using the unmarked comments in the step (4)
(5.1) carrying out word frequency TF and anti-word frequency IDF statistics on the unmarked comments in the step (4)
TF is f/m, the TF value is between 0 and 1, f represents the number of times of the current word appearing in the current comment, and m represents the sum of the number of times of all the words appearing in the current comment;
Figure BDA0001469176570000021
n represents the total number of comments in the entire corpus, and
Figure BDA0001469176570000022
representing the number of comments containing the current word;
(5.2) constructing an improved version vector space model
Figure BDA0001469176570000023
Wherein d isi,djRespectively representing the ith comment and the jth comment, N representing the sum of the numbers of all words, wikExpressing the statistical product of the word frequency TF and the inverse word frequency IDF of the kth vocabulary in the ith comment;
Figure BDA0001469176570000024
(5.3) calculating the similarity of any two comments by using an improved vector space model, screening out the same or similar comments, marking the same or similar comments as false comments, and marking the comments as a positive example sample I;
(6) carrying out emotion scoring on the unmarked comments in the step (4) according to BosonNLP emotion dictionary data and the Hopkinson emotion analysis word data, and then carrying out emotion polarity judgment according to the emotion scores, wherein the judgment result shows that the Score is positive when the Score is greater than 0 and the judgment result shows that the Score is negative when the Score is less than 0;
marking the comments with positive emotion polarity and user score lower than the average judgment standard or with negative emotion polarity and user score higher than the average judgment standard as false comments, and taking the false comments as a negative example sample II;
(7) sorting the users in the unmarked comments in the step (4) in a descending order according to the comment times of each user, marking all comments of the first 1% of users as false comments, and taking the false comments as a negative sample III;
(8) respectively forming positive example vectors and negative example vectors by the positive example samples and the negative example samples obtained in the steps (4), (5), (6) and (7); inputting the positive vector into the BP neural network, and modifying the weight between each layer of the BP neural network by using forward propagation and backward propagation through iteration so that the BP neural network outputs '1'; inputting the negative example vector into a BP neural network, modifying the weight between each layer of the BP neural network by using forward propagation and backward propagation through iteration, and enabling the BP neural network to output '0', thereby training the BP neural network;
(9) inputting the comments grabbed in real time into the trained BP neural network, and if the output of the BP neural network is '1', the comments are real comments; if the BP neural network output is "0", then the comment is a false comment.
The invention aims to realize the following steps:
the invention relates to a Chinese false comment filtering method based on a vector space model, which judges the similarity among comments by improving the vector space model and takes the comment with high similarity as a part of a false comment. Meanwhile, another part of false comments is screened out by combining the emotional polarity of the comments and the user scores. And introducing a part of real comment samples, and training the BP neural network by using the two types of samples. And judging unlabeled comments by using the trained network.
Meanwhile, the Chinese false comment filtering method based on the vector space model also has the following steps
Has the advantages that:
(1) the positive sample and the negative sample are integrated to train the BP neural network, so that the reliability of the training samples is improved; secondly, the BP neural network is selected, so that the BP neural network not only can process the condition that the feature vector is relatively larger, but also can process the condition that the training set is relatively larger, and is better than a logistic regression machine and a support vector machine in limitation.
(2) And the vectorization of the training samples integrates hidden influence factors such as structural coding, a vector space model, emotion polarity, comment time and the like.
Drawings
FIG. 1 is a flow chart of a method for filtering false Chinese comments based on a vector space model according to the present invention;
Detailed Description
The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.
Examples
FIG. 1 is a flow chart of a method for filtering Chinese false comments based on a vector space model according to the present invention.
In this embodiment, as shown in fig. 1, the method for filtering Chinese false comments based on a vector space model of the present invention includes the following steps
S1, using Python to realize simulation login of the website, and using a regular expression to capture the publishing time of each comment, the text content of the comment, the nickname, id, homepage address and the like of a comment publisher;
s2, removing comments smaller than L according to the set comment length L; in this embodiment, the threshold is set to 15, and comments with a length smaller than 15 are rejected;
s3, segmenting the comments to obtain sentence component structure
S3.1, establishing an interference word bank, wherein the interference word bank comprises nonsense words such as connection words, subject words and object words, calculating the proportion of the interference words in each comment, comparing the obtained proportion of the interference words with a preset proportion threshold value of 50%, and rejecting the comments with the proportion larger than 50%;
s3.2, performing word segmentation on the comment obtained in the step S3.1 by using a Chinese academy NLPIR Chinese word segmentation java version tool, deleting punctuations, encoding the segmented comment according to the parts of speech such as nouns, verbs, adverbs, adjectives and the like, establishing a comment structure encoding library, searching whether the comment structure encoding library has the same encoding, if so, adding 1 to the characteristic value of the comment template, and if not, not modifying the comment template;
the encoding process is for example:
Figure BDA0001469176570000041
the third line in the above example is the comment structure code;
s4, sorting the comments obtained in the step S3 according to the useful number of votes of the user, selecting the 5% of the comments in the top of the sorting as real comments, and marking as a positive sample;
s5, constructing an improved version vector space model by using the unmarked comments in the step S4
Vector Space Model (VSM) is the most commonly used similarity calculation model and has wide application in natural language processing, and the traditional Vector space model follows the following principle:
assume that there are ten words in total: w is a1,w2,……,w10And there are three comments, d respectively1,d2And d3. The word frequency table obtained by statistics is shown in table 1:
w1 w2 w3 w4 w5 w6 w7 w8 w9 w10
d1 1 2 5 7 9
d2 3 4 6 8
d3 10 11 12 13 14 15
TABLE 1
The vector space formula commonly used is as follows:
Figure BDA0001469176570000051
wherein d isi,djRespectively representing the ith comment and the jth comment, N representing the sum of the numbers of all words, aikIndicating the number of times the k < th > word appears in the ith comment.
Suppose to calculate d1And d2Then:
Figure BDA0001469176570000052
the above formula is computationally intensive, and here a dimension reduction method is used to reduce the computational complexity. The adoption of the dimension reduction strategy can not only improve the efficiency, but also improve the precision. For example, the following two phrases:
1. this is my meal.
2. That is your meal.
If "this", "that", "you", "i", "is", "of" are all treated as functional words, then the similarity is 100%. If none is removed, the similarity may be only 60%. And the subject display of both words is the same.
The direct use of the number of words presents a problem when comparing documents with a large number of words and a small number of words. For example, document I contains 10000 words, while word a appears 10 times; document II contains 100 words, while a appears 5 times. Thus, in the similarity calculation, a in document I has a larger influence on the final result than a in document II. This is clearly not reasonable, since a only accounts for 0.1% of document I, but 5% of document II.
In order to solve the problems, two concepts of word frequency TF and inverse word frequency IDF are introduced, and the specific method comprises the following steps:
s5.1, carrying out word frequency TF and inverse word frequency IDF statistics on the unmarked comments in the step S4
TF is f/m, the TF value is between 0 and 1, f represents the number of times that the current word appears in the current comment, and m represents the number of times that the word with the largest number of times appears in the current comment, so that errors caused by unreasonable distribution of the frequency of the words in the comment are reduced;
Figure BDA0001469176570000061
n represents the total number of comments in the entire corpus, and
Figure BDA0001469176570000062
the comment number containing the current word is represented, so that the similarity error caused by uneven word frequency distribution in the corpus range is reduced;
s5.2, constructing an improved version vector space model
Figure BDA0001469176570000063
Wherein d isi,djRespectively representing the ith comment and the jth comment, wikExpressing the statistical product of the word frequency TF and the inverse word frequency IDF of the kth vocabulary in the ith comment;
Figure BDA0001469176570000064
s5.3, calculating the similarity of any two comments by using an improved version vector space model, screening out the same or similar comments, marking the same or similar comments as false comments, and marking the comments as a negative example sample I;
s6, according to BosonNLP emotion dictionary data and Hopkinson emotion analysis word data, carrying out emotion scoring on the comments which are not marked in the step S4, and then carrying out emotion polarity judgment according to emotion scores, wherein the judgment result is that Score >0 is positive, and Score <0 is negative;
comparing the emotional tendency with the score, if the emotional tendency is good, but the score is less than 3 stars (5 stars are standard), namely, the comments with positive emotional polarity and user score lower than the average judgment standard, or the emotional tendency is poor, but the score is more than 3 stars, namely, the comments with negative emotional polarity and user score higher than the average judgment standard are marked as false comments and are used as a negative sample II;
s7, sorting the users in the step S4 according to the number of the comments of each user in a descending order, marking all the comments of the first 1% of users as false comments, and taking the false comments as a negative example sample III;
s8, respectively forming positive example vectors and negative example vectors by the positive example samples and the negative example samples obtained in the steps S4, S5, S6 and S7, wherein each comment forms one vector no matter whether the positive example sample or the negative example sample, then all the positive example vectors are input into the BP neural network, and through iteration, forward propagation and backward propagation are used for modifying the weight between each layer of the BP neural network, so that the BP neural network outputs '1'; inputting all negative example vectors into a BP neural network, modifying the weight between each layer of the BP neural network by using forward propagation and backward propagation through iteration, and enabling the BP neural network to output '0', thereby training the BP neural network;
s9, inputting the comments grabbed in real time into the trained BP neural network, wherein if the output of the BP neural network is '1', the comments are real comments; if the BP neural network output is "0", then the comment is a false comment.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims (1)

1. A Chinese false comment filtering method based on a vector space model is characterized by comprising the following steps
(1) Simulating website login and capturing comments;
(2) removing comments smaller than L according to the set comment length L;
(3) dividing the comments into words to obtain a sentence component structure;
(3.1) firstly establishing an interference word bank, wherein the interference word bank comprises connecting words, subjects and objects, then calculating the proportion of the interference words in each comment, comparing the obtained proportion of the interference words with a preset proportion threshold, and rejecting the comments of which the proportion is greater than the proportion threshold;
(3.2) performing word segmentation processing on the comment obtained in the step (3.1) by using a Chinese academy NLPIR Chinese word segmentation java edition tool, deleting punctuation marks, encoding the segmented comment according to the part of speech, establishing a comment structure encoding library, searching whether the comment structure encoding library has the same code, if so, adding 1 to the characteristic value of the comment template, and if not, not modifying;
(4) sorting the comments obtained in the step (3) according to the useful number of votes of the user, selecting the 5% of the comments in the top sorting as real comments, and marking as a positive sample;
(5) constructing an improved version vector space model by using the unmarked comments in the step (4);
(5.1) carrying out word frequency TF and inverse word frequency IDF statistics on the unmarked comments in the step (4);
TF is f/m, the TF value is between 0 and 1, f represents the number of times that the current word appears in the current comment, and m represents the number of times that the word with the largest number of times appears in the current comment;
Figure FDA0002869160890000011
n represents the total number of comments in the entire corpus, and
Figure FDA0002869160890000012
representing the number of comments containing the current word;
(5.2) constructing an improved version vector space model;
Figure FDA0002869160890000013
wherein d isi,djRespectively representing the ith comment and the jth comment, N representing the sum of the numbers of all words, wikExpressing the statistical product of the word frequency TF and the inverse word frequency IDF of the kth vocabulary in the ith comment;
Figure FDA0002869160890000021
(5.3) calculating the similarity of any two comments by using an improved vector space model, screening out the same or similar comments, marking the same or similar comments as false comments, and marking the comments as a negative example sample I;
(6) carrying out emotion scoring on the unmarked comments in the step (4) according to BosonNLP emotion dictionary data and the Hopkinson emotion analysis word data, and then carrying out emotion polarity judgment according to the emotion scores, wherein the judgment result shows that the Score is positive when the Score is greater than 0 and the judgment result shows that the Score is negative when the Score is less than 0;
marking the comments with positive emotion polarity and user score lower than the average judgment standard or with negative emotion polarity and user score higher than the average judgment standard as false comments, and taking the false comments as a negative example sample II;
(7) sorting the users in the unmarked comments in the step (4) in a descending order according to the comment times of each user, marking all comments of the first 1% of users as false comments, and taking the false comments as a negative sample III;
(8) respectively forming positive example vectors and negative example vectors by the positive example samples and the negative example samples obtained in the steps (4), (5), (6) and (7); inputting the positive vector into the BP neural network, and modifying the weight between each layer of the BP neural network by using forward propagation and backward propagation through iteration so that the BP neural network outputs '1'; inputting the negative example vector into a BP neural network, and training the BP neural network by modifying the weight between each layer of the BP neural network by using forward propagation and backward propagation through iteration so that the BP neural network outputs '0';
(9) inputting the comments grabbed in real time into the trained BP neural network, and if the output of the BP neural network is '1', the comments are real comments; if the BP neural network output is "0", then the comment is a false comment.
CN201711129611.6A 2017-11-15 2017-11-15 Vector space model-based Chinese false comment filtering method Active CN107818173B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711129611.6A CN107818173B (en) 2017-11-15 2017-11-15 Vector space model-based Chinese false comment filtering method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711129611.6A CN107818173B (en) 2017-11-15 2017-11-15 Vector space model-based Chinese false comment filtering method

Publications (2)

Publication Number Publication Date
CN107818173A CN107818173A (en) 2018-03-20
CN107818173B true CN107818173B (en) 2021-05-14

Family

ID=61609112

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711129611.6A Active CN107818173B (en) 2017-11-15 2017-11-15 Vector space model-based Chinese false comment filtering method

Country Status (1)

Country Link
CN (1) CN107818173B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189922B (en) * 2018-08-07 2021-06-29 创新先进技术有限公司 Comment evaluation model training method and device
CN109670542A (en) * 2018-12-11 2019-04-23 田刚 A kind of false comment detection method based on comment external information
CN110941953B (en) * 2019-11-26 2023-08-01 华中师范大学 Automatic identification method and system for network false comments considering interpretability
CN116385029B (en) * 2023-04-20 2024-01-30 深圳市天下房仓科技有限公司 Hotel bill detection method, system, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682120A (en) * 2012-05-15 2012-09-19 合一网络技术(北京)有限公司 Method,device and system for acquiring essential article commented on network
CN103745001A (en) * 2014-01-24 2014-04-23 福州大学 System for detecting reviewers of negative comments on products
CN106708966A (en) * 2016-11-29 2017-05-24 中国计量大学 Similarity calculation-based junk comment detection method
CN107025284A (en) * 2017-04-06 2017-08-08 中南大学 The recognition methods of network comment text emotion tendency and convolutional neural networks model
CN107229608A (en) * 2016-03-23 2017-10-03 阿里巴巴集团控股有限公司 Comment spam recognition methods and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682120A (en) * 2012-05-15 2012-09-19 合一网络技术(北京)有限公司 Method,device and system for acquiring essential article commented on network
CN103745001A (en) * 2014-01-24 2014-04-23 福州大学 System for detecting reviewers of negative comments on products
CN107229608A (en) * 2016-03-23 2017-10-03 阿里巴巴集团控股有限公司 Comment spam recognition methods and device
CN106708966A (en) * 2016-11-29 2017-05-24 中国计量大学 Similarity calculation-based junk comment detection method
CN107025284A (en) * 2017-04-06 2017-08-08 中南大学 The recognition methods of network comment text emotion tendency and convolutional neural networks model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
opinion spam and analysis;Nitin Jindal等;《In proceedings of the international conference on web search and web data mining》;20081231;正文第219-230页 *
文本相似度视角下的虚拟社区评论的可信性分析;夏火松等;《现代情报》;20110915;正文第33-36页 *
面向产品评论的垃圾评论识别方法研究;刘立佳;《中国优秀硕士学位论文全文数据库》;20150815;正文14-28页 *

Also Published As

Publication number Publication date
CN107818173A (en) 2018-03-20

Similar Documents

Publication Publication Date Title
CN110175325B (en) Comment analysis method based on word vector and syntactic characteristics and visual interaction interface
CN109933664B (en) Fine-grained emotion analysis improvement method based on emotion word embedding
CN107193959B (en) Pure text-oriented enterprise entity classification method
CN106708966B (en) Junk comment detection method based on similarity calculation
CN106599032B (en) Text event extraction method combining sparse coding and structure sensing machine
CN108563638B (en) Microblog emotion analysis method based on topic identification and integrated learning
CN107818173B (en) Vector space model-based Chinese false comment filtering method
CN106776713A (en) It is a kind of based on this clustering method of the Massive short documents of term vector semantic analysis
CN107239439A (en) Public sentiment sentiment classification method based on word2vec
CN110489523B (en) Fine-grained emotion analysis method based on online shopping evaluation
CN112989802B (en) Bullet screen keyword extraction method, bullet screen keyword extraction device, bullet screen keyword extraction equipment and bullet screen keyword extraction medium
CN108090099B (en) Text processing method and device
CN109101490B (en) Factual implicit emotion recognition method and system based on fusion feature representation
CN112069312B (en) Text classification method based on entity recognition and electronic device
CN106446147A (en) Emotion analysis method based on structuring features
CN111538828A (en) Text emotion analysis method and device, computer device and readable storage medium
CN110705247A (en) Based on x2-C text similarity calculation method
CN110706028A (en) Commodity evaluation emotion analysis system based on attribute characteristics
CN111462752A (en) Client intention identification method based on attention mechanism, feature embedding and BI-L STM
CN111339772B (en) Russian text emotion analysis method, electronic device and storage medium
CN107451116B (en) Statistical analysis method for mobile application endogenous big data
CN110866087B (en) Entity-oriented text emotion analysis method based on topic model
CN107291686B (en) Method and system for identifying emotion identification
CN111191029B (en) AC construction method based on supervised learning and text classification
CN113159831A (en) Comment text sentiment analysis method based on improved capsule network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant