CN109558587B - Method for classifying public opinion tendency recognition aiming at category distribution imbalance - Google Patents

Method for classifying public opinion tendency recognition aiming at category distribution imbalance Download PDF

Info

Publication number
CN109558587B
CN109558587B CN201811325887.6A CN201811325887A CN109558587B CN 109558587 B CN109558587 B CN 109558587B CN 201811325887 A CN201811325887 A CN 201811325887A CN 109558587 B CN109558587 B CN 109558587B
Authority
CN
China
Prior art keywords
public opinion
comment
word
category
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811325887.6A
Other languages
Chinese (zh)
Other versions
CN109558587A (en
Inventor
彭蓉
王卓
洪涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201811325887.6A priority Critical patent/CN109558587B/en
Publication of CN109558587A publication Critical patent/CN109558587A/en
Application granted granted Critical
Publication of CN109558587B publication Critical patent/CN109558587B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a public opinion tendency identification method aiming at unbalanced training sample class distribution. Firstly, collecting vocabularies related to the concerned public opinion field as public opinion hot words to establish a word stock; and (4) crawling a comment data set from a public opinion information source, and dividing the comment data set into a training set and a testing set. And then, manually classifying the public opinion tendency of the training set, and performing complement processing by adopting a bootstrap learning method aiming at the problem of unbalanced category. Extracting the characteristics of each type of training sample, training an algorithm model by adopting algorithms such as naive Bayes, a support vector machine, a decision tree and the like, classifying the data of the test set by using the trained model, and identifying the public opinion tendency according to the classification result. The bootstrap learning method, the feature vector construction method and the classification model training method adopt a time-sensitive weighting method for weighting, so that the public opinion tendency reflected by the method is more time-efficient. The method solves the problem of inaccurate classification caused by unbalanced training data, and improves the accuracy of public opinion tendency identification and the timeliness of public opinion analysis.

Description

Method for classifying public opinion tendency recognition aiming at category distribution imbalance
Technical Field
The invention belongs to the technical field of natural language processing and machine learning, relates to a method for performing public opinion tendency analysis by using a machine learning algorithm, and particularly relates to a public opinion tendency identification method aiming at unbalanced training sample class distribution.
Background
The popularity of the internet is rapidly increased, the number of news updated on the internet is huge, public opinion influence caused by the news is huge, and a public opinion tendency analysis technology is born under the situation and aims to timely discriminate tendency attitude and attitude change held by public opinion reviewers on the internet, so that supervision departments are helped to timely find the public opinion change and construct civilized and harmonious public opinion environment.
When a general machine learning algorithm is used for public opinion tendency analysis, great deviation between tendency recognition effect and actual tendency is caused by problems of unbalanced class of training data, text publishing timeliness, public opinion timeliness and the like. At present, no effective solution has been proposed.
Disclosure of Invention
In order to solve the technical problems, the invention provides a public opinion tendency recognition method aiming at unbalanced training sample class distribution, introduces a semi-supervised training set expansion method and a time sensitive and public opinion high-frequency word sensitive characteristic weighting method on the basis of a common machine learning algorithm, and can improve the public opinion tendency recognition accuracy under the state of class unbalance.
1. A public opinion tendency recognition method aiming at unbalanced training sample class distribution is characterized by comprising the following steps:
step 1: collecting high-frequency words related to the public opinion field as public opinion hot words, creating a public opinion high-frequency word library, and updating every day;
step 2: crawling a comment data set to be analyzed from a public opinion information source, and dividing the comment data set into a training set and a testing set;
and step 3: and manually marking the public opinion tendency in the training set, classifying training samples according to tendency categories, and counting the sample amount under different tendency categories in the training set. If the phenomenon of unbalanced category distribution exists, processing is carried out by adopting a bootstrap learning method. The method comprises the steps that the sample size owned by the category with the largest sample size is taken as a standard, more comment data are crawled from a public opinion information source for the category with the data size less than that of the category, comment data similar to the characteristic text of the category are searched by a semi-supervised similarity calculation method and are supplemented into a category training set until the data size of all the category training sets is the same; the method for extracting the comment feature vector in the similarity calculation is the same as that in the step 4.
And 4, step 4: for all comments in the training set and the test set, taking a comment publisher as a unit, weighting the comment features of the comment by using a time-sensitive weighting function and a public opinion hot word-sensitive weighting function to form weighted feature vectors so as to reflect the timeliness of the comments;
and 5: training an algorithm model by using weighted feature vectors of each class of training samples and adopting machine learning algorithms such as naive Bayes, a support vector machine, a decision tree, a multi-layer perception classifier and the like; and then, classifying the comment data in the test set by using the trained model, and determining the public opinion tendency of the comment publishers according to the classification.
Preferably, in step 1, the public opinion high frequency word library records not only the high frequency words, but also the time, frequency and frequency variation of the high frequency words with time. The frequency of the public opinion high-frequency words is calculated according to the number of relevant results searched by the public opinion high-frequency words at a specific time point in a hundred-degree search engine.
Preferably, in step 2, the basis for segmenting the training set and the test set is the comment publisher, that is, the comments published by one part of the comment publisher are taken as the training set, and the comments published by the other part of the comment publisher are taken as the test set. It is suggested to select the comments made by 90% of the commentators in the data set as the training set and the remaining 10% as the test set. This ratio can be dynamically adjusted as needed.
Preferably, in step 3, if there is a problem of unbalanced class distribution, the processing is performed by using a bootstrap learning method. The unbalanced distribution of the categories means that the number of samples in different categories has a difference exceeding K%. The determination of the value K is related to the true class proportion of the current classification problem. Generally, the smaller K is, the better the classification effect of the classification algorithm model after learning is; the larger K is, the more the classification effect after the classification model algorithm learns tends to classify the data into the class with the largest number of samples. Thus, the sensitivity of the classification model algorithm to K can be determined by analysis.
The sample size owned by the category with the largest sample size is taken as a standard, for the category with the small sample size, more comment data need to be crawled by using a crawler in a public opinion information source, and a semi-supervised similarity calculation method is used for searching comments with the similarity exceeding a certain threshold T with the category sample and supplementing the comments into the category sample. Taking the VSM-based similarity calculation method as an example, the similarity Sim (o1, o2) is shown in formula (1):
Figure BDA0001858748060000021
wherein the content of the first and second substances,o1 is the feature vector of a class sample of the training set, o2 is the feature vector of the text of a newly crawled comment data set in a public opinion source, o1iTo train the ith feature of a class of feature text, o2iFor the ith feature of the text of the review data set, x is the total dimension of the vectorized feature vectors of o1 and o 2. The construction method of the feature vector is the same as the step 4.
Preferably, in step 4, a time-sensitive weighting function is adopted, and the feature vector extracted from the comments is weighted, so that the more recent comments are presented, the more the current public opinion tendency of the commentator is reflected, and the idea that the feature weight should be higher is reflected. For example, the comment weight TimeWeight (c) calculation method shown in formula (2) can be adopted:
Figure BDA0001858748060000031
wherein c is a comment, Tn is the current date, Tc is the date on which the comment c was published, and the unit of Tn-Tc is day. The feature words appearing in the same comment are given the same feature weight according to equation (2).
Preferably, in step 4, a weighting function sensitive to public opinion hot words is used to weight the feature vectors extracted from the comments, so that the comments which are more relevant to the current public opinion hot spots can reflect the current public opinion tendency of the commentator, and the feature weight is higher. For example, a public opinion high frequency word weight hotwordweight (c) calculation method as shown in formula (3) may be adopted:
Figure BDA0001858748060000032
wherein D is the current date, Dc is the date of adding the hot word c into the public opinion high-frequency word bank, Wt (c) is the current search result number of the hot word c, and Wb (c) is the search result number of the hot word c when the hot word c is added into the public opinion high-frequency word bank. When c is not a high frequency word, HotWordWeight (c) is 0.
Preferably, in step 4, the comment features of the public opinion hot word are weighted by using a time-sensitive weighting function and a public opinion hot word-sensitive weighting function to form a weighted feature vector.
For example, the weighted TF-IDF value weighttfidf (c) of the comment feature word c may be calculated using the method shown in formula (4).
WeightTFIDF(c)=(HotWordWeight(c)+TimeWeight(Sc))×TFIDF(c) (4)
Wherein HotWordWeight (c) is the public opinion high frequency word weight of word c, TimeWeight (S)c) Is comment sentence S of word ccTfidf (c) is the TF-IDF value of the word c.
The TF-IDF algorithm is formulated as follows:
Figure BDA0001858748060000041
in equation (5), tf (c) refers to the word frequency of the word c in the current text. N represents the total number of texts in the corpus, and N (c) represents the total number of texts in the corpus containing the word c.
And then, arranging all the characteristic words of the class sample in a descending order according to the weighted TF-IDF value, and selecting the first L words with the highest correlation degree with the class as the characteristic text vector of the class sample. The value L needs to be determined according to the requirements for the classification accuracy and recall of the classification algorithm model and the acceptable time complexity. Generally, the larger L, the higher the temporal complexity of the classification algorithm model; as the value of L is increased, the classification accuracy and the recall rate of the classification algorithm model are gradually increased and reduced after reaching the peak value. Thus, the optimal value of L can be determined by multiple iterative analyses.
Compared with the prior art, the invention has the following beneficial technical effects: the invention supplements unbalanced training set data by introducing a semi-supervised training set extension method on the basis of the original machine learning classification algorithm so as to solve the problem of inaccurate classification caused by unbalanced training data. Meanwhile, a public opinion timeliness concept and a public opinion high-frequency word library are introduced to improve real-time hot public opinion classification precision. In addition, the method and the device adopt time-sensitive weighting for all comments of a single user, and can better identify the current tendency of the user.
Drawings
FIG. 1 is a block flow diagram of an embodiment of the present invention.
Detailed Description
In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.
Referring to fig. 1, the present invention provides a method for identifying a public opinion tendency of unbalanced distribution of training sample classes, which includes the following steps:
step 1: tracking and labeling the current public opinion hotspots by using a manual collection method, selecting high-frequency words related to the concerned public opinion field as public opinion hotspots, creating a public opinion high-frequency word library, and updating every day;
in this embodiment, the public opinion hot word source may refer to a microblog hot search list or a first page title of each large portal website, for example: "Jinyong goes off the world" and "twelve great curtains of Chinese women" etc.
In this embodiment, the public opinion high frequency word stock is stored in a text file format, so as to facilitate the artificial addition of the public opinion hot words.
In this embodiment, the content of each entry in the public opinion high-frequency word library includes a word, a time when the word is added to the word library, and a number of results searched by using an hundred degree search engine when the word is added to the word library.
The final tendency judgment result is influenced by public opinion timeliness. If the time periods of the comments contained in the training set with the labels and the comments contained in the test set are different, the model training effect may be poor due to the fact that the public opinion hot words are different, and the tendency judgment result is affected.
Step 2: crawling a comment data set to be analyzed from a public opinion information source, and dividing the comment data set into a training set and a testing set;
in this embodiment, crawls of social network account bloggers or comments such as microblogs and Twitter are used as a comment data set, and the crawled data is sorted according to posting accounts.
And step 3: and counting the sample amount of different classes in the training set. If the problem of unbalanced category distribution exists, processing is carried out by adopting a bootstrap learning method. The unbalanced distribution of the categories means that the number of samples in different categories has a difference exceeding K%. The determination of the value K is related to the true class proportion of the current classification problem. Generally, the smaller K is, the better the classification effect of the classification algorithm model after learning is; the larger K is, the more the classification effect after the classification model algorithm learns tends to classify the data into the class with the largest number of samples. Thus, the sensitivity of the classification model algorithm to K can be determined by analysis.
For the category with a small sample size, more comment data need to be crawled from a public opinion information source, data similar to the characteristic text of the category is searched in a comment data set by utilizing a similarity calculation algorithm, and the comment data set is supplemented into the category training set until the problem of unbalanced category distribution is solved.
In this embodiment, for the category with a small sample size, a semi-supervised training set extension method based on VSM is adopted to calculate the similarity between the category feature sample and a newly crawled text from a public opinion source, and the similarity calculation formula is shown in formula (1).
o1 selecting the feature text of a certain category of the training set through TF-IDF algorithm, calculating TF-IDF values of all words after dividing the category into words, and selecting the first L words with the highest degree of correlation with the category as the feature text vector of the category text. The value L needs to be determined according to the requirements for the classification accuracy and recall of the classification algorithm model and the acceptable time complexity. Generally, the larger L, the higher the temporal complexity of the classification algorithm model; as the value of L is increased, the classification accuracy and the recall rate of the classification algorithm model are gradually increased and reduced after reaching the peak value. Thus, the optimal value of L can be determined by multiple iterative analyses.
When the similarity Sim (o1, o2) >0.7, the present invention considers that the two texts o1 and o2 are similar.
In the embodiment, the feature vector adopts a construction method based on the TF-IDF algorithm. In particular, the method of manufacturing a semiconductor device,
firstly, TF-IDF values TFIDF (c) of words of each category sample of the training set after word segmentation are calculated in sequence.
The TF-IDF algorithm is formulated as follows:
Figure BDA0001858748060000061
in equation (5), tf (c) refers to the word frequency of the word c in the current text. N represents the total number of texts in the corpus, and N (c) represents the total number of texts in the corpus containing the word c.
Secondly, the weighted TF-IDF value WeightTFIDF (c) is calculated for each comment characteristic word c according to the formula (4) by adopting a time-sensitive weighting function and a public opinion hot word-sensitive weighting function.
Thirdly, after weighted TF-IDF values of all words in a certain category sample are calculated, the values are arranged in a descending order, and the first L words are selected as feature vectors of the category text.
And 4, step 4: for all comments of a single user, weighting the comments by adopting a time-sensitive weighting function so as to reflect the timeliness of the comments;
when the tendency of a single user is calculated, the weights TimeWeight (c) are directly accumulated, so that the tendency brought by earlier comments is reduced, the tendency brought by recent comments is improved, and the current tendency of the user is judged along with the lapse of time.
And 5: and classifying the comment data by adopting machine learning algorithms such as naive Bayes, a support vector machine, a decision tree, a multi-layer perception classifier and the like.
In this embodiment, the current user comment tendency is classified using a corresponding machine learning algorithm, such as a naive bayes classifier. When the classification problem is a second classification problem, the positive and negative tendency values can be 1-1; for the three-classification problem, the neutral, positive and negative tendency values can be taken as 0, 1 and-1, and the user tendency value Sum (A) is as follows:
Figure BDA0001858748060000062
in the formula (6), N is the total number of comments made by the current user A, tend (c)i) Refer to the trend value of the ith comment posted by the reviewer, TimeWeight (c)i) Referring to the weight of the ith comment, the calculation formula is shown as (2).
The user's tendencies Tendency (A) are as follows:
Figure BDA0001858748060000063
in the present embodiment, t is 5.
In the method, values of various parameters such as a category distribution imbalance determination index K, a characteristic dimension L, a similarity threshold T, a tendency determination threshold Tt and the like need to be optimized through tests so as to obtain a better public opinion tendency identification effect.
The method introduces a semi-supervised training set extension method on the basis of the traditional machine learning algorithm, and solves the problem of inaccurate classification caused by unbalanced training data to a certain extent. Meanwhile, the concepts of public opinion hot words and public opinion high-frequency word libraries are added, and the public opinion tendency recognition efficiency aiming at specific public opinions or major events is improved by introducing the public opinion hot word sensitive weighting function, and the classification accuracy under the conditions is also improved; the introduction of a time sensitive weighting function can reflect changes in the user's tendency over time.
It should be understood that parts of the specification not set forth in detail are well within the prior art.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (5)

1. A classification method for public opinion tendency identification aiming at category distribution imbalance is characterized by comprising the following steps:
step 1: collecting high-frequency words related to the public opinion field as public opinion hot words, creating a public opinion high-frequency word library, and updating every day;
step 2: crawling a comment data set to be analyzed from a public opinion information source, and dividing the comment data set into a training set and a testing set;
the basis for segmenting the training set and the test set is a comment publisher, namely, comments published by one part of comment publishers are used as the training set, and comments published by the other part of comment publishers are used as the test set;
and step 3: manually marking the public opinion tendency in the training set, classifying training samples according to tendency categories, and counting the sample amount under different tendency categories in the training set; if the phenomenon of unbalanced category distribution exists, processing by adopting a similarity calculation method; the method comprises the steps that the sample size owned by the category with the largest sample size is taken as a standard, more comment data are crawled from a public opinion information source for the category with the data size less than that of the category, comment data similar to the characteristic text of the category are searched, and the comment data are supplemented into a category training set until the data size of all the category training sets is the same;
wherein the unbalanced class distribution means that the difference of the number of samples of different classes exceeds K%; the determination of the value K is related to the true category proportion of the current classification problem; taking the sample size owned by the category with the largest sample size as a standard, crawling more comment data in a public opinion information source by using a crawler for the category with a small sample size again, searching comments with similarity exceeding a threshold value T with the category sample by using a semi-supervised similarity calculation method, and supplementing the comments into the category sample; the similarity Sim (o1, o2) is shown in formula (1):
Figure FDA0002955650610000011
wherein o1 is the feature direction of a class sample of the training setVolume, o2 is a feature vector of newly crawled text of a review data set in a public opinion source, o1iTo train the ith feature of a class of feature text, o2iFor the ith feature of the text of the comment data set, x is the total dimension of the vectorized feature vectors of o1 and o 2;
and 4, step 4: for all comments in the training set and the test set, taking a comment publisher as a unit, weighting the comment features of the comment by using a time-sensitive weighting function and a public opinion hot word-sensitive weighting function to form weighted feature vectors so as to reflect the timeliness of the comments;
and 5: training an algorithm model by using the weighted feature vector of each type of training sample and adopting a machine learning algorithm; and then, classifying the comment data in the test set by using the trained model, and determining the public opinion tendency of the comment publishers according to the classification.
2. The method as claimed in claim 1, wherein the classification method comprises the following steps: in the step 1, a public opinion high-frequency word library not only records high-frequency words, but also records the occurrence time, frequency and the change condition of the frequency along with the time of the high-frequency words; the frequency of the public opinion high frequency words is calculated according to the number of relevant results searched in a search engine.
3. The method as claimed in claim 1, wherein the classification method comprises the following steps: in step 4, a time-sensitive weighting function is adopted to weight the feature vectors extracted from the comments, and the comment weight TimeWeight (Sc) calculation formula is as follows:
Figure FDA0002955650610000021
wherein Sc is a certain comment, Tn is the current date, Tc is the date of publication of the comment c, and the unit of Tn-Tc is day; the feature words appearing in the same comment are given the same feature weight according to equation (2).
4. The method as claimed in claim 1, wherein the classification method comprises the following steps: in step 4, weighting the feature vectors extracted from the comments by using a weighting function sensitive to the public opinion hot words, wherein the public opinion high-frequency word weight HotWordWeight (c) has the calculation formula:
Figure FDA0002955650610000022
wherein D is the current date, Dc is the date of adding the hot word c into the public opinion high-frequency word bank, Wt (c) is the current search result number of the hot word c, and Wb (c) is the search result number of the hot word c when the hot word c is added into the public opinion high-frequency word bank; when c is not a high frequency word, HotWordWeight (c) is 0.
5. The method as claimed in claim 1, wherein the classification method comprises the following steps: step 4, calculating a weighted TF-IDF value weightTFIDF (c) of the comment feature word c by adopting a formula (4);
WeightTFIDF(c)=(HotWordWeight(c)+TimeWeight(Sc))×TFIDF(c) (4)
wherein HotWordWeight (c) is the public opinion high frequency word weight of word c, TimeWeight (S)c) Is comment sentence S of word ccTfidf (c) is the TF-IDF value of the word c;
the TF-IDF algorithm is formulated as follows:
Figure FDA0002955650610000031
in formula (5), tf (c) refers to the word frequency of the word c in the current text; n represents the total number of texts in the corpus, and N (c) represents the total number of texts containing the word c in the corpus;
and then, arranging all the characteristic words of the class sample in a descending order according to the weighted TF-IDF value, and selecting the first L words with the highest correlation degree with the class as the characteristic text vector of the class sample.
CN201811325887.6A 2018-11-08 2018-11-08 Method for classifying public opinion tendency recognition aiming at category distribution imbalance Active CN109558587B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811325887.6A CN109558587B (en) 2018-11-08 2018-11-08 Method for classifying public opinion tendency recognition aiming at category distribution imbalance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811325887.6A CN109558587B (en) 2018-11-08 2018-11-08 Method for classifying public opinion tendency recognition aiming at category distribution imbalance

Publications (2)

Publication Number Publication Date
CN109558587A CN109558587A (en) 2019-04-02
CN109558587B true CN109558587B (en) 2021-04-16

Family

ID=65866157

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811325887.6A Active CN109558587B (en) 2018-11-08 2018-11-08 Method for classifying public opinion tendency recognition aiming at category distribution imbalance

Country Status (1)

Country Link
CN (1) CN109558587B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738591A (en) * 2019-09-20 2020-01-31 哈尔滨工业大学(威海) Method for calculating traffic safety benefit of climbing lane based on tendency value matching
CN111400499A (en) * 2020-03-24 2020-07-10 网易(杭州)网络有限公司 Training method of document classification model, document classification method, device and equipment
CN111461002B (en) * 2020-03-31 2023-05-26 华南理工大学 Sample processing method for thermal imaging pedestrian detection
CN111966875B (en) * 2020-08-18 2023-08-22 中国银行股份有限公司 Sensitive information identification method and device
CN113157872B (en) * 2021-05-27 2021-12-28 西藏凯美信息科技有限公司 Online interactive topic intention analysis method based on cloud computing, server and medium
CN114511345B (en) * 2021-12-20 2024-06-04 武汉理工大学 Sales prediction method based on policy-public opinion-purchase two-stage deep learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101408883A (en) * 2008-11-24 2009-04-15 电子科技大学 Method for collecting network public feelings viewpoint
CN103793503A (en) * 2014-01-24 2014-05-14 北京理工大学 Opinion mining and classification method based on web texts
CN104951548A (en) * 2015-06-24 2015-09-30 烟台中科网络技术研究所 Method and system for calculating negative public opinion index
WO2017149540A1 (en) * 2016-03-02 2017-09-08 Feelter Sales Tools Ltd Sentiment rating system and method
CN108228612A (en) * 2016-12-14 2018-06-29 北京国双科技有限公司 A kind of method and device for extracting network event keyword and mood tendency

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101408883A (en) * 2008-11-24 2009-04-15 电子科技大学 Method for collecting network public feelings viewpoint
CN103793503A (en) * 2014-01-24 2014-05-14 北京理工大学 Opinion mining and classification method based on web texts
CN104951548A (en) * 2015-06-24 2015-09-30 烟台中科网络技术研究所 Method and system for calculating negative public opinion index
WO2017149540A1 (en) * 2016-03-02 2017-09-08 Feelter Sales Tools Ltd Sentiment rating system and method
CN108228612A (en) * 2016-12-14 2018-06-29 北京国双科技有限公司 A kind of method and device for extracting network event keyword and mood tendency

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于用户评论的潜在演化需求发现方法;倪瑜泽 等;《武汉大学学报》;20180831;第61卷(第4期);全文 *

Also Published As

Publication number Publication date
CN109558587A (en) 2019-04-02

Similar Documents

Publication Publication Date Title
CN109558587B (en) Method for classifying public opinion tendency recognition aiming at category distribution imbalance
CN107861939B (en) Domain entity disambiguation method fusing word vector and topic model
CN107609132B (en) Semantic ontology base based Chinese text sentiment analysis method
CN107193959B (en) Pure text-oriented enterprise entity classification method
CN108197117B (en) Chinese text keyword extraction method based on document theme structure and semantics
CN112347244B (en) Yellow-based and gambling-based website detection method based on mixed feature analysis
CN107844559A (en) A kind of file classifying method, device and electronic equipment
CN110543595B (en) In-station searching system and method
CN101609450A (en) Web page classification method based on training set
CN105512285B (en) Adaptive network reptile method based on machine learning
CN103294681B (en) Method and device for generating search result
CN103678576A (en) Full-text retrieval system based on dynamic semantic analysis
CN102194013A (en) Domain-knowledge-based short text classification method and text classification system
CN102227724A (en) Machine learning for transliteration
US10387805B2 (en) System and method for ranking news feeds
CN107463616B (en) Enterprise information analysis method and system
CN111324801B (en) Hot event discovery method in judicial field based on hot words
CN106933800A (en) A kind of event sentence abstracting method of financial field
CN111444304A (en) Search ranking method and device
CN103309862A (en) Webpage type recognition method and system
CN105512333A (en) Product comment theme searching method based on emotional tendency
CN108228612B (en) Method and device for extracting network event keywords and emotional tendency
KR101059557B1 (en) Computer-readable recording media containing information retrieval methods and programs capable of performing the information
CN110196910A (en) A kind of method and device of corpus classification
CN103778206A (en) Method for providing network service resources

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant