CN108563647A - A kind of automobile Method for Sales Forecast method based on comment sentiment analysis - Google Patents

A kind of automobile Method for Sales Forecast method based on comment sentiment analysis Download PDF

Info

Publication number
CN108563647A
CN108563647A CN201711229414.1A CN201711229414A CN108563647A CN 108563647 A CN108563647 A CN 108563647A CN 201711229414 A CN201711229414 A CN 201711229414A CN 108563647 A CN108563647 A CN 108563647A
Authority
CN
China
Prior art keywords
word
car
indicates
comment
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711229414.1A
Other languages
Chinese (zh)
Inventor
周应华
商楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201711229414.1A priority Critical patent/CN108563647A/en
Publication of CN108563647A publication Critical patent/CN108563647A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Landscapes

  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明请求保护一种基于情感分析的汽车销量预测方法,在汽车评论网站获取评论数据对数据进行预处理,利用多标签分类方法将评论数据按照用户的使用体验分为安全,舒适,操控,动力,经济和服务六个方面;将各方面情感因素分别融入模型建立情感预测模型。对汽车销量进行预测,找出消费者更加注重汽车性能的哪一方面,对以后的生产作为指导。该方法操作过程:用户输入以往销售数据,将数据带入模型,得到下一季度的销量预测数据。本预测方法提高了预测准确度。

The present invention claims a method for predicting car sales based on sentiment analysis, which obtains comment data from a car review website to preprocess the data, and uses a multi-label classification method to classify the comment data into safety, comfort, control, and power according to the user's experience. , six aspects of economy and service; each aspect of emotional factors were integrated into the model to establish an emotional prediction model. Predict car sales, find out which aspect of car performance consumers pay more attention to, and use it as a guide for future production. The operation process of this method: the user inputs past sales data, brings the data into the model, and obtains the sales forecast data for the next quarter. The prediction method improves the prediction accuracy.

Description

一种基于评论情感分析的汽车销量预测方法A car sales forecasting method based on comment sentiment analysis

技术领域technical field

本发明属于汽车销量分析预测领域,具体属于一种涉及评论情感分析的评论情感分析的汽车销量。The invention belongs to the field of analysis and forecasting of automobile sales, in particular to automobile sales of comment sentiment analysis involving comment sentiment analysis.

背景技术Background technique

汽车销量预测技术指的是根据以往的销售数据和其他数据对下个某个阶段的销量进行估计。现有的汽车销量预测技术主要是根据以往的销售数据,使用自回归模型或者灰色模型预测技术。基于这些预测方法的局限在于,深入在以往的销售数据忽略了用户的评论数据的影响。根据研究在线评论数据有助于提高销量预测模型的准确率。Automobile sales forecasting technology refers to the estimation of sales in the next stage based on past sales data and other data. Existing auto sales forecasting techniques are mainly based on past sales data, using autoregressive model or gray model forecasting techniques. The limitation of these forecasting methods is that the influence of user's review data is neglected by digging in the past sales data. According to research, online review data can help improve the accuracy of sales forecasting models.

基于汽车评论数据进行预测是当前研究的热门方向,但存在一些难点如在自然语言处理方面(现在的评论语言种类繁多,随意性大,网络用语较多)。Prediction based on car review data is a popular research direction at present, but there are some difficulties such as natural language processing (there are many types of review languages, large randomness, and many online terms).

发明内容Contents of the invention

本发明旨在解决以上现有技术的问题。提出了一种提高预测的准确性的基于评论情感分析的汽车销量预测方法。本发明的技术方案如下:The present invention aims to solve the above problems of the prior art. A car sales forecasting method based on review sentiment analysis is proposed to improve the accuracy of forecasting. Technical scheme of the present invention is as follows:

一种基于评论情感分析的汽车销量预测方法,其包括如下步骤:A method for forecasting car sales based on comment sentiment analysis, comprising the steps of:

1)、对汽车评论数据进行包括统一格式并剔除重复词汇在内的预处理;1) Carry out preprocessing on the car review data, including unified format and elimination of repeated words;

2)、利用中科院汉语语法系统对经过预处理后的汽车评论数据进行分词处理,去除停用词;2), use the Chinese Grammar System of the Chinese Academy of Sciences to perform word segmentation processing on the preprocessed car review data, and remove stop words;

3)、利用多标签分类技术对对步骤2分词处理后的评论数据集进行多标签分类;3), using multi-label classification technology to carry out multi-label classification to the comment data set after step 2 word segmentation processing;

4)、使用互信息技术对情感值进行量化,求得评论文本集的情感值;4), use mutual information technology to quantify the emotional value, and obtain the emotional value of the comment text set;

5)、将情感值融合进入回归模型预测下个阶段的汽车销量。5). Integrate the emotional value into the regression model to predict the car sales in the next stage.

进一步的,所述步骤1)将汽车评论数据分为舒适、动力、操控、服务、经济和安全六个方面,首先求出一个评论词与类标签之间的关系,公式如下:Further, the step 1) divides the car review data into six aspects of comfort, power, handling, service, economy and safety, and first finds the relationship between a review word and the class label, the formula is as follows:

其中,n表示文档总数,表示词word不在文档Di中,x2表示某一个词word和汽车某一方面lj之间的相关性,表示不含有lj方面,即p(word,lj)表示词Word在文档Di中出现的次数且lij=1,lj表示汽车的某一方面性能,使用L={l1,l2,....,lj,…,l6}表示由6种标签构成的标记集合。具体为文档集合D所涉及的多个性能构成的方面集合,使用汽车的舒适性、动力性、操控性、服务性、经济性和安全性六个性能方面。j表示其中某一种性能(1≤j≤6),i表示第i篇文档。p(word)表示词word在文档Di中出现的次数,p(lj)文本集中lj出现的次数,表示词word不在文档Di出现的次数。Among them, n represents the total number of documents, Indicates that the word word is not in the document D i , x 2 indicates the correlation between a certain word word and a certain aspect of the car l j , Indicates that it does not contain l j aspects, ie p(word,l j ) represents the number of times the word Word appears in the document D i and l ij =1, l j represents a certain aspect of the performance of the car, using L={l 1 ,l 2 ,...,l j ,…,l 6 } represent a set of labels consisting of 6 types of labels. Specifically, it is an aspect set composed of multiple performances involved in the document collection D, using six performance aspects of the car, namely, comfort, power, handling, serviceability, economy, and safety. j represents one of the properties (1≤j≤6), and i represents the i-th document. p(word) indicates the number of occurrences of the word word in the document D i , p(l j ) the number of occurrences of l j in the text set, Indicates the number of times word word does not appear in document D i .

进一步的,所述步骤1)使用中科院计算所的汉语词法分析系统ICTCLAS3,首先将搜狗输入法中与汽车行业相关的细胞词库导入汉语词法分析系统,利用UltraEdit编辑器将非文本格式的词库解析出来,统一格式并剔除重复词汇。Further, said step 1) uses the Chinese lexical analysis system ICTCLAS3 of the Institute of Computing Technology, Chinese Academy of Sciences, first imports the cell lexicon related to the automobile industry in the Sogou input method into the Chinese lexical analysis system, and utilizes the UltraEdit editor to convert the non-text format lexicon into the Chinese lexical analysis system. Parse it out, unify the format and remove repeated words.

进一步的,所述步骤2)将数词、代词、量词、拟声词、方位词、连词、叹词、后接成分和助词作为停用词。Further, the step 2) uses numerals, pronouns, quantifiers, onomatopoeias, localizers, conjunctions, interjections, subsequent components and auxiliary words as stop words.

进一步的,所述使用平均X2的聚合策略来度量X2的值,公式如下:Further, the aggregation strategy using the average X2 to measure the value of X2, the formula is as follows:

将X2的值从高到低排序选取部分词作为特征项,一词频作为特征项的权值,使用向量空间模型对文本进行表示,并求得每篇评论文档的特征向量di,采用SVM对文档进行分类。Sort the value of X 2 from high to low, select some words as feature items, and the frequency of a word as the weight of feature items, use the vector space model to represent the text, and obtain the feature vector d i of each review document, using SVM Classify documents.

进一步的,所述步骤4)对情感值进行量化具体包括:Further, said step 4) quantifying the emotional value specifically includes:

当评价分数小于等于2时,认为是负向文本,归属于负向文本集;当评价分数为5时,认为是正向文本,并入正向文本集,文本中每个词word的情感值S(word)计算方式为:When the evaluation score is less than or equal to 2, it is considered as a negative text and belongs to the negative text set; when the evaluation score is 5, it is considered as a positive text and incorporated into the positive text set. The emotional value S of each word in the text (word) is calculated as:

S(word)=P(word,pos)-P(word,neg)S(word)=P(word,pos)-P(word,neg)

其中f(word,pos)表示word在正向文本集只出现的频次,f(word)表示word在整个文本集中出现的次数;f(pos)表示正向文档的数量;M表示整个文本集的数量,同理可计算P(word,neg)的值。P(word,neg)词word与负向文档之间的互信息。Among them, f(word, pos) indicates the frequency of word only appearing in the forward text set, f(word) indicates the number of times word appears in the entire text set; f(pos) indicates the number of forward documents; M indicates the number of the entire text set In the same way, the value of P(word,neg) can be calculated. P(word,neg) Mutual information between the word word and the negative document.

S(word)计算公式可化简为The calculation formula of S(word) can be simplified as

则第i篇评论的情感值Srev(rk)为:f(neg)表示负向文档的数量。Then the sentiment value S rev (r k ) of the i-th review is: f(neg) represents the number of negative documents.

q表示第i篇评论文档中含有q个情感词典中的词,即每篇评论文本的情感值由每个词的情感值累加而成。q indicates that the i-th review document contains q words in the sentiment dictionary, that is, the sentiment value of each comment text is accumulated by the sentiment value of each word.

进一步的,所述步骤5)使用修改的回归模型AR模型进行预测,用yt表示第t个月的销售量,t=1,2,…,n;n表示未来某个月。Further, the step 5) uses the modified regression model AR model to predict, and y t represents the sales volume of the t-th month, t=1,2,...,n; n represents a certain month in the future.

q表示第t个月之前q个月的情感因素的影响,wt表示第t个月的情感影响,αi为最小二乘法得到的模型参数,P表示要考察的第t个月之前的P个月,i表示前P个月中的某个月,α0表示常数项,εt表示误差项,将各个标签下的情感因素分别代入模型,通过训练集的对比可以找出消费者更看中汽车性能的哪一个方面。q represents the influence of emotional factors in q months before the tth month, w t represents the emotional influence in the tth month, α i is the model parameter obtained by the least square method, and P represents the P month, i represents a certain month in the previous P months, α 0 represents a constant item, ε t represents an error term, and the emotional factors under each label are substituted into the model, and the comparison of the training set can find out what consumers are more interested in. Which aspect of the performance of the car.

本发明的优点及有益效果如下:Advantage of the present invention and beneficial effect are as follows:

1、有别于传统预测,使用评论数据,考虑用户对于产品的喜好程度。避免造成数据浪费。1. Different from traditional forecasting, it uses review data and considers the user's preference for the product. Avoid data waste.

2、可以分别使用汽车某一方面性能的评论数据进行预测,找出消费者更看中汽车的哪一方面性能。2. You can use the review data of a certain aspect of the car's performance to make predictions, and find out which aspect of the car's performance consumers are more interested in.

3、使预测更加精确。3. Make forecasts more accurate.

附图说明Description of drawings

图1是本发明提供优选实施例操作流程图;Fig. 1 is the flow chart of the operation of the preferred embodiment provided by the present invention;

图2是本发明的多标签分类结果图。Fig. 2 is a multi-label classification result diagram of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、详细地描述。所描述的实施例仅仅是本发明的一部分实施例。The technical solutions in the embodiments of the present invention will be described clearly and in detail below with reference to the drawings in the embodiments of the present invention. The described embodiments are only some of the embodiments of the invention.

本发明解决上述技术问题的技术方案是:The technical scheme that the present invention solves the problems of the technologies described above is:

对网络评论进行预处理。使用中科院计算所的汉语词法分析系统(ICTCLAS3)。首先搜狗输入法中与汽车行业相关的细胞词库,导入语法系统,利用UltraEdit编辑器将非文本格式的词库解析出来,统一格式并剔除重复词汇。依据分词结果去除停用词,将数词、代词、量词、拟声词、方位词、连词、叹词、后接成分和助词作为停用词。Preprocess web reviews. The Chinese Lexical Analysis System (ICTCLAS3) of the Institute of Computing Technology, Chinese Academy of Sciences was used. Firstly, the cell lexicon related to the automobile industry in the Sogou input method is imported into the grammar system, and the non-text format lexicon is parsed out using the UltraEdit editor, and the format is unified and duplicate words are eliminated. Stop words are removed according to word segmentation results, and numerals, pronouns, quantifiers, onomatopoeias, localizers, conjunctions, interjections, suffixes and particles are used as stop words.

1)对多标记分类1) Multi-label classification

由汽车评论文本构成的多标记训练数据集用(D,T,L)表示,D={D1,D2,…,Dn}={(d1,y1),(d2,y2),…(dn,yn)},表示由汽车这一石头的n篇评论文档构成的多标记数据集,每篇文档Di由特征向量di和标记向量yi组成(1<<i<<n),T=(t1,t2,…tp)表示n篇评论文档中选择的p个关键词构成的特征集合。L={l1,l2,…,l6}表示由6种标签构成的标记集合(舒适、动力、操控、服务、经济和安全)。特征向量di={w1i,w2i,...,wji,...,wpi}wij表示关键词tj在文档Di中的相应权值。每篇文档对应于标记集合L中的一个或者多个性能标签,并有0和1构成一个二值向量yi,如果Di包含类别lj,则yji=1,否则为0。The multi-label training data set composed of car review texts is represented by (D, T, L), D={D 1 ,D 2 ,…,D n }={(d 1 ,y 1 ),(d 2 ,y 2 ),…(d n ,y n )}, representing a multi-label data set composed of n review documents of the car, each document D i is composed of a feature vector d i and a label vector y i (1<<i<<n), T=(t 1 ,t 2 ,...t p ) represents a feature set composed of p keywords selected from n review documents. L={l 1 , l 2 , . . . , l 6 } represents a tag set consisting of 6 tags (comfort, power, handling, service, economy, and safety). The feature vector d i = {w 1i , w 2i , . . . , w ji , . Each document corresponds to one or more performance labels in the label set L, and has 0 and 1 to form a binary vector y i , if D i contains category l j , then y ji =1, otherwise it is 0.

a)以X2统计度量一个词一某一个标签之间的相关性,公式如下:a) The correlation between a word and a certain label is measured by X 2 statistics, the formula is as follows:

其中,n表示文档总数,p(word,lj)表示词Word在文档Di中出现的次数(且lij=1),同理表示不在文档Di Among them, n represents the total number of documents, p(word,l j ) represents the number of times the word Word appears in the document D i (and l ij =1), similarly Indicates not in document D i

b)使用平均X2的聚合策略来度量X2的值,公式如下:b) Use the aggregation strategy of average X 2 to measure the value of X 2 , the formula is as follows:

将X2的值从高到低排序选取部分词作为特征项,一词频作为特征项的权值,使用向量空间模型对文本进行表示,并求得每篇评论文档的特征向量diSort the value of X 2 from high to low, select some words as feature items, and the frequency of a word as the weight of feature items, use the vector space model to represent the text, and obtain the feature vector d i of each review document.

c)采用SVM对文档进行分类,c) Use SVM to classify documents,

3)情感值的确定3) Determination of emotional value

根据新浪汽车的评价体系,当消费者对某项评价为1分或2分时,表示消费者对该项非常不满意;而给出5分时,则认为消费者对该项满意。对于一条评论文本,当评价分数小于等于2时,认为是负向文本,归属于负向文本集;当评价分数为5时,认为是正向文本,并入正向文本集。文本中每个词word的情感值S(word)计算方式为:According to the evaluation system of Sina Automobile, when consumers give an evaluation of 1 or 2 points, it means that consumers are very dissatisfied with the item; and when they give 5 points, it means that consumers are satisfied with the item. For a review text, when the evaluation score is less than or equal to 2, it is considered a negative text and belongs to the negative text set; when the evaluation score is 5, it is considered a positive text and incorporated into the positive text set. The sentiment value S(word) of each word word in the text is calculated as:

S(word)=P(word,pos)-P(word,neg)S(word)=P(word,pos)-P(word,neg)

其中f(word,pos)表示word在正向文本集只出现的频次,f(word)表示word在整个文本集中出现的次数;f(pos)表示正向文档的数量;M表示整个文本集的数量。同理可计算P(word,neg)的值。S(word)计算公式可化简为Among them, f(word, pos) indicates the frequency of word only appearing in the forward text set, f(word) indicates the number of times word appears in the entire text set; f(pos) indicates the number of forward documents; M indicates the number of the entire text set quantity. Similarly, the value of P(word, neg) can be calculated. The calculation formula of S(word) can be simplified as

则第i篇评论的情感值Srev(rk)为:Then the sentiment value S rev (r k ) of the i-th review is:

q表示第i篇评论文档中含有q个情感词典中的词。即每篇评论文本的情感值由每个词的情感值累加而成。q indicates that the i-th review document contains words in q sentiment dictionaries. That is, the sentiment value of each comment text is accumulated by the sentiment value of each word.

则某种型号汽车的评论情感值为:Then the comment sentiment value of a certain model of car is:

即为每一个篇论文本的情感值累加而成。将文本的分类之后的结果(分为六个方面:舒适、动力、操控、服务、经济和安全)分别计算其情感值和综合情感值。It is formed by accumulating the emotional value of each essay text. Calculate the emotional value and comprehensive emotional value of the text classification results (divided into six aspects: comfort, power, handling, service, economy, and safety).

4)预测4) Forecast

使用修改的AR模型进行预测。用yt表示第t个月的销售量(t=1,2,…,n)。Prediction using a modified AR model. Use y t to represent the sales volume in month t (t=1,2,...,n).

q表示第t个月之前q个月的情感因素的影响。wt表示第t个月的情感影响。将各个标签下的情感因素分别代入模型,通过训练集的对比可以找出消费者更看中汽车性能的哪一个方面。对汽车的下一阶段的生产作指导。q represents the impact of emotional factors in q months before month t. w t represents the emotional impact in month t. Substitute the emotional factors under each label into the model, and compare the training set to find out which aspect of car performance consumers value more. Guidance on the next phase of production of the car.

以上这些实施例应理解为仅用于说明本发明而不用于限制本发明的保护范围。在阅读了本发明的记载的内容之后,技术人员可以对本发明作各种改动或修改,这些等效变化和修饰同样落入本发明权利要求所限定的范围。The above embodiments should be understood as only for illustrating the present invention but not for limiting the protection scope of the present invention. After reading the contents of the present invention, skilled persons can make various changes or modifications to the present invention, and these equivalent changes and modifications also fall within the scope defined by the claims of the present invention.

Claims (7)

1.一种基于评论情感分析的汽车销量预测方法,其特征在于,包括如下步骤:1. A method for forecasting car sales based on comment sentiment analysis, characterized in that, comprising the steps: 1)、对汽车评论数据进行包括统一格式并剔除重复词汇在内的预处理;1) Carry out preprocessing on the car review data, including unified format and elimination of repeated words; 2)、利用中科院汉语语法系统对经过预处理后的汽车评论数据进行分词处理,去除停用词;2), use the Chinese Grammar System of the Chinese Academy of Sciences to perform word segmentation processing on the preprocessed car review data, and remove stop words; 3)、利用多标签分类技术对对步骤2分词处理后的评论数据集进行多标签分类;3), using multi-label classification technology to carry out multi-label classification to the comment data set after step 2 word segmentation processing; 4)、使用互信息技术对情感值进行量化,求得评论文本集的情感值;4), use mutual information technology to quantify the emotional value, and obtain the emotional value of the comment text set; 5)、将情感值融合进入回归模型预测下个阶段的汽车销量。5). Integrate the emotional value into the regression model to predict the car sales in the next stage. 2.根据权利要求1所述的基于评论情感分析的汽车销量预测方法,其特征在于,所述步骤1)将汽车评论数据分为舒适、动力、操控、服务、经济和安全六个方面,首先求出一个评论词与类标签之间的关系,公式如下:2. the automobile sales prediction method based on comment sentiment analysis according to claim 1, is characterized in that, described step 1) automobile comment data is divided into six aspects of comfort, power, handling, service, economy and safety, at first To find the relationship between a comment word and the class label, the formula is as follows: 其中,n表示文档总数,表示词word不在文档Di中,x2表示某一个词word和汽车某一方面lj之间的相关性,表示不含有lj方面,即p(word,lj)表示词Word在文档Di中出现的次数且lij=1,lj表示汽车的某一方面性能,j表示其中某一种性能编号(1≤j≤6),i表示第i篇文档。p(word)表示词word在文档Di中出现的次数,p(word)表示词word在文档Di中出现的次数,p(lj)文本集中lj出现的次数,表示词word不在文档Di出现的次数。where n represents the total number of documents, Indicates that the word word is not in the document D i , x 2 indicates the correlation between a certain word word and a certain aspect of the car l j , Indicates that it does not contain l j aspects, ie p(word, l j ) represents the number of times the word Word appears in the document D i and l ij = 1, l j represents a certain aspect of the performance of the car, and j represents a certain performance number (1≤j≤6), i represents the i-th document. p(word) indicates the number of occurrences of the word word in the document D i , p(word) indicates the number of occurrences of the word word in the document D i , p(l j ) the number of occurrences of l j in the text set, Indicates the number of times word word does not appear in document D i . 3.根据权利要求1或2所述的基于评论情感分析的汽车销量预测方法,其特征在于,所述步骤1)使用中科院计算所的汉语词法分析系统ICTCLAS3,首先将搜狗输入法中与汽车行业相关的细胞词库导入汉语词法分析系统,利用UltraEdit编辑器将非文本格式的词库解析出来,统一格式并剔除重复词汇。3. according to claim 1 or 2 described based on the car sales forecasting method of comment sentiment analysis, it is characterized in that, described step 1) uses the Chinese lexical analysis system ICTCLAS3 of Institute of Computing Technology, Chinese Academy of Sciences, at first in Sogou input method and automobile industry The relevant cell lexicon is imported into the Chinese lexical analysis system, and the non-text format lexicon is parsed out using the UltraEdit editor, the format is unified and duplicate words are eliminated. 4.根据权利要求3所述的基于评论情感分析的汽车销量预测方法,其特征在于,所述步骤2)将数词、代词、量词、拟声词、方位词、连词、叹词、后接成分和助词作为停用词。4. the automobile sales prediction method based on comment sentiment analysis according to claim 3, is characterized in that, described step 2) numeral, pronoun, quantifier, onomatopoeia, location word, conjunction, interjection, suffix Components and particles serve as stop words. 5.根据权利要求2所述的基于评论情感分析的汽车销量预测方法,其特征在于,所述使用平均X2的聚合策略来度量X2的值,公式如下:5. the method for predicting car sales based on comment sentiment analysis according to claim 2, characterized in that, the aggregation strategy using average X 2 measures the value of X 2 , and the formula is as follows: 将X2的值从高到低排序选取部分词作为特征项,一词频作为特征项的权值,使用向量空间模型对文本进行表示,并求得每篇评论文档的特征向量di,采用SVM对文档进行分类。Sort the value of X 2 from high to low, select some words as feature items, and the frequency of a word as the weight of feature items, use the vector space model to represent the text, and obtain the feature vector d i of each review document, using SVM Classify documents. 6.根据权利要求5所述的基于评论情感分析的汽车销量预测方法,其特征在于,所述步骤4)对情感值进行量化具体包括:6. the automobile sales forecasting method based on comment sentiment analysis according to claim 5, is characterized in that, described step 4) quantifying emotional value specifically comprises: 当评价分数小于等于2时,认为是负向文本,归属于负向文本集;当评价分数为5时,认为是正向文本,并入正向文本集,文本中每个词word的情感值S(word)计算方式为:When the evaluation score is less than or equal to 2, it is considered as a negative text and belongs to the negative text set; when the evaluation score is 5, it is considered as a positive text and incorporated into the positive text set. The emotional value S of each word in the text (word) is calculated as: S(word)=P(word,pos)-P(word,neg)S(word)=P(word,pos)-P(word,neg) 其中f(word,pos)表示word在正向文本集只出现的频次,f(word)表示word在整个文本集中出现的次数;f(pos)表示正向文档的数量;M表示整个文本集的数量,同理可计算P(word,neg)的值,P(word,neg)表示词word与负向文档之间的点互关系;Among them, f(word, pos) indicates the frequency of word only appearing in the forward text set, f(word) indicates the number of times word appears in the entire text set; f(pos) indicates the number of forward documents; M indicates the number of the entire text set Quantity, in the same way, the value of P(word, neg) can be calculated, and P(word, neg) represents the point relationship between the word word and the negative document; S(word)计算公式可化简为The calculation formula of S(word) can be simplified as 则第i篇评论的情感值Srev(rk)为:f(neg)表示负向文档的数量Then the sentiment value S rev (r k ) of the i-th review is: f(neg) represents the number of negative documents q表示第i篇评论文档中含有q个情感词典中的词,即每篇评论文本的情感值由每个词的情感值累加而成。q indicates that the i-th review document contains q words in the sentiment dictionary, that is, the sentiment value of each comment text is accumulated by the sentiment value of each word. 7.根据权利要求6所述的基于评论情感分析的汽车销量预测方法,其特征在于,所述步骤5)使用修改的回归模型AR模型进行预测,用yt表示第t个月的销售量,t=1,2,…,n;n表示未来某个月;7. the automobile sales prediction method based on comment sentiment analysis according to claim 6, is characterized in that, described step 5) uses the regression model AR model of revision to predict, represents the sales volume of the t month with y t , t=1,2,...,n; n represents a certain month in the future; q表示第t个月之前q个月的情感因素的影响,wt表示第t个月的情感影响,αi为最小二乘法得到的模型参数,P表示要考察的第t个月之前的P个月,i表示前P个月中的某个月,α0表示常数项,εt表示误差项,将各个标签下的情感因素分别代入模型,通过训练集的对比可以找出消费者更看中汽车性能的哪一个方面。q represents the influence of emotional factors in q months before the tth month, w t represents the emotional influence in the tth month, α i is the model parameter obtained by the least square method, and P represents the P month, i represents a certain month in the previous P months, α 0 represents a constant item, ε t represents an error term, and the emotional factors under each label are substituted into the model, and the comparison of the training set can find out what consumers are more interested in. Which aspect of the performance of the car.
CN201711229414.1A 2017-11-29 2017-11-29 A kind of automobile Method for Sales Forecast method based on comment sentiment analysis Pending CN108563647A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711229414.1A CN108563647A (en) 2017-11-29 2017-11-29 A kind of automobile Method for Sales Forecast method based on comment sentiment analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711229414.1A CN108563647A (en) 2017-11-29 2017-11-29 A kind of automobile Method for Sales Forecast method based on comment sentiment analysis

Publications (1)

Publication Number Publication Date
CN108563647A true CN108563647A (en) 2018-09-21

Family

ID=63529172

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711229414.1A Pending CN108563647A (en) 2017-11-29 2017-11-29 A kind of automobile Method for Sales Forecast method based on comment sentiment analysis

Country Status (1)

Country Link
CN (1) CN108563647A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110442717A (en) * 2019-08-08 2019-11-12 深巨科技(北京)有限公司 A kind of adaptability sentiment analysis system and method
CN111242671A (en) * 2019-12-30 2020-06-05 上海锐嘉科智能科技有限公司 Data acquisition and analysis system and method
CN111242679A (en) * 2020-01-08 2020-06-05 北京工业大学 A sales forecast method based on product review opinion mining
CN113393279A (en) * 2021-07-08 2021-09-14 北京沃东天骏信息技术有限公司 Order quantity estimation method and system
CN114022176A (en) * 2021-09-26 2022-02-08 上海电信工程有限公司 Method for predicting commodity sales on e-commerce platform and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106227756A (en) * 2016-07-14 2016-12-14 苏州大学 A kind of stock index forecasting method based on emotional semantic classification and system
US9633538B1 (en) * 2015-12-09 2017-04-25 International Business Machines Corporation System and method for wearable indication of personal risk within a workplace
CN106951514A (en) * 2017-03-17 2017-07-14 合肥工业大学 A kind of automobile Method for Sales Forecast method for considering brand emotion

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9633538B1 (en) * 2015-12-09 2017-04-25 International Business Machines Corporation System and method for wearable indication of personal risk within a workplace
CN106227756A (en) * 2016-07-14 2016-12-14 苏州大学 A kind of stock index forecasting method based on emotional semantic classification and system
CN106951514A (en) * 2017-03-17 2017-07-14 合肥工业大学 A kind of automobile Method for Sales Forecast method for considering brand emotion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马那那: "面向产品评论的情感文本分类研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110442717A (en) * 2019-08-08 2019-11-12 深巨科技(北京)有限公司 A kind of adaptability sentiment analysis system and method
CN111242671A (en) * 2019-12-30 2020-06-05 上海锐嘉科智能科技有限公司 Data acquisition and analysis system and method
CN111242679A (en) * 2020-01-08 2020-06-05 北京工业大学 A sales forecast method based on product review opinion mining
CN113393279A (en) * 2021-07-08 2021-09-14 北京沃东天骏信息技术有限公司 Order quantity estimation method and system
CN114022176A (en) * 2021-09-26 2022-02-08 上海电信工程有限公司 Method for predicting commodity sales on e-commerce platform and electronic equipment

Similar Documents

Publication Publication Date Title
CN107491531B (en) Chinese network comment sensibility classification method based on integrated study frame
CN104834747B (en) Short text classification method based on convolutional neural networks
CN102831184B (en) According to the method and system text description of social event being predicted to social affection
CN106951422B (en) Webpage training method and device, and search intention identification method and device
CN110825877A (en) A Semantic Similarity Analysis Method Based on Text Clustering
CN108563647A (en) A kind of automobile Method for Sales Forecast method based on comment sentiment analysis
CN112862569B (en) Product appearance style evaluation method and system based on image and text multi-modal data
CN105069141A (en) Construction method and construction system for stock standard news library
CN103310003A (en) Method and system for predicting click rate of new advertisement based on click log
CN114254653A (en) Scientific and technological project text semantic extraction and representation analysis method
US11893537B2 (en) Linguistic analysis of seed documents and peer groups
CN110134799B (en) BM25 algorithm-based text corpus construction and optimization method
CN110888983B (en) Positive and negative emotion analysis method, terminal equipment and storage medium
Zhang et al. Continuous word embeddings for detecting local text reuses at the semantic level
Azim et al. Text to emotion extraction using supervised machine learning techniques
CN111061939A (en) Scientific research academic news keyword matching recommendation method based on deep learning
CN107992550A (en) A kind of network comment analysis method and system
CN110851593A (en) Complex value word vector construction method based on position and semantics
Angelpreethi et al. An enhanced architecture for feature based opinion mining from product reviews
Wang et al. A novel calibrated label ranking based method for multiple emotions detection in Chinese microblogs
CN114416991A (en) A Prompt-based Text Sentiment Analysis Method and System
CN103177126B (en) For pornographic user query identification method and the equipment of search engine
Rachman et al. Word Embedding for Rhetorical Sentence Categorization on Scientific Articles.
Sun Research on product attribute extraction and classification method for online review
CN106569997B (en) Science and technology compound phrase identification method based on hidden Markov model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180921