CN108563647A - A kind of automobile Method for Sales Forecast method based on comment sentiment analysis - Google Patents

A kind of automobile Method for Sales Forecast method based on comment sentiment analysis Download PDF

Info

Publication number
CN108563647A
CN108563647A CN201711229414.1A CN201711229414A CN108563647A CN 108563647 A CN108563647 A CN 108563647A CN 201711229414 A CN201711229414 A CN 201711229414A CN 108563647 A CN108563647 A CN 108563647A
Authority
CN
China
Prior art keywords
word
comment
document
automobile
indicate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711229414.1A
Other languages
Chinese (zh)
Inventor
周应华
商楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201711229414.1A priority Critical patent/CN108563647A/en
Publication of CN108563647A publication Critical patent/CN108563647A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Landscapes

  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of automobile Method for Sales Forecast method based on sentiment analysis is claimed in the present invention; comment data is obtained in car review website to pre-process data; comment data is divided into safety according to the usage experience of user using multi-tag sorting technique; comfortably; manipulation; power, economic and six aspects of service;Various aspects emotional factor is incorporated into model foundation emotion prediction model respectively.Automobile sales volume is predicted, which aspect that consumer more focuses on automotive performance is found out, to later production as guidance.This method operating process:User inputs previous sales data, brings data into model, obtains the Method for Sales Forecast data of the lower first quarter.This prediction technique improves prediction accuracy.

Description

A kind of automobile Method for Sales Forecast method based on comment sentiment analysis
Technical field
The invention belongs to automobile sales volume analysis prediction fields, particularly belong to a kind of comment emotion for being related to commenting on sentiment analysis The automobile sales volume of analysis.
Background technology
Automobile Method for Sales Forecast technology refers to the pin with other data to some next stage according to previous sales data Amount is estimated.Existing automobile Method for Sales Forecast technology mainly according to previous sales data, using autoregression model or Grey Model technology.Limitation based on these prediction techniques is, is embedded in previous sales data and has ignored user's The influence of comment data.The accuracy rate of Method for Sales Forecast model is helped to improve according to research online comment data.
The popular direction that prediction is current research is carried out based on car review data, but there are some difficult points such as in natural language Say processing aspect (present comment category of language is various, and arbitrariness is also big, and cyberspeak is more).
Invention content
Present invention seek to address that the above problem of the prior art.Propose it is a kind of improve prediction accuracy based on comment The automobile Method for Sales Forecast method of sentiment analysis.Technical scheme is as follows:
A kind of automobile Method for Sales Forecast method based on comment sentiment analysis comprising following steps:
1), car review data are carried out with the pretreatment including unified format and including rejecting repeated vocabulary;
2) it, is gone using Chinese Academy of Sciences's Chinese grammar system to carrying out word segmentation processing by pretreated car review data Except stop words;
3), using multi-tag sorting technique to carrying out multi-tag classification to the comment data collection after step 2 word segmentation processing;
4), emotional value is quantified using mutual information technology, acquires the emotional value of comment text collection;
5), emotional value fusion is entered to the automobile sales volume in forecast of regression model next stage.
Further, car review data are divided into comfortable, power, manipulation, service, economy and safety six by the step 1) A aspect, finds out the relationship between a comment word and class label first, and formula is as follows:
Wherein, n indicates total number of documents,Indicate word word not in document DiIn, x2Indicate some word word and vapour Vehicle l in a certain respectjBetween correlation,It indicates not containing ljAspect, i.e.,p(word,lj) indicate word Word in text Shelves DiThe number and l of middle appearanceij=1, ljThe performance in a certain respect for indicating automobile, uses L={ l1,l2,....,lj,…,l6Table Show the tag set being made of 6 kinds of labels.The aspect set that multiple performances involved by specially collection of document D are constituted, uses Six comfort, dynamic property, handling, service, economy and safety aspect of performance of automobile.J indicates wherein a certain Performance (1≤j≤6), i indicate i-th document.P (word) indicates word word in document DiThe number of middle appearance, p (lj) text set Middle ljThe number of appearance,Indicate word word not in document DiThe number of appearance.
Further, the step 1) uses the Chinese lexical analysis system ICTCLAS3 of the Computer Department of the Chinese Academy of Science, first will Chinese lexical analysis system is imported with the relevant cell dictionary of automobile industry in search dog input method, utilizes UltraEdit editing machines The dictionary of non-textual format is parsed, unified format simultaneously rejects repeated vocabulary.
Further, the step 2) by number, pronoun, quantifier, onomatopoeia, the noun of locality, conjunction, interjection, be followed by ingredient With auxiliary word as stop words.
Further, described to use average X2Aggregation strategy measure X2Value, formula is as follows:
By X2Value sort selected part word from high to low as characteristic item, weights of the word frequency as characteristic item use Vector space model is indicated text, and acquires the feature vector d of every comment documenti, document is divided using SVM Class.
Further, the step 4) carries out quantifying to specifically include to emotional value:
When evaluation score is less than or equal to 2, it is believed that be negative sense text, belong to negative sense text set;When evaluation score is 5 When, it is believed that it is positive text, is incorporated to positive text set, emotional value S (word) calculation of each word word is in text:
S (word)=P (word, pos)-P (word, neg)
Wherein f (word, pos) indicates that the frequency that word only occurs in positive text set, f (word) indicate word entire The number occurred in text set;F (pos) indicates the quantity of positive document;M indicates the quantity of entire text set, can similarly calculate P The value of (word, neg).Mutual information between P (word, neg) word word and negative sense document.
S (word) calculation formula can abbreviation be
The then emotional value S of i-th commentrev(rk) be:F (neg) indicates the quantity of negative sense document.
Q indicates in i-th comment document that the emotional value of that is, every comment text is by every containing the word in q sentiment dictionary The emotional value of a word is cumulative to be formed.
Further, the step 5) is predicted using the regression model AR models of modification, uses ytIt indicates t-th month Sales volume, t=1,2 ..., n;N indicates some following moon.
The influence of q months emotional factors, w before q is indicated t-th monthtIndicate that t-th month emotion influences, αiFor minimum The model parameter that square law obtains, P indicate to be investigated t-th month before P months, some moon in the P middle of the month before i is indicated, α0Indicate constant term, εtIt indicates error term, the emotional factor under each label is substituted into model respectively, passes through the comparison of training set Which aspect that consumer more takes a fancy to automotive performance can be found out.
It advantages of the present invention and has the beneficial effect that:
1, it is different from Classical forecast, using comment data, considers fancy grade of the user for product.It avoids causing data Waste.
2, it can be predicted respectively using the comment data of automobile performance in a certain respect, find out consumer and more take a fancy to automobile Which aspect performance.
3, keep prediction more accurate.
Description of the drawings
Fig. 1 is that the present invention provides preferred embodiment operational flowchart;
Fig. 2 is the multi-tag classification results figure of the present invention.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, detailed Carefully describe.Described embodiment is only a part of the embodiment of the present invention.
The present invention solve above-mentioned technical problem technical solution be:
Network comment is pre-processed.Use the Chinese lexical analysis system (ICTCLAS3) of the Computer Department of the Chinese Academy of Science.It is first With the relevant cell dictionary of automobile industry in first search dog input method, grammar system is imported, using UltraEdit editing machines by non-text The dictionary of this format parses, and unified format simultaneously rejects repeated vocabulary.Stop words is removed according to word segmentation result, by number, generation Word, onomatopoeia, the noun of locality, conjunction, interjection, is followed by ingredient and auxiliary word as stop words at quantifier.
1) classify to multiple labeling
It is indicated with (D, T, L) by the multiple labeling training dataset that car review text is constituted, D={ D1,D2,…,Dn}= {(d1,y1),(d2,y2),…(dn,yn), it indicates to comment on the multiple labeling data set that document is constituted by a n pieces for automobile this stone, Every document DiBy feature vector diWith label vector yiIt forms (1 < < i < < n), T=(t1,t2,…tp) indicate n comments The characteristic set that the p keyword selected in document is constituted.L={ l1, l2..., l6Indicate the label sets being made of 6 kinds of labels It closes (comfortable, power, manipulation, service, economy and safety).Feature vector di={ w1i, w2i..., wji..., wpi}wijIt indicates to close Keyword tjIn document DiIn corresponding weight value.Every document corresponds to one or more performance label in tag set L, and There is 0 and 1 one binary set y of compositioniIf DiIncluding classification lj, then yji=1, it is otherwise 0.
A) with X2Correlation between some label of the word one of statistical measures one, formula are as follows:
Wherein, n indicates total number of documents, p (word, lj) indicate word Word in document DiNumber (and the l of middle appearanceij=1), SimilarlyIt indicates not in document DiIn
B) using average X2Aggregation strategy measure X2Value, formula is as follows:
By X2Value sort selected part word from high to low as characteristic item, weights of the word frequency as characteristic item use Vector space model is indicated text, and acquires the feature vector d of every comment documenti
C) classified to document using SVM,
3) determination of emotional value
According to the appraisement system of Sina's automobile, when consumer is evaluated as 1 point or 2 timesharing to a certain, consumer is to this for expression Item is very dissatisfied;And provide 5 timesharing, then it is assumed that consumer is to this satisfaction.For a comment text, when evaluation score is small When equal to 2, it is believed that be negative sense text, belong to negative sense text set;When evaluation score is 5, it is believed that be positive text, be incorporated to Positive text set.Emotional value S (word) calculation of each word word is in text:
S (word)=P (word, pos)-P (word, neg)
Wherein f (word, pos) indicates that the frequency that word only occurs in positive text set, f (word) indicate word entire The number occurred in text set;F (pos) indicates the quantity of positive document;M indicates the quantity of entire text set.P can similarly be calculated The value of (word, neg).S (word) calculation formula can abbreviation be
The then emotional value S of i-th commentrev(rk) be:
Q indicates to contain the word in q sentiment dictionary in i-th comment document.The emotional value of i.e. every comment text is by every The emotional value of a word is cumulative to be formed.
Then the comment emotional value of certain car model is:
The emotional value of as each paper sheet is cumulative to be formed.Result after the classification of text (is divided into six sides Face:Comfortably, power, manipulation, service, economy and safely) calculate separately its emotional value and comprehensive emotional value.
4) it predicts
It is predicted using the AR models of modification.Use ytIndicate t-th month sales volume (t=1,2 ..., n).
The influence of q months emotional factors before q is indicated t-th month.wtIndicate that t-th month emotion influences.It will be each Emotional factor under label substitutes into model respectively, by the comparison of training set can find out consumer more take a fancy to automotive performance which On one side.It coaches to the production of the next stage of automobile.
The above embodiment is interpreted as being merely to illustrate the present invention rather than limit the scope of the invention. After the content for having read the record of the present invention, technical staff can make various changes or modifications the present invention, these equivalent changes Change and modification equally falls into the scope of the claims in the present invention.

Claims (7)

1. a kind of automobile Method for Sales Forecast method based on comment sentiment analysis, which is characterized in that include the following steps:
1), car review data are carried out with the pretreatment including unified format and including rejecting repeated vocabulary;
2), using Chinese Academy of Sciences's Chinese grammar system to carrying out word segmentation processing by pretreated car review data, removal stops Word;
3), using multi-tag sorting technique to carrying out multi-tag classification to the comment data collection after step 2 word segmentation processing;
4), emotional value is quantified using mutual information technology, acquires the emotional value of comment text collection;
5), emotional value fusion is entered to the automobile sales volume in forecast of regression model next stage.
2. the automobile Method for Sales Forecast method according to claim 1 based on comment sentiment analysis, which is characterized in that the step It is rapid that car review data 1) are divided into six comfortable, power, manipulation, service, economy and safety aspects, a comment is found out first Relationship between word and class label, formula are as follows:
Wherein, n indicates total number of documents,Indicate word word not in document DiIn, x2Indicate that some word word and automobile are a certain Aspect ljBetween correlation,It indicates not containing ljAspect, i.e.,p(word,lj) indicate word Word in document DiIn go out Existing number and lij=1, ljIndicate that the performance in a certain respect of automobile, j indicate wherein a certain performance number (1≤j≤6), i tables Show i-th document.P (word) indicates word word in document DiThe number of middle appearance, p (word) indicate word word in document DiIn The number of appearance, p (lj) l in text setjThe number of appearance,Indicate word word not in document DiThe number of appearance.
3. the automobile Method for Sales Forecast method according to claim 1 or 2 based on comment sentiment analysis, which is characterized in that institute State step 1) use the Computer Department of the Chinese Academy of Science Chinese lexical analysis system ICTCLAS3, first by search dog input method with garage The relevant cell dictionary of industry imports Chinese lexical analysis system, using UltraEdit editing machines by the dictionary solution of non-textual format It separates out, unified format simultaneously rejects repeated vocabulary.
4. the automobile Method for Sales Forecast method according to claim 3 based on comment sentiment analysis, which is characterized in that the step It is rapid 2) using number, pronoun, quantifier, onomatopoeia, the noun of locality, conjunction, interjection, be followed by ingredient and auxiliary word as stop words.
5. the automobile Method for Sales Forecast method according to claim 2 based on comment sentiment analysis, which is characterized in that described to make With average X2Aggregation strategy measure X2Value, formula is as follows:
By X2Value sort selected part word from high to low as characteristic item, weights of the word frequency as characteristic item use vector empty Between model text is indicated, and acquire every comment document feature vector di, classified to document using SVM.
6. the automobile Method for Sales Forecast method according to claim 5 based on comment sentiment analysis, which is characterized in that the step It is rapid 4) emotional value to be carried out quantifying to specifically include:
When evaluation score is less than or equal to 2, it is believed that be negative sense text, belong to negative sense text set;When evaluation score is 5, recognize To be positive text, it is incorporated to positive text set, emotional value S (word) calculation of each word word is in text:
S (word)=P (word, pos)-P (word, neg)
Wherein f (word, pos) indicates that the frequency that word only occurs in positive text set, f (word) indicate word in entire text Concentrate the number occurred;F (pos) indicates the quantity of positive document;M indicates the quantity of entire text set, can similarly calculate P The value of (word, neg), P (word, neg) indicate the point mutual relation between word word and negative sense document;
S (word) calculation formula can abbreviation be
The then emotional value S of i-th commentrev(rk) be:F (neg) indicates the quantity of negative sense document
Q indicates in i-th comment document that the emotional value of that is, every comment text is by each word containing the word in q sentiment dictionary Emotional value cumulative form.
7. the automobile Method for Sales Forecast method according to claim 6 based on comment sentiment analysis, which is characterized in that the step It is rapid 5) to be predicted using the regression model AR models of modification, use ytIndicate t-th month sales volume, t=1,2 ..., n;N tables Show some following moon;
The influence of q months emotional factors, w before q is indicated t-th monthtIndicate that t-th month emotion influences, αiFor least square The model parameter that method obtains, P indicate to be investigated t-th month before P months, some moon in the P middle of the month, α before i is indicated0Table Show constant term, εtIt indicates error term, the emotional factor under each label is substituted into model respectively, it can be with by the comparison of training set Which aspect that consumer more takes a fancy to automotive performance found out.
CN201711229414.1A 2017-11-29 2017-11-29 A kind of automobile Method for Sales Forecast method based on comment sentiment analysis Pending CN108563647A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711229414.1A CN108563647A (en) 2017-11-29 2017-11-29 A kind of automobile Method for Sales Forecast method based on comment sentiment analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711229414.1A CN108563647A (en) 2017-11-29 2017-11-29 A kind of automobile Method for Sales Forecast method based on comment sentiment analysis

Publications (1)

Publication Number Publication Date
CN108563647A true CN108563647A (en) 2018-09-21

Family

ID=63529172

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711229414.1A Pending CN108563647A (en) 2017-11-29 2017-11-29 A kind of automobile Method for Sales Forecast method based on comment sentiment analysis

Country Status (1)

Country Link
CN (1) CN108563647A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110442717A (en) * 2019-08-08 2019-11-12 深巨科技(北京)有限公司 A kind of adaptability sentiment analysis system and method
CN111242671A (en) * 2019-12-30 2020-06-05 上海锐嘉科智能科技有限公司 Data acquisition and analysis system and method
CN111242679A (en) * 2020-01-08 2020-06-05 北京工业大学 Sales forecasting method based on product review viewpoint mining
CN113393279A (en) * 2021-07-08 2021-09-14 北京沃东天骏信息技术有限公司 Order quantity estimation method and system
CN114022176A (en) * 2021-09-26 2022-02-08 上海电信工程有限公司 Method for predicting commodity sales on e-commerce platform and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106227756A (en) * 2016-07-14 2016-12-14 苏州大学 A kind of stock index forecasting method based on emotional semantic classification and system
US9633538B1 (en) * 2015-12-09 2017-04-25 International Business Machines Corporation System and method for wearable indication of personal risk within a workplace
CN106951514A (en) * 2017-03-17 2017-07-14 合肥工业大学 A kind of automobile Method for Sales Forecast method for considering brand emotion

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9633538B1 (en) * 2015-12-09 2017-04-25 International Business Machines Corporation System and method for wearable indication of personal risk within a workplace
CN106227756A (en) * 2016-07-14 2016-12-14 苏州大学 A kind of stock index forecasting method based on emotional semantic classification and system
CN106951514A (en) * 2017-03-17 2017-07-14 合肥工业大学 A kind of automobile Method for Sales Forecast method for considering brand emotion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马那那: "面向产品评论的情感文本分类研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110442717A (en) * 2019-08-08 2019-11-12 深巨科技(北京)有限公司 A kind of adaptability sentiment analysis system and method
CN111242671A (en) * 2019-12-30 2020-06-05 上海锐嘉科智能科技有限公司 Data acquisition and analysis system and method
CN111242679A (en) * 2020-01-08 2020-06-05 北京工业大学 Sales forecasting method based on product review viewpoint mining
CN113393279A (en) * 2021-07-08 2021-09-14 北京沃东天骏信息技术有限公司 Order quantity estimation method and system
CN114022176A (en) * 2021-09-26 2022-02-08 上海电信工程有限公司 Method for predicting commodity sales on e-commerce platform and electronic equipment

Similar Documents

Publication Publication Date Title
Alaparthi et al. Bidirectional Encoder Representations from Transformers (BERT): A sentiment analysis odyssey
Taj et al. Sentiment analysis of news articles: a lexicon based approach
CN107491531B (en) Chinese network comment sensibility classification method based on integrated study frame
CN108563647A (en) A kind of automobile Method for Sales Forecast method based on comment sentiment analysis
Yadav et al. Sentiment analysis of financial news using unsupervised approach
CN107256494B (en) Article recommendation method and device
CN103646088B (en) Product comment fine-grained emotional element extraction method based on CRFs and SVM
CN107704892A (en) A kind of commodity code sorting technique and system based on Bayesian model
US20150032518A1 (en) Systems and methods for managing publication of online advertisements
CN109492105B (en) Text emotion classification method based on multi-feature ensemble learning
CN103034626A (en) Emotion analyzing system and method
CN110888983B (en) Positive and negative emotion analysis method, terminal equipment and storage medium
CN111538828A (en) Text emotion analysis method and device, computer device and readable storage medium
CN112862569A (en) Product appearance style evaluation method and system based on image and text multi-modal data
CN104462408A (en) Topic modeling based multi-granularity sentiment analysis method
KR20110044112A (en) Semi-automatic building of pattern database for mining review of product attributes
CN101556580A (en) Stock comment classification system based on analysis of discourse structure and method
Yang et al. Microblog sentiment analysis algorithm research and implementation based on classification
Lei et al. Automatically classify chinese judgment documents utilizing machine learning algorithms
Dann et al. Reconstructing the giant: Automating the categorization of scientific articles with deep learning techniques
Háva et al. Supervised two-step feature extraction for structured representation of text data
Sun Research on product attribute extraction and classification method for online review
CN107967260B (en) Data processing method, device, system and computer readable medium
Wei et al. The instructional design of Chinese text classification based on SVM
CN102622405B (en) Method for computing text distance between short texts based on language content unit number evaluation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180921

RJ01 Rejection of invention patent application after publication