CN108563647A - A kind of automobile Method for Sales Forecast method based on comment sentiment analysis - Google Patents
A kind of automobile Method for Sales Forecast method based on comment sentiment analysis Download PDFInfo
- Publication number
- CN108563647A CN108563647A CN201711229414.1A CN201711229414A CN108563647A CN 108563647 A CN108563647 A CN 108563647A CN 201711229414 A CN201711229414 A CN 201711229414A CN 108563647 A CN108563647 A CN 108563647A
- Authority
- CN
- China
- Prior art keywords
- word
- comment
- document
- automobile
- indicate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0202—Market predictions or forecasting for commercial activities
Landscapes
- Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Engineering & Computer Science (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Data Mining & Analysis (AREA)
- Economics (AREA)
- Marketing (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of automobile Method for Sales Forecast method based on sentiment analysis is claimed in the present invention; comment data is obtained in car review website to pre-process data; comment data is divided into safety according to the usage experience of user using multi-tag sorting technique; comfortably; manipulation; power, economic and six aspects of service;Various aspects emotional factor is incorporated into model foundation emotion prediction model respectively.Automobile sales volume is predicted, which aspect that consumer more focuses on automotive performance is found out, to later production as guidance.This method operating process:User inputs previous sales data, brings data into model, obtains the Method for Sales Forecast data of the lower first quarter.This prediction technique improves prediction accuracy.
Description
Technical field
The invention belongs to automobile sales volume analysis prediction fields, particularly belong to a kind of comment emotion for being related to commenting on sentiment analysis
The automobile sales volume of analysis.
Background technology
Automobile Method for Sales Forecast technology refers to the pin with other data to some next stage according to previous sales data
Amount is estimated.Existing automobile Method for Sales Forecast technology mainly according to previous sales data, using autoregression model or
Grey Model technology.Limitation based on these prediction techniques is, is embedded in previous sales data and has ignored user's
The influence of comment data.The accuracy rate of Method for Sales Forecast model is helped to improve according to research online comment data.
The popular direction that prediction is current research is carried out based on car review data, but there are some difficult points such as in natural language
Say processing aspect (present comment category of language is various, and arbitrariness is also big, and cyberspeak is more).
Invention content
Present invention seek to address that the above problem of the prior art.Propose it is a kind of improve prediction accuracy based on comment
The automobile Method for Sales Forecast method of sentiment analysis.Technical scheme is as follows:
A kind of automobile Method for Sales Forecast method based on comment sentiment analysis comprising following steps:
1), car review data are carried out with the pretreatment including unified format and including rejecting repeated vocabulary;
2) it, is gone using Chinese Academy of Sciences's Chinese grammar system to carrying out word segmentation processing by pretreated car review data
Except stop words;
3), using multi-tag sorting technique to carrying out multi-tag classification to the comment data collection after step 2 word segmentation processing;
4), emotional value is quantified using mutual information technology, acquires the emotional value of comment text collection;
5), emotional value fusion is entered to the automobile sales volume in forecast of regression model next stage.
Further, car review data are divided into comfortable, power, manipulation, service, economy and safety six by the step 1)
A aspect, finds out the relationship between a comment word and class label first, and formula is as follows:
Wherein, n indicates total number of documents,Indicate word word not in document DiIn, x2Indicate some word word and vapour
Vehicle l in a certain respectjBetween correlation,It indicates not containing ljAspect, i.e.,p(word,lj) indicate word Word in text
Shelves DiThe number and l of middle appearanceij=1, ljThe performance in a certain respect for indicating automobile, uses L={ l1,l2,....,lj,…,l6Table
Show the tag set being made of 6 kinds of labels.The aspect set that multiple performances involved by specially collection of document D are constituted, uses
Six comfort, dynamic property, handling, service, economy and safety aspect of performance of automobile.J indicates wherein a certain
Performance (1≤j≤6), i indicate i-th document.P (word) indicates word word in document DiThe number of middle appearance, p (lj) text set
Middle ljThe number of appearance,Indicate word word not in document DiThe number of appearance.
Further, the step 1) uses the Chinese lexical analysis system ICTCLAS3 of the Computer Department of the Chinese Academy of Science, first will
Chinese lexical analysis system is imported with the relevant cell dictionary of automobile industry in search dog input method, utilizes UltraEdit editing machines
The dictionary of non-textual format is parsed, unified format simultaneously rejects repeated vocabulary.
Further, the step 2) by number, pronoun, quantifier, onomatopoeia, the noun of locality, conjunction, interjection, be followed by ingredient
With auxiliary word as stop words.
Further, described to use average X2Aggregation strategy measure X2Value, formula is as follows:
By X2Value sort selected part word from high to low as characteristic item, weights of the word frequency as characteristic item use
Vector space model is indicated text, and acquires the feature vector d of every comment documenti, document is divided using SVM
Class.
Further, the step 4) carries out quantifying to specifically include to emotional value:
When evaluation score is less than or equal to 2, it is believed that be negative sense text, belong to negative sense text set;When evaluation score is 5
When, it is believed that it is positive text, is incorporated to positive text set, emotional value S (word) calculation of each word word is in text:
S (word)=P (word, pos)-P (word, neg)
Wherein f (word, pos) indicates that the frequency that word only occurs in positive text set, f (word) indicate word entire
The number occurred in text set;F (pos) indicates the quantity of positive document;M indicates the quantity of entire text set, can similarly calculate P
The value of (word, neg).Mutual information between P (word, neg) word word and negative sense document.
S (word) calculation formula can abbreviation be
The then emotional value S of i-th commentrev(rk) be:F (neg) indicates the quantity of negative sense document.
Q indicates in i-th comment document that the emotional value of that is, every comment text is by every containing the word in q sentiment dictionary
The emotional value of a word is cumulative to be formed.
Further, the step 5) is predicted using the regression model AR models of modification, uses ytIt indicates t-th month
Sales volume, t=1,2 ..., n;N indicates some following moon.
The influence of q months emotional factors, w before q is indicated t-th monthtIndicate that t-th month emotion influences, αiFor minimum
The model parameter that square law obtains, P indicate to be investigated t-th month before P months, some moon in the P middle of the month before i is indicated,
α0Indicate constant term, εtIt indicates error term, the emotional factor under each label is substituted into model respectively, passes through the comparison of training set
Which aspect that consumer more takes a fancy to automotive performance can be found out.
It advantages of the present invention and has the beneficial effect that:
1, it is different from Classical forecast, using comment data, considers fancy grade of the user for product.It avoids causing data
Waste.
2, it can be predicted respectively using the comment data of automobile performance in a certain respect, find out consumer and more take a fancy to automobile
Which aspect performance.
3, keep prediction more accurate.
Description of the drawings
Fig. 1 is that the present invention provides preferred embodiment operational flowchart;
Fig. 2 is the multi-tag classification results figure of the present invention.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, detailed
Carefully describe.Described embodiment is only a part of the embodiment of the present invention.
The present invention solve above-mentioned technical problem technical solution be:
Network comment is pre-processed.Use the Chinese lexical analysis system (ICTCLAS3) of the Computer Department of the Chinese Academy of Science.It is first
With the relevant cell dictionary of automobile industry in first search dog input method, grammar system is imported, using UltraEdit editing machines by non-text
The dictionary of this format parses, and unified format simultaneously rejects repeated vocabulary.Stop words is removed according to word segmentation result, by number, generation
Word, onomatopoeia, the noun of locality, conjunction, interjection, is followed by ingredient and auxiliary word as stop words at quantifier.
1) classify to multiple labeling
It is indicated with (D, T, L) by the multiple labeling training dataset that car review text is constituted, D={ D1,D2,…,Dn}=
{(d1,y1),(d2,y2),…(dn,yn), it indicates to comment on the multiple labeling data set that document is constituted by a n pieces for automobile this stone,
Every document DiBy feature vector diWith label vector yiIt forms (1 < < i < < n), T=(t1,t2,…tp) indicate n comments
The characteristic set that the p keyword selected in document is constituted.L={ l1, l2..., l6Indicate the label sets being made of 6 kinds of labels
It closes (comfortable, power, manipulation, service, economy and safety).Feature vector di={ w1i, w2i..., wji..., wpi}wijIt indicates to close
Keyword tjIn document DiIn corresponding weight value.Every document corresponds to one or more performance label in tag set L, and
There is 0 and 1 one binary set y of compositioniIf DiIncluding classification lj, then yji=1, it is otherwise 0.
A) with X2Correlation between some label of the word one of statistical measures one, formula are as follows:
Wherein, n indicates total number of documents, p (word, lj) indicate word Word in document DiNumber (and the l of middle appearanceij=1),
SimilarlyIt indicates not in document DiIn
B) using average X2Aggregation strategy measure X2Value, formula is as follows:
By X2Value sort selected part word from high to low as characteristic item, weights of the word frequency as characteristic item use
Vector space model is indicated text, and acquires the feature vector d of every comment documenti。
C) classified to document using SVM,
3) determination of emotional value
According to the appraisement system of Sina's automobile, when consumer is evaluated as 1 point or 2 timesharing to a certain, consumer is to this for expression
Item is very dissatisfied;And provide 5 timesharing, then it is assumed that consumer is to this satisfaction.For a comment text, when evaluation score is small
When equal to 2, it is believed that be negative sense text, belong to negative sense text set;When evaluation score is 5, it is believed that be positive text, be incorporated to
Positive text set.Emotional value S (word) calculation of each word word is in text:
S (word)=P (word, pos)-P (word, neg)
Wherein f (word, pos) indicates that the frequency that word only occurs in positive text set, f (word) indicate word entire
The number occurred in text set;F (pos) indicates the quantity of positive document;M indicates the quantity of entire text set.P can similarly be calculated
The value of (word, neg).S (word) calculation formula can abbreviation be
The then emotional value S of i-th commentrev(rk) be:
Q indicates to contain the word in q sentiment dictionary in i-th comment document.The emotional value of i.e. every comment text is by every
The emotional value of a word is cumulative to be formed.
Then the comment emotional value of certain car model is:
The emotional value of as each paper sheet is cumulative to be formed.Result after the classification of text (is divided into six sides
Face:Comfortably, power, manipulation, service, economy and safely) calculate separately its emotional value and comprehensive emotional value.
4) it predicts
It is predicted using the AR models of modification.Use ytIndicate t-th month sales volume (t=1,2 ..., n).
The influence of q months emotional factors before q is indicated t-th month.wtIndicate that t-th month emotion influences.It will be each
Emotional factor under label substitutes into model respectively, by the comparison of training set can find out consumer more take a fancy to automotive performance which
On one side.It coaches to the production of the next stage of automobile.
The above embodiment is interpreted as being merely to illustrate the present invention rather than limit the scope of the invention.
After the content for having read the record of the present invention, technical staff can make various changes or modifications the present invention, these equivalent changes
Change and modification equally falls into the scope of the claims in the present invention.
Claims (7)
1. a kind of automobile Method for Sales Forecast method based on comment sentiment analysis, which is characterized in that include the following steps:
1), car review data are carried out with the pretreatment including unified format and including rejecting repeated vocabulary;
2), using Chinese Academy of Sciences's Chinese grammar system to carrying out word segmentation processing by pretreated car review data, removal stops
Word;
3), using multi-tag sorting technique to carrying out multi-tag classification to the comment data collection after step 2 word segmentation processing;
4), emotional value is quantified using mutual information technology, acquires the emotional value of comment text collection;
5), emotional value fusion is entered to the automobile sales volume in forecast of regression model next stage.
2. the automobile Method for Sales Forecast method according to claim 1 based on comment sentiment analysis, which is characterized in that the step
It is rapid that car review data 1) are divided into six comfortable, power, manipulation, service, economy and safety aspects, a comment is found out first
Relationship between word and class label, formula are as follows:
Wherein, n indicates total number of documents,Indicate word word not in document DiIn, x2Indicate that some word word and automobile are a certain
Aspect ljBetween correlation,It indicates not containing ljAspect, i.e.,p(word,lj) indicate word Word in document DiIn go out
Existing number and lij=1, ljIndicate that the performance in a certain respect of automobile, j indicate wherein a certain performance number (1≤j≤6), i tables
Show i-th document.P (word) indicates word word in document DiThe number of middle appearance, p (word) indicate word word in document DiIn
The number of appearance, p (lj) l in text setjThe number of appearance,Indicate word word not in document DiThe number of appearance.
3. the automobile Method for Sales Forecast method according to claim 1 or 2 based on comment sentiment analysis, which is characterized in that institute
State step 1) use the Computer Department of the Chinese Academy of Science Chinese lexical analysis system ICTCLAS3, first by search dog input method with garage
The relevant cell dictionary of industry imports Chinese lexical analysis system, using UltraEdit editing machines by the dictionary solution of non-textual format
It separates out, unified format simultaneously rejects repeated vocabulary.
4. the automobile Method for Sales Forecast method according to claim 3 based on comment sentiment analysis, which is characterized in that the step
It is rapid 2) using number, pronoun, quantifier, onomatopoeia, the noun of locality, conjunction, interjection, be followed by ingredient and auxiliary word as stop words.
5. the automobile Method for Sales Forecast method according to claim 2 based on comment sentiment analysis, which is characterized in that described to make
With average X2Aggregation strategy measure X2Value, formula is as follows:
By X2Value sort selected part word from high to low as characteristic item, weights of the word frequency as characteristic item use vector empty
Between model text is indicated, and acquire every comment document feature vector di, classified to document using SVM.
6. the automobile Method for Sales Forecast method according to claim 5 based on comment sentiment analysis, which is characterized in that the step
It is rapid 4) emotional value to be carried out quantifying to specifically include:
When evaluation score is less than or equal to 2, it is believed that be negative sense text, belong to negative sense text set;When evaluation score is 5, recognize
To be positive text, it is incorporated to positive text set, emotional value S (word) calculation of each word word is in text:
S (word)=P (word, pos)-P (word, neg)
Wherein f (word, pos) indicates that the frequency that word only occurs in positive text set, f (word) indicate word in entire text
Concentrate the number occurred;F (pos) indicates the quantity of positive document;M indicates the quantity of entire text set, can similarly calculate P
The value of (word, neg), P (word, neg) indicate the point mutual relation between word word and negative sense document;
S (word) calculation formula can abbreviation be
The then emotional value S of i-th commentrev(rk) be:F (neg) indicates the quantity of negative sense document
Q indicates in i-th comment document that the emotional value of that is, every comment text is by each word containing the word in q sentiment dictionary
Emotional value cumulative form.
7. the automobile Method for Sales Forecast method according to claim 6 based on comment sentiment analysis, which is characterized in that the step
It is rapid 5) to be predicted using the regression model AR models of modification, use ytIndicate t-th month sales volume, t=1,2 ..., n;N tables
Show some following moon;
The influence of q months emotional factors, w before q is indicated t-th monthtIndicate that t-th month emotion influences, αiFor least square
The model parameter that method obtains, P indicate to be investigated t-th month before P months, some moon in the P middle of the month, α before i is indicated0Table
Show constant term, εtIt indicates error term, the emotional factor under each label is substituted into model respectively, it can be with by the comparison of training set
Which aspect that consumer more takes a fancy to automotive performance found out.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711229414.1A CN108563647A (en) | 2017-11-29 | 2017-11-29 | A kind of automobile Method for Sales Forecast method based on comment sentiment analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711229414.1A CN108563647A (en) | 2017-11-29 | 2017-11-29 | A kind of automobile Method for Sales Forecast method based on comment sentiment analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108563647A true CN108563647A (en) | 2018-09-21 |
Family
ID=63529172
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711229414.1A Pending CN108563647A (en) | 2017-11-29 | 2017-11-29 | A kind of automobile Method for Sales Forecast method based on comment sentiment analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108563647A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110442717A (en) * | 2019-08-08 | 2019-11-12 | 深巨科技(北京)有限公司 | A kind of adaptability sentiment analysis system and method |
CN111242671A (en) * | 2019-12-30 | 2020-06-05 | 上海锐嘉科智能科技有限公司 | Data acquisition and analysis system and method |
CN111242679A (en) * | 2020-01-08 | 2020-06-05 | 北京工业大学 | Sales forecasting method based on product review viewpoint mining |
CN113393279A (en) * | 2021-07-08 | 2021-09-14 | 北京沃东天骏信息技术有限公司 | Order quantity estimation method and system |
CN114022176A (en) * | 2021-09-26 | 2022-02-08 | 上海电信工程有限公司 | Method for predicting commodity sales on e-commerce platform and electronic equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106227756A (en) * | 2016-07-14 | 2016-12-14 | 苏州大学 | A kind of stock index forecasting method based on emotional semantic classification and system |
US9633538B1 (en) * | 2015-12-09 | 2017-04-25 | International Business Machines Corporation | System and method for wearable indication of personal risk within a workplace |
CN106951514A (en) * | 2017-03-17 | 2017-07-14 | 合肥工业大学 | A kind of automobile Method for Sales Forecast method for considering brand emotion |
-
2017
- 2017-11-29 CN CN201711229414.1A patent/CN108563647A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9633538B1 (en) * | 2015-12-09 | 2017-04-25 | International Business Machines Corporation | System and method for wearable indication of personal risk within a workplace |
CN106227756A (en) * | 2016-07-14 | 2016-12-14 | 苏州大学 | A kind of stock index forecasting method based on emotional semantic classification and system |
CN106951514A (en) * | 2017-03-17 | 2017-07-14 | 合肥工业大学 | A kind of automobile Method for Sales Forecast method for considering brand emotion |
Non-Patent Citations (1)
Title |
---|
马那那: "面向产品评论的情感文本分类研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110442717A (en) * | 2019-08-08 | 2019-11-12 | 深巨科技(北京)有限公司 | A kind of adaptability sentiment analysis system and method |
CN111242671A (en) * | 2019-12-30 | 2020-06-05 | 上海锐嘉科智能科技有限公司 | Data acquisition and analysis system and method |
CN111242679A (en) * | 2020-01-08 | 2020-06-05 | 北京工业大学 | Sales forecasting method based on product review viewpoint mining |
CN113393279A (en) * | 2021-07-08 | 2021-09-14 | 北京沃东天骏信息技术有限公司 | Order quantity estimation method and system |
CN114022176A (en) * | 2021-09-26 | 2022-02-08 | 上海电信工程有限公司 | Method for predicting commodity sales on e-commerce platform and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Alaparthi et al. | Bidirectional Encoder Representations from Transformers (BERT): A sentiment analysis odyssey | |
Taj et al. | Sentiment analysis of news articles: a lexicon based approach | |
CN107491531B (en) | Chinese network comment sensibility classification method based on integrated study frame | |
CN108563647A (en) | A kind of automobile Method for Sales Forecast method based on comment sentiment analysis | |
Yadav et al. | Sentiment analysis of financial news using unsupervised approach | |
CN107256494B (en) | Article recommendation method and device | |
CN103646088B (en) | Product comment fine-grained emotional element extraction method based on CRFs and SVM | |
CN107704892A (en) | A kind of commodity code sorting technique and system based on Bayesian model | |
US20150032518A1 (en) | Systems and methods for managing publication of online advertisements | |
CN109492105B (en) | Text emotion classification method based on multi-feature ensemble learning | |
CN103034626A (en) | Emotion analyzing system and method | |
CN110888983B (en) | Positive and negative emotion analysis method, terminal equipment and storage medium | |
CN111538828A (en) | Text emotion analysis method and device, computer device and readable storage medium | |
CN112862569A (en) | Product appearance style evaluation method and system based on image and text multi-modal data | |
CN104462408A (en) | Topic modeling based multi-granularity sentiment analysis method | |
KR20110044112A (en) | Semi-automatic building of pattern database for mining review of product attributes | |
CN101556580A (en) | Stock comment classification system based on analysis of discourse structure and method | |
Yang et al. | Microblog sentiment analysis algorithm research and implementation based on classification | |
Lei et al. | Automatically classify chinese judgment documents utilizing machine learning algorithms | |
Dann et al. | Reconstructing the giant: Automating the categorization of scientific articles with deep learning techniques | |
Háva et al. | Supervised two-step feature extraction for structured representation of text data | |
Sun | Research on product attribute extraction and classification method for online review | |
CN107967260B (en) | Data processing method, device, system and computer readable medium | |
Wei et al. | The instructional design of Chinese text classification based on SVM | |
CN102622405B (en) | Method for computing text distance between short texts based on language content unit number evaluation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180921 |
|
RJ01 | Rejection of invention patent application after publication |