CN108491377A

CN108491377A - A kind of electric business product comprehensive score method based on multi-dimension information fusion

Info

Publication number: CN108491377A
Application number: CN201810181878.8A
Authority: CN
Inventors: 徐新胜; 余建浙
Original assignee: China Jiliang University
Current assignee: China Jiliang University
Priority date: 2018-03-06
Filing date: 2018-03-06
Publication date: 2018-09-04
Anticipated expiration: 2038-03-06
Also published as: CN108491377B

Abstract

The electric business product comprehensive score method based on multi-dimension information fusion that the invention discloses a kind of, wherein the method includes：The acquisition of electric business product various dimensions information, mainly retail shop's information, sales volume information and comment text information；Data prediction, numeric type data carries out data cleansing and data convert, and the processing such as comment text is segmented, part-of-speech tagging；The excavation of various dimensions information, pass through Data induction to retail shop's information and Sales Volume of Commodity information and main composition regression analysis, retail shop's information index and Sales Volume of Commodity index are obtained, comment text carries out sentiment analysis, and product feature score radar map is obtained by quantization and clustering method；Commodity total score calculates, and designs fusion function and calculates commodity total score.The method of the present invention can be applied in the commercial product recommending system based on merchandise news, and be capable of high-efficient simple identifies best buy, and designed commending system is made to have performance more fast and accurately.

Description

A kind of electric business product comprehensive score method based on multi-dimension information fusion

Technical field

The present invention relates to natural language processing and Data Mining, especially a kind of commodity based on various dimensions information are commented Valence method.

Background technology

Along with the continuous promotion of Internet information technique, e-commerce industry is grown rapidly, and electric business platform has become One important channel of net purchase.But at the same time, consumer often faces some difficulties in net purchase commodity, such as fake and forged, False propaganda and the problems such as choose difficulty.Although many electric business platforms provide consumer feedback's mechanism, on network How the feedback information of accumulation quickly and effectively identifies valuable reference information how in boundless and indistinct more feedback information By the reference information of high value, the problem that commodity quality is most important is succinctly efficiently assessed.Currently, having there is part class Like research work.Tian Bo et al. introduces perception trust and trust systems, by the associated data fusion of electric business product, description The trust degree of commodity proposes a kind of e-commerce recommendation trust evaluation model.Pavilions Li Rui et al. merge historical trading situation with And current transaction value proposes a kind of commodity Quantitative Risk Assessment side being based on credit value, credit grade and commodity price Method.Based on the pre-warning indexes system of constructed electric business credit risk, consider current trading activity and transactions history, defend will really etc. People proposes a kind of e-commerce transaction assessment models of comprehensive degree of belief and risk.Pang is by carrying out film comment text Sentiment orientation is classified, and every film emotional category is obtained.Shi Wei et al. is based on《HowNet》With TF-IDF methods of weighting, excavate micro- The feeling polarities and emotional intensity of rich comment information.Lin Qin with et al. in view of the position that qualifier occurs in comment information it is different A kind of caused semantic difference, it is proposed that the product review analysis system of a sentiment analysis.

However, above-mentioned scholar considers the structural data in consumer feedback's information to the analyses of commodity or only, pass through Numerical value is calculated to the model of structural data, weighs commodity quality or the risk of commodity purchasing.It is only non-to those Structured data is excavated, by the Sentiment orientation quantization to comment information, the emotion score as evaluation object.This paper In, comprehensive analysis structured message and unstructured information, by being carried out to retail shop's prestige, Sales Volume of Commodity and comment text emotion Quantify to build the commodity comprehensive grade model of a various dimensions, it is more accurately objective to provide commodity for consumer even manufacturer A comprehensive score.

Invention content

The technical problem to be solved by the present invention is to：A kind of electric business product comprehensive score side of multi-dimension information fusion is provided Method crawls the relevant retail shop's information of electric business product, Sales Volume of Commodity information and comment text.Believe for retail shop's information and Sales Volume of Commodity Retail shop's information index and Sales Volume of Commodity index is calculated by Data induction and main composition regression analysis in the analysis of breath.For The emotion of comment text is excavated, and product feature extraction is carried out using Chinese Chunk, according to Apriori algorithm generate frequent item set with And TF-IDF threshold values are filtered candidate products feature, obtain product feature set, are clustered to candidate feature set, amount Change product feature score, obtain product feature score radar map, the final various dimensions information that merges provides commodity total score.It can be high Effect simplicity identifies best buy, and designed commending system is made to have performance more fast and accurately.

For this purpose, a kind of electric business Customer Satisfaction for Product analysis method based on machine learning proposed by the present invention includes as follows Step：

Step S1：Various dimensions acquisition of information uses web crawlers tool, crawls retail shop's information, the quotient of dependent merchandise first The comment text information of product sales volume information and commodity parses various dimensions information, is persisted in database by programming；

Step S2：Data prediction with JAVA language write program to structural data carry out deduplication, data conversion and The operations such as data regularization, while comment text use of information Chinese Academy of Sciences NLPIR participle tool is segmented, part-of-speech tagging and is stopped Use stop words；

Step S3：Retail shop's information and Sales Volume of Commodity information are analyzed in the excavation of various dimensions information, by Data induction and it is main at Part regression analysis, calculates separately out retail shop's credit index and Sales Volume of Commodity index, and it is special then to carry out product to comment text information The extraction for levying word pair, emotion score construction sentiment dictionary, product feature is clustered and calculates product feature.In order to more Add the emotion score for comprehensively and objectively analyzing commodity, considers the emotion score influence of degree adverb and negative word on evaluation phrase Afterwards, also product feature weight is added in the calculating of emotion score；

Step S4：Commodity total score calculates, in retail shop's credit index, the Sales Volume of Commodity that the various dimensions information excavating stage obtains Index and commodity emotion score, give each score certain weight, and the synthesis that commodity are calculated by linear weighting method is commented Point.

The beneficial effect of the present invention compared with the prior art is：The present invention proposes a kind of electric business of multi-dimension information fusion Product comprehensive score method, electric business product multi-dimensional data convergence analysis, more comprehensively, electric business product is studied in more fine granularity Comprehensive score.Based on various dimensions product data information, i.e. retail shop's reputation model, Sales Volume of Commodity exponential model and comment text emotion Score is more acurrate, comprehensive and objective then in conjunction with credit index and sales volume index mainly based on comment text emotion score Score commodity.Secondly, the emotion of feature word set and feature based word is extracted from product feature level angle Word set, combination product feature weight, emotion degree word, negative word, based on k-means++ clustering algorithms are improved, to product feature Product cluster is carried out, each cluster score is calculated according to cluster result, goes out comment text emotion in conjunction with the weight calculation of each cluster Score.Finally, fusion is weighted to multi-dimensional data score, assesses score as final products, is capable of the knowledge of high-efficient simple Do not go out best buy, makes designed commending system that there is performance more fast and accurately.

Description of the drawings

Fig. 1 is a kind of electric business product comprehensive score method of multi-dimension information fusion in the specific embodiment of the invention Flow diagram.

Specific implementation mode

To make the object, technical solutions and advantages of the present invention understand, the specific implementation mode of the present invention will be carried out below Clear, complete description.

As shown in Figure 1, for a kind of electric business product comprehensive score side of multi-dimension information fusion in present embodiment The flow chart of method.

This method includes：Step S1 various dimensions acquisition of information uses web crawlers tool, crawls the quotient of dependent merchandise first Information, the comment text information of Sales Volume of Commodity information and commodity are spread, various dimensions information is parsed, data are persisted to by programming In library；Step S2 data predictions write program with JAVA language and carry out deduplication, data conversion and data to structural data The operations such as reduction, while comment text use of information Chinese Academy of Sciences NLPIR participle tool is segmented, part-of-speech tagging and is stopped Word；Retail shop's information and Sales Volume of Commodity information are analyzed in the excavation of step S3 various dimensions information, are returned by Data induction and main composition Return analysis, calculate separately out retail shop's credit index and Sales Volume of Commodity index, product feature word then is carried out to comment text information To extraction, construction sentiment dictionary, product feature is clustered and calculates product feature emotion score.In order to more complete The emotion score of commodity is objectively analyzed in face, after considering that degree adverb and negative word influence the emotion score for evaluating phrase, Also product feature weight is added in the calculating of emotion score；Step S4 commodity total scores calculate, in various dimensions information excavating Retail shop's credit index, Sales Volume of Commodity index and the commodity emotion score that stage obtains, give each score certain weight, pass through Linear weighting method calculates the comprehensive score of commodity.

In specific embodiments, it can operate that (in following operation statement, we will be to mainstream electricity by following mode In quotient website for the score analysis of number money mobile phone, after each operating procedure, specific example is provided)：

Step S1：Using the Scrapy reptile frames of python, at random from Jingdone district, Suning and day cat mainstream electric business platform, Several money mobile phone products information, including mobile phone comment text information, sales volume information and corresponding retail shop's information are crawled respectively.It crawls There is repetition and default value in information, pass through the filling of deduplication and default value.Final 18666 comments of comment text quantity, hand Machine amount of money is 53 sections of mobile phones, and the retail shop's number for selling mobile phone is 14.Wherein, Jingdone district store and Suning's easily purchase are to rely on gondola sales, Therefore mobile phone sales mode is one-to-many, and day cat store is the form that retail shop enters, and is the marketing model of multi-to-multi, multiple hands Machine may be to be crawled from a retail shop, it is also possible to which then corresponding multiple retail shops are persisted in Mysql databases.

Step S2：It originally handles obtaining flat paper, mainly including text participle, part-of-speech tagging and word frequency statistics, so It is based on stop words afterwards and low-frequency word filters word segmentation result.Subdivided step is as follows：1) text participle and part-of-speech tagging：It is known that English style of writing in, be between word using space as nature delimiter, and Chinese only word, sentence and section can be by apparent Delimiter simply demarcate, the formal delimiter of word neither one only, although similarly there are the divisions of phrase for English Problem, but on word this layer, Chinese than complicated more, difficult more of English.Chinese word segmentation (Chinese Word Segmentation it) refers to a Chinese character sequence being cut into individual word one by one.Part-of-speech tagging is to above-mentioned point For word as a result, marking the part of speech of each word, the word of Modern Chinese can be divided into two classes, 14 kinds of parts of speech.The Chinese word segmentation that can be selected now It is relatively more with part-of-speech tagging tool, for example, ICTCLAS：Chinese lexical analysis system, this is that earliest Chinese is increased income participle project One of, activity obtains first place to ICTCLAS in the evaluation and test of 973 expert groups tissue at home, in first (2003) are international Multinomial first place is all obtained in the evaluation and test of literary treatment research mechanism SigHan tissues；Language cloud (language technology platform cloud LTP- Cloud it is) by the high in the clouds natural language processing service platform of Harbin Institute of Technology's social computing and the research and development of Research into information retrieval center.Rear end Language technology platform is relied on, language cloud has provided to the user including participle, part-of-speech tagging, interdependent syntactic analysis, name entity Abundant efficient natural language processing service including identification, semantic character labeling；" stammerer " Chinese word segmentation, does best Python Chinese word segmentation components.We consider accuracy rate, high efficiency and simplicity selection " stammerer " Chinese word segmentation of participle Tool (tool web site：http://www.oschina.net/p/jieba).2) word frequency statistics are carried out to word segmentation result：Create one A dictionary container is worth the frequency occurred for word using the word of word segmentation result as key, its main feature is that key-value pair stores, and store Key cannot must be repeated uniquely, be traversed to word segmentation result, and store the word that whole word segmentation results is obtained into dictionary container Frequently.

3) filtering of low-frequency word and stop words：Low-frequency word refers to the word that occurrence number is less in word frequency statistics, general mistake The occurrence number filtered is less than 3 word；Stop words refers in information retrieval, to save memory space and improving search effect Rate, before or after handling natural language data (or text) can automatic fitration fall certain words or word, such as " ", " I " etc. Word, these words or word are referred to as Stop Words (stop words).These stop words are all manually entered, non-automated generates , the stop words after generation can form a deactivated vocabulary.4) filtering of word segmentation result：, filter out the appearance in word segmentation result Low-frequency word and stop words.

We selected from the comment text of Taobao's money mobile phone commodity following several as example：

1 " very good mobile phone, workmanship texture is fabulous, and face is worth quick-fried table.”

2 " logistics in Jingdone district is super to praise, and mobile phone has begun to use, and function is normal, quality-high and inexpensive, be worth recommend.”

3 " mobile phone is fine, and quickly, telephone sound quality is pretty good for the speed of service.”

" stammerer " Chinese word segmentation and part-of-speech tagging official are described as：jieba.posseg.POSTokenizer (tokenizer=None) self-defined segmenter is created, tokenizer parameters may specify that inside uses Jieba.Tokenizer segmenter.Jieba.posseg.dt is acquiescence part-of-speech tagging segmenter.It marks each after sentence segments The part of speech of word, using the labelling method being compatible with ictclas.Specifically used method is as follows：

Import jieba.posseg as pseg

Mobile phone very good sentence=', workmanship texture is fabulous, and face is worth quick-fried table.'

Result=[str (a) for a in pseg.cut (sentence)]

print("".join(result))

To the above-mentioned participle of carry out and part-of-speech tagging step of sample text 1, treated, and display format is, space-separated is each A word, the part of speech of backslash this word after each word, the result finally shown are as follows：

" very/d is pretty good/a /uj mobile phones/n ,/x workmanships/v texture/n is fabulous/d /uj ,/x face value/quick-fried table/the v of n./ X ", wherein v represents that verb, n representation nouns, a represent adjective, d represents adverbial word, uj represents auxiliary word, x represents non-morpheme word.

Carrying out word frequency statistics to above-mentioned word segmentation result, the specific method is as follows：

Counting the result after word frequency is：' very ':1, ' good ':1, ' ':2, ' mobile phone ':1, ' workmanship ':1, ' matter Sense ':1, ' fabulous ':1, ' face value ':1, ' quick-fried table ':1 }, dictionary appearance is stored into using the combining form of word and word frequency as key-value pair In device, certain threshold value is given, using the word less than this threshold value as low-frequency word.

Step S3：The excavation of various dimensions information, the main calculating for including retail shop's prestige and sales volume index, product features word carry It takes and filters, the extraction of product feature word pair, the cluster of product features word and comment text emotion quantization score.Subdivided step It is as follows：1) calculating of retail shop's prestige and sales volume index：It is final to determine by integration to online retail shop's information and questionnaire The index for evaluating commodity reputation is as shown in table 1.

1 commodity reputation index of table

By table 1 it is found that commodity reputation score is mainly by retail shop's basic score, the half a year dynamic score of retail shop and commodity one It services score in a month to be formed, then by analyzing the business meaning of each index, induction and conclusion goes out commodity reputation score meter It is as follows to calculate formula：

STORE_reputation=α × BIS+ β × SDS, alpha+beta=1 (3)

(1) in formulaRepresent the industry average level of guarantee fund, α in (3) formula, β is weight parameter, remaining parameter is equal Relevant explanation can be found in table 1.

The mathematical description that the PCA of Sales Volume of Commodity index is returned is as follows, and influencing Sales Volume of Commodity has n influence factor, is denoted as

X={ x₁,x₂,…,x_i,…,x_n, i=1,2,3 ... n

And

In formula, θ₁,θ₂,…,θ_mIndicate the twiddle factor of each main composition, P₁,P₂,…,P_mIndicate through twiddle factor and Then the main composition that influence factor product obtains calculates the contribution degree of each main composition, determines the number of main composition.Assuming that determining M main composition numbers, using M main compositions as independent variable, Sales Volume of Commodity index is dependent variable, establishes following regression model：

(4) in formula, w₀For offset parameter, M is the main composition number chosen, Φ_j(P) it is basic function, takes Φ herein_j(P)= P_jFor as simple multiple linear regression.2) extraction of product features word and filtering：Chunk parsing is a kind of syntactic analysis. It can both can also be used as morphological analysis and be transitioned into sentence as the subtask for analyzing syntactic function in natural language processing system One bridge block of method analysis.According to the word segmentation result that step S2 is obtained each word Chinese is given in conjunction with the word relationship up and down of each word Language chunking craft label symbol, composing training model sample.It is then based on Chinese Chunk and carries out manual mark, give certain proportion Training set and test set, training product feature extraction model, model training completes to carry out product to all comment data collection to carry There are a certain amount of non-product features for the feature for taking, but extracting.Computer can not automatic identification candidate feature word whether be true Positive product feature, based on " product feature can repeat in comment text " it is assumed that Apriori algorithm can be used The product feature for constituting frequent item set is found as candidate products feature.But by observing the candidate feature set of product, hair Existing many non-product feature nouns, by these nominal definitions at stop words.Product feature set is obtained in order to more acurrate, is needed Candidate products feature is filtered again using corresponding filter algorithm.

It is as follows that product feature extracts detailed step：

1. determining the item collection and support counting of Apriori algorithm.Item collection X can be defined as：It is analyzed by Chinese Chunk The initialization set obtained afterwards.Things set T is defined as：The user comment set downloaded from network.Wherein one comment is used Family comment can be calculated as t_i(1≤i≤n)).Therefore T={ t₁,t₂,…t_n,}。

Support counting is expressed as：

Support is expressed as：

Wherein:X and Y be mutually disjoint phase collection (i.e.), N is user comment entry t_iQuantity.

Last set minimum support be 1%, find frequent item set in things set, using obtained frequent item set as Candidate products feature.

2. filtering stop words.By observing candidate products feature and the existing stop words of net being combined to construct product feature Stop words, wherein stop words mainly have following three classes：Name of product, such as " millet " " Meizu " " Huawei " etc.；People claims noun, example Such as " auntie " " colleague " " friend "；Orientation and time pronoun, such as " the inside " " morning " " evening " etc..It is simple by writing The product feature that computer program to candidate products feature obtain after stop words matching filtering is preliminary examination product feature set.

3. just trial product is special for the filtering of TF-IDF (Term Frequency-Inverse Document Frequency) algorithm Sign.

The computational methods of TF-IDF algorithms are as follows：

TF-IDF=TF_i,j×IDF_i (7)

(5) in formula, n_i,jIt is that some product feature word is commenting on d_jThe number of middle appearance, and ∑_kn_k,jIt is to occur in the comment Word quantity summation.(6) in formula, | D | indicate the total number of comment text, | t | j:t_i∈d_j| it indicates to include product feature Word t_iComment item number.

By crossing over many times confirmatory experiment, the TF-IDF values of most of non-product Feature Words are found 0.005 or more, Therefore filtering threshold is set to 0.005, and final product feature set is obtained after filtering.

3) extraction of product feature word pair：Hu et al. assumes that feature can occur with emotion word in commenting on sentence together, base In this it is assumed that after the product feature in being commented on, the character string of certain length before and after product feature is chosen, extraction feature is attached Emotion word of the close emotion word chunking as this feature, and product feature-emotion pair is formed with this feature, shaped like (feature, emotion Word).It uses herein apart from windowhood method, it is 6 to give window size, that is, is found out before and after product feature within the scope of 6 character strings Emotion word, with reference to Raymond Y.K.Lau et al. propose degree of membership algorithm test and assess emotion word, to extract product feature- Emotion pair.Algorithm core formula is as follows：

Wherein Pr (f) Pr (m) indicate that probability in the window occur in feature and viewpoint word respectively,Point Not Biao Shi feature and viewpoint word be not present in the probability in window, Pr (f, m) indicates feature and the probability that viewpoint word occurs simultaneously,Indicate that the probability that feature and viewpoint word do not occur, ω indicate to adjust the weight of positive and negative degree of membership.And Ensure to be subordinate to angle value between [0,1], degree of membership progress standardization processing is obtained:

4) cluster of product features word:Since product feature fine granularity is excessive, need to cluster all product features, Traditional K-Means clustering algorithms are simple and are easily achieved, and good Clustering Effect is obtained in many application scenarios, but from K- It is found during Means algorithms, the number K of the cluster centre in K-Means algorithms needs to specify in advance, for product feature Cluster, due to merchandise classification difference choose K values certainly be change, significant limitation is had based on this K-Means algorithm. Therefore, it is clustered herein using improved K-Means++ algorithms, initialization procedure of the K-Means++ algorithms in cluster centre In basic principle be so that the mutual distance between initial cluster centre as far as possible, can avoid the occurrence of above-mentioned in this way Problem.Improved K-Means++ product features term clustering algorithm description：

Input：Product feature set { F₁,F₂,…,F_n, similar matrix, that is, distance matrix of product feature word Wherein D_i,j=WSim (F_i,F_j) and the dimension term vector of product feature 100

Output：Product feature cluster result.

Step1：A Feature Words F is randomly selected from product feature set_iAs initial cluster center C₁；

Step2：Each product feature word and F are calculated first_iDistance, that is, D_i,j；Then calculating Feature Words are chosen as next The probability of a cluster centreFinally, K cluster centre is determined according to wheel disc method；

Step3：For each Feature Words F in product feature set_k, it is calculated to the distance at K center and is assigned to In cluster corresponding to the minimum cluster centre of distance；

Step4：Each feature word class C_i, recalculate its cluster centre(the matter of i.e. each cluster The heart)；

Step5：The 3rd step and the 4th step are repeated until the position of cluster centre no longer changes.

In conjunction with Zhong Guan-cun to the comment feature of mobile phone parametric classification and comment information, determine mobile phone evaluation object 6 Product attribute feature class is：Screen, hardware, network, camera shooting, appearance, function and service, last row spy are every one kind in weight The weight summation of product feature in cluster.The results are shown in Table 2：

2 product feature cluster result of table

5) comment text emotion quantifies score：Emotion qualifier coefficient setting method is as follows, by Hownet (How Net) The degree adverb filtered out in 219 degree adverbs and comment collection is bonded degree adverb collection and is divided into 5 grades, degree system Number set gradually for:0.6,0.8,1.2,1.4,1.6, if being free of degree adverb in comment, then it is 1 to enable degree coefficient, negative word Degree coefficient is uniformly set as -1.Sentiment dictionary selects《How Net》、《NTUSD》With《Chinese emotion vocabulary ontology library》, such as table Shown in 3：

3 sentiment dictionary of table

Sentiment dictionary	Front vocabulary	Neutral vocabulary	Negative vocabulary	Total vocabulary
					HowNet	4566	/	4370	8851
Chinese emotion dictionary	11229	5375	10783	27466
					NTUSD	2846	/	8325	10027

Pass through the quantization of said extracted and emotion dictionary, so that it may to calculate the comment text emotion score per money mobile phone, first Product feature is clustered into six dimensions, will be seen that the behavior pattern of mobile phone in all fields by drawing radar map, and According to the size of hexagon can to go out which kind of handset capability very much preferable for intuitive judgment, then by being obtained to above-mentioned six dimensions The weighted average divided, the as final comment text emotion score of mobile phone.

Step 4：A commodity can be comprehensively evaluated by merging three dimensions, auxiliary consumer chooses decision.Score (P_Credit),Score(P_Sales),Score(P_Reviews) there are the differences of dimension for the scores of three dimensions, before being merged Data normalization processing is carried out firstly the need of to score, uses fairly simple Min-Max standardizations here, processing formula is such as Under：

Commodity comprehensive grade model is as follows：

FinalScore (P)=α NScore (P_Credit)+βNScore(P_Sales)+γNScore(P_Revie)

In formula, alpha+beta+γ=1 indicates the weight of each dimension respectively, and α=0.23 is repeatedly tested by testing, β=0.14, γ=0.63.

It is right in order to verify the realization effect that product feature proposed by the present invention clusters emotion score and various dimensions blending algorithm Than certain literature review text emotion score calculate (Old-Score), cluster comment text emotion score calculate (New-Score), Cluster comment text emotion score combines (New-Sales-Score), cluster comment text emotion score and quotient with sales volume index Spread the mobile phone synthesis of prestige combination (New-Credit-Score), various dimensions fusion score (Mix-Score) and handmarking It scores (Lable-Score), it is as shown in table 4 to carry out accuracy comparison：

4 Experimental comparison results of table (Accuracy)

As can be seen from the table, the electric business product comprehensive score precision highest of the invention based on multi-dimension information fusion, Be capable of high-efficient simple identifies best buy, and designed commending system is made to have performance more fast and accurately.

Claims

1. a kind of electric business product comprehensive score method based on multi-dimension information fusion, it is characterized in that including the following steps：

Step S1：Various dimensions acquisition of information uses web crawlers tool, crawls retail shop's information, the commodity pin of dependent merchandise first Information and the comment text information of commodity are measured, various dimensions information is parsed, is persisted in database by programming；

Step S2：It is pre-processed using the data obtained in the step S1, program is write to structuring using JAVA language Data carry out the operations such as deduplication, data conversion and data regularization, while being segmented to comment text use of information Chinese Academy of Sciences NLPIR The processing such as tool segmented, part-of-speech tagging and deactivated stop words；

Step S3：The excavation of various dimensions information analyzes retail shop's information and commodity pin using the pretreated data of step S2 Amount information calculates separately out retail shop's credit index and Sales Volume of Commodity index, then by Data induction and main composition regression analysis The extraction of product feature word pair is carried out to comment text information, construction sentiment dictionary, product feature is clustered and is calculated The emotion score of product feature.In order to more comprehensively and objectively analyze the emotion score of commodity, degree adverb and negative are being considered After word influences the emotion score for evaluating phrase, also product feature weight is added in the calculating of emotion score；

Step S4：Commodity total score calculates, and analyzes three obtained index score using the step S3, including retail shop's prestige refers to Number, Sales Volume of Commodity index and commodity emotion score, give each score certain weight, quotient are calculated by linear weighting method The comprehensive score of product.

2. a kind of electric business product comprehensive score method based on multi-dimension information fusion as described in claim 1, characterized in that In the step S1, the acquisition of data is to utilize web crawlers tool, crawls retail shop's information, the Sales Volume of Commodity of dependent merchandise automatically The comment text information of information and commodity parses various dimensions information, is then persisted in associated databases by programming.

3. a kind of electric business product comprehensive score method based on multi-dimension information fusion as described in claim 1, characterized in that In the step S2, data prediction writes program with JAVA language and carries out deduplication, data conversion sum number to structural data According to operations such as reduction, while comment text use of information Chinese Academy of Sciences NLPIR participle tool is segmented, part-of-speech tagging and is deactivated Stop words.

4. a kind of electric business product comprehensive score method based on multi-dimension information fusion as described in claim 1, characterized in that In the step S3, the method for retail shop's credit index and Sales Volume of Commodity index is：Retail shop's credit index according to retail shop's basic score, Service score is formed in the half a year dynamic score of retail shop and commodity one month, is then contained by the business of each index of analysis Justice, induction and conclusion go out commodity reputation index；Sales Volume of Commodity index is to use PCA dimensionality reduction technologies, in conjunction with crawling sales volume influence factor Information constructs main composition, determines that sales volume returns index by the regression analysis of main composition.

5. a kind of electric business product comprehensive score method based on multi-dimension information fusion as described in claim 1, characterized in that In the step S3, comment text quantization emotion scoring method is：To comment text information carry out product feature word pair extraction, The emotion score for constructing sentiment dictionary, product feature being clustered and calculates product feature.In order to more comprehensively and objectively The emotion score for analyzing commodity, after considering that degree adverb and negative word influence the emotion score for evaluating phrase, also by product Feature weight is added in the calculating of emotion score.

6. a kind of electric business product comprehensive score method based on multi-dimension information fusion as described in claim 1, characterized in that In the step S4, commodity various dimensions information, including retail shop's credit score are merged, reflects retail shop's basic condition and prestige；Quotient The sales volume index of product can reflect commodity in pouplarity on the market；And comment on commodity text emotion score, it is consumption Person buys the gains in depth of comprehension after commodity use, and by analyzing the Sentiment orientation of these comment texts, quantization emotion is scalar value, passes through number Value judges commodity performance, gives each score certain weight, the comprehensive score of commodity is calculated by linear weighting method.