CN108491377A - A kind of electric business product comprehensive score method based on multi-dimension information fusion - Google Patents

A kind of electric business product comprehensive score method based on multi-dimension information fusion Download PDF

Info

Publication number
CN108491377A
CN108491377A CN201810181878.8A CN201810181878A CN108491377A CN 108491377 A CN108491377 A CN 108491377A CN 201810181878 A CN201810181878 A CN 201810181878A CN 108491377 A CN108491377 A CN 108491377A
Authority
CN
China
Prior art keywords
information
commodity
score
index
product
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810181878.8A
Other languages
Chinese (zh)
Other versions
CN108491377B (en
Inventor
徐新胜
余建浙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Jiliang University
Original Assignee
China Jiliang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Jiliang University filed Critical China Jiliang University
Priority to CN201810181878.8A priority Critical patent/CN108491377B/en
Publication of CN108491377A publication Critical patent/CN108491377A/en
Application granted granted Critical
Publication of CN108491377B publication Critical patent/CN108491377B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0282Rating or review of business operators or products

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Accounting & Taxation (AREA)
  • Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Evolutionary Biology (AREA)
  • Algebra (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The electric business product comprehensive score method based on multi-dimension information fusion that the invention discloses a kind of, wherein the method includes:The acquisition of electric business product various dimensions information, mainly retail shop's information, sales volume information and comment text information;Data prediction, numeric type data carries out data cleansing and data convert, and the processing such as comment text is segmented, part-of-speech tagging;The excavation of various dimensions information, pass through Data induction to retail shop's information and Sales Volume of Commodity information and main composition regression analysis, retail shop's information index and Sales Volume of Commodity index are obtained, comment text carries out sentiment analysis, and product feature score radar map is obtained by quantization and clustering method;Commodity total score calculates, and designs fusion function and calculates commodity total score.The method of the present invention can be applied in the commercial product recommending system based on merchandise news, and be capable of high-efficient simple identifies best buy, and designed commending system is made to have performance more fast and accurately.

Description

A kind of electric business product comprehensive score method based on multi-dimension information fusion
Technical field
The present invention relates to natural language processing and Data Mining, especially a kind of commodity based on various dimensions information are commented Valence method.
Background technology
Along with the continuous promotion of Internet information technique, e-commerce industry is grown rapidly, and electric business platform has become One important channel of net purchase.But at the same time, consumer often faces some difficulties in net purchase commodity, such as fake and forged, False propaganda and the problems such as choose difficulty.Although many electric business platforms provide consumer feedback's mechanism, on network How the feedback information of accumulation quickly and effectively identifies valuable reference information how in boundless and indistinct more feedback information By the reference information of high value, the problem that commodity quality is most important is succinctly efficiently assessed.Currently, having there is part class Like research work.Tian Bo et al. introduces perception trust and trust systems, by the associated data fusion of electric business product, description The trust degree of commodity proposes a kind of e-commerce recommendation trust evaluation model.Pavilions Li Rui et al. merge historical trading situation with And current transaction value proposes a kind of commodity Quantitative Risk Assessment side being based on credit value, credit grade and commodity price Method.Based on the pre-warning indexes system of constructed electric business credit risk, consider current trading activity and transactions history, defend will really etc. People proposes a kind of e-commerce transaction assessment models of comprehensive degree of belief and risk.Pang is by carrying out film comment text Sentiment orientation is classified, and every film emotional category is obtained.Shi Wei et al. is based on《HowNet》With TF-IDF methods of weighting, excavate micro- The feeling polarities and emotional intensity of rich comment information.Lin Qin with et al. in view of the position that qualifier occurs in comment information it is different A kind of caused semantic difference, it is proposed that the product review analysis system of a sentiment analysis.
However, above-mentioned scholar considers the structural data in consumer feedback's information to the analyses of commodity or only, pass through Numerical value is calculated to the model of structural data, weighs commodity quality or the risk of commodity purchasing.It is only non-to those Structured data is excavated, by the Sentiment orientation quantization to comment information, the emotion score as evaluation object.This paper In, comprehensive analysis structured message and unstructured information, by being carried out to retail shop's prestige, Sales Volume of Commodity and comment text emotion Quantify to build the commodity comprehensive grade model of a various dimensions, it is more accurately objective to provide commodity for consumer even manufacturer A comprehensive score.
Invention content
The technical problem to be solved by the present invention is to:A kind of electric business product comprehensive score side of multi-dimension information fusion is provided Method crawls the relevant retail shop's information of electric business product, Sales Volume of Commodity information and comment text.Believe for retail shop's information and Sales Volume of Commodity Retail shop's information index and Sales Volume of Commodity index is calculated by Data induction and main composition regression analysis in the analysis of breath.For The emotion of comment text is excavated, and product feature extraction is carried out using Chinese Chunk, according to Apriori algorithm generate frequent item set with And TF-IDF threshold values are filtered candidate products feature, obtain product feature set, are clustered to candidate feature set, amount Change product feature score, obtain product feature score radar map, the final various dimensions information that merges provides commodity total score.It can be high Effect simplicity identifies best buy, and designed commending system is made to have performance more fast and accurately.
For this purpose, a kind of electric business Customer Satisfaction for Product analysis method based on machine learning proposed by the present invention includes as follows Step:
Step S1:Various dimensions acquisition of information uses web crawlers tool, crawls retail shop's information, the quotient of dependent merchandise first The comment text information of product sales volume information and commodity parses various dimensions information, is persisted in database by programming;
Step S2:Data prediction with JAVA language write program to structural data carry out deduplication, data conversion and The operations such as data regularization, while comment text use of information Chinese Academy of Sciences NLPIR participle tool is segmented, part-of-speech tagging and is stopped Use stop words;
Step S3:Retail shop's information and Sales Volume of Commodity information are analyzed in the excavation of various dimensions information, by Data induction and it is main at Part regression analysis, calculates separately out retail shop's credit index and Sales Volume of Commodity index, and it is special then to carry out product to comment text information The extraction for levying word pair, emotion score construction sentiment dictionary, product feature is clustered and calculates product feature.In order to more Add the emotion score for comprehensively and objectively analyzing commodity, considers the emotion score influence of degree adverb and negative word on evaluation phrase Afterwards, also product feature weight is added in the calculating of emotion score;
Step S4:Commodity total score calculates, in retail shop's credit index, the Sales Volume of Commodity that the various dimensions information excavating stage obtains Index and commodity emotion score, give each score certain weight, and the synthesis that commodity are calculated by linear weighting method is commented Point.
The beneficial effect of the present invention compared with the prior art is:The present invention proposes a kind of electric business of multi-dimension information fusion Product comprehensive score method, electric business product multi-dimensional data convergence analysis, more comprehensively, electric business product is studied in more fine granularity Comprehensive score.Based on various dimensions product data information, i.e. retail shop's reputation model, Sales Volume of Commodity exponential model and comment text emotion Score is more acurrate, comprehensive and objective then in conjunction with credit index and sales volume index mainly based on comment text emotion score Score commodity.Secondly, the emotion of feature word set and feature based word is extracted from product feature level angle Word set, combination product feature weight, emotion degree word, negative word, based on k-means++ clustering algorithms are improved, to product feature Product cluster is carried out, each cluster score is calculated according to cluster result, goes out comment text emotion in conjunction with the weight calculation of each cluster Score.Finally, fusion is weighted to multi-dimensional data score, assesses score as final products, is capable of the knowledge of high-efficient simple Do not go out best buy, makes designed commending system that there is performance more fast and accurately.
Description of the drawings
Fig. 1 is a kind of electric business product comprehensive score method of multi-dimension information fusion in the specific embodiment of the invention Flow diagram.
Specific implementation mode
To make the object, technical solutions and advantages of the present invention understand, the specific implementation mode of the present invention will be carried out below Clear, complete description.
As shown in Figure 1, for a kind of electric business product comprehensive score side of multi-dimension information fusion in present embodiment The flow chart of method.
This method includes:Step S1 various dimensions acquisition of information uses web crawlers tool, crawls the quotient of dependent merchandise first Information, the comment text information of Sales Volume of Commodity information and commodity are spread, various dimensions information is parsed, data are persisted to by programming In library;Step S2 data predictions write program with JAVA language and carry out deduplication, data conversion and data to structural data The operations such as reduction, while comment text use of information Chinese Academy of Sciences NLPIR participle tool is segmented, part-of-speech tagging and is stopped Word;Retail shop's information and Sales Volume of Commodity information are analyzed in the excavation of step S3 various dimensions information, are returned by Data induction and main composition Return analysis, calculate separately out retail shop's credit index and Sales Volume of Commodity index, product feature word then is carried out to comment text information To extraction, construction sentiment dictionary, product feature is clustered and calculates product feature emotion score.In order to more complete The emotion score of commodity is objectively analyzed in face, after considering that degree adverb and negative word influence the emotion score for evaluating phrase, Also product feature weight is added in the calculating of emotion score;Step S4 commodity total scores calculate, in various dimensions information excavating Retail shop's credit index, Sales Volume of Commodity index and the commodity emotion score that stage obtains, give each score certain weight, pass through Linear weighting method calculates the comprehensive score of commodity.
In specific embodiments, it can operate that (in following operation statement, we will be to mainstream electricity by following mode In quotient website for the score analysis of number money mobile phone, after each operating procedure, specific example is provided):
Step S1:Using the Scrapy reptile frames of python, at random from Jingdone district, Suning and day cat mainstream electric business platform, Several money mobile phone products information, including mobile phone comment text information, sales volume information and corresponding retail shop's information are crawled respectively.It crawls There is repetition and default value in information, pass through the filling of deduplication and default value.Final 18666 comments of comment text quantity, hand Machine amount of money is 53 sections of mobile phones, and the retail shop's number for selling mobile phone is 14.Wherein, Jingdone district store and Suning's easily purchase are to rely on gondola sales, Therefore mobile phone sales mode is one-to-many, and day cat store is the form that retail shop enters, and is the marketing model of multi-to-multi, multiple hands Machine may be to be crawled from a retail shop, it is also possible to which then corresponding multiple retail shops are persisted in Mysql databases.
Step S2:It originally handles obtaining flat paper, mainly including text participle, part-of-speech tagging and word frequency statistics, so It is based on stop words afterwards and low-frequency word filters word segmentation result.Subdivided step is as follows:1) text participle and part-of-speech tagging:It is known that English style of writing in, be between word using space as nature delimiter, and Chinese only word, sentence and section can be by apparent Delimiter simply demarcate, the formal delimiter of word neither one only, although similarly there are the divisions of phrase for English Problem, but on word this layer, Chinese than complicated more, difficult more of English.Chinese word segmentation (Chinese Word Segmentation it) refers to a Chinese character sequence being cut into individual word one by one.Part-of-speech tagging is to above-mentioned point For word as a result, marking the part of speech of each word, the word of Modern Chinese can be divided into two classes, 14 kinds of parts of speech.The Chinese word segmentation that can be selected now It is relatively more with part-of-speech tagging tool, for example, ICTCLAS:Chinese lexical analysis system, this is that earliest Chinese is increased income participle project One of, activity obtains first place to ICTCLAS in the evaluation and test of 973 expert groups tissue at home, in first (2003) are international Multinomial first place is all obtained in the evaluation and test of literary treatment research mechanism SigHan tissues;Language cloud (language technology platform cloud LTP- Cloud it is) by the high in the clouds natural language processing service platform of Harbin Institute of Technology's social computing and the research and development of Research into information retrieval center.Rear end Language technology platform is relied on, language cloud has provided to the user including participle, part-of-speech tagging, interdependent syntactic analysis, name entity Abundant efficient natural language processing service including identification, semantic character labeling;" stammerer " Chinese word segmentation, does best Python Chinese word segmentation components.We consider accuracy rate, high efficiency and simplicity selection " stammerer " Chinese word segmentation of participle Tool (tool web site:http://www.oschina.net/p/jieba).2) word frequency statistics are carried out to word segmentation result:Create one A dictionary container is worth the frequency occurred for word using the word of word segmentation result as key, its main feature is that key-value pair stores, and store Key cannot must be repeated uniquely, be traversed to word segmentation result, and store the word that whole word segmentation results is obtained into dictionary container Frequently.
3) filtering of low-frequency word and stop words:Low-frequency word refers to the word that occurrence number is less in word frequency statistics, general mistake The occurrence number filtered is less than 3 word;Stop words refers in information retrieval, to save memory space and improving search effect Rate, before or after handling natural language data (or text) can automatic fitration fall certain words or word, such as " ", " I " etc. Word, these words or word are referred to as Stop Words (stop words).These stop words are all manually entered, non-automated generates , the stop words after generation can form a deactivated vocabulary.4) filtering of word segmentation result:, filter out the appearance in word segmentation result Low-frequency word and stop words.
We selected from the comment text of Taobao's money mobile phone commodity following several as example:
1 " very good mobile phone, workmanship texture is fabulous, and face is worth quick-fried table.”
2 " logistics in Jingdone district is super to praise, and mobile phone has begun to use, and function is normal, quality-high and inexpensive, be worth recommend.”
3 " mobile phone is fine, and quickly, telephone sound quality is pretty good for the speed of service.”
" stammerer " Chinese word segmentation and part-of-speech tagging official are described as:jieba.posseg.POSTokenizer (tokenizer=None) self-defined segmenter is created, tokenizer parameters may specify that inside uses Jieba.Tokenizer segmenter.Jieba.posseg.dt is acquiescence part-of-speech tagging segmenter.It marks each after sentence segments The part of speech of word, using the labelling method being compatible with ictclas.Specifically used method is as follows:
Import jieba.posseg as pseg
Mobile phone very good sentence=', workmanship texture is fabulous, and face is worth quick-fried table.'
Result=[str (a) for a in pseg.cut (sentence)]
print("".join(result))
To the above-mentioned participle of carry out and part-of-speech tagging step of sample text 1, treated, and display format is, space-separated is each A word, the part of speech of backslash this word after each word, the result finally shown are as follows:
" very/d is pretty good/a /uj mobile phones/n ,/x workmanships/v texture/n is fabulous/d /uj ,/x face value/quick-fried table/the v of n./ X ", wherein v represents that verb, n representation nouns, a represent adjective, d represents adverbial word, uj represents auxiliary word, x represents non-morpheme word.
Carrying out word frequency statistics to above-mentioned word segmentation result, the specific method is as follows:
Counting the result after word frequency is:' very ':1, ' good ':1, ' ':2, ' mobile phone ':1, ' workmanship ':1, ' matter Sense ':1, ' fabulous ':1, ' face value ':1, ' quick-fried table ':1 }, dictionary appearance is stored into using the combining form of word and word frequency as key-value pair In device, certain threshold value is given, using the word less than this threshold value as low-frequency word.
Step S3:The excavation of various dimensions information, the main calculating for including retail shop's prestige and sales volume index, product features word carry It takes and filters, the extraction of product feature word pair, the cluster of product features word and comment text emotion quantization score.Subdivided step It is as follows:1) calculating of retail shop's prestige and sales volume index:It is final to determine by integration to online retail shop's information and questionnaire The index for evaluating commodity reputation is as shown in table 1.
1 commodity reputation index of table
By table 1 it is found that commodity reputation score is mainly by retail shop's basic score, the half a year dynamic score of retail shop and commodity one It services score in a month to be formed, then by analyzing the business meaning of each index, induction and conclusion goes out commodity reputation score meter It is as follows to calculate formula:
STOREreputation=α × BIS+ β × SDS, alpha+beta=1 (3)
(1) in formulaRepresent the industry average level of guarantee fund, α in (3) formula, β is weight parameter, remaining parameter is equal Relevant explanation can be found in table 1.
The mathematical description that the PCA of Sales Volume of Commodity index is returned is as follows, and influencing Sales Volume of Commodity has n influence factor, is denoted as
X={ x1,x2,…,xi,…,xn, i=1,2,3 ... n
And
In formula, θ12,…,θmIndicate the twiddle factor of each main composition, P1,P2,…,PmIndicate through twiddle factor and Then the main composition that influence factor product obtains calculates the contribution degree of each main composition, determines the number of main composition.Assuming that determining M main composition numbers, using M main compositions as independent variable, Sales Volume of Commodity index is dependent variable, establishes following regression model:
(4) in formula, w0For offset parameter, M is the main composition number chosen, Φj(P) it is basic function, takes Φ hereinj(P)= PjFor as simple multiple linear regression.2) extraction of product features word and filtering:Chunk parsing is a kind of syntactic analysis. It can both can also be used as morphological analysis and be transitioned into sentence as the subtask for analyzing syntactic function in natural language processing system One bridge block of method analysis.According to the word segmentation result that step S2 is obtained each word Chinese is given in conjunction with the word relationship up and down of each word Language chunking craft label symbol, composing training model sample.It is then based on Chinese Chunk and carries out manual mark, give certain proportion Training set and test set, training product feature extraction model, model training completes to carry out product to all comment data collection to carry There are a certain amount of non-product features for the feature for taking, but extracting.Computer can not automatic identification candidate feature word whether be true Positive product feature, based on " product feature can repeat in comment text " it is assumed that Apriori algorithm can be used The product feature for constituting frequent item set is found as candidate products feature.But by observing the candidate feature set of product, hair Existing many non-product feature nouns, by these nominal definitions at stop words.Product feature set is obtained in order to more acurrate, is needed Candidate products feature is filtered again using corresponding filter algorithm.
It is as follows that product feature extracts detailed step:
1. determining the item collection and support counting of Apriori algorithm.Item collection X can be defined as:It is analyzed by Chinese Chunk The initialization set obtained afterwards.Things set T is defined as:The user comment set downloaded from network.Wherein one comment is used Family comment can be calculated as ti(1≤i≤n)).Therefore T={ t1,t2,…tn,}。
Support counting is expressed as:
Support is expressed as:
Wherein:X and Y be mutually disjoint phase collection (i.e.), N is user comment entry tiQuantity.
Last set minimum support be 1%, find frequent item set in things set, using obtained frequent item set as Candidate products feature.
2. filtering stop words.By observing candidate products feature and the existing stop words of net being combined to construct product feature Stop words, wherein stop words mainly have following three classes:Name of product, such as " millet " " Meizu " " Huawei " etc.;People claims noun, example Such as " auntie " " colleague " " friend ";Orientation and time pronoun, such as " the inside " " morning " " evening " etc..It is simple by writing The product feature that computer program to candidate products feature obtain after stop words matching filtering is preliminary examination product feature set.
3. just trial product is special for the filtering of TF-IDF (Term Frequency-Inverse Document Frequency) algorithm Sign.
The computational methods of TF-IDF algorithms are as follows:
TF-IDF=TFi,j×IDFi (7)
(5) in formula, ni,jIt is that some product feature word is commenting on djThe number of middle appearance, and ∑knk,jIt is to occur in the comment Word quantity summation.(6) in formula, | D | indicate the total number of comment text, | t | j:ti∈dj| it indicates to include product feature Word tiComment item number.
By crossing over many times confirmatory experiment, the TF-IDF values of most of non-product Feature Words are found 0.005 or more, Therefore filtering threshold is set to 0.005, and final product feature set is obtained after filtering.
3) extraction of product feature word pair:Hu et al. assumes that feature can occur with emotion word in commenting on sentence together, base In this it is assumed that after the product feature in being commented on, the character string of certain length before and after product feature is chosen, extraction feature is attached Emotion word of the close emotion word chunking as this feature, and product feature-emotion pair is formed with this feature, shaped like (feature, emotion Word).It uses herein apart from windowhood method, it is 6 to give window size, that is, is found out before and after product feature within the scope of 6 character strings Emotion word, with reference to Raymond Y.K.Lau et al. propose degree of membership algorithm test and assess emotion word, to extract product feature- Emotion pair.Algorithm core formula is as follows:
Wherein Pr (f) Pr (m) indicate that probability in the window occur in feature and viewpoint word respectively,Point Not Biao Shi feature and viewpoint word be not present in the probability in window, Pr (f, m) indicates feature and the probability that viewpoint word occurs simultaneously,Indicate that the probability that feature and viewpoint word do not occur, ω indicate to adjust the weight of positive and negative degree of membership.And Ensure to be subordinate to angle value between [0,1], degree of membership progress standardization processing is obtained:
4) cluster of product features word:Since product feature fine granularity is excessive, need to cluster all product features, Traditional K-Means clustering algorithms are simple and are easily achieved, and good Clustering Effect is obtained in many application scenarios, but from K- It is found during Means algorithms, the number K of the cluster centre in K-Means algorithms needs to specify in advance, for product feature Cluster, due to merchandise classification difference choose K values certainly be change, significant limitation is had based on this K-Means algorithm. Therefore, it is clustered herein using improved K-Means++ algorithms, initialization procedure of the K-Means++ algorithms in cluster centre In basic principle be so that the mutual distance between initial cluster centre as far as possible, can avoid the occurrence of above-mentioned in this way Problem.Improved K-Means++ product features term clustering algorithm description:
Input:Product feature set { F1,F2,…,Fn, similar matrix, that is, distance matrix of product feature word Wherein Di,j=WSim (Fi,Fj) and the dimension term vector of product feature 100
Output:Product feature cluster result.
Step1:A Feature Words F is randomly selected from product feature setiAs initial cluster center C1
Step2:Each product feature word and F are calculated firstiDistance, that is, Di,j;Then calculating Feature Words are chosen as next The probability of a cluster centreFinally, K cluster centre is determined according to wheel disc method;
Step3:For each Feature Words F in product feature setk, it is calculated to the distance at K center and is assigned to In cluster corresponding to the minimum cluster centre of distance;
Step4:Each feature word class Ci, recalculate its cluster centre(the matter of i.e. each cluster The heart);
Step5:The 3rd step and the 4th step are repeated until the position of cluster centre no longer changes.
In conjunction with Zhong Guan-cun to the comment feature of mobile phone parametric classification and comment information, determine mobile phone evaluation object 6 Product attribute feature class is:Screen, hardware, network, camera shooting, appearance, function and service, last row spy are every one kind in weight The weight summation of product feature in cluster.The results are shown in Table 2:
2 product feature cluster result of table
5) comment text emotion quantifies score:Emotion qualifier coefficient setting method is as follows, by Hownet (How Net) The degree adverb filtered out in 219 degree adverbs and comment collection is bonded degree adverb collection and is divided into 5 grades, degree system Number set gradually for:0.6,0.8,1.2,1.4,1.6, if being free of degree adverb in comment, then it is 1 to enable degree coefficient, negative word Degree coefficient is uniformly set as -1.Sentiment dictionary selects《How Net》、《NTUSD》With《Chinese emotion vocabulary ontology library》, such as table Shown in 3:
3 sentiment dictionary of table
Sentiment dictionary Front vocabulary Neutral vocabulary Negative vocabulary Total vocabulary
HowNet 4566 / 4370 8851
Chinese emotion dictionary 11229 5375 10783 27466
NTUSD 2846 / 8325 10027
Pass through the quantization of said extracted and emotion dictionary, so that it may to calculate the comment text emotion score per money mobile phone, first Product feature is clustered into six dimensions, will be seen that the behavior pattern of mobile phone in all fields by drawing radar map, and According to the size of hexagon can to go out which kind of handset capability very much preferable for intuitive judgment, then by being obtained to above-mentioned six dimensions The weighted average divided, the as final comment text emotion score of mobile phone.
Step 4:A commodity can be comprehensively evaluated by merging three dimensions, auxiliary consumer chooses decision.Score (PCredit),Score(PSales),Score(PReviews) there are the differences of dimension for the scores of three dimensions, before being merged Data normalization processing is carried out firstly the need of to score, uses fairly simple Min-Max standardizations here, processing formula is such as Under:
Commodity comprehensive grade model is as follows:
FinalScore (P)=α NScore (PCredit)+βNScore(PSales)+γNScore(PRevie)
In formula, alpha+beta+γ=1 indicates the weight of each dimension respectively, and α=0.23 is repeatedly tested by testing, β=0.14, γ=0.63.
It is right in order to verify the realization effect that product feature proposed by the present invention clusters emotion score and various dimensions blending algorithm Than certain literature review text emotion score calculate (Old-Score), cluster comment text emotion score calculate (New-Score), Cluster comment text emotion score combines (New-Sales-Score), cluster comment text emotion score and quotient with sales volume index Spread the mobile phone synthesis of prestige combination (New-Credit-Score), various dimensions fusion score (Mix-Score) and handmarking It scores (Lable-Score), it is as shown in table 4 to carry out accuracy comparison:
4 Experimental comparison results of table (Accuracy)
As can be seen from the table, the electric business product comprehensive score precision highest of the invention based on multi-dimension information fusion, Be capable of high-efficient simple identifies best buy, and designed commending system is made to have performance more fast and accurately.

Claims (6)

1. a kind of electric business product comprehensive score method based on multi-dimension information fusion, it is characterized in that including the following steps:
Step S1:Various dimensions acquisition of information uses web crawlers tool, crawls retail shop's information, the commodity pin of dependent merchandise first Information and the comment text information of commodity are measured, various dimensions information is parsed, is persisted in database by programming;
Step S2:It is pre-processed using the data obtained in the step S1, program is write to structuring using JAVA language Data carry out the operations such as deduplication, data conversion and data regularization, while being segmented to comment text use of information Chinese Academy of Sciences NLPIR The processing such as tool segmented, part-of-speech tagging and deactivated stop words;
Step S3:The excavation of various dimensions information analyzes retail shop's information and commodity pin using the pretreated data of step S2 Amount information calculates separately out retail shop's credit index and Sales Volume of Commodity index, then by Data induction and main composition regression analysis The extraction of product feature word pair is carried out to comment text information, construction sentiment dictionary, product feature is clustered and is calculated The emotion score of product feature.In order to more comprehensively and objectively analyze the emotion score of commodity, degree adverb and negative are being considered After word influences the emotion score for evaluating phrase, also product feature weight is added in the calculating of emotion score;
Step S4:Commodity total score calculates, and analyzes three obtained index score using the step S3, including retail shop's prestige refers to Number, Sales Volume of Commodity index and commodity emotion score, give each score certain weight, quotient are calculated by linear weighting method The comprehensive score of product.
2. a kind of electric business product comprehensive score method based on multi-dimension information fusion as described in claim 1, characterized in that In the step S1, the acquisition of data is to utilize web crawlers tool, crawls retail shop's information, the Sales Volume of Commodity of dependent merchandise automatically The comment text information of information and commodity parses various dimensions information, is then persisted in associated databases by programming.
3. a kind of electric business product comprehensive score method based on multi-dimension information fusion as described in claim 1, characterized in that In the step S2, data prediction writes program with JAVA language and carries out deduplication, data conversion sum number to structural data According to operations such as reduction, while comment text use of information Chinese Academy of Sciences NLPIR participle tool is segmented, part-of-speech tagging and is deactivated Stop words.
4. a kind of electric business product comprehensive score method based on multi-dimension information fusion as described in claim 1, characterized in that In the step S3, the method for retail shop's credit index and Sales Volume of Commodity index is:Retail shop's credit index according to retail shop's basic score, Service score is formed in the half a year dynamic score of retail shop and commodity one month, is then contained by the business of each index of analysis Justice, induction and conclusion go out commodity reputation index;Sales Volume of Commodity index is to use PCA dimensionality reduction technologies, in conjunction with crawling sales volume influence factor Information constructs main composition, determines that sales volume returns index by the regression analysis of main composition.
5. a kind of electric business product comprehensive score method based on multi-dimension information fusion as described in claim 1, characterized in that In the step S3, comment text quantization emotion scoring method is:To comment text information carry out product feature word pair extraction, The emotion score for constructing sentiment dictionary, product feature being clustered and calculates product feature.In order to more comprehensively and objectively The emotion score for analyzing commodity, after considering that degree adverb and negative word influence the emotion score for evaluating phrase, also by product Feature weight is added in the calculating of emotion score.
6. a kind of electric business product comprehensive score method based on multi-dimension information fusion as described in claim 1, characterized in that In the step S4, commodity various dimensions information, including retail shop's credit score are merged, reflects retail shop's basic condition and prestige;Quotient The sales volume index of product can reflect commodity in pouplarity on the market;And comment on commodity text emotion score, it is consumption Person buys the gains in depth of comprehension after commodity use, and by analyzing the Sentiment orientation of these comment texts, quantization emotion is scalar value, passes through number Value judges commodity performance, gives each score certain weight, the comprehensive score of commodity is calculated by linear weighting method.
CN201810181878.8A 2018-03-06 2018-03-06 E-commerce product comprehensive scoring method based on multi-dimensional information fusion Active CN108491377B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810181878.8A CN108491377B (en) 2018-03-06 2018-03-06 E-commerce product comprehensive scoring method based on multi-dimensional information fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810181878.8A CN108491377B (en) 2018-03-06 2018-03-06 E-commerce product comprehensive scoring method based on multi-dimensional information fusion

Publications (2)

Publication Number Publication Date
CN108491377A true CN108491377A (en) 2018-09-04
CN108491377B CN108491377B (en) 2021-10-08

Family

ID=63341434

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810181878.8A Active CN108491377B (en) 2018-03-06 2018-03-06 E-commerce product comprehensive scoring method based on multi-dimensional information fusion

Country Status (1)

Country Link
CN (1) CN108491377B (en)

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109242353A (en) * 2018-10-23 2019-01-18 武汉达梦数据库有限公司 Information resource processing method and system in transaction application scene or data sharing
CN109460940A (en) * 2018-11-26 2019-03-12 北京香侬慧语科技有限责任公司 A kind of method for early warning and device based on sentiment analysis
CN109657056A (en) * 2018-11-14 2019-04-19 金色熊猫有限公司 Target sample acquisition methods, device, storage medium and electronic equipment
CN110060132A (en) * 2019-04-24 2019-07-26 吉林大学 Interpretable Method of Commodity Recommendation based on fine-grained data
CN110096618A (en) * 2019-05-10 2019-08-06 北京友普信息技术有限公司 A kind of film recommended method based on fractional dimension sentiment analysis
CN110222965A (en) * 2019-05-28 2019-09-10 东华大学 Online fabric supplier qualification scale method based on UGC information excavating
CN110457472A (en) * 2019-07-16 2019-11-15 天津大学 The emotion association analysis method for electric business product review based on SOM clustering algorithm
CN110458420A (en) * 2019-07-18 2019-11-15 平安科技(深圳)有限公司 A kind of score value appraisal procedure, device and storage medium
CN110599306A (en) * 2019-09-16 2019-12-20 腾讯科技(深圳)有限公司 Commodity recommendation method, transaction record storage method and device and computer equipment
CN110827118A (en) * 2019-10-18 2020-02-21 天津大学 Method for automatically analyzing user comments in application store and recommending user comments to developer
CN110968670A (en) * 2019-12-02 2020-04-07 名创优品(横琴)企业管理有限公司 Method, device, equipment and storage medium for acquiring attributes of popular commodities
CN111612339A (en) * 2020-05-21 2020-09-01 中国标准化研究院 Big data-based online commodity emotional tendency analysis method
CN111612340A (en) * 2020-05-21 2020-09-01 中国标准化研究院 Network commodity inspection sampling method based on big data
CN111897963A (en) * 2020-08-06 2020-11-06 沈鑫 Commodity classification method based on text information and machine learning
CN112015857A (en) * 2019-05-13 2020-12-01 中国移动通信集团湖北有限公司 User perception evaluation method and device, electronic equipment and computer storage medium
CN112053080A (en) * 2020-09-15 2020-12-08 上海唐硕信息科技有限公司 Brand scoring method based on user experience perception
CN112052306A (en) * 2019-06-06 2020-12-08 北京京东振世信息技术有限公司 Method and device for identifying data
CN112559743A (en) * 2020-12-09 2021-03-26 深圳市网联安瑞网络科技有限公司 Method, device, equipment and storage medium for calculating support degree of government and enterprise network
CN112597302A (en) * 2020-12-18 2021-04-02 东北林业大学 False comment detection method based on multi-dimensional comment representation
CN112651768A (en) * 2020-12-04 2021-04-13 苏州黑云智能科技有限公司 E-commerce analysis method and system based on block chain
CN112667817A (en) * 2020-12-31 2021-04-16 杭州电子科技大学 Text emotion classification integration system based on roulette attribute selection
CN112801743A (en) * 2020-12-23 2021-05-14 珠海必要工业科技股份有限公司 Commodity recommendation method and device, electronic equipment and storage medium
CN112801384A (en) * 2021-02-03 2021-05-14 湖北民族大学 Commodity quality evaluation and prediction method, system, medium and equipment
CN112818677A (en) * 2021-02-22 2021-05-18 康美健康云服务有限公司 Information evaluation method and system based on Internet
CN112905898A (en) * 2021-03-31 2021-06-04 北京达佳互联信息技术有限公司 Information recommendation method and device and electronic equipment
CN113357139A (en) * 2021-08-10 2021-09-07 焕新汽车科技(南通)有限公司 Automatic performance test system for electronic water pump of recovery engine
CN113781107A (en) * 2021-08-27 2021-12-10 湖州市吴兴区数字经济技术研究院 E-commerce promotion pricing decision-making auxiliary method and system based on big data
CN114386879A (en) * 2022-03-22 2022-04-22 南京建普信息科技有限公司 Grading and ranking method and system based on multi-product multi-dimensional performance indexes

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010020745A (en) * 2008-06-10 2010-01-28 Yahoo Japan Corp Method of outputting reputation index and reputation index output device
CN103489108A (en) * 2013-08-22 2014-01-01 浙江工商大学 Large-scale order form matching method in community commerce cloud
CN104462333A (en) * 2014-12-03 2015-03-25 上海耀肖电子商务有限公司 Shopping search recommending and alarming method and system
US20160098738A1 (en) * 2014-10-06 2016-04-07 Chunghwa Telecom Co., Ltd. Issue-manage-style internet public opinion information evaluation management system and method thereof
CN106447388A (en) * 2016-08-31 2017-02-22 广东华邦云计算股份有限公司 Method and system for recommending dishes
CN107146122A (en) * 2016-03-01 2017-09-08 阿里巴巴集团控股有限公司 Data processing method and device
CN107369075A (en) * 2017-07-26 2017-11-21 万帮充电设备有限公司 Methods of exhibiting, device and the electronic equipment of commodity

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010020745A (en) * 2008-06-10 2010-01-28 Yahoo Japan Corp Method of outputting reputation index and reputation index output device
CN103489108A (en) * 2013-08-22 2014-01-01 浙江工商大学 Large-scale order form matching method in community commerce cloud
US20160098738A1 (en) * 2014-10-06 2016-04-07 Chunghwa Telecom Co., Ltd. Issue-manage-style internet public opinion information evaluation management system and method thereof
CN104462333A (en) * 2014-12-03 2015-03-25 上海耀肖电子商务有限公司 Shopping search recommending and alarming method and system
CN107146122A (en) * 2016-03-01 2017-09-08 阿里巴巴集团控股有限公司 Data processing method and device
CN106447388A (en) * 2016-08-31 2017-02-22 广东华邦云计算股份有限公司 Method and system for recommending dishes
CN107369075A (en) * 2017-07-26 2017-11-21 万帮充电设备有限公司 Methods of exhibiting, device and the electronic equipment of commodity

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
应可福 等: "基于PCA和GRA的虚拟企业信用评价模型", 《统计与决策》 *
扈中凯等: "基于用户评论挖掘的产品推荐算法", 《浙江大学学报(工学版)》 *
李松 等: "网络购物的信誉和销售量关系研究——基于淘宝网的实证分析", 《现代管理科学》 *
李永胜: "基于淘宝网的用户评价的商品推荐系统的设计与实现", 《中国优秀硕士学位论文全文数据库(硕士)-信息科学辑》 *
杨兴寿: "电子商务环境下的信用和信任机制研究", 《中国博士学位论文全文数据库-经济与管理科学辑》 *

Cited By (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109242353A (en) * 2018-10-23 2019-01-18 武汉达梦数据库有限公司 Information resource processing method and system in transaction application scene or data sharing
CN109657056A (en) * 2018-11-14 2019-04-19 金色熊猫有限公司 Target sample acquisition methods, device, storage medium and electronic equipment
CN109460940A (en) * 2018-11-26 2019-03-12 北京香侬慧语科技有限责任公司 A kind of method for early warning and device based on sentiment analysis
CN110060132A (en) * 2019-04-24 2019-07-26 吉林大学 Interpretable Method of Commodity Recommendation based on fine-grained data
CN110060132B (en) * 2019-04-24 2021-09-24 吉林大学 Interpretable commodity recommendation method based on fine-grained data
CN110096618A (en) * 2019-05-10 2019-08-06 北京友普信息技术有限公司 A kind of film recommended method based on fractional dimension sentiment analysis
CN110096618B (en) * 2019-05-10 2021-06-15 北京友普信息技术有限公司 Movie recommendation method based on dimension-based emotion analysis
CN112015857A (en) * 2019-05-13 2020-12-01 中国移动通信集团湖北有限公司 User perception evaluation method and device, electronic equipment and computer storage medium
CN110222965A (en) * 2019-05-28 2019-09-10 东华大学 Online fabric supplier qualification scale method based on UGC information excavating
CN112052306B (en) * 2019-06-06 2023-11-03 北京京东振世信息技术有限公司 Method and device for identifying data
CN112052306A (en) * 2019-06-06 2020-12-08 北京京东振世信息技术有限公司 Method and device for identifying data
CN110457472A (en) * 2019-07-16 2019-11-15 天津大学 The emotion association analysis method for electric business product review based on SOM clustering algorithm
CN110458420A (en) * 2019-07-18 2019-11-15 平安科技(深圳)有限公司 A kind of score value appraisal procedure, device and storage medium
CN110599306A (en) * 2019-09-16 2019-12-20 腾讯科技(深圳)有限公司 Commodity recommendation method, transaction record storage method and device and computer equipment
CN110599306B (en) * 2019-09-16 2021-10-15 腾讯科技(深圳)有限公司 Commodity recommendation method, transaction record storage method and device and computer equipment
CN110827118A (en) * 2019-10-18 2020-02-21 天津大学 Method for automatically analyzing user comments in application store and recommending user comments to developer
CN110968670A (en) * 2019-12-02 2020-04-07 名创优品(横琴)企业管理有限公司 Method, device, equipment and storage medium for acquiring attributes of popular commodities
CN111612340A (en) * 2020-05-21 2020-09-01 中国标准化研究院 Network commodity inspection sampling method based on big data
CN111612339A (en) * 2020-05-21 2020-09-01 中国标准化研究院 Big data-based online commodity emotional tendency analysis method
CN111612339B (en) * 2020-05-21 2023-08-22 中国标准化研究院 Big data-based network sales commodity emotion tendency analysis method
CN111612340B (en) * 2020-05-21 2023-10-17 中国标准化研究院 Big data-based network sales commodity inspection sampling method
CN111897963A (en) * 2020-08-06 2020-11-06 沈鑫 Commodity classification method based on text information and machine learning
CN112053080A (en) * 2020-09-15 2020-12-08 上海唐硕信息科技有限公司 Brand scoring method based on user experience perception
WO2022057097A1 (en) * 2020-09-15 2022-03-24 上海唐硕信息科技有限公司 Brand scoring method based on user experience perception
CN112651768A (en) * 2020-12-04 2021-04-13 苏州黑云智能科技有限公司 E-commerce analysis method and system based on block chain
CN112559743A (en) * 2020-12-09 2021-03-26 深圳市网联安瑞网络科技有限公司 Method, device, equipment and storage medium for calculating support degree of government and enterprise network
CN112559743B (en) * 2020-12-09 2024-02-13 深圳市网联安瑞网络科技有限公司 Method, device, equipment and storage medium for calculating government and enterprise network support
CN112597302A (en) * 2020-12-18 2021-04-02 东北林业大学 False comment detection method based on multi-dimensional comment representation
CN112597302B (en) * 2020-12-18 2022-04-29 东北林业大学 False comment detection method based on multi-dimensional comment representation
CN112801743A (en) * 2020-12-23 2021-05-14 珠海必要工业科技股份有限公司 Commodity recommendation method and device, electronic equipment and storage medium
CN112801743B (en) * 2020-12-23 2022-05-31 珠海必要工业科技股份有限公司 Commodity recommendation method and device, electronic equipment and storage medium
CN112667817A (en) * 2020-12-31 2021-04-16 杭州电子科技大学 Text emotion classification integration system based on roulette attribute selection
CN112667817B (en) * 2020-12-31 2022-05-31 杭州电子科技大学 Text emotion classification integration system based on roulette attribute selection
CN112801384A (en) * 2021-02-03 2021-05-14 湖北民族大学 Commodity quality evaluation and prediction method, system, medium and equipment
CN112818677A (en) * 2021-02-22 2021-05-18 康美健康云服务有限公司 Information evaluation method and system based on Internet
CN112905898A (en) * 2021-03-31 2021-06-04 北京达佳互联信息技术有限公司 Information recommendation method and device and electronic equipment
CN112905898B (en) * 2021-03-31 2024-03-15 北京达佳互联信息技术有限公司 Information recommendation method and device and electronic equipment
CN113357139B (en) * 2021-08-10 2021-10-29 焕新汽车科技(南通)有限公司 Automatic performance test system for electronic water pump of recovery engine
CN113357139A (en) * 2021-08-10 2021-09-07 焕新汽车科技(南通)有限公司 Automatic performance test system for electronic water pump of recovery engine
CN113781107A (en) * 2021-08-27 2021-12-10 湖州市吴兴区数字经济技术研究院 E-commerce promotion pricing decision-making auxiliary method and system based on big data
CN114386879A (en) * 2022-03-22 2022-04-22 南京建普信息科技有限公司 Grading and ranking method and system based on multi-product multi-dimensional performance indexes
CN114386879B (en) * 2022-03-22 2022-07-22 南京建普信息科技有限公司 Grading and ranking method and system based on multi-product multi-dimensional performance indexes

Also Published As

Publication number Publication date
CN108491377B (en) 2021-10-08

Similar Documents

Publication Publication Date Title
CN108491377A (en) A kind of electric business product comprehensive score method based on multi-dimension information fusion
Majumder et al. Perceived usefulness of online customer reviews: A review mining approach using machine learning & exploratory data analysis
Kim et al. When Bitcoin encounters information in an online forum: Using text mining to analyse user opinions and predict value fluctuation
Wang et al. Multiple affective attribute classification of online customer product reviews: A heuristic deep learning method for supporting Kansei engineering
Singla et al. Statistical and sentiment analysis of consumer product reviews
US10642975B2 (en) System and methods for automatically detecting deceptive content
Sharma et al. Comparative Analysis of Online Fashion Retailers Using Customer Sentiment Analysis on Twitter
US9116985B2 (en) Computer-implemented systems and methods for taxonomy development
CN108388660B (en) Improved E-commerce product pain point analysis method
Wu et al. Operationalizing regulatory focus in the digital age: Evidence from an e-commerce context
CN108038725A (en) A kind of electric business Customer Satisfaction for Product analysis method based on machine learning
CN108319734A (en) A kind of product feature structure tree method for auto constructing based on linear combiner
von Hoffen et al. Leveraging social media to gain insights into service delivery: a study on Airbnb
CN108874768A (en) A kind of e-commerce falseness comment recognition methods based on theme emotion joint probability
Zhang et al. Combining sentiment analysis with a fuzzy kano model for product aspect preference recommendation
Nan et al. DO ONLY REVIEW CHARACTERISTICS AFFECT CONSUMERS'ONLINE BEHAVIORS? A STUDY OF RELATIONSHIP BETWEEN REVIEWS.
Lin A TEXT MINING APPROACH TO CAPTURE USER EXPERIENCE FOR NEW PRODUCT DEVELOPMENT.
KR20220000485A (en) User inference and emotion analysis system and method using the review data of online shopping mall
Moazzam et al. Customer Opinion Mining by Comments Classification using Machine Learning
Cao et al. Big data in marketing & retailing
Niranjani et al. Spam detection for social media networks using machine learning
CN115659961B (en) Method, apparatus and computer storage medium for extracting text views
Panchendrarajan et al. Eatery: a multi-aspect restaurant rating system
Ko et al. Semantic properties of customer sentiment in tweets
Walha et al. ETL design toward social network opinion analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant