CN109800307A

CN109800307A - Analysis method, device, computer equipment and the storage medium of product evaluation

Info

Publication number: CN109800307A
Application number: CN201910046997.7A
Authority: CN
Inventors: 胡芹
Original assignee: OneConnect Smart Technology Co Ltd
Current assignee: OneConnect Smart Technology Co Ltd
Priority date: 2019-01-18
Filing date: 2019-01-18
Publication date: 2019-05-24
Anticipated expiration: 2039-01-18
Also published as: CN109800307B

Abstract

The invention discloses a kind of analysis methods of product evaluation, device, computer equipment and storage medium, the described method includes: obtaining the data to be analyzed of application product, word segmentation processing is carried out to data to be analyzed, obtain target participle, and construct the target term vector of target participle, and target is segmented and carries out part-of-speech tagging, obtain the part of speech of target term vector, substatement vector sum sentence vector is constructed using target term vector, in the corresponding clustering cluster of every kind of part of speech in preset semantic analysis model, calculate the weight coefficient of each target term vector, according to the weight coefficient of target term vector, calculate the score of each substatement vector and the appraisal result of sentence vector, user preferences degree corresponding with appraisal result is inquired from score mapping table, so that it is determined that fancy grade of the user to application product.The embodiment of the present invention can rapidly and accurately obtain the degree of recognition of the comment to application product of user, improve the analysis accuracy rate of the comment to application product.

Description

Analysis method, device, computer equipment and the storage medium of product evaluation

Technical field

The present invention relates to data analysis technique field more particularly to a kind of analysis methods of product evaluation, device, computer Equipment and storage medium.

Background technique

In the application market of each application software, user can carry out application software after downloading application software use Evaluation, the content of user's evaluation can embody the advantage and disadvantage of application software, be the direct mode for obtaining the feedback information of user, raw Business men can get the design advantage of application software from the comment of user, and need the shortcomings that improving.Currently, usually making It is analyzed with comment content of the sentiment classification model to user, still, the analysis accuracy rate of sentiment classification model is not high, and Sentiment analysis model can only judge that user is to hold front or negative attitude to application software, be unable to specific manifestation and go out user Positive and passive degree, therefore, the effect analyzed at present to the user comment in application market is poor.

Summary of the invention

Analysis method, device, computer equipment and the storage medium of a kind of product evaluation are provided in the embodiment of the present invention, with It solves the problems, such as currently low to the analysis accuracy rate of the comment of application software.

A kind of analysis method of product evaluation, comprising:

Obtain the data to be analyzed of application product；

Word segmentation processing is carried out to the data to be analyzed, obtains target participle；

The target term vector of the target participle is constructed, and the target is segmented and carries out part-of-speech tagging, obtains each institute State the part of speech of target term vector；

Putting in order in the data to be analyzed is segmented according to the target, segments corresponding institute using the target It states target term vector and constructs n substatement vector sum sentence vector, wherein n is positive integer；

According to the corresponding K clustering cluster of every kind of part of speech and each clustering cluster pair in preset semantic analysis model The weight coefficient answered calculates the phase between the target term vector and the clustering cluster according to the part of speech of the target term vector Like degree, and weight coefficient corresponding with the maximum clustering cluster of the similarity of the target term vector is obtained as the target The weight coefficient of term vector, wherein K is positive integer；

According to the weight coefficient of the target term vector, the score of each substatement vector is calculated, and according to n institute The score for stating substatement vector calculates the appraisal result of the sentence vector；

According to the appraisal result of the sentence vector, inquired from preset score mapping table corresponding with the appraisal result User preferences degree, and using the user preferences degree as the analysis result of the data to be analyzed.

A kind of analytical equipment of product evaluation, comprising:

Data acquisition module, for obtaining the data to be analyzed of application product；

Word segmentation processing module obtains target participle for carrying out word segmentation processing to the data to be analyzed；

Term vector constructs module, for constructing the target term vector of the target participle, and segments and carries out to the target Part-of-speech tagging obtains the part of speech of each target term vector；

Sentence vector constructs module, for segmenting putting in order in the data to be analyzed according to the target, makes N substatement vector sum sentence vector is constructed with the corresponding target term vector of target participle, wherein n is positive integer；

Weight Acquisition module is used for according to the corresponding K clustering cluster of every kind of part of speech in preset semantic analysis model, and The corresponding weight coefficient of each clustering cluster calculates the target term vector and institute according to the part of speech of the target term vector It states the similarity between clustering cluster, and obtains weight corresponding with the maximum clustering cluster of the similarity of the target term vector Weight coefficient of the coefficient as the target term vector, wherein K is positive integer；

Score computing module, for the weight coefficient according to the target term vector, calculates each substatement vector Score, and calculate according to the score of n substatement vectors the appraisal result of the sentence vector；

As a result output module is inquired from preset score mapping table for the appraisal result according to the sentence vector User preferences degree corresponding with the appraisal result, and using the user preferences degree as the analysis of the data to be analyzed As a result.

A kind of computer equipment, including memory, processor and storage are in the memory and can be in the processing The computer program run on device, the processor realize the analysis side of the said goods evaluation when executing the computer program Method.

A kind of computer readable storage medium, the computer-readable recording medium storage have computer program, the meter Calculation machine program realizes the analysis method of the said goods evaluation when being executed by processor.

The said goods evaluation analysis method, device, computer equipment and storage medium, by obtain application product to Data are analyzed, word segmentation processing is carried out to data to be analyzed, obtain target participle, and construct the target term vector of target participle, with And target is segmented and carries out part-of-speech tagging, the part of speech of each target term vector is obtained, constructs number to be analyzed using target term vector According to substatement vector sum sentence vector, in the clustering cluster corresponding to each part of speech in preset semantic analysis model, meter The weight coefficient of each target term vector is calculated, finally, calculating each substatement vector according to the weight coefficient of target term vector The appraisal result of score and sentence vector inquires user preferences degree corresponding with appraisal result from score mapping table, thus Determine user to the fancy grade of application product.It is calculated by using trained semantic analysis model every in data to be analyzed A target segments corresponding weight coefficient, and segments the scoring knot that corresponding weight coefficient finds out data to be analyzed according to target Fruit can quickly and accurately obtain the degree of recognition of the comment to application product of user, embody the passiveness of comment data Degree or aggressiveness level improve the analysis efficiency and analysis accuracy rate of the comment to application product.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by institute in the description to the embodiment of the present invention Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings Obtain other attached drawings.

Fig. 1 is an application environment schematic diagram of the analysis method of product evaluation in one embodiment of the invention；

Fig. 2 is a flow chart of the analysis method of product evaluation in one embodiment of the invention；

Fig. 3 be score in the analysis method of product evaluation data to be analyzed in one embodiment of the invention it is one specific Flow chart；

Fig. 4 is the specific stream for storing data into database in one embodiment of the invention in the analysis method of product evaluation Cheng Tu；

Fig. 5 is the detailed process for constructing semantic analysis model in one embodiment of the invention in the analysis method of product evaluation Figure；

Fig. 6 is the detailed process clustered in the analysis method of product evaluation to term vector in one embodiment of the invention Figure；

Fig. 7 is a functional block diagram of the analytical equipment of product evaluation in one embodiment of the invention；

Fig. 8 is a schematic diagram of computer equipment in one embodiment of the invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.

The analysis method of product evaluation provided by the embodiments of the present application can be applicable in the application environment such as Fig. 1, the application Environment includes server-side and client, wherein is attached between server-side and client by network, user passes through client The evaluation to application product is inputted, the evaluation that server-side obtains user is used as band analysis data, and uses preset semantic analysis Model scores to data to be analyzed, obtains user to the user preferences degree of application product.Client specifically can with but not It is limited to be various personal computers, laptop, smart phone, tablet computer and portable wearable device, server-side tool Body can be realized with the server cluster that independent server or multiple servers form.The embodiment of the present invention provides product and comments The method of the analysis of valence is applied to server-side.

In one embodiment, Fig. 2 shows a flow chart of the analysis method of product evaluation in the present embodiment, this method applications Server-side in Fig. 1 is analyzed for the data to be analyzed to application product, is obtained user and is liked to the user of application product Good degree.As shown in Fig. 2, the analysis method of the product evaluation includes step S1 to step S7, details are as follows:

S1: the data to be analyzed of application product are obtained.

In the present embodiment, data to be analyzed are evaluation data of the user to application product, and user is in application product platform After downloading application product, for example, the application products platform such as pea pods or Android market, user can be according to the downloading of application product The application product to be commented on the case where use, the evaluation data that service provider need to obtain the application product are analyzed, with Just further application product is improved.

Specifically, the data to be analyzed of application product, can be that user inputs after client downloads application product comments Valence mumber evidence is also possible to the evaluation data that server-side is obtained from internet, and specific data source can be according to actual needs It is configured, herein with no restrictions.

S2: carrying out word segmentation processing to data to be analyzed, obtains target participle.

Specifically, server-side be based on preset corpus, to data to be analyzed carry out word segmentation processing, by data to be analyzed into At least one target participle is obtained after row cutting, which is the data preset for storing Chinese word Library, wherein preset corpus includes the word of standard and the part of speech of word, and the content in preset corpus specifically can be from It is obtained in the Chinese vocabulary banks such as " the huge Chinese wordbank of millions " of open source, " stammerer Chinese word segmentation library " or " new edition Beijing University mark collection " It takes.

It is to be appreciated that being conducive to the discriminance analysis of machine mould after carrying out word segmentation processing to data to be analyzed, obtaining Target participle granularity it is smaller so that the accuracy of analysis of machine mould is higher.

S3: building target participle target term vector, and to target segment carry out part-of-speech tagging, obtain each target word to The part of speech of amount.

Wherein, term vector, which refers to, uses the describing mode of mathematics or the describing mode table of machine language for Chinese word Show, so as to representation language in a computer, and computer program identifying processing can be allowed.Target term vector in the present embodiment is to use The form of vector indicates target participle.

Specifically, server-side can be used term vector generation technique and target participle be converted to target term vector, and building is every A target segments corresponding term vector, for example, the target term vector that target participle " product " building obtains is < 0.3,0.5,0.1, 0.1 >, which specifically can be one-hot coding techniques, be also possible to word2vec (word to Vector, word converting vector) etc. natural language processing techniques.

According to the part of speech of word in preset corpus, each target is segmented and carries out part-of-speech tagging, so that each target Participle carries corresponding part of speech, and the target term vector generated by target participle obtains each mesh with corresponding part of speech Mark the part of speech of term vector, wherein part of speech refers to adverbial word, adjective, noun and pronoun etc., for example, the word of target participle " beauty " Property be " adjective ".

By preset corpus, the target term vector of each target participle is constructed, it can not accurate understanding by computer Text conversion easily identifies the term vector with operation at machine, and carries out part-of-speech tagging processing to target term vector, so as to It is enough to be analyzed according to semanteme of the part of speech to word, the degree of recognition to application product for including in the data to be analyzed is obtained, Accurately identify the thought expressed in data to be analyzed.

S4: putting in order in data to be analyzed is segmented according to target, segments corresponding target term vector using target N substatement vector is constructed, and sentence vector is constructed according to n substatement vector, wherein n is positive integer.

In the present embodiment, data to be analyzed refer to the evaluation data that user comments on application product, evaluation The corresponding substatement of each sentence in data is analysed to data according to the point number in the punctuation mark of data to be analyzed It is divided into n substatement, wherein point number is intended to indicate that the punctuation mark that the sentence in data to be analyzed pauses, point number include Fullstop (.), question mark (?), exclamation mark (！), comma (), branch (；) and colon (:), the specific value of n is equal to the evaluation of user The quantity of sentence in data indicates that, comprising 2 sentences in the data to be analyzed, n is also possible to for example, n specifically can be 2 3, indicate that the data to be analyzed include 3 sentences.

Specifically, it puts in order according to word of the target participle in data to be analyzed, segments corresponding mesh using target It marks term vector and constructs n substatement vector, obtained substatement vector can indicate are as follows: SubStenP={ [SubStenId] [W₁][W₂]…[W_j], wherein SubStenP is substatement vector, and SubStenId is unique identification in data to be analyzed The identifier of sentence, for distinguishing the substatement in data to be analyzed, W_jIndicate j-th target segment corresponding target word to Amount.

Put in order according to sentence of the n substatement in data to be analyzed, using the corresponding substatement of n substatement to Amount constructs the corresponding sentence vector of data to be analyzed, and data to be analyzed are indicated by way of vector, and the data to be analyzed are corresponding One sentence vector, also, the sentence vector can specifically indicate are as follows: SenP={ [SenPId] [SubStenP₁] [SubStenP₂]…[SubStenP_n], wherein SenP is sentence vector, and SenPId is the mark of unique identification data to be analyzed Symbol, for the data to be analyzed of Division identification, SubStenP_nFor the corresponding substatement of n-th of substatement in data to be analyzed to Amount.

For example, in one embodiment, one got data to be analyzed are " well, useful！", it is to be analyzed to this After data are segmented, available target participle is " good ", " fine " and " use ", according to target participle construct target word to Amount, also, target segments corresponding target term vector and is respectively as follows: " good "=(0.2,0.5,1.1,0.7,1.3), and " fine "= (1.2,2.5,1.8,0.8,0.3), " use "=(1,0.4,0.8,1.1,0.9).

According to the point number in data to be analyzed, data can be analysed to and be divided into 2 substatements, and segmented according to target Word in data to be analyzed puts in order, and segments corresponding target term vector using target and constructs substatement vector, obtains Substatement vector be respectively as follows: " good "=SubStenP₁={ [SubSten1] (0.2,0.5,1.1,0.7,1.3) }, " very well With "=SubStenP₂={ [SubSten2] (1.2,2.5,1.8,0.8,0.3) (1,0.4,0.8,1.1,0.9) }.

It is put in order according to sentence of the substatement in data to be analyzed, the language constructed using 2 substatement vectors Sentence vector are as follows: " it is good, it is useful！"=SenP={ [SenP1] [SubStenP₁][SubStenP₂]}。

S5: corresponding according to the corresponding K clustering cluster of every kind of part of speech in preset semantic analysis model and each clustering cluster Weight coefficient calculate the similarity between target term vector and clustering cluster according to the part of speech of target term vector, and obtain and mesh Mark weight coefficient of the corresponding weight coefficient of the maximum clustering cluster of similarity of term vector as target term vector, wherein K is positive Integer.

In the present embodiment, preset semantic analysis model is preparatory trained machine learning model, preset semanteme Analysis model includes that the corresponding K clustering cluster of every kind of part of speech and each clustering cluster set corresponding weight coefficient, wherein Part of speech in preset semantic analysis model includes but is not limited to adverbial word, adjective, noun and pronoun etc., and K is positive integer, and K can To be 5, it is also possible to 6, can be specifically configured according to the actual situation.

Specifically, according to the part of speech of target term vector, in the part of speech with target term vector of preset semantic analysis model In corresponding clustering cluster, the similarity between target term vector each clustering cluster corresponding with the part of speech, acquisition and target are calculated Weight coefficient of the corresponding weight coefficient of the maximum clustering cluster of the similarity of term vector as target term vector, obtains each target The weight coefficient of term vector, if similarity is bigger, then it is assumed that the space length of two objects is closer, and the semantic of word gets over phase Seemingly.

S6: according to the weight coefficient of target term vector, the score of each substatement vector is calculated, and according to n substatement The appraisal result of the score computing statement vector of vector.

Specifically, according to the weight coefficient of the obtained target term vector of step S5, to the target in each substatement vector The weight coefficient of term vector carries out quadrature calculation process, calculates the score of each substatement vector, obtains to n substatement vector Divide and carries out summation operation processing, the appraisal result of computing statement vector, so as to be divided into granularity by being analysed to data Smaller participle analyzes the semanteme of each participle, and identification obtains the feature of semanteme of each participle, then by all participles The feature of semanteme combine, obtain the appraisal result of sentence vector.

Wherein, weight coefficient includes positive number and negative, and positive number indicates commendatory term, is the expression positive state of user in user comment The word of degree, negative number representation derogatory term are the words that user's passive attitude is indicated in user comment.The appraisal result of sentence vector It indicates that the feature of semanteme of data to be analyzed, appraisal result are positive number, indicates that the data to be analyzed are positive comment, appraisal result For negative, indicate that the data to be analyzed are passive comments, the numerical value of appraisal result is bigger, indicates user to the product of application product The degree of pole attitude or passive attitude is higher.

S7: according to the appraisal result of sentence vector, use corresponding with appraisal result is inquired from preset score mapping table Family fancy grade, and using user preferences degree as the analysis result of data to be analyzed.

Specifically, preset score mapping table is the user preferences journey for storing appraisal result and user to application product The tables of data of corresponding relationship between degree can be inquired according to the appraisal result of sentence vector from preset score mapping table User is obtained to the user preferences degree of application product, and using user preferences degree as the analysis of data to be analyzed as a result, should The mapping relations of preset score mapping table storage can be specifically configured according to the needs of actual conditions, not limited herein System.

For example, fraction representation user of the appraisal result between (- 20, -10) is to application in preset score mapping table Product is extremely dissatisfied, is badly in need of the defect of correction application product；Fraction representation user of the appraisal result between (- 10,0) is corresponding Less satisfied with product, urgent need is improved；Print of fraction representation user of the appraisal result between (0,10) to application product As good, need further perfect；Fraction representation user of the appraisal result between (10,20) feels to praise very much to application product, Promotion efficiency can be increased；Fraction representation user of the appraisal result between (20,30) enjoys a lot application product, can be with Continue to continue the outstanding design style of the application product.

In the corresponding embodiment of Fig. 2, by obtaining the data to be analyzed of application product, data to be analyzed are segmented Processing obtains target participle, and constructs the target term vector of target participle, and segments to target and carry out part-of-speech tagging, obtains The part of speech of each target term vector constructs the substatement vector sum sentence vector of data to be analyzed using target term vector, pre- If semantic analysis model in each part of speech corresponding in clustering cluster, calculate the weight coefficient of each target term vector, most Afterwards, according to the weight coefficient of target term vector, the score of each substatement vector and the appraisal result of sentence vector are calculated, from point Inquiry user preferences degree corresponding with appraisal result in number mapping table, so that it is determined that fancy grade of the user to application product. Each target in data to be analyzed is calculated by using trained semantic analysis model and segments corresponding weight coefficient, and root The appraisal result that corresponding weight coefficient finds out data to be analyzed is segmented according to target, can quickly and accurately obtain user's The degree of recognition to application product is commented on, the passive degree or aggressiveness level of comment data are embodied, is improved to application product Comment analysis efficiency and analysis accuracy rate.

In one embodiment, the present embodiment calculates the weight coefficient according to target term vector mentioned in step S6 The score of each substatement vector, and the specific reality of the appraisal result according to the score computing statement vector of n substatement vector Existing method is described in detail.

Referring to Fig. 3, Fig. 3 shows a specific flow chart of step S6, details are as follows:

S61: according to the weight coefficient of target term vector, the score of each substatement vector is calculated according to formula (1):

Wherein, D_xFor the weight coefficient of x-th of target term vector, | D_x| it is the exhausted of the weight coefficient of x-th of target term vector To value, S_x-1For D_xThe score of the prefix sentence of corresponding target term vector, S_xFor the score of substatement vector, x is just greater than 0 Integer, S₀Equal to 1.

Specifically, formula is used since the first target term vector of the substatement vector for each substatement vector (1) the sentence characteristic of entire substatement vector, target term vector D are successively calculated_xPrefix sentence refer to, in x-th of target word Word combination before vector, using the score of the last one target term vector as the score of the substatement vector.

For example, substatement " I likes China " can be segmented with cutting for " I ", " love " and " China " three targets, and target The weight coefficient for segmenting corresponding target term vector is followed successively by 2,2.5 and 1, wherein the prefix sentence of " love " is " I ", " China " Prefix sentence be " I like ".The corresponding substatement vector of the substatement is calculated according to formula (1), available: And obtaining third target term vector It is allocated as the score of the substatement vector, i.e. score of the corresponding score 25 of acquisition S3 as the substatement vector, therefore, sub- language The corresponding substatement vector of sentence " I likes China " is scored at 25.

S62: according to the scoring of n substatement vector, according to the appraisal result of formula (2) computing statement vector:

Wherein, S is the appraisal result of data to be analyzed, and n is the number of substatement vector, S_iFor i-th of substatement vector Score, i is positive integer, i ∈ [1, n].

Specifically, the scoring of n substatement vector is added using formula (2), expressed by comprehensive each substatement vector The feature of semanteme, obtain the appraisal result of sentence vector, which indicates the feature of semanteme of data to be analyzed, appraisal result For positive number, indicate that the data to be analyzed are positive comments, appraisal result is negative, indicates that the data to be analyzed are passive comment By the numerical value of appraisal result is bigger, indicates that user is higher to the positive attitude of application product or the degree of passive attitude.

In the corresponding embodiment of Fig. 3, the score of each substatement vector is calculated by using formula (1), and according to institute The score of some substatement vectors enables server-side according to each using the appraisal result of formula (2) computing statement vector The semanteme of participle, analysis obtains the feature of semanteme of each substatement, then the feature of semanteme of all substatements is combined, thus The feature of semanteme of data to be analyzed is obtained, the accuracy of analysis of data to be analyzed is improved.

In one embodiment, as shown in figure 4, the mentioned appraisal result according to sentence vector in the step s 7, from pre- If score mapping table in corresponding with the appraisal result user preferences degree of inquiry, and using user preferences degree as number to be analyzed According to analysis result after, the data to be analyzed for obtaining appraisal result can also be saved in number by the analysis method of the product evaluation According in library, details are as follows:

S7: if the appraisal result of data to be analyzed is greater than preset first threshold, which is saved in pre- If first database in.

In the present embodiment, preset first threshold is the score value preset for obtaining positive evaluation data, if The appraisal result of data to be analyzed is greater than preset first threshold, then the data to be analyzed belong to positive evaluation data, for example, First threshold can be preset as 10 points, can also be preset as 15 points, and the data that can specifically obtain according to actual needs are set It sets, herein with no restrictions.

The data to be analyzed that appraisal result is greater than preset first threshold are saved in preset first database by server-side In, which is the database for storing the positive comment sentence of user, so that staff can be with Evaluation data in first database are further analyzed.

Optionally, staff can assess application product according to the evaluation data in first database, obtain The advantages of application product, so that it is determined that the scheme of later product design and iteration, continues the outstanding core concept of application product, and And can also estimate to obtain the consumption hobby of user, it is applied in application product leaved for development, formulates relevant operation activity, mention The usage amount of high application product.

S8: if the appraisal result of data to be analyzed is less than preset second threshold, which is saved in pre- If the second database in, wherein second threshold be less than first threshold.

In the present embodiment, preset second threshold is the score value for obtaining passive evaluation data, if number to be analyzed According to appraisal result be less than preset second threshold, then the data to be analyzed belong to passive evaluation data, for example, second threshold It can be preset as 0 point, -5 points can also be preset as, the data that can specifically obtain according to actual needs are configured, herein not It is limited.

The data to be analyzed that appraisal result is less than preset second threshold are saved in preset second database by server-side In, which is the database for storing the passive comment sentence of user, so that staff can be with Evaluation data in second database are further analyzed.

Optionally, staff can analyze application product according to the evaluation data in first database, retrospect The shortcomings that source to go wrong, acquisition application product, application product is improved to formulate reasonable solution, is mentioned The consumption figure of high application product.

It is understood that for appraisal result be less than preset first threshold and be greater than preset second threshold to Data are analyzed, which can be able to maintain neutral attitude to application product, it is also possible to be filled in refresh download The comment content unrelated with the application product, this partial data is screened out, and can be improved the working efficiency of staff.

In the corresponding embodiment of Fig. 4, the data to be analyzed by the way that appraisal result to be greater than to preset first threshold are saved Into preset first database, and by appraisal result be less than preset second threshold data to be analyzed be saved in it is preset In second database, so that staff can quickly and easily obtain the positive or passive evaluation data of application product It is analyzed, and can application product be improved or be improved based on the analysis results, improve the usage amount of application product.

In one embodiment, as shown in figure 5, in step sl the data to be analyzed of mentioned acquisition application product it Before, the analysis method of the product evaluation further includes the building of preset semantic analysis model, constructs preset semantic analysis model Concrete methods of realizing details are as follows:

S101: the history evaluation data of application product are obtained.

In the present embodiment, application product refer to application product platform provide application software product, for example, pea pods or The application products platforms such as person's Android market, history evaluation data are evaluation data of the user to application product, and user produces in application After product platform downloads application product, can according to the downloading of application product and using the case where the application product is commented on.

Specifically, the data-interface that the platform that application software product specifically can be used in history evaluation data provides is adopted Collection can also grab the evaluation data of application product, specific acquisition modes by network protocol in the platform of application software product It can be configured according to the actual situation, herein without limitation.

Preferably, the history evaluation data of application product can be used reptile instrument and be crawled, for example, octopus crawler Tool, Boston ivy reptile instrument or collection search objective reptile instrument etc., by URL (the Uniform Resource of application product platform Locator, uniform resource locator) it is set as the acquisition address of history evaluation data, which uniquely marked in internet The network address of knowledge, reptile instrument determine the webpage for crawling data according to the address URL, crawl comment corresponding with preset keyword By data, which is keyword relevant to history evaluation data, such as " comment " or " evaluation ", crawler work Keyword is spliced to chaining for the address URL by tool, is searched for and is downloaded history evaluation data corresponding with preset keyword.

S102: word segmentation processing is carried out to history evaluation data, obtains the corresponding participle of history evaluation data.

Specifically, server-side can segment tool or segmentation methods by third party, divide history evaluation data Word processing, obtains the corresponding participle of history evaluation data.

Wherein, common third party participle tool includes but is not limited to: Stanford NLP segmenter, ICTClAS participle System, ansj segment tool and HanLP Chinese word segmentation tool etc..

For example, carrying out word segmentation processing to history evaluation data " function of this product is very powerful ", can be with cutting " this ", " money ", " product ", " ", " function ", the participle such as " very " and " powerful ".

S103: being based on preset corpus, constructs the term vector of participle, and carry out part-of-speech tagging to participle, obtains each The part of speech of term vector.

Specifically, preset corpus is the database preset for storing Chinese word, wherein preset corpus Library includes the word of standard and the part of speech of word, the content in preset corpus specifically can from open source " millions are huge Chinese wordbank ", obtain in the Chinese vocabulary banks such as " stammerer Chinese word segmentation library " or " new edition Beijing University mark collection ".

Based on preset corpus, participle is converted to term vector using term vector generation technique by server-side, and building is each Corresponding term vector is segmented, and according to the part of speech of word in preset corpus, part-of-speech tagging is carried out to each participle, so that often A participle carries corresponding part of speech, and the target term vector generated by participle is with corresponding part of speech, obtain each word to The part of speech of amount, wherein part of speech includes but is not limited to adverbial word, adjective, noun and pronoun etc., for example, the part of speech of participle " beauty " For " adjective ".

S104: according to the part of speech of term vector, term vector is clustered using preset clustering algorithm, obtains every kind of part of speech Corresponding K clustering cluster.

Specifically, by way of cluster, clustering is carried out to the identical term vector of part of speech, it is corresponding to obtain every kind of part of speech K clustering cluster, wherein clustering is the statistical analysis method based on similitude, the data in the same clustering cluster Between similitude it is higher than the similitude between the data in different clustering clusters, preset clustering algorithm specifically can be K- Means algorithm is also possible to Clara algorithm, can specifically be configured according to the needs of practical application, herein with no restrictions.

Preferably, the present embodiment carries out clustering to term vector using K-Means algorithm, is used as similitude using distance Evaluation index, that is, think that the distance between two term vectors are closer, similarity is bigger.Each clustering cluster is leaned on by distance Close term vector composition, therefore, will obtain compact and independent K clustering cluster as the cluster result of term vector.

S105: the cluster central point in each clustering cluster is obtained, and distributes corresponding weight coefficient for each clustering cluster, is obtained Preset semantic analysis model after cluster.

Specifically, in clustering cluster, if all term vectors to some term vector distance and it is minimum, which is Cluster central point in the clustering cluster obtains the corresponding participle of cluster central point and is used as keyword, which represents the language of clustering cluster Justice, and be that the participle distributes corresponding weight coefficient according to the part of speech feature of the participle, that is, corresponding for the setting of each clustering cluster Weight coefficient, obtain the weight sets H=[H of each part of speech₁, H₂..., H_k]=[H₁₁, H₂₂..., H_kk], wherein part of speech H includes K clustering cluster, H_kIndicate the cluster central point of k-th clustering cluster, H_kkIndicate the corresponding weight coefficient of k-th clustering cluster, thus To the semantic analysis model of the K clustering cluster comprising each part of speech and the corresponding weight coefficient of each clustering cluster.

For example, being in adverbial word, according to the weight coefficient of the semantic setting of each cluster central point in part of speech are as follows: [pole, very, Very, very, compare, a bit, general]=[2,1.6,1.5,1.3,1.1,1], it is in adjective, according in each cluster in part of speech The weight coefficient of the semantic setting of heart point are as follows: [it is excellent, it is good, it is good, in, it is poor, bad]=[2,1.5,1.2,1, -1.5, -2], in part of speech For in pronoun, the weight coefficient of each cluster central point can be disposed as 1.

It should be noted that weight coefficient includes positive number and negative, it is the expression in user comment that positive number, which indicates commendatory term, The word of user's positive attitude, negative number representation derogatory term are the words of expression user's passive attitude in user comment.

In the corresponding embodiment of Fig. 5, by obtaining the historical review data of application product, historical review data are carried out Word segmentation processing obtains the corresponding participle of historical review data, and is based on preset corpus, constructs the term vector of participle, and Part-of-speech tagging is carried out to participle, the part of speech of each term vector is obtained, according to the part of speech of term vector, using preset clustering algorithm pair Term vector is clustered, and the corresponding clustering cluster of every kind of part of speech is obtained, then obtains the cluster central point in each clustering cluster, is each poly- Class cluster distributes corresponding weight coefficient, obtains semantic analysis model, so as to it is subsequent using semantic analysis model to application product Comment carries out intellectual analysis, can quickly and accurately obtain the degree of recognition of the comment to application product of user, embody The passive degree or aggressiveness level of data are evaluated, the analysis efficiency and analysis accuracy rate of the comment to application product are improved.

In one embodiment, for the present embodiment to the part of speech according to term vector mentioned in step S104, use is preset Clustering algorithm clusters term vector, and the concrete methods of realizing for obtaining the corresponding K clustering cluster of every kind of part of speech carries out specifically It is bright.

Referring to Fig. 6, Fig. 6 shows a specific flow chart of step S104, details are as follows:

S1041: sorting out term vector according to the part of speech of term vector, will store comprising the term vector of identical part of speech to same One classification.

Specifically, according to the part of speech of term vector, the corresponding term vector of the participle of history evaluation data is carried out according to part of speech Term vector comprising identical part of speech is classified as a kind of storage to the same classification, obtains describing word set, name word set, generation by classification The classifications such as word set and adverbial word collection improve the efficiency of cluster so that the word to different parts of speech distinguishes.

S1042: for the term vector of each classification, using K-means clustering algorithm according to the every two word in the category Euclidean distance between vector carries out clustering processing, obtains the corresponding K clustering cluster of each part of speech.

Specifically, for the term vector of each classification, K term vector is randomly choosed as initial cluster center, passes through public affairs Formula (3) calculates the Euclidean distance of unchecked other term vectors and initial cluster center:

D=sqrt (∑ (p-q) ^2) formula (3)

Wherein, p is initial cluster center, and q is term vector, p=(X₁, X₂..., X_n), q=(Y₁, Y₂..., Y_n), sqrt It is the function of sqrt, D is the Euclidean distance between p and q.

In each unchecked other term vectors, according to the term vector and all initial cluster centers it is European away from From the term vector is divided into and in the smallest initial cluster center of the Euclidean distance of the term vector, K interim clusters are formed Cluster.

Calculate the mean value of each interim clustering cluster.According to the mean value of interim clustering cluster, each word in interim clustering cluster is calculated Euclidean distance between the vector sum mean value chooses the corresponding term vector of minimum Eustachian distance as the new of the interim clustering cluster Cluster centre, and by term vector be divided into in the smallest new cluster centre of the Euclidean distance of the term vector, after obtaining update K interim clustering clusters.

The standard deviation of each updated interim clustering cluster is calculated according to formula (4):

Wherein, σ is standard deviation, A_tFor t-th of term vector in updated clustering cluster, n is word in updated clustering cluster The quantity of vector, μ are term vector A_tThe mean value of the updated interim clustering cluster at place, t ∈ [1, m], and t and m are positive whole Number.

If at least there is a standard deviation in the standard deviation of K updated interim clustering clusters more than or equal to preset Standard deviation threshold method then returns to the step of calculating each temporarily mean value of clustering cluster, continues to update K interim clustering clusters, Zhi Daogeng Until the standard deviation of K interim clustering clusters after new is respectively less than preset standard deviation threshold method.

If the standard deviation of K updated interim clustering clusters is respectively less than preset standard deviation threshold method, after this K is updated Interim clustering cluster cluster centre as cluster analysis result, obtain the corresponding K clustering cluster of each part of speech.

In the corresponding embodiment of Fig. 6, stored by the term vector that will include identical part of speech to the same classification, and be directed to The term vector of each classification, using K-means clustering algorithm according to the Euclidean distance between the every two term vector in the category Clustering processing is carried out, the corresponding multiple clustering clusters of each part of speech are obtained, by the participle of semanteme identical in the evaluation to application product Clustering processing is done, enabling machine to learn the feature of semanteme to such participle can in the analytic process to evaluation data According to the knowledge learnt, identification obtains the semanteme of each participle in evaluation data, improves semantics recognition model to evaluation The recognition accuracy of data.

It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.

In one embodiment, a kind of analytical equipment of product evaluation, the analytical equipment of the product evaluation and above-mentioned reality are provided The analysis method for applying product evaluation in example corresponds.As shown in fig. 7, the analytical equipment of the product evaluation includes: data acquisition Module 71, word segmentation processing module 72, term vector building module 73, sentence vector building module 74, Weight Acquisition module 75, scoring Computing module 76 and result output module 77.

Detailed description are as follows for each functional module:

Data acquisition module 71, for obtaining the data to be analyzed of application product；

Word segmentation processing module 72 obtains target participle for carrying out word segmentation processing to data to be analyzed；

Term vector constructs module 73, for constructing the target term vector of target participle, and segments to target and carries out part of speech mark Note, obtains the part of speech of each target term vector；

Sentence vector constructs module 74, for segmenting putting in order in data to be analyzed according to target, uses target It segments corresponding target term vector and constructs n substatement vector sum sentence vector, wherein n is positive integer；

Weight Acquisition module 75, for according to the corresponding K clustering cluster of every kind of part of speech in preset semantic analysis model, with And the corresponding weight coefficient of each clustering cluster calculates between target term vector and clustering cluster according to the part of speech of target term vector Similarity, and obtain power of the weight coefficient corresponding with the maximum clustering cluster of the similarity of target term vector as target term vector Weight coefficient, wherein K is positive integer；

Scoring computing module 76 calculates the score of each substatement vector for the weight coefficient according to target term vector, And the appraisal result of the score computing statement vector according to n substatement vector；

As a result output module 77, for the appraisal result according to sentence vector, from preset score mapping table inquiry with The corresponding user preferences degree of appraisal result, and using user preferences degree as the analysis result of data to be analyzed.

Further, scoring computing module 76 includes:

First computing unit 761 calculates every sub- language according to following formula for the weight coefficient according to target term vector The score of sentence vector:

Wherein, D_xFor the weight coefficient of x-th of target term vector, | D_x| it is the exhausted of the weight coefficient of x-th of target term vector To value, S_x-1For D_xThe score of the prefix sentence of corresponding target term vector, S_xFor the score of substatement vector, x is just greater than 0 Integer, S₀Equal to 1；

Second computing unit 762, for the scoring according to n substatement vector, according to following formula computing statement vector Appraisal result:

Further, the analytical equipment of the product evaluation further include:

First memory module 78 waits for this if the appraisal result for data to be analyzed is greater than preset first threshold Analysis data are saved in preset first database；

Second memory module 78 waits for this if the appraisal result for data to be analyzed is less than preset second threshold Analysis data are saved in preset second database, wherein second threshold is less than first threshold.

Further, the analytical equipment of the product evaluation further include:

Data acquisition module 701, for obtaining the history evaluation data of application product；

It is corresponding to obtain history evaluation data for carrying out word segmentation processing to history evaluation data for data word segmentation module 702 Participle；

Part-of-speech tagging module 703 constructs the term vector of participle, and carry out word to participle for being based on preset corpus Property mark, obtain the part of speech of each term vector；

Vector clusters module 704 gathers term vector using preset clustering algorithm for the part of speech according to term vector Class obtains the corresponding K clustering cluster of every kind of part of speech；

Model construction module 705, for obtaining the cluster central point in each clustering cluster, and it is corresponding for the distribution of each clustering cluster Weight coefficient, the preset semantic analysis model after being clustered.

Further, vector clusters module 704 includes:

Taxon 7041, for being sorted out according to the part of speech of term vector to term vector, by the word comprising identical part of speech Vector is stored to the same classification；

Cluster cell 7042, for being directed to the term vector of each classification, using K-means clustering algorithm according in the category Every two term vector between Euclidean distance carry out clustering processing, obtain the corresponding K clustering cluster of each part of speech.

The specific of analytical equipment about product evaluation limits the analysis method that may refer to above for product evaluation Restriction, details are not described herein.Modules in the analytical equipment of the said goods evaluation can be fully or partially through software, hard Part and combinations thereof is realized.Above-mentioned each module can be embedded in the form of hardware or independently of in the processor in computer equipment, It can also be stored in a software form in the memory in computer equipment, execute the above modules in order to which processor calls Corresponding operation.

In one embodiment, a kind of computer equipment is provided, which can be server, internal junction Composition can be as shown in Figure 8.The computer equipment include by system bus connect processor, memory, network interface and Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The network interface of machine equipment is used to communicate with external terminal by network connection.When the computer program is executed by processor with Realize a kind of analysis method of product evaluation.

In one embodiment, a kind of computer equipment is provided, including memory, processor and storage are on a memory And the computer program that can be run on a processor, processor realize product evaluation in above-described embodiment when executing computer program Analysis method in step, such as step S1 shown in Fig. 2 to step S7, alternatively, processor execute it is real when computer program The function of each module/unit of the analytical equipment of product evaluation in existing above-described embodiment, such as module 71 shown in Fig. 7 is to module 77 Function.To avoid repeating, which is not described herein again.

In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program realizes the step in above-described embodiment in the analysis method of product evaluation when being executed by processor, such as shown in Fig. 2 Step S1 to step S7, alternatively, processor realizes the analytical equipment of product evaluation in above-described embodiment when executing computer program Each module/unit function, such as module 71 shown in Fig. 7 is to the function of module 77.To avoid repeating, which is not described herein again.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Instruct relevant hardware to complete by computer program, computer program to can be stored in a non-volatile computer readable It takes in storage medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, this Shen Please provided by any reference used in each embodiment to memory, storage, database or other media, may each comprise Non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (SyKchliKk) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of device are divided into different functional unit or module, to complete above description All or part of function.

The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although with reference to the foregoing embodiments Invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each implementation Technical solution documented by example is modified or equivalent replacement of some of the technical features；And these modification or Replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all include Within protection scope of the present invention.

Claims

1. a kind of analysis method of product evaluation, which is characterized in that the analysis method of the product evaluation includes:

Obtain the data to be analyzed of application product；

The target term vector of the target participle is constructed, and the target is segmented and carries out part-of-speech tagging, obtains each mesh Mark the part of speech of term vector；

Putting in order in the data to be analyzed is segmented according to the target, segments the corresponding mesh using the target It marks term vector and constructs n substatement vector, and construct sentence vector according to the n substatement vectors, wherein n is positive integer；

It is corresponding according to the corresponding K clustering cluster of every kind of part of speech and each clustering cluster in preset semantic analysis model Weight coefficient calculates the similarity between the target term vector and the clustering cluster according to the part of speech of the target term vector, And obtain weight coefficient corresponding with the maximum clustering cluster of the similarity of the target term vector as the target word to The weight coefficient of amount, wherein K is positive integer；

According to the weight coefficient of the target term vector, the score of each substatement vector is calculated, and according to the n sons The score of sentence vector calculates the appraisal result of the sentence vector；

According to the appraisal result of the sentence vector, use corresponding with the appraisal result is inquired from preset score mapping table Family fancy grade, and using the user preferences degree as the analysis result of the data to be analyzed.

2. the analysis method of product evaluation as described in claim 1, which is characterized in that described according to the target term vector Weight coefficient calculates the score of each substatement vector, and calculates the sentence according to the score of the substatement vector The appraisal result of vector includes:

According to the weight coefficient of the target term vector, the score of each substatement vector is calculated according to following formula:

Wherein, D_xFor the weight coefficient of x-th of target term vector, | D_x| it is the weight coefficient of x-th of target term vector Absolute value, S_x-1For D_xThe score of the prefix sentence of corresponding target term vector, S_xFor the score of the substatement vector, x is Positive integer greater than 0, S₀Equal to 1；

According to the scoring of the n substatement vectors, the appraisal result of the sentence vector is calculated according to following formula:

Wherein, S is the appraisal result of the data to be analyzed, and n is the number of the substatement vector, S_iFor i-th of sub- language The score of sentence vector, i is positive integer, i ∈ [1, n].

3. the analysis method of product evaluation as described in claim 1, which is characterized in that described according to the data to be analyzed Appraisal result, from preset score mapping table inquire user to the fancy grade of the application product, obtain the user After fancy grade is as the analysis result, the analysis of the product evaluation further include:

If the appraisal result of the data to be analyzed is greater than preset first threshold, which is saved in preset In first database；

If the appraisal result of the data to be analyzed is less than preset second threshold, which is saved in preset In second database, wherein the second threshold is less than the first threshold.

4. the analysis method of product evaluation as described in claim 1, which is characterized in that it is described obtain data to be analyzed it Before, the analysis method of the product evaluation further include:

Obtain the history evaluation data of the application product；

Word segmentation processing is carried out to the history evaluation data, obtains the corresponding participle of the history evaluation data；

Based on preset corpus, the term vector of the participle is constructed, and part-of-speech tagging is carried out to the participle, obtains each institute The part of speech of predicate vector；

According to the part of speech of the term vector, term vector is clustered using preset clustering algorithm, obtains every kind of part of speech The corresponding K clustering clusters；

The cluster central point in each clustering cluster is obtained, and distributes corresponding weight coefficient for each clustering cluster, is obtained The preset semantic analysis model after cluster.

5. the analysis method of product evaluation as claimed in claim 4, which is characterized in that the word according to the term vector Property, clusters term vector using preset clustering algorithm, obtains the corresponding K clustering cluster of every kind of part of speech and includes:

The term vector is sorted out according to the part of speech of the term vector, the term vector storage comprising identical part of speech is arrived The same classification；

For the term vector of each classification, using K-means clustering algorithm according to every two institute's predicate in the category to Euclidean distance between amount carries out clustering processing, obtains the corresponding K clustering cluster of each part of speech.

6. a kind of analytical equipment of product evaluation, which is characterized in that the analytical equipment of the product evaluation includes:

Term vector constructs module, for constructing the target term vector of the target participle, and segments to the target and carries out part of speech Mark, obtains the part of speech of each target term vector；

Sentence vector constructs module, for segmenting putting in order in the data to be analyzed according to the target, uses institute It states the corresponding target term vector of target participle and constructs n substatement vector sum sentence vector, wherein n is positive integer；

Weight Acquisition module is used for according to the corresponding K clustering cluster of every kind of part of speech in preset semantic analysis model, and each The corresponding weight coefficient of the clustering cluster calculates the target term vector and described poly- according to the part of speech of the target term vector Similarity between class cluster, and obtain weight coefficient corresponding with the maximum clustering cluster of the similarity of the target term vector Weight coefficient as the target term vector, wherein K is positive integer；

Score computing module, for the weight coefficient according to the target term vector, calculates obtaining for each substatement vector Divide, and calculates the appraisal result of the sentence vector according to the score of the n substatement vectors；

As a result output module, for the appraisal result according to the sentence vector, inquiry and institute from preset score mapping table The corresponding user preferences degree of appraisal result is stated, and using the user preferences degree as the analysis knot of the data to be analyzed Fruit.

7. the analytical equipment of product evaluation as claimed in claim 6, which is characterized in that the scoring computing module includes:

First computing unit calculates each son according to following formula for the weight coefficient according to the target term vector The score of sentence vector:

Second computing unit calculates the sentence vector according to following formula for the scoring according to the n substatement vectors Appraisal result:

8. the analytical equipment of product evaluation as claimed in claim 6, which is characterized in that the analytical equipment of the product evaluation is also Include:

Data acquisition module, for obtaining the history evaluation data of the application product；

It is corresponding to obtain the history evaluation data for carrying out word segmentation processing to the history evaluation data for data word segmentation module Participle；

Part-of-speech tagging module constructs the term vector of the participle, and carry out word to the participle for being based on preset corpus Property mark, obtain the part of speech of each term vector；

Vector clusters module clusters term vector using preset clustering algorithm for the part of speech according to the term vector, Obtain the corresponding K clustering cluster of every kind of part of speech；

Model construction module distributes phase for obtaining the cluster central point in each clustering cluster, and for each clustering cluster The weight coefficient answered, the preset semantic analysis model after being clustered.

9. a kind of computer equipment, including memory, processor and storage are in the memory and can be in the processor The computer program of upper operation, which is characterized in that the processor realized when executing the computer program as claim 1 to The analysis method of any one of 5 product evaluations.

10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In the analysis side of realization product evaluation as described in any one of claim 1 to 5 when the computer program is executed by processor Method.