CN108776940A

CN108776940A - A kind of intelligent food and drink proposed algorithm excavated based on text comments

Info

Publication number: CN108776940A
Application number: CN201810566021.8A
Authority: CN
Inventors: 郎非; 赵志斌; 苗栋晨
Original assignee: Nupt Institute Of Big Data Research At Yancheng Co Ltd
Current assignee: Nupt Institute Of Big Data Research At Yancheng Co Ltd
Priority date: 2018-06-04
Filing date: 2018-06-04
Publication date: 2018-11-09

Abstract

The present invention relates to the intelligent food and drink proposed algorithms excavated based on text comments, and the invention belongs to computerized algorithm fields.The present invention establishes user dining room scoring matrix, calculates user's marking similarity by collecting user's food and drink data；Extract feature emotion word pair；Merge homogenous characteristics, quantifies the score of each feature；User preference scoring matrix is established, the preference similarity of user is calculated；Calculate user's comprehensive similarity；Calculate users' trust value；Calculate the degree of belief between user；It is given a mark for dining room by weighting evaluation value based on degree of belief between user；Recommend top n dining room.The present invention utilizes the method that text comments excavate, a series of decimation rules are formulated, form comment abstract, emotion information of the user to characteristic attributes such as environment, the services in dining room is extracted in making a summary from comment using interdependent syntax, totally marking combines in the dining room of score and user after characteristic attribute is quantified, recommends dining room.Compared with traditional food and drink is recommended, the accuracy rate of recommendation is improved.

Description

A kind of intelligent food and drink proposed algorithm excavated based on text comments

Technical field

The present invention relates to the intelligent food and drink proposed algorithms excavated based on text comments, and the invention belongs to computerized algorithm necks Domain.

Background technology

Commending system be the information for referring to that user may be liked or material object (such as：It is film, music, books, new Hear, picture) recommend a kind of application of user.Proposed algorithm is the core of commending system, and performance determines commending system Quality.Therefore, the research center of gravity of commending system is consistently placed in proposed algorithm.Currently, commonly being pushed away in commending system It recommends algorithm mainly and has and is following several：Content-based recommendation, the recommendation based on correlation rule, is based on effectiveness at collaborative filtering Recommendation, Knowledge based engineering recommend and combined recommendation.These algorithms are played very important due to the pros and cons of itself in certain fields Effect, but food and drink recommend accuracy rate on it is to be improved.

It is a kind of suggested design popular at present based on the recommendation that comment text excavates.Text mining is data mining Application in Web texts, text message excavation, has become the main stream approach of personalized recommendation emphatically.Intelligent food and drink is pushed away It recommends for field, consumer contains the comment in dining room that consumption suggestions, impression of having dinner, service experience etc. is largely valuable disappears Charge information can accurately portray the various features in dining room.And traditional food and drink commending system does not account for these features, only provides Consumer gives a mark to the totality in dining room so that the accuracy of commending system decreases.Therefore the present invention is directed to the food and drink of user Text mining has been done in comment, and the accuracy recommended is improved using the characteristic information excavated.

Invention content

The present invention provides a kind of intelligent food and drink proposed algorithm excavated based on text comments regarding to the issue above.

The present invention adopts the following technical scheme that：

The intelligent food and drink proposed algorithm of the present invention excavated based on text comments, algorithm steps are as follows：

1) user's food and drink data, are collected, according to the user-dining room scoring matrix for collecting data information foundation such as following formula, are divided It is user U, dining room G, user marking M, user comment not define food and drink data；

In formulaIndicate user U_iTo dining room G_jMarking

And it is calculate by the following formula to obtain Euclidean distance using user-dining room scoring matrix：

D represents distance, and d () indicates an operator, for calculating two vectorial Euclidean distances；

Marking similarity is calculate by the following formula using the Euclidean distance that above formula obtains：

Wherein D in formula_i,jFor user U_iAnd U_jEuclidean distance,WithMarking for user to dining room a, A is user U_iWith user U_jExcessive dining room number is beaten jointly；For user U_iWith user U_jMarking similarity；

2), the user comment being directed in step 1) carries out participle and part of speech label；It is short to pretreated comment using LTP Sentence carries out interdependent syntactic analysis；Obtain the dependency relationship type between each ingredient of sentence；Abstract decimation rule is formulated from interdependent sentence Feature emotion word is extracted in method tree；

The form of feature emotion word pair is W=(w_c,w_e), w in formula_cIt is characterized attribute word, w_eThe qualifier of feature thus；

3), needle is using service respectively, and environment is hygienic, and vegetable is characterized；Merge homogenous characteristics by such as following formula and quantifies each spy Sign:

G_j={ w_c1:[w_e11,w_e12,…w_e1n]；w_c2:[w_e21,w_e22,…w_e2n]；…w_c4:[w_e41,w_e42,…w_e4n] in formula w_c1,w_c2,w_c3,w_c4It corresponds to service respectively, environment, health, vegetable；w_em1,w_em2…w_emnFor all qualifiers under this feature； And positive emotion dictionary, Negative Affect dictionary, negative sentiment dictionary and degree adverb dictionary are established, each feature is commented Point；

4), scoring is obtained for step 3) establish following user-preference scoring matrix：

Wherein：Indicate user U_jTo 4 feature c_mThe marking of (service, environment, health, vegetable)；

And user preference similarity is calculated by such as following formula

Wherein D in formula_i,jFor user U_iAnd U_jEuclidean distance,WithMarking for user to feature b, B are User U_iWith user U_jExcessive number of features is beaten jointly；For user U_iWith user U_jMarking similarity.

5), by the user obtained in step 1 marking similarityThe user preference similarity obtained with step 4) Weights combination is carried out by following formula：

Wherein：β is equal to 0.5；

6) it, calculates separately family liveness and user's evaluation is efficient；User activity is obtained by following formula

Wherein H_UFor the activity of the user, e is the nature truth of a matter, and n is the number of marking operation, t_iFor the duration, υ is to live Jerk；

Judge whether comment is effectively to comment on by following formula,

N_favourIt agrees with counting for comment, N_againstTo comment on antilogarithm；

It is calculate by the following formula the effective percentage of effective evaluation again：

Wherein E_UFor user comment effective percentage, N_{Effectively comment}For the quantity effectively commented on, N_CommentFor the comment sum of the user；

7) user activity and user's evaluation, obtained for step 6 is efficient, is calculate by the following formula users to trust degree：

T_U=H_U+E_U

User activity H_UIt is higher, users' trust value T_UIt is higher；User comment effective percentage E_UIt is higher, users' trust value T_UMore It is high；

8), pass through degree of belief between users to trust degree calculating user：

TD_i,j=Sim_i,j×T_j

Wherein TD_i,jIndicate user U_iTo U_jDegree of belief, Sim_i,jFor user U_iAnd U_jComprehensive similarity, T_jFor user U_j Trust value.User U_jTrust value is higher, user U_iTo U_jDegree of belief it is higher；

9) degree of belief is given a mark by following formula for dining room by weighting evaluation value between, being based on step 8 user：

Wherein：Score_i,gIndicate user U_iTo dining room G_gWeighting evaluation value, TD_i,jIndicate user U_iTo U_jDegree of belief, M_j,gFor user U_jTo dining room G_gMarking, U be user set；

10), after by giving a mark to all dining rooms of not giving a mark, by being ranked up from high to low, recommend top n dining room.

The intelligent food and drink proposed algorithm of the present invention excavated based on text comments, using to comment data in step 1 Participle and part-of-speech tagging are carried out, interdependent syntactic analysis is carried out to pretreated comment short sentence using LTP, obtains each ingredient of sentence Between dependency relationship type, formulate decimation rule and extract feature emotion word pair.

The intelligent food and drink proposed algorithm of the present invention excavated based on text comments, is commented for the sentiment dictionary of foundation Divider is then as follows：

(1), each positive emotion word assigns weight 1, and each Negative Affect word assigns weight -1, and assumes emotion Value meets linear superposition theorem；

(2) if, the qualifier under feature include corresponding word in dictionary, in addition corresponding weights；It negate language appropriate to the occasion Weights opposite sign, degree adverb enable weights double；

(3) if, total weight value be that just, emotion is that commendation if total weight value is negative, for derogatory sense, is otherwise neutrality；Feature is beaten Divide and use the five-grade marking system, commendation is 5 points, and derogatory sense is 1 point, and neutrality is 3；

It is to feature：Service, environment, health, vegetable quantify later as a result, for establishing subsequent user-preference Scoring matrix.

Advantageous effect

The present invention only gives a mark to the totality in dining room for existing food and drink commending system, have ignored the vegetable in dining room, environment, The poor problem of recommendation effect caused by the attributive character such as service quality.Propose a kind of intelligence excavated based on text comments Food and drink proposed algorithm.The present invention utilizes the method that text comments excavate, and has formulated a series of decimation rules, forms comment abstract, User is extracted in making a summary from comment using interdependent syntax to the emotion information of the characteristic attributes such as environment, the service in dining room, it will be special The dining room totally marking combination for levying the score and user after attribute quantification, recommends dining room.Recommend with traditional food and drink It compares, improves the accuracy rate of recommendation.

Specific implementation mode

It is clearer for the purpose and technical solution that make the embodiment of the present invention, below to the technical solution of the embodiment of the present invention It is clearly and completely described.Obviously, described embodiment is a part of the embodiment of the present invention, rather than whole realities Apply example.Based on described the embodiment of the present invention, those of ordinary skill in the art are in the institute of the under the premise of without creative work The every other embodiment obtained, shall fall within the protection scope of the present invention.

The present invention is based on the intelligent food and drink proposed algorithms that text comments excavate

Step 1：User's food and drink data are crawled, user-dining room scoring matrix is established, calculate user's marking similarity：It is described User's food and drink data are user U, dining room G, and user gives a mark M, and user comment, these fields can crawl from food and drink website.

The user-dining room matrix is as follows：

In formulaIndicate user U_iTo dining room G_jMarking

Wherein D in formula_i,jFor user U_iAnd U_jEuclidean distance,WithMarking for user to dining room a, A is user U_iWith user U_jExcessive dining room number is beaten jointly；For user U_iWith user U_jMarking similarity；D is The meaning of distance, d () indicate an operator, are used for calculating two vectorial Euclidean distances here.

Step 2：Participle and part of speech label are carried out to user comment, using LTP (LTP is language technology platform) to pretreatment Comment short sentence afterwards carries out interdependent syntactic analysis, obtains the dependency relationship type between each ingredient of sentence, formulates abstract and extracts rule Feature emotion word pair is then extracted from interdependent syntax tree：The form of the feature emotion word pair is W=(w_c,w_e), wherein w_cFor Characteristic attribute word, w_eThe qualifier of feature thus.

Extraction feature emotion word pair method be：Participle and part-of-speech tagging are carried out to food and drink comment data, knot can be used Bar participle etc. tools complete.Interdependent syntactic analysis is carried out to pretreated comment short sentence using LTP, obtain sentence respectively at point it Between dependency relationship type, formulate abstract decimation rule feature emotion word pair is extracted from interdependent syntax tree；It is specific to extract rule It is then as follows：

(1) part of speech of core words is adjective：

Step1：Core word is stored in emotion word list；

Step2：Traverse the word that all grammatical relations are sent out from core word；

Step3：If the grammatical relation of this word and core word is COO, COO indicates coordination, and part of speech is a (adjective), then the word is emotion word, is deposited into emotion word list；

Step4：If the grammatical relation of this word and core word is ADV, ADV indicates verbal endocentric phrase, then this word is Adverbial word is deposited into adverbial word list；

Step5：If the grammatical relation of this word and core word is SBV, SBV indicates subject-predicate relationship, and part of speech is n (noun), then the word is attributive character word, is deposited into attributive character word list；

Step6：Traverse the word that all grammatical relations are sent out from Feature Words；

Step7：If the grammatical relation of the word and Feature Words is COO, COO indicates coordination, then this word is also to belong to Property Feature Words, are deposited into attributive character word list.

(2) part of speech of core words is common saying：

Step1：The word is stored in emotion word list；

Step2：Traverse the word that all grammatical relations are sent out from the emotion word；

Step3：If the grammatical relation of the word and emotion word is SBV, SBV indicates subject-predicate relationship, and part of speech is n (names Word), then the word is attributive character word, is deposited into attributive character word list.

If the part of speech of core word is verb, clause is relatively complicated changeable, is divided into 4 kinds of situations and carries out analysis digging Pick：

(a) core word is " liking ", " love "：

Step1：Core word is stored in verb list；

Step3：If the grammatical relation of the word and core word is ADV, ADV indicates verbal endocentric phrase, then the word is adverbial word, will It is stored in adverbial word list L1；

Step4：If the grammatical relation of the word and core word is VOB, VOB indicates to move guest's relationship, and part of speech is n (names Word), attributive character word is deposited into attributive character word list.

(b) core word is "Yes", " being exactly ", " feeling ", " feeling "：

Step1：Traverse the word that all grammatical relations are sent out from core word；

Step2：If the grammatical relation of the word and core word is VOB, VOB indicates to move guest's relationship, and part of speech is that a (is described Word), then the word is emotion word, is stored in emotion word list；

Step3：Traverse the word that all grammatical relations are sent out from emotion word；

Step4：If the relationship of the word and emotion word is ADV, ADV indicates verbal endocentric phrase, then the word is adverbial word, is deposited Enter adverbial word list L2；

Step5：If the relationship of the word and emotion word is SBV, SBV indicates subject-predicate relationship, and part of speech is n (noun), then The word is attributive character word, is deposited into attributive character word list.

(c) core word is " service ", "Off", " eating ", " drinking "：

Step1：" distance " is saved in category by core word deposit attributive character word list if core word is "Off" Property feature word list；

Step3：If the relationship of the word and core word is CMP, CMP indicates structure of complementation, and part of speech is a (adjective), Then the word is emotion word, is deposited into emotion word list；

Step4：Traverse the word that all grammatical relations are sent out from emotion word；

Step5：If the relationship of the word and emotion word is ADV, ADV indicates verbal endocentric phrase, then the word is adverbial word, is deposited Enter adverbial word list L2.

(d) core word is " needing "：

Step1：Core word is stored in verb list；

Step3：If the relationship of the word and core word is SBV, SBV indicates subject-predicate relationship, then the word is attributive character word, It is deposited into attributive character word list；

Step4：If the relationship of the word and core word is ADV, ADV indicates verbal endocentric phrase, then the word is adverbial word, is deposited Enter adverbial word list L1；

Step5：If the relationship of the word and core word is VOB, VOB indicates to move guest's relationship, and part of speech is v (verb), then The word is emotion word, is deposited into emotion word list.

Step 3：Merge homogenous characteristics, quantify the score of each feature, for the feature extracted, a kind of spy will be belonged to Sign is merged together, then user's character representation of interest is：

G_j={ w_c1:[w_e11,w_e12,…w_e1n]；w_c2:[w_e21,w_e22,…w_e2n]；…w_c4:[w_e41,w_e42,…w_e4n]}

Wherein w_c1,w_c2,w_c3,w_c4For 4 features, specifically correspond to：Service, environment, health, vegetable, w_em1,w_em2…w_emn For all qualifiers under this feature.

It is right to establish sentiment dictionary (positive emotion dictionary, Negative Affect dictionary, negative sentiment dictionary and degree adverb dictionary) Each feature scores；The code of points is as follows：

(1) each positive emotion words of assign weight 1, and each Negative Affect word assigns weight -1, and assumes emotion Value meets linear superposition theorem；

(2) if the qualifier under features includes corresponding word in dictionary, corresponding weights are just added.In addition, negative Language appropriate to the occasion weights opposite sign, degree adverb enable weights double；

(3) if total weight values are just, emotion, for derogatory sense, is otherwise neutrality if total weight value is negative for commendation.Feature is beaten Divide and use the five-grade marking system, commendation is 5 points, and derogatory sense is 1 point, and neutrality is 3.

Step 4：User-preference scoring matrix is established, the preference similarity of user is calculated：User's feature of interest is beaten It is user preference marking to divide.The user-preference scoring matrix is as follows：

Wherein：Indicate user U_jTo 4 feature c_mThe marking of (service, environment, health, vegetable).

Calculate the preference similarity of user：The calculating similarity based method is calculate by the following formula to obtain user inclined with step 1 Good similarity：And user preference similarity is calculated by such as following formula

Wherein D in formula_i,jFor user U_iAnd U_jEuclidean distance,WithMarking for user to feature b, B For user U_iWith user U_jExcessive number of features is beaten jointly；For user U_iWith user U_jMarking similarity.

Step 5：Calculate user's comprehensive similarity：The method for calculating user's comprehensive similarity is similar for user gives a mark DegreeWith user preference similarityIt is combined using weights appropriate, formula is as follows：

Wherein：β is equal to 0.5.

Step 6：It is efficient according to user activity and user's evaluation, calculate users' trust value, the user activity meter Calculation method is as follows：Often operation is marked in a dining room in user, and liveness adds υ, and one is only calculated daily in same dining room It is secondary, it can add up between different dining rooms.Marking operation needs to position.Liveness linearly increases in the case where account is always maintained at active Long, after the markup operation of account is reduced, temporally length t does forthright linear reduction again.Its calculation formula is：

Wherein H_UFor the activity of the user, e is the nature truth of a matter, and n is the number of marking operation, t_iFor the duration.

The user's evaluation effective percentage computational methods are as follows：User's food and drink comment can be visible to other users.Other users It can be evaluated and judge agreeing (favour) or oppose (against).If certain comment,(N_favOur is that comment is agreed with counting, N_againstTo comment on antilogarithm), then the comment is effectively comment. User's evaluation effective percentage calculation formula is：

Wherein E_UFor user comment effective percentage, N_{Effectively comment}For the quantity effectively commented on, N_CommentFor the comment sum of the user.

It is efficient according to user activity and user's evaluation, users' trust value is calculated into the users' trust value calculates public Formula is：

T_U=H_U+E_U

User activity H_UIt is higher, users' trust value T_UIt is higher；User comment effective percentage E_UIt is higher, users' trust value T_UMore It is high.

Step 7：The degree of belief between user is calculated, the users to trust degree calculation formula is as follows：

TD_i,j=Sim_i,j×T_j

Wherein TD_i,jIndicate user U_iTo U_jDegree of belief, Sim_i,jFor user U_iAnd U_jComprehensive similarity, T_jFor user U_j Trust value.User U_jTrust value is higher, user U_iTo U_jDegree of belief it is higher.

Step 8：It is given a mark for dining room by weighting evaluation value based on degree of belief between user：The marking calculation formula is as follows：

Wherein：Score_i,gIndicate user U_iTo dining room G_gWeighting evaluation value, TD_i,jIndicate user U_iTo U_jDegree of belief, M_j,gFor user U_jTo dining room G_gMarking, U be user set.

Step 9：N family dining room before recommending, after giving a mark to all dining rooms of not giving a mark, by score according to carrying out from high to low Sequence, N family dining room before recommending.

The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, Any one skilled in the art in the technical scope disclosed by the present invention, the change or replacement that can be readily occurred in, It should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with scope of the claims Subject to.

Claims

1. a kind of intelligent food and drink proposed algorithm excavated based on text comments, it is characterised in that：Algorithm steps are as follows：

1) user's food and drink data, are collected, it is fixed respectively according to the user-dining room scoring matrix for collecting data information foundation such as following formula Adopted food and drink data be user U, dining room G, user give a mark M, user comment,

In formulaIndicate user U_iTo dining room G_jMarking；

Wherein D in formula_i,jFor user U_iAnd U_jEuclidean distance,WithMarking for user to dining room a, A are to use Family U_iWith user U_jExcessive dining room number is beaten jointly；For user U_iWith user U_jMarking similarity；

2), the user comment being directed in step 1) carries out participle and part of speech label；Using LTP to pretreated comment short sentence into The interdependent syntactic analysis of row；Obtain the dependency relationship type between each ingredient of sentence；Abstract decimation rule is formulated from interdependent syntax tree In extract feature emotion word；

3), needle is using service respectively, and environment is hygienic, and vegetable is characterized；Merge homogenous characteristics by such as following formula and quantifies each feature:

G_j={ w_c1:[w_e11,w_e12,…w_e1n]；w_c2:[w_e21,w_e22,…w_e2n]；…w_c4:[w_e41,w_e42,…w_e4n] w in formula_c1, w_c2,w_c3,w_c4It corresponds to service respectively, environment, health, vegetable；w_em1,w_em2…w_emnFor all qualifiers under this feature；And Positive emotion dictionary, Negative Affect dictionary, negative sentiment dictionary and degree adverb dictionary are established, is scored each feature；

And user preference similarity is calculated by such as following formula

5), by the user obtained in step 1 marking similarityThe user preference similarity obtained with step 4)Pass through Following formula carries out weights combination：

Wherein：β is equal to 0.5；

Wherein H_UFor the activity of the user, e is the nature truth of a matter, and n is the number of marking operation, t_iFor the duration, υ is liveness；

Judge whether comment is effectively to comment on by following formula,

T_U=H_U+E_U

User activity H_UIt is higher, users' trust value T_UIt is higher；User comment effective percentage E_UIt is higher, users' trust value T_UIt is higher；

TD_i,j=Sim_i,j×T_j

Wherein TD_i,jIndicate user U_iTo U_jDegree of belief, Sim_i,jFor user U_iAnd U_jComprehensive similarity, T_jFor user U_jLetter Appoint value.User U_jTrust value is higher, user U_iTo U_jDegree of belief it is higher；

2. the intelligent food and drink proposed algorithm according to claim 1 excavated based on text comments, it is characterised in that：Step 1 It is middle to carry out participle and part-of-speech tagging using to comment data, interdependent syntax point is carried out to pretreated comment short sentence using LTP Analysis obtains the dependency relationship type between each ingredient of sentence, formulates decimation rule to extract feature emotion word pair.

3. the intelligent food and drink proposed algorithm according to claim 1 excavated based on text comments, it is characterised in that：For building The code of points of vertical sentiment dictionary is as follows：

(1), each positive emotion word assigns weight 1, and each Negative Affect word assigns weight -1, and assumes that emotional value is full Sufficient linear superposition theorem；

(2) if, the qualifier under feature include corresponding word in dictionary, in addition corresponding weights；Negate language appropriate to the occasion weights Opposite sign, degree adverb enable weights double；

(3) if, total weight value be that just, emotion is that commendation if total weight value is negative, for derogatory sense, is otherwise neutrality；Feature marking is adopted With the five-grade marking system, commendation is 5 points, and derogatory sense is 1 point, and neutrality is 3；

It is to feature：Service, environment, health, vegetable quantify later as a result, for establishing subsequent user-preference marking Matrix.