CN105005552A - Information processing method and apparatus - Google Patents

Information processing method and apparatus Download PDF

Info

Publication number
CN105005552A
CN105005552A CN201410162861.XA CN201410162861A CN105005552A CN 105005552 A CN105005552 A CN 105005552A CN 201410162861 A CN201410162861 A CN 201410162861A CN 105005552 A CN105005552 A CN 105005552A
Authority
CN
China
Prior art keywords
clause
training sample
mark
sentiment orientation
described training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410162861.XA
Other languages
Chinese (zh)
Other versions
CN105005552B (en
Inventor
杨海军
安涛
安华明
叶强
赵月
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Navinfo Co Ltd
Original Assignee
Navinfo Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Navinfo Co Ltd filed Critical Navinfo Co Ltd
Priority to CN201410162861.XA priority Critical patent/CN105005552B/en
Publication of CN105005552A publication Critical patent/CN105005552A/en
Application granted granted Critical
Publication of CN105005552B publication Critical patent/CN105005552B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides an information processing method and apparatus and relates to the field of information processing. The method comprises the following steps of: preliminarily screening online comment data of a product according to a preset criterion; performing clause separation on online comment data, kept after the screening, by using symbols as nodes, to establish a clause set; extracting randomly a preset quantity of clauses from the clause set to establish a training sample set, labeling emotional inclination of the clauses in the training sample set and labeling all the clauses in the clause set according to labels of the training sample set; and deleting clauses labeled with a first value in the clause set and combining the remaining clauses in the clause set in accordance with a preset mode to obtain a recommendation reason. The scheme of the invention solves the problem that a conventionally generated recommendation reason fails to reflect actual value of user comment data, and emotional inclination expressed by the generated recommendation reason is not strong, and is not good in attracting users.

Description

A kind of information processing method and device
Technical field
The present invention relates to field of information processing, refer to a kind of information processing method and device especially.
Background technology
Along with the Innovation and development of electronic information technology, network becomes a kind of indispensable technical service platform, and the instrument based on network and product emerge in an endless stream.In order to promote the confidence level of product in ecommerce and improve service of goods, introduce the online public praise system of user interaction design and user comment gradually.The online public praise of user can react sense of reality and the objective demand of user to a great extent, and compared to subjective comments and the function introduction of businessman, the information of public praise data more easily allows user convince.
At present, the application side of online public praise focuses in the quantitative analysis of data, and conventional directly extraction online comment in the past as rationale for the recommendation, but fails to embody the actual value that user comments on data, the emotion tendency that the rationale for the recommendation generated is expressed is not strong, well can not attract user.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of information processing method and device, realizes recommending reason expression of feeling to have more authenticity, can be good at attracting user.
For achieving the above object, embodiments of the invention provide a kind of information processing method, comprise the following steps:
The online comment data of product are carried out preliminary screening according to a preset standard;
Be that node carries out clause's fractionation by the described online comment data that retain after screening with symbol, establishing clause collection;
The clause randomly drawing a predetermined number from described clause set sets up training sample set, and marks the Sentiment orientation of the clause that described training sample is concentrated, and all clauses of mark to described clause set according to described training sample set mark;
Delete the clause being labeled as the first value described in described clause set, and the residue clause in described clause set is combined according to a preset mode, obtain rationale for the recommendation.
Wherein, the clause randomly drawing a predetermined number from described clause set sets up training sample set, and the Sentiment orientation of the clause that described training sample is concentrated is marked, be specially according to the step that all clauses of mark to described clause set of described training sample set mark:
The clause randomly drawing a predetermined number respectively from described clause set sets up test sample book collection and training sample set;
The result that the first time obtaining clause in described test sample book collection and described training sample set marks, carries out second time mark according to the result that the first time of described training sample set marks to the clause that described test sample book is concentrated;
The result of more described test sample book collection twice mark, obtains the error rate of described test sample book collection second time mark, if error rate is less than predetermined threshold value, then marks according to all clauses of current annotation results to described clause set of described training sample set; If error rate is greater than predetermined threshold value, described training sample set is corrected, until error rate is less than predetermined threshold value.
Wherein, the result that the first time obtaining clause in described test sample book collection and described training sample set marks, comprises the step that the clause that described test sample book is concentrated carries out second time mark according to the result that the first time of described training sample set marks:
After the result that the first time obtaining clause in described test sample book collection and described training sample set marks, the clause concentrated by described training sample marks according to it set being divided into Sentiment orientation different;
Select the clause that described test sample book is concentrated one by one, the probability P (T) in the set of concentrating Sentiment orientation different by the relatively described training sample of the following formula current clause of choosing of acquisition:
P(T)=P(w 1w 2w 3…w n)=P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w 1w 2…w n-1)≈P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w n-3w n-2w n-1);
P(w n|w 1w 2...w n-1)=C(w 1w 2…w n)/C(w 1w 2...w n-1);
Wherein, T represents the clause that test sample book is concentrated, w nrepresent the n-th word in the clause represented at T, calculate according to maximum likelihood method, C (w n-i-1w n-i) represent the number of times occurred in the set that this sequence word concentrates Sentiment orientation different at described training sample;
Each probability in the set that the relatively described training sample of clause of more current selection concentrates Sentiment orientation different, the Sentiment orientation marking current clause is identical with the Sentiment orientation of the set obtaining maximum probability.
Wherein, the step marked according to all clauses of current annotation results to described clause set of described training sample set comprises:
Mark according to it set being divided into Sentiment orientation different according to the clause that described training sample is concentrated by the current annotation results of described training sample set;
Select the clause in described clause set one by one, the probability P (t) in the set of concentrating Sentiment orientation different by the relatively described training sample of the following formula current clause of choosing of acquisition:
P(t)=P(w 1w 2w 3…w n)=P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w 1w 2…w n-1)≈P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w n-3w n-2w n-1);
P(w n|w 1w 2...w n-1)=C(w 1w 2…w n)/C(w 1w 2...w n-1);
Wherein, t represents the clause in clause set, w nrepresent the n-th word in the clause represented at t, calculate according to maximum likelihood method, C (w n-i-1w n-i) represent the number of times occurred in the set that this sequence word concentrates Sentiment orientation different at described training sample;
Each probability in the set that the relatively described training sample of clause of more current selection concentrates Sentiment orientation different, the Sentiment orientation marking current clause is identical with the Sentiment orientation of the set obtaining maximum probability.
Wherein, the step that described training sample set corrects is comprised:
The clause that the described test sample book that extraction mark there are differences is concentrated;
After obtaining the result of described clause third time mark, again add the described clause marked and concentrate to training sample;
The result again marked according to described training sample set carries out the 4th mark to the clause that described test sample book is concentrated;
The result of more described test sample book collection twice mark, obtains the error rate of described test sample book collection the 4th mark, if error rate is less than predetermined threshold value, then marks according to all clauses of mark to described clause set of described training sample set; If error rate is greater than predetermined threshold value, proceed to correct, until error rate is less than predetermined threshold value to described training sample set.
Embodiments of the invention additionally provide a kind of signal conditioning package, comprising:
Comment primary dcreening operation module, for carrying out preliminary screening by the online comment data of product according to a preset standard;
Clause splits module, for will the described online comment data that retain be that node carries out clause's fractionation with symbol after screening, and establishing clause collection;
Emotional semantic classification module, clause for randomly drawing a predetermined number from described clause set sets up training sample set, and the Sentiment orientation of the clause that described training sample is concentrated is marked, all clauses of mark to described clause set according to described training sample set mark;
Rationale for the recommendation composite module, for deleting the clause being labeled as the first value described in described clause set, and combining the residue clause in described clause set according to a preset mode, obtaining rationale for the recommendation.
Wherein, described emotional semantic classification module comprises:
Sample set sets up submodule, sets up test sample book collection and training sample set for the clause randomly drawing a predetermined number respectively from described clause set;
Emotion tagging submodule, the result that the first time for obtaining clause in described test sample book collection and described training sample set marks, carries out second time mark according to the result that the first time of described training sample set marks to the clause that described test sample book is concentrated;
Training sample set optimizes submodule, for the result of more described test sample book collection twice mark, obtain the error rate of described test sample book collection second time mark, if error rate is less than predetermined threshold value, then mark according to all clauses of current annotation results to described clause set of described training sample set; If error rate is greater than predetermined threshold value, described training sample set is corrected, until error rate is less than predetermined threshold value.
Wherein, described Emotion tagging submodule comprises:
First emotional semantic classification unit, after the result that the first time for obtaining clause in described test sample book collection and described training sample set marks, the clause concentrated by described training sample marks according to it set being divided into Sentiment orientation different;
First probability acquiring unit, for the clause selecting described test sample book to concentrate one by one, the probability P (T) in the set of concentrating Sentiment orientation different by the relatively described training sample of the following formula current clause of choosing of acquisition:
P(T)=P(w 1w 2w 3…w n)=P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w 1w 2…w n-1)≈P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w n-3w n-2w n-1);
P(w n|w 1w 2...w n-1)=C(w 1w 2…w n)/C(w 1w 2...w n-1);
Wherein, T represents the clause that test sample book is concentrated, w nrepresent the n-th word in the clause represented at T, calculate according to maximum likelihood method, C (w n-i-1w n-i) represent the number of times occurred in the set that this sequence word concentrates Sentiment orientation different at described training sample;
First Emotion tagging unit, for each probability in the set that the relatively described training sample of clause of more current selection concentrates Sentiment orientation different, the Sentiment orientation marking current clause is identical with the Sentiment orientation of the set obtaining maximum probability.
Wherein, described training sample set optimization submodule comprises:
Second emotional semantic classification unit, marks according to it set being divided into Sentiment orientation different for the clause concentrated by described training sample according to the current annotation results of described training sample set;
Second probability acquiring unit, for selecting the clause in described clause set one by one, the probability P (t) in the set of concentrating Sentiment orientation different by the relatively described training sample of the following formula current clause of choosing of acquisition:
P(t)=P(w 1w 2w 3…w n)=P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w 1w 2…w n-1)≈P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w n-3w n-2w n-1);
P(w n|w 1w 2...w n-1)=C(w 1w 2…w n)/C(w 1w 2...w n-1);
Wherein, t represents the clause in clause set, w nrepresent the n-th word in the clause represented at t, calculate according to maximum likelihood method, C (w n-i-1w n-i) represent the number of times occurred in the set that this sequence word concentrates Sentiment orientation different at described training sample;
Second Emotion tagging unit, for each probability in the set that the relatively described training sample of clause of more current selection concentrates Sentiment orientation different, the Sentiment orientation marking current clause is identical with the Sentiment orientation of the set obtaining maximum probability.
Wherein, described training sample set optimization submodule also comprises:
Extraction unit, the clause that the described test sample book that there are differences for extracting mark is concentrated;
Adding device, for obtain described clause third time mark result after, again add the described clause that marked concentrated to training sample;
3rd Emotion tagging unit, carries out the 4th mark for the result again marked according to described training sample set to the clause that described test sample book is concentrated;
Training sample set optimizes unit, for the result of more described test sample book collection twice mark, obtain the error rate of described test sample book collection the 4th mark, if error rate is less than predetermined threshold value, then mark according to all clauses of mark to described clause set of described training sample set; If error rate is greater than predetermined threshold value, proceed to correct, until error rate is less than predetermined threshold value to described training sample set.
The beneficial effect of technique scheme of the present invention is as follows:
The information processing method of the embodiment of the present invention, after user is carried out preliminary screening to the online comment data of product according to a preset standard, be that node carries out clause's fractionation with symbol by the online comment data of reservation, establishing clause collection, and the clause therefrom randomly drawing a predetermined number sets up training sample set, again after the Sentiment orientation completing the clause concentrated training sample marks, the all clauses of mark to clause set according to training sample set mark, the clause of the first value is labeled as in final deletion clause set, and the residue clause in clause set is combined according to a preset mode, obtain rationale for the recommendation.Wherein, the first value represents this clause and is inclined to negative emotion.So, the rationale for the recommendation expression of feeling obtained has more authenticity, and according to positive emotion tendency content and neutral Sentiment orientation content combinations, can be good at attracting user.
Accompanying drawing explanation
Fig. 1 represents the flow chart of steps of the information processing method of the embodiment of the present invention;
Fig. 2 represents the concrete steps process flow diagram of the step 13 of the information processing method of the embodiment of the present invention;
Fig. 3 represents the concrete steps process flow diagram of the step 132 of the information processing method of the embodiment of the present invention;
The concrete steps process flow diagram one of the step 133 of the information processing method of Fig. 4 embodiment of the present invention;
The concrete steps flowchart 2 of the step 133 of the information processing method of Fig. 5 embodiment of the present invention;
The structural drawing of the signal conditioning package of Fig. 6 embodiment of the present invention;
The structural drawing of the emotional semantic classification module of the signal conditioning package of Fig. 7 embodiment of the present invention;
The structural drawing of the Emotion tagging submodule of the signal conditioning package of Fig. 8 embodiment of the present invention;
The training sample set of the signal conditioning package of Fig. 9 embodiment of the present invention optimizes the structural drawing one of submodule;
The training sample set of the signal conditioning package of Figure 10 embodiment of the present invention optimizes the structural drawing two of submodule.
Embodiment
For making the technical problem to be solved in the present invention, technical scheme and advantage clearly, be described in detail below in conjunction with the accompanying drawings and the specific embodiments.
The rationale for the recommendation that the present invention is directed to existing generation fails to embody the actual value that user comments on data, the emotion tendency that the rationale for the recommendation generated is expressed is not strong, well can not attract customer problem, a kind of information processing method and device are provided, realize rationale for the recommendation expression of feeling and have more authenticity, according to positive emotion tendency content and neutral Sentiment orientation content combinations, can be good at attracting user.
As shown in Figure 1, a kind of information processing method of the embodiment of the present invention, comprises the following steps:
The online comment data of product are carried out preliminary screening according to a preset standard by step 11;
The described online comment data that retain after screening are that node carries out clause's fractionation with symbol by step 12, establishing clause collection;
Step 13, the clause randomly drawing a predetermined number from described clause set sets up training sample set, and the Sentiment orientation of the clause that described training sample is concentrated is marked, all clauses of mark to described clause set according to described training sample set mark;
Step 14, deletes the clause being labeled as the first value described in described clause set, and the residue clause in described clause set is combined according to a preset mode, obtains rationale for the recommendation.
Pass through above-mentioned steps, after user is carried out preliminary screening to the online comment data of product according to a preset standard, be that node carries out clause's fractionation with symbol by the online comment data of reservation, establishing clause collection, and the clause therefrom randomly drawing a predetermined number sets up training sample set, again after the Sentiment orientation completing the clause concentrated training sample marks, the clause being labeled as the first value in final deletion clause set is marked according to all clauses of mark to clause set of training sample set, and the residue clause in clause set is combined according to a preset mode, obtain rationale for the recommendation.Wherein, the first value represents this clause and is inclined to negative emotion.So, the rationale for the recommendation expression of feeling obtained has more authenticity, and according to positive emotion tendency content and neutral Sentiment orientation content combinations, can be good at attracting user.
In an embodiment of the present invention, the online public praise data of product are carried out data storage according to the form of product serial number, comment people title, comment quantification star, comment content.In a step 11, the preset standard of preliminary screening can be screen according to self-defining comment star criteria.Certainly, also can be screen according to prestige or other guide.
After preliminary screening completes, according to step 12, being that node carries out clause's fractionation to screening the comment paragraph obtained with symbol, completing the work for the treatment of of data.Such as, comment paragraph is for " first time goes, and feels all right, and thing is very many, and I am not fastidious, so think that taste is pretty good; What certainly have is to one's taste, and what have is not to one's taste yet.Feeling of seafood is not so well ... crab does not all have meat & ghost, this price is somewhat expensive "; split by symbol and obtain 11 clauses, be respectively " first time go ", " sensation can also ", " thing is very many ", " not fastidious in person ", " so thinking that taste is pretty good ", " what certainly have is to one's taste ", " what have is not to one's taste ", " feeling of seafood is not so well ", " crab does not all have meat ", " ghost ", " this price is somewhat expensive " yet.
After comment paragraph preliminary screening obtained carries out clause's fractionation, the clause randomly drawing a predetermined number from the clause set obtained sets up training sample set, again after the Sentiment orientation completing the clause concentrated training sample marks, the all clauses of mark to clause set according to training sample set mark, i.e. step 13.So the step of step 13 is specially:
Step 131, the clause randomly drawing a predetermined number respectively from described clause set sets up test sample book collection and training sample set;
Step 132, the result that the first time obtaining clause in described test sample book collection and described training sample set marks, carries out second time mark according to the result that the first time of described training sample set marks to the clause that described test sample book is concentrated;
Step 133, the result of more described test sample book collection twice mark, obtain the error rate of described test sample book collection second time mark, if error rate is less than predetermined threshold value, then mark according to all clauses of current annotation results to described clause set of described training sample set; If error rate is greater than predetermined threshold value, described training sample set is corrected, until error rate is less than predetermined threshold value.
In embodiments of the present invention, training sample set is basis and the template of Emotion tagging, with training sample set as a whole emotion coupling basis, distinguish the Sentiment orientation of all clause's reality, test sample book collection is the order of accuarcy in order to training of judgement sample set Sentiment orientation, ensures the precision that training sample set reaches default.The whole clauses split by the training sample set pair primary election optimized subsequently carry out the mark of Sentiment orientation.
In embodiments of the present invention, need according to emotion classification, be divided into commendation, neutrality and derogatory sense Three Estate to clause, use 1,0 respectively ,-1 marks, and certainly, also can carry out refinement according to specific needs.
According to step 131,132,133, provide a specific embodiment below, the clause of the 1%-5% randomly drawed wherein in the clause set obtained after screening splits respectively sets up training sample set and test sample book collection, then the result that the first time obtaining clause in test sample book collection and training sample set marks, carries out second time mark according to the result that the first time of described training sample set marks to the clause that described test sample book is concentrated.Wherein, the result that first time marks is the Sentiment orientation of each clause of artificial mark, then based on training sample set, mark test set clause, every bar clause that such test sample book is concentrated will obtain two marks, be artificial mark and the mark based on training set respectively.Obtain two kinds of notation methods differences and error probability subsequently.As error probability is greater than 5%, then need to correct training sample set, continue to optimize until reach error rate to remain on less than 5%, and carry out the mark of Sentiment orientation by whole clauses that the training sample set pair primary election optimized splits; As error probability is less than 5%, then the direct whole clauses split by the primary election of training sample set pair carry out the mark of Sentiment orientation.
Wherein, the setting randomly drawing the predetermined number and error rate that clause sets up training sample set and test sample book collection can be carried out self-defined according to actual needs.
In the embodiment of the present invention, step 132 comprises:
Step 1321, after the result that the first time obtaining clause in described test sample book collection and described training sample set marks, the clause concentrated by described training sample marks according to it set being divided into Sentiment orientation different;
Step 1322, selects the clause that described test sample book is concentrated one by one, the probability P (T) in the set of concentrating Sentiment orientation different by the relatively described training sample of the following formula current clause of choosing of acquisition:
P(T)=P(w 1w 2w 3…w n)=P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w 1w 2…w n-1)≈P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w n-3w n-2w n-1);
P(w n|w 1w 2...w n-1)=C(w 1w 2…w n)/C(w 1w 2...w n-1);
Wherein, T represents the clause that test sample book is concentrated, w nrepresent the n-th word in the clause represented at T, calculate according to maximum likelihood method, C (w n-i-1w n-i) represent the number of times occurred in the set that this sequence word concentrates Sentiment orientation different at described training sample;
Step 1323, each probability in the set that the relatively described training sample of clause of more current selection concentrates Sentiment orientation different, the Sentiment orientation marking current clause is identical with the Sentiment orientation of the set obtaining maximum probability.
Below in conjunction with instantiation, above-mentioned steps is described:
In an embodiment of the present invention, the N-gram algorithm of emotional semantic classification application supervised learning, wherein, N value is that N number of clause that acquiescence clause is adjacent is associated, and such as N gets 4, represents that adjacent 4 sub-Jurongs easily have the incidence relation simultaneously occurred.Comparatively conventional N value is chosen for 4 or 8, and consider efficiency and actual demand, in embodiments of the invention, preferred N is 4.According to step 1321, and continue above-mentioned according to emotion classification needs, commendation is divided into clause, neutral and derogatory sense Three Estate marks, the clause that training sample is concentrated is divided into commendation according to its Sentiment orientation, neutrality and derogatory sense three set, with word be adjacently less than the word of 4 and the type of attachment of word stores in these set, such as, the sentence that training sample is concentrated is " this family's steamed bun restaurant is eaten very well ", and this be labeled as 1, " this family " of then commendation centralized stores, " this family's steamed bun restaurant ", " this family's steamed bun restaurant very ", " this family's steamed bun restaurant is fine ", the number of times that " steamed bun restaurant is eaten very well " occurs all increases by 1.
When the clause carrying out test sample book collection marks, T represents the clause that test sample book is concentrated, w nrepresent the n-th word in the clause represented at T, P (T) represents the probability that this sentence occurs, P (w n) represent the probability that this word occurs.Continue to continue to use the example that clause splits, comment paragraph is for " first time goes, and feels all right, and thing is very many, and I am not fastidious, so think that taste is pretty good; What certainly have is to one's taste, and what have is not to one's taste yet.Feeling of seafood is not so well ... crab does not all have meat & ghost, this price is somewhat expensive "; split by symbol and obtain 11 clauses, be respectively " first time go ", " sensation can also ", " thing is very many ", " not fastidious in person ", " so thinking that taste is pretty good ", " what certainly have is to one's taste ", " what have is not to one's taste ", " feeling of seafood is not so well ", " crab does not all have meat ", " ghost ", " this price is somewhat expensive " yet.Get wherein clause's " feeling of seafood is not so well " as clause to be marked, w 1, w 2w 6be respectively " seafood ", " ", " sensation ", " no ", how " ", " good ", obtain after bringing into according to the formula of step 1323:
P (" seafood ... what is better ")=P (" seafood " " " ... " how " " good ")=P (" seafood ") P (" " | " seafood ") ... P (" good " | " sensation " " no " " how ")
P (" good " | " sensation " " no " " how ")=C (" sensation " " no " " how " " good ")/C (" sensation " " no " " how ")
And C wherein (" sensation " " no " " how " " good ") and C (" sensation " " no " " how ") can obtain corresponding numerical value respectively in the commendation set of training set, neutral collection and derogatory sense set.Thus can calculate, for the set that this three class is different, the frequency that this clause occurs.Then the mark value of clause to be marked is obtained according to step 1323 interpretation.As in three set, the frequency that sentence " feeling of seafood is not so well " occurs is respectively P (T)=0.5, P (T)=0.34, P (T)=0.67, the frequency occurred in the derogatory sense set that then this sentence is concentrated at training sample is the highest, so this sentence is labeled as-1.
Same, the step marked according to all clauses of the current annotation results of described training sample set to described clause set in step 133 comprises:
Step 1331, marks according to it set being divided into Sentiment orientation different according to the clause that described training sample is concentrated by the current annotation results of described training sample set;
Step 1332, selects the clause in described clause set one by one, the probability P (t) in the set of concentrating Sentiment orientation different by the relatively described training sample of the following formula current clause of choosing of acquisition:
P(t)=P(w 1w 2w 3…w n)==P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w 1w 2…w n-1)≈P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w n-3w n-2w n-1);
P(w n|w 1w 2...w n-1)=C(w 1w 2…w n)/C(w 1w 2...w n-1);
Wherein, t represents the clause in clause set, w nrepresent the n-th word in the clause represented at t, calculate according to maximum likelihood method, C (w n-i-1w n-i) represent the number of times occurred in the set that this sequence word concentrates Sentiment orientation different at described training sample;
Step 1333, each probability in the set that the relatively described training sample of clause of more current selection concentrates Sentiment orientation different, the Sentiment orientation marking current clause is identical with the Sentiment orientation of the set obtaining maximum probability.
Above-mentioned steps 1331,1332,1333 with step 1321, the mask method of 1322,1323 is identical, is only that mark object is different, step 1331, and 1332,1333 is that clause in the clause set after splitting step 12 marks, and is just no longer described in detail at this.
In an embodiment of the present invention, training sample set is basis and the template of Emotion tagging, only ensure that the order of accuarcy of training sample set, could obtain other clauses Emotion tagging more accurately.Therefore, in step 133, the result of more described test sample book collection twice mark, obtain the error rate of described test sample book collection second time mark, if error rate is greater than predetermined threshold value, also need to correct described training sample set, until error rate is less than predetermined threshold value.So, the step that described training sample set corrects is comprised:
Step 1334, the clause that the described test sample book that extraction mark there are differences is concentrated;
Step 1335, after obtaining the result of described clause third time mark, again adds the described clause marked and concentrates to training sample;
Step 1336, the result again marked according to described training sample set carries out the 4th mark to the clause that described test sample book is concentrated;
Step 1337, the result of more described test sample book collection twice mark, obtains the error rate of described test sample book collection the 4th mark, if error rate is less than predetermined threshold value, then marks according to all clauses of mark to described clause set of described training sample set; If error rate is greater than predetermined threshold value, proceed to correct, until error rate is less than predetermined threshold value to described training sample set.
Wherein, the result of described clause third time mark is obtained by artificially marking.First extract test sample book and concentrate the clause marking and there are differences, obtain it through the artificial result confirming mark, and the clause after confirming and mark are added to training sample and concentrate, new test sample book collection is tested with the training sample set generated, if error rate is less than predetermined threshold value, then mark according to all clauses of mark to described clause set of training sample set; If error rate is greater than predetermined threshold value, proceeds to correct to training sample set, continue to optimize repetition above-mentioned steps, until error rate is less than predetermined threshold value.
The information processing method of the embodiment of the present invention, what finally will obtain is the rationale for the recommendation of satisfying the demand, therefore, after having marked all clauses to be marked, according to step 14, delete the clause being labeled as the first value in clause set, namely there is the clause of negative emotion tendency, and the residue clause in clause set is combined according to a preset mode, be combined into the pattern of positive emotion tendency and neutral Sentiment orientation clause combination, obtain rationale for the recommendation.
Wherein, preset mode setting can connect with comma for neutral clause before commendation clause, and adjacent commendation clause connects with comma, and after commendation clause, neutral clause and derogatory sense clause slightly remove and replace with suspension points.If former comment paragraph unitized construction is 1101-10011-1, then generate as 1,1 ... 1 ... 1,1 ... sentence (the clause's content in this example corresponding to this affective style of numeral).Final extraction rationale for the recommendation length also can customize, and default-length is 25-30 scope, and certain above-mentioned setting is not unique integrated mode, does not enumerate at this.
To sum up, there is following advantage in the information processing method of the embodiment of the present invention: 1) can meet various ways rationale for the recommendation demand by self-defined; 2) rationale for the recommendation can dynamically update from eligible database, the every bar comment paragraph meeting star requirement for each product all can generate a rationale for the recommendation, namely each product can have a lot of bar with rationale for the recommendation, upgrades to be to present from wherein selecting out a conduct; 3) when rolling up appears in the public praise data of product, because larger change does not occur for the unique characteristics of product and advantage, when now wishing to generate new whole rationale for the recommendation, do not need to change training sample set, only need based on original training sample set, rerun, rationale for the recommendation is carried out to the online comment data newly obtained and generates; 4) have employed the processing mode of multithreading, very the efficiency that improve information processing of limits.
As shown in Figure 6, the embodiment of the present invention additionally provides a kind of signal conditioning package, comprising:
Comment primary dcreening operation module 10, for carrying out preliminary screening by the online comment data of product according to a preset standard;
Clause splits module 20, for will the described online comment data that retain be that node carries out clause's fractionation with symbol after screening, and establishing clause collection;
Emotional semantic classification module 30, clause for randomly drawing a predetermined number from described clause set sets up training sample set, and the Sentiment orientation of the clause that described training sample is concentrated is marked, all clauses of mark to described clause set according to described training sample set mark;
Rationale for the recommendation composite module 40, for deleting the clause being labeled as the first value described in described clause set, and combining the residue clause in described clause set according to a preset mode, obtaining rationale for the recommendation.
Wherein, as shown in Figure 7, described emotional semantic classification module 30 comprises:
Sample set sets up submodule 301, sets up test sample book collection and training sample set for the clause randomly drawing a predetermined number respectively from described clause set;
Emotion tagging submodule 302, the result that the first time for obtaining clause in described test sample book collection and described training sample set marks, carries out second time mark according to the result that the first time of described training sample set marks to the clause that described test sample book is concentrated;
Training sample set optimizes submodule 303, for the result of more described test sample book collection twice mark, obtain the error rate of described test sample book collection second time mark, if error rate is less than predetermined threshold value, then mark according to all clauses of current annotation results to described clause set of described training sample set; If error rate is greater than predetermined threshold value, described training sample set is corrected, until error rate is less than predetermined threshold value.
Wherein, as shown in Figure 8, described Emotion tagging submodule 302 comprises:
First emotional semantic classification unit 3021, after the result that the first time for obtaining clause in described test sample book collection and described training sample set marks, the clause concentrated by described training sample marks according to it set being divided into Sentiment orientation different;
First probability acquiring unit 3022, for the clause selecting described test sample book to concentrate one by one, the probability P (T) in the set of concentrating Sentiment orientation different by the relatively described training sample of the following formula current clause of choosing of acquisition:
P(T)=P(w 1w 2w 3…w n)=P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w 1w 2…w n-1)≈P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w n-3w n-2w n-1);
P(w n|w 1w 2...w n-1)=C(w 1w 2…w n)/C(w 1w 2...w n-1);
Wherein, T represents the clause that test sample book is concentrated, w nrepresent the n-th word in the clause represented at T, calculate according to maximum likelihood method, C (w n-i-1w n-i) represent the number of times occurred in the set that this sequence word concentrates Sentiment orientation different at described training sample;
First Emotion tagging unit 3023, for each probability in the set that the relatively described training sample of clause of more current selection concentrates Sentiment orientation different, the Sentiment orientation marking current clause is identical with the Sentiment orientation of the set obtaining maximum probability.
Wherein, as shown in Figure 9, described training sample set optimization submodule 303 comprises:
Second emotional semantic classification unit 3031, marks according to it set being divided into Sentiment orientation different for the clause concentrated by described training sample according to the current annotation results of described training sample set;
Second probability acquiring unit 3032, for selecting the clause in described clause set one by one, the probability P (t) in the set of concentrating Sentiment orientation different by the relatively described training sample of the following formula current clause of choosing of acquisition:
P(t)=P(w 1w 2w 3…w n)=P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w 1w 2…w n-1)≈P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w n-3w n-2w n-1);
P(w n|w 1w 2...w n-1)=C(w 1w 2…w n)/C(w 1w 2...w n-1);
Wherein, t represents the clause in clause set, w nrepresent the n-th word in the clause represented at t, calculate according to maximum likelihood method, C (w n-i-1w n-i) represent the number of times occurred in the set that this sequence word concentrates Sentiment orientation different at described training sample;
Second Emotion tagging unit 3033, for each probability in the set that the relatively described training sample of clause of more current selection concentrates Sentiment orientation different, the Sentiment orientation marking current clause is identical with the Sentiment orientation of the set obtaining maximum probability.
Wherein, as shown in Figure 10, described training sample set optimization submodule 303 also comprises:
Extraction unit 3034, the clause that the described test sample book that there are differences for extracting mark is concentrated;
Adding device 3035, for obtain described clause third time mark result after, again add the described clause that marked concentrated to training sample;
3rd Emotion tagging unit 3036, carries out the 4th mark for the result again marked according to described training sample set to the clause that described test sample book is concentrated;
Training sample set optimizes unit 3037, for the result of more described test sample book collection twice mark, obtain the error rate of described test sample book collection the 4th mark, if error rate is less than predetermined threshold value, then mark according to all clauses of mark to described clause set of described training sample set; If error rate is greater than predetermined threshold value, proceed to correct, until error rate is less than predetermined threshold value to described training sample set.
Certainly, the signal conditioning package of the embodiment of the present invention also comprises: custom block.The preset standard of preliminary screening is set, the predetermined number of training sample set and test sample book collection, the preset mode of rationale for the recommendation by custom block, the predetermined threshold value etc. of error rate.
It should be noted that, this signal conditioning package is the device applying above-mentioned information processing method, and the implementation of above-mentioned information processing method is applied in this device and also can reaches identical technique effect.
The above is the preferred embodiment of the present invention; it should be pointed out that for those skilled in the art, under the prerequisite not departing from principle of the present invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.

Claims (10)

1. an information processing method, is characterized in that, comprises the following steps:
The online comment data of product are carried out preliminary screening according to a preset standard;
Be that node carries out clause's fractionation by the described online comment data that retain after screening with symbol, establishing clause collection;
The clause randomly drawing a predetermined number from described clause set sets up training sample set, and marks the Sentiment orientation of the clause that described training sample is concentrated, and all clauses of mark to described clause set according to described training sample set mark;
Delete the clause being labeled as the first value described in described clause set, and the residue clause in described clause set is combined according to a preset mode, obtain rationale for the recommendation.
2. information processing method according to claim 1, it is characterized in that, the clause randomly drawing a predetermined number from described clause set sets up training sample set, and the Sentiment orientation of the clause that described training sample is concentrated is marked, be specially according to the step that all clauses of mark to described clause set of described training sample set mark:
The clause randomly drawing a predetermined number respectively from described clause set sets up test sample book collection and training sample set;
The result that the first time obtaining clause in described test sample book collection and described training sample set marks, carries out second time mark according to the result that the first time of described training sample set marks to the clause that described test sample book is concentrated;
The result of more described test sample book collection twice mark, obtains the error rate of described test sample book collection second time mark, if error rate is less than predetermined threshold value, then marks according to all clauses of current annotation results to described clause set of described training sample set; If error rate is greater than predetermined threshold value, described training sample set is corrected, until error rate is less than predetermined threshold value.
3. information processing method according to claim 2, it is characterized in that, the result that the first time obtaining clause in described test sample book collection and described training sample set marks, comprises the step that the clause that described test sample book is concentrated carries out second time mark according to the result that the first time of described training sample set marks:
After the result that the first time obtaining clause in described test sample book collection and described training sample set marks, the clause concentrated by described training sample marks according to it set being divided into Sentiment orientation different;
Select the clause that described test sample book is concentrated one by one, the probability P (T) in the set of concentrating Sentiment orientation different by the relatively described training sample of the following formula current clause of choosing of acquisition:
P(T)=P(w 1w 2w 3…w n)=P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w 1w 2…w n-1)≈P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w n-3w n-2w n-1);
P(w n|w 1w 2...w n-1)=C(w 1w 2…w n)/C(w 1w 2...w n-1);
Wherein, T represents the clause that test sample book is concentrated, w nrepresent the n-th word in the clause represented at T, calculate according to maximum likelihood method, C (w n-i-1w n-i) represent the number of times occurred in the set that this sequence word concentrates Sentiment orientation different at described training sample;
Each probability in the set that the relatively described training sample of clause of more current selection concentrates Sentiment orientation different, the Sentiment orientation marking current clause is identical with the Sentiment orientation of the set obtaining maximum probability.
4. information processing method according to claim 2, is characterized in that, comprises according to the step that all clauses of current annotation results to described clause set of described training sample set mark:
Mark according to it set being divided into Sentiment orientation different according to the clause that described training sample is concentrated by the current annotation results of described training sample set;
Select the clause in described clause set one by one, the probability P (t) in the set of concentrating Sentiment orientation different by the relatively described training sample of the following formula current clause of choosing of acquisition:
P(t)=P(w 1w 2w 3…w n)=P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w 1w 2…w n-1)≈P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w n-3w n-2w n-1);
P(w n|w 1w 2...w n-1)=C(w 1w 2…w n)/C(w 1w 2...w n-1);
Wherein, t represents the clause in clause set, w nrepresent the n-th word in the clause represented at t, calculate according to maximum likelihood method, C (w n-i-1w n-i) represent the number of times occurred in the set that this sequence word concentrates Sentiment orientation different at described training sample;
Each probability in the set that the relatively described training sample of clause of more current selection concentrates Sentiment orientation different, the Sentiment orientation marking current clause is identical with the Sentiment orientation of the set obtaining maximum probability.
5. information processing method according to claim 2, is characterized in that, comprises the step that described training sample set corrects:
The clause that the described test sample book that extraction mark there are differences is concentrated;
After obtaining the result of described clause third time mark, again add the described clause marked and concentrate to training sample;
The result again marked according to described training sample set carries out the 4th mark to the clause that described test sample book is concentrated;
The result of more described test sample book collection twice mark, obtains the error rate of described test sample book collection the 4th mark, if error rate is less than predetermined threshold value, then marks according to all clauses of mark to described clause set of described training sample set; If error rate is greater than predetermined threshold value, proceed to correct, until error rate is less than predetermined threshold value to described training sample set.
6. a signal conditioning package, is characterized in that, comprising:
Comment primary dcreening operation module, for carrying out preliminary screening by the online comment data of product according to a preset standard;
Clause splits module, for will the described online comment data that retain be that node carries out clause's fractionation with symbol after screening, and establishing clause collection;
Emotional semantic classification module, clause for randomly drawing a predetermined number from described clause set sets up training sample set, and the Sentiment orientation of the clause that described training sample is concentrated is marked, all clauses of mark to described clause set according to described training sample set mark;
Rationale for the recommendation composite module, for deleting the clause being labeled as the first value described in described clause set, and combining the residue clause in described clause set according to a preset mode, obtaining rationale for the recommendation.
7. signal conditioning package according to claim 6, is characterized in that, described emotional semantic classification module comprises:
Sample set sets up submodule, sets up test sample book collection and training sample set for the clause randomly drawing a predetermined number respectively from described clause set;
Emotion tagging submodule, the result that the first time for obtaining clause in described test sample book collection and described training sample set marks, carries out second time mark according to the result that the first time of described training sample set marks to the clause that described test sample book is concentrated;
Training sample set optimizes submodule, for the result of more described test sample book collection twice mark, obtain the error rate of described test sample book collection second time mark, if error rate is less than predetermined threshold value, then mark according to all clauses of current annotation results to described clause set of described training sample set; If error rate is greater than predetermined threshold value, described training sample set is corrected, until error rate is less than predetermined threshold value.
8. signal conditioning package according to claim 7, is characterized in that, described Emotion tagging submodule comprises:
First emotional semantic classification unit, after the result that the first time for obtaining clause in described test sample book collection and described training sample set marks, the clause concentrated by described training sample marks according to it set being divided into Sentiment orientation different;
First probability acquiring unit, for the clause selecting described test sample book to concentrate one by one, the probability P (T) in the set of concentrating Sentiment orientation different by the relatively described training sample of the following formula current clause of choosing of acquisition:
P(T)=P(w 1w 2w 3…w n)=P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w 1w 2…w n-1)≈P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w n-3w n-2w n-1);
P(w n|w 1w 2...w n-1)=C(w 1w 2…w n)/C(w 1w 2...w n-1);
Wherein, T represents the clause that test sample book is concentrated, w nrepresent the n-th word in the clause represented at T, calculate according to maximum likelihood method, C (w n-i-1w n-i) represent the number of times occurred in the set that this sequence word concentrates Sentiment orientation different at described training sample;
First Emotion tagging unit, for each probability in the set that the relatively described training sample of clause of more current selection concentrates Sentiment orientation different, the Sentiment orientation marking current clause is identical with the Sentiment orientation of the set obtaining maximum probability.
9. signal conditioning package according to claim 7, is characterized in that, described training sample set is optimized submodule and comprised:
Second emotional semantic classification unit, marks according to it set being divided into Sentiment orientation different for the clause concentrated by described training sample according to the current annotation results of described training sample set;
Second probability acquiring unit, for selecting the clause in described clause set one by one, the probability P (t) in the set of concentrating Sentiment orientation different by the relatively described training sample of the following formula current clause of choosing of acquisition:
P(t)=P(w 1w 2w 3…w n)=P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w 1w 2…w n-1)≈P(w 1)P(w 2|w 1)P(w 3|w 1w 2)…P(w n|w n-3w n-2w n-1);
P(w n|w 1w 2...w n-1)=C(w 1w 2…w n)/C(w 1w 2...w n-1);
Wherein, t represents the clause in clause set, w nrepresent the n-th word in the clause represented at t, calculate according to maximum likelihood method, C (w n-i-1w n-i) represent the number of times occurred in the set that this sequence word concentrates Sentiment orientation different at described training sample;
Second Emotion tagging unit, for each probability in the set that the relatively described training sample of clause of more current selection concentrates Sentiment orientation different, the Sentiment orientation marking current clause is identical with the Sentiment orientation of the set obtaining maximum probability.
10. signal conditioning package according to claim 7, is characterized in that, described training sample set is optimized submodule and also comprised:
Extraction unit, the clause that the described test sample book that there are differences for extracting mark is concentrated;
Adding device, for obtain described clause third time mark result after, again add the described clause that marked concentrated to training sample;
3rd Emotion tagging unit, carries out the 4th mark for the result again marked according to described training sample set to the clause that described test sample book is concentrated;
Training sample set optimizes unit, for the result of more described test sample book collection twice mark, obtain the error rate of described test sample book collection the 4th mark, if error rate is less than predetermined threshold value, then mark according to all clauses of mark to described clause set of described training sample set; If error rate is greater than predetermined threshold value, proceed to correct, until error rate is less than predetermined threshold value to described training sample set.
CN201410162861.XA 2014-04-22 2014-04-22 A kind of information processing method and device Active CN105005552B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410162861.XA CN105005552B (en) 2014-04-22 2014-04-22 A kind of information processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410162861.XA CN105005552B (en) 2014-04-22 2014-04-22 A kind of information processing method and device

Publications (2)

Publication Number Publication Date
CN105005552A true CN105005552A (en) 2015-10-28
CN105005552B CN105005552B (en) 2019-01-08

Family

ID=54378228

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410162861.XA Active CN105005552B (en) 2014-04-22 2014-04-22 A kind of information processing method and device

Country Status (1)

Country Link
CN (1) CN105005552B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776568A (en) * 2016-12-26 2017-05-31 成都康赛信息技术有限公司 Based on the rationale for the recommendation generation method that user evaluates
CN107609960A (en) * 2017-10-18 2018-01-19 口碑(上海)信息技术有限公司 Rationale for the recommendation generation method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101876985A (en) * 2009-11-26 2010-11-03 西北工业大学 WEB text sentiment theme recognizing method based on mixed model
CN101901230A (en) * 2009-05-31 2010-12-01 国际商业机器公司 Information retrieval method, user comment processing method and system thereof
CN102023967A (en) * 2010-11-11 2011-04-20 清华大学 Text emotion classifying method in stock field
CN102929860A (en) * 2012-10-12 2013-02-13 浙江理工大学 Chinese clause emotion polarity distinguishing method based on context
CN103034626A (en) * 2012-12-26 2013-04-10 上海交通大学 Emotion analyzing system and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101901230A (en) * 2009-05-31 2010-12-01 国际商业机器公司 Information retrieval method, user comment processing method and system thereof
CN101876985A (en) * 2009-11-26 2010-11-03 西北工业大学 WEB text sentiment theme recognizing method based on mixed model
CN102023967A (en) * 2010-11-11 2011-04-20 清华大学 Text emotion classifying method in stock field
CN102929860A (en) * 2012-10-12 2013-02-13 浙江理工大学 Chinese clause emotion polarity distinguishing method based on context
CN103034626A (en) * 2012-12-26 2013-04-10 上海交通大学 Emotion analyzing system and method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
刘鸿宇: "情感标签抽取相关技术研究", 《中国优秀硕士学位论文全文数据库》 *
李素科等: "基于情感特征聚类的半监督情感分类", 《计算机研究与发展》 *
王兰成等: "基于情感本体的主题网络舆情倾向性分析", 《信息与控制》 *
王海雷等: "基于用户生成内容的产品搜索模型", 《中文信息学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776568A (en) * 2016-12-26 2017-05-31 成都康赛信息技术有限公司 Based on the rationale for the recommendation generation method that user evaluates
CN107609960A (en) * 2017-10-18 2018-01-19 口碑(上海)信息技术有限公司 Rationale for the recommendation generation method and device

Also Published As

Publication number Publication date
CN105005552B (en) 2019-01-08

Similar Documents

Publication Publication Date Title
CN104268160B (en) A kind of OpinionTargetsExtraction Identification method based on domain lexicon and semantic role
CN106650943B (en) Auxiliary writing method and device based on artificial intelligence
CN109189901B (en) Method for automatically discovering new classification and corresponding corpus in intelligent customer service system
CN104281622B (en) Information recommendation method and device in a kind of social media
CN107609960A (en) Rationale for the recommendation generation method and device
CN105868185A (en) Part-of-speech-tagging-based dictionary construction method applied in shopping comment emotion analysis
CN106815194A (en) Model training method and device and keyword recognition method and device
US10747798B2 (en) Control method, processing apparatus, and recording medium
CN102682120B (en) Method and device for acquiring essential article commented on network
CN110096587A (en) The fine granularity sentiment classification model of LSTM-CNN word insertion based on attention mechanism
CN108021660B (en) Topic self-adaptive microblog emotion analysis method based on transfer learning
CN103631963B (en) A kind of keyword optimized treatment method and device based on big data
Zamani et al. Sentiment analysis: determining people’s emotions in Facebook
CN105183731A (en) Method, device, and system for generating recommended information
CN104199845B (en) Line Evaluation based on agent model discusses sensibility classification method
CN106407235A (en) A semantic dictionary establishing method based on comment data
JP6182478B2 (en) Analysis apparatus and analysis method
CN106776808A (en) Information data offering method and device based on artificial intelligence
KR20180092732A (en) Document summarization system and method using RNN model
CN104731874A (en) Evaluation information generation method and device
CN106569996A (en) Chinese-microblog-oriented emotional tendency analysis method
CN105005552A (en) Information processing method and apparatus
KR20200064490A (en) Server and method for automatically generating profile
US10268640B1 (en) System for communication of object-directed feelings
Aldahawi et al. Twitter mining in the oil business: A sentiment analysis approach

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant