CN103425635A - Method and device for recommending answers - Google Patents
Method and device for recommending answers Download PDFInfo
- Publication number
- CN103425635A CN103425635A CN2012101510445A CN201210151044A CN103425635A CN 103425635 A CN103425635 A CN 103425635A CN 2012101510445 A CN2012101510445 A CN 2012101510445A CN 201210151044 A CN201210151044 A CN 201210151044A CN 103425635 A CN103425635 A CN 103425635A
- Authority
- CN
- China
- Prior art keywords
- answer
- weight
- classification
- semantic primitive
- dictionary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method and a device for recommending answers. The method includes acquiring questions and text content corresponding to the questions and segmenting to obtain semantic units of the questions and semantic units of answers, searching weights of the semantic units of the questions in different categories according to a built question domain dictionary to compute the theme weight of the questions in different categories, searching weights of the semantic units of the answers in different categories according to a built answer domain dictionary to compute the theme weight of the answers in different categories, computing the theme similarities of the various answers and the questions respectively according to the theme weight of the questions and the theme weight of the answers, and finally recommending the answers according to the computing result of the theme similarity. Compared with the prior art, the method and the device for recommending answers have the advantages that accuracy of semantic similarities between questions and answers is improved effectively and recall rate is increased since the question domain dictionary and the answer domain dictionary are generated respectively.
Description
[technical field]
The present invention relates to the internet information processing technology field, particularly a kind of answer recommend method and device.
[background technology]
Along with the development of communication technology and network, such as Baidu know, Sina likes to ask, Google's question and answer, search ask, the network interdynamic Ask-Answer Community such as Yahoo's knowledge hall, day by day receive people's concern.These network interdynamic Ask-Answer Communities provide a platform that can carry out interaction for the netizen, and the user can freely ask a question, browses problem, answer a question, and the interchange of being helped each other, share knowledge.Increasing along with the Ask-Answer Community participating user, the candidate answers number increases thereupon, and Ask-Answer Community usually can check on one's answers and carry out auto-sequencing, in order to recommend preferred answer for the user.
In the auto-sequencing that checks on one's answers, at present, mostly adopt the text subject analytical technology to analyze semantic relevancy that question and answer are right etc. and judge that question and answer are to satisfaction, and then check on one's answers and carry out auto-sequencing.The text subject analytical technology is mainly based on topic model, text mapping is become to the topic vector, the topic vector is again that the distribution by word means, therefore the Topic Similarity between text calculates the similarity calculating that can change between the topic vector, and this similarity can be measured by the cosine similarity.
Existing text subject analytical approach is mostly based on a hypothesis: text all belongs to same topic space, and each topic belongs to same word distribution.Yet question and answer centering question and answer may adopt different describing modes, the inconsistent situation of word appears, for example, in computer realm, the field word of problem distributes to commonly use or colloquial compuword is main, as computer, operating system etc.; And the field word distribution of answering be take some professional compuwords as main, such as PC, win7 etc.; And for example, put question to the user to be asked a question with regard to the technical ability of certain game, but be the description to concrete technical ability in the answer that the user answers, do not comprise the word in problem.Now, the semantic relevancy that calculates answer and problem according to existing method is lower, can make and can't recall with the actual answer be complementary of problem or after the sequence of answer leans on, cause the decline of question and answer to the quality determination rate of accuracy, make the user can't find preferred answer.
[summary of the invention]
In view of this, the invention provides a kind of answer recommend method and device, Generating Problems field dictionary and the field of answer dictionary, shine upon statement with the field of expanding question and answer centering problem and answer respectively, effectively promote the accuracy rate that between problem and answer, semantic similarity is judged, improved recall rate.
Concrete technical scheme is as follows:
A kind of answer recommend method, the method comprises the following steps:
S1, obtain the content of text of the corresponding answer of problem and this problem, participle obtains the semantic primitive of described problem and the semantic primitive of described answer;
S2, utilization be the problem domain dictionary of foundation in advance, finds out the weight of semantic primitive in each classification of described problem, calculates the topic weights of described problem in each classification;
And
Utilize the answer field dictionary of setting up in advance, find out the weight of semantic primitive in each classification of described each answer, calculate respectively the topic weights of described each answer in each classification;
S3, the topic weights of utilizing the described problem obtain and the topic weights of each answer, calculate respectively the Topic Similarity of each answer and described problem, according to the result of calculation recommendation answer of described Topic Similarity.
According to one preferred embodiment of the present invention, the method for building up of described problem domain dictionary specifically comprises:
Obtain the content of question and answer to problem in language material, participle obtains the semantic primitive of described problem;
Calculate respectively the weight of each semantic primitive in each classification of described problem;
Described each semantic primitive and the weight in each classification thereof are formed to the problem domain dictionary.
According to one preferred embodiment of the present invention, the method for building up of described answer field dictionary specifically comprises:
Obtain the content of question and answer to answer in language material, participle obtains the semantic primitive of described answer;
Calculate respectively the weight of each semantic primitive in each classification of described answer;
Described each semantic primitive and the weight in each classification thereof are formed to answer field dictionary.
According to one preferred embodiment of the present invention, after the semantic primitive of the described semantic primitive that obtains described problem or answer, also comprise:
Semantic primitive by word frequency lower than default word frequency threshold value filters out;
Only, to filtering rear remaining semantic primitive, calculate respectively the weight in each classification.
According to one preferred embodiment of the present invention, the weight of described semantic primitive in each classification calculated according to following listed a kind of or combination in any:
The otherness of the word frequency of described semantic primitive between of all categories, described semantic primitive are in the word frequency of middle appearance of all categories or the contrary word frequency rate of described semantic primitive.
According to one preferred embodiment of the present invention, the weighing computation method of described semantic primitive in each classification is:
Wherein, w (token
i, C
j) expression semantic primitive token
iAt classification C
jIn weight;
P
Ij=T
Ij/ L
j, L
jMean classification C
jIn the number of times summation of all semantic primitives of containing, T
IjMean semantic primitive token
iAt classification C
jThe number of times of middle appearance;
Be illustrated in semantic primitive token
iAt classification C
jThe word frequency of middle appearance, n is the word frequency factor of influence;
N means the number of times summation that in language material, all semantic primitives occur, N (token
i) expression semantic primitive token
iThe number of times occurred.
According to one preferred embodiment of the present invention, described each semantic primitive and the weight in each classification thereof are formed to problem domain dictionary or answer field dictionary before, also comprise:
Weight to each semantic primitive between each classification is carried out the similarity weight heavy filtration, for same semantic primitive, will be in same weight interval the occurrence number weight that is greater than predetermined threshold value filter out;
Only by semantic primitive in residue the weight in classification in order to form problem domain dictionary or answer field dictionary.
According to one preferred embodiment of the present invention, according to described semantic primitive, the weight size in each classification is arranged in described weight interval.
According to one preferred embodiment of the present invention, described each semantic primitive and the weight in each classification thereof are formed to the problem domain dictionary before, also comprise:
The semantic primitive that individual character, repeat number word string or numeric string length is surpassed to the preset length threshold value filters out;
After only filtering, remaining semantic primitive is in order to form problem domain dictionary or answer field dictionary.
According to one preferred embodiment of the present invention, the computing method of the Topic Similarity of described answer and problem comprise:
Calculate respectively described answer and the problem Topic Similarity under each classification;
Choose the Topic Similarity maximal value that the calculates Topic Similarity as described answer and problem.
According to one preferred embodiment of the present invention, the computing method of the Topic Similarity of described answer and problem are:
sim(query,ans)=Max
j{weight(query,C
j)×weight(ans,C
j)}
Wherein, sim (query, ans) means the Topic Similarity of answer and problem, weight (query, C
j) problem of representation is at classification C
jIn topic weights, weight (ans, C
j) mean that answer is at classification C
jIn topic weights.
A kind of answer recommendation apparatus, this device comprises:
The text acquisition module, for obtaining the content of text of problem and the corresponding answer of this problem, participle obtains the semantic primitive of described problem and the semantic primitive of described answer;
The topic weights computing module, for utilizing the problem domain dictionary of setting up in advance, find out the weight of semantic primitive in each classification of described problem, calculates the topic weights of described problem in each classification;
And
For utilizing the answer field dictionary of setting up in advance, find out the weight of semantic primitive in each classification of described each answer, calculate respectively the topic weights of described each answer in each classification;
Similarity calculation module, for the topic weights of the described problem of utilizing described topic weights computing module to obtain and the topic weights of each answer, calculate respectively the Topic Similarity of each answer and described problem, according to the result of calculation of described Topic Similarity, recommend answer.
According to one preferred embodiment of the present invention, described problem domain dictionary is set up module by the problem dictionary in advance and is set up, and described problem dictionary is set up module and specifically comprised:
Problem is obtained submodule, and for obtaining the content of question and answer to the language material problem, participle obtains the semantic primitive of described problem;
The first weight calculation submodule, the weight for each semantic primitive of calculating respectively described problem in each classification;
The first integron module, for forming the problem domain dictionary by described each semantic primitive and in the weight of each classification.
According to one preferred embodiment of the present invention, described answer field dictionary is set up module by the answer dictionary in advance and is set up, and described answer dictionary is set up module and specifically comprised:
Submodule is obtained in answer, and for obtaining the content of question and answer to the language material answer, participle obtains the semantic primitive of described answer;
The second weight calculation submodule, the weight for each semantic primitive of calculating respectively described answer in each classification;
The second integron module, for forming answer field dictionary by described each semantic primitive and in the weight of each classification.
According to one preferred embodiment of the present invention, module set up by described problem dictionary or described answer dictionary is set up module, also comprises:
Word frequency is filtered submodule, for by word frequency, the semantic primitive lower than default word frequency threshold value filters out;
After filtering, remaining semantic primitive offers described the first weight calculation submodule or described the second weight calculation submodule.
According to one preferred embodiment of the present invention, described the first weight calculation submodule or the second weight calculation submodule calculate the weight of described semantic primitive in each classification according to following listed a kind of or combination in any:
The otherness of the word frequency of described semantic primitive between of all categories, described semantic primitive are in the word frequency of middle appearance of all categories or the contrary word frequency rate of described semantic primitive.
According to one preferred embodiment of the present invention, described the first weight calculation submodule or the second weight calculation submodule calculate the method for the weight of described semantic primitive in each classification and are:
Wherein, w (token
i, C
j) expression semantic primitive token
iAt classification C
jIn weight;
P
Ij=T
Ij/ L
j, L
jMean classification C
jIn the number of times summation of all semantic primitives of containing, T
IjMean semantic primitive token
iAt classification C
jThe number of times of middle appearance;
Be illustrated in semantic primitive token
iAt classification C
jThe word frequency of middle appearance, n is the word frequency factor of influence;
N means the number of times summation that in language material, all semantic primitives occur, N (token
i) expression semantic primitive token
iThe number of times occurred.
According to one preferred embodiment of the present invention, module set up by described problem dictionary or described answer dictionary is set up module, also comprises:
Weight is filtered submodule, for to each semantic primitive the weight between each classification carry out the similarity weight heavy filtration, for same semantic primitive, will be in same weight interval the occurrence number weight that is greater than predetermined threshold value filter out;
Only by semantic primitive, the weight in the residue classification offers described the first integron module or described the second integron module, in order to form problem domain dictionary or answer field dictionary.
According to one preferred embodiment of the present invention, according to described semantic primitive, the weight size in each classification is arranged in described weight interval.
According to one preferred embodiment of the present invention, module set up by described problem dictionary or described answer dictionary is set up module, also comprises:
Semantic primitive is filtered submodule, for the semantic primitive that individual character, repeat number word string or numeric string length is surpassed to the preset length threshold value, filters out;
After only filtering, remaining semantic primitive offers described the first integron module or described the second integron module, in order to form problem domain dictionary or answer field dictionary.
According to one preferred embodiment of the present invention, described similarity calculation module is calculated respectively described answer and the Topic Similarity of problem under each classification, and chooses the Topic Similarity maximal value that the calculates Topic Similarity as described answer and problem.
According to one preferred embodiment of the present invention, the method that described similarity calculation module is calculated the Topic Similarity of described answer and problem is:
sim(query,ans)=Max
j{weight(query,C
j)×weight(ans,C
j)}
Wherein, sim (query, ans) means the Topic Similarity of answer and problem, weight (query, C
j) problem of representation is at classification C
jIn topic weights, weight (ans, C
j) mean that answer is at classification C
jIn topic weights.
As can be seen from the above technical solutions, answer recommend method provided by the invention and device, utilize question and answer to language material difference Generating Problems field dictionary and answer field dictionary, thereby expand the field mapping statement that question and answer are right, effectively promoted the accuracy rate of question and answer to semantic similarity, solution problem and answer to the inconsistent situation of word of describing same subject under the inaccurate problem of coupling, improve recall rate.
[accompanying drawing explanation]
The answer recommend method process flow diagram that Fig. 1 provides for the embodiment of the present invention one;
The method for building up process flow diagram of the problem domain dictionary that Fig. 2 provides for the embodiment of the present invention one;
The method for building up process flow diagram of the answer field dictionary that Fig. 3 provides for the embodiment of the present invention one;
The answer recommendation apparatus schematic diagram that Fig. 4 provides for the embodiment of the present invention two;
The problem dictionary that Fig. 5 provides for the embodiment of the present invention two is set up the schematic diagram of module;
The answer dictionary that Fig. 6 provides for the embodiment of the present invention two is set up the schematic diagram of module.
[embodiment]
In order to make the purpose, technical solutions and advantages of the present invention clearer, below in conjunction with the drawings and specific embodiments, describe the present invention.
In the question answering process of network interdynamic Ask-Answer Community, problem is with different and different because of dialogist's knowledge background to the statement meeting of same theme in answer, for example<compressed software, winrar >,<lantern slide, PPT >,<system software, win7 > etc., although such statement word difference has higher semantic similarity under specific domain background.
The present invention utilizes this characteristic, the word in different classes of for problem and answer respectively, set up problem domain dictionary and answer field dictionary, by the computing method of semantic similarity between minute field computational problem and answer, in order to the result of calculation according to similarity, carry out the answer recommendation.
Embodiment mono-,
Fig. 1 is the answer recommend method process flow diagram that the present embodiment provides, and as shown in Figure 1, the method comprises:
Step S10, obtain the content of text of the corresponding answer of problem and this problem, participle obtains the semantic primitive of described problem and the semantic primitive of described answer.
A problem may comprise the answer of a plurality of correspondences, and the content of text of problem and each answer is carried out to the processing such as participle filtration, the semantic primitive comprised in the problem that obtains obtaining and each answer.
The present invention can existing segmenting method carries out participle to the content of text of problem or answer, such as the N metagrammar, divides morphology, Forward Maximum Method method, reverse maximum matching method etc.The N metagrammar of take divides morphology as example, carries out the monobasic division and obtains each monobasic semantic primitive, as " text ", " data ", " form " etc.; Carry out the binary division and obtain each Two-tuple Linguistic Information Processing unit, as " text box ", " packet ", " new form " etc.; Carry out the ternary division and obtain each ternary semantic primitive, " multiline text frame ", " data package capture ", " new form is downloaded " etc.; The rest may be inferred, carries out the participle of N unit semantic primitive.N unit semantic primitive is N the lexical item that context is adjacent in problem or answer, and N the lexical item occurred continuously is middle without separators such as word, punctuate or spaces.
Problem or answer may comprise the content in a plurality of territories.Such as, a problem comprises title, text and three territories that remark additionally, and extracts respectively the content of text in these three territories, it is carried out to participle and obtain corresponding semantic primitive.Problem or answer are acquired respectively to corresponding N unit semantic primitive according to title, text and supplemental content.
Give an example, the problem that the user proposes is:
" seek advice the computer talent
The thing that my computer is downloaded after restarting has not before just had but what I do not nullify again? "
This problem comprise that title " is sought advice the computer talent " and body matter " my computer restart after before the thing downloaded just do not had but what I do not nullify again? "Take this title as example, and its word segmentation result comprises: the monobasic semantic primitive " is sought advice ", " computer ", " master-hand ", and computer " is sought advice " in the Two-tuple Linguistic Information Processing unit, " computer talent " and ternary semantic primitive " are sought advice the computer talent ".
Step S20, utilization be the problem domain dictionary of foundation in advance, finds out the weight of semantic primitive in each classification of described problem, calculates the topic weights of described problem in each classification; And, utilize the answer field dictionary of setting up in advance, find out the weight of semantic primitive in each classification of described each answer, calculate respectively the topic weights of described each answer in each classification.
Described problem domain dictionary or answer field dictionary comprise semantic primitive and the weight of each semantic primitive in each classification.Described classification is several default domain classifications, can adopt the encyclopaedia classification, for example, and the classification such as computing machine, medicine, education, map, song, film.
About the concrete process of establishing of utilizing existing question and answer to set up in advance problem domain dictionary and answer field dictionary to corpus, will in follow-up length, describe in detail.
Utilize the problem domain dictionary, find out the weight of each semantic primitive in each classification of described problem, the weight of all semantic primitives that problem is comprised is sued for peace according to each classification, obtains the topic weights of problem in each classification.For example, utilize semantic primitive " computer " to be searched in the problem domain dictionary, obtaining this semantic primitive " computer " is 15 in computer other weight, in educational other weight, is 30, and the weight in the medicine classification is 10.Find out successively the weight of each semantic primitive of problem in each classification obtained in step S10.
According to different classes of, each semantic primitive is weighted to summation in the weight under respective classes, obtain the topic weights of problem under each classification.If the weight of semantic primitive under certain classification can not find, the weight of this semantic primitive under this classification is zero.For example, the semantic primitive that problem obtains through participle, only " computer " and " master-hand " is in medicine classification right of possession weight, the topic weights in the medicine classification using semantic primitive " computer " and " master-hand's " weight addition as problem.
In like manner, utilize answer field dictionary, find out the weight of semantic primitive in each classification of described each answer, and the weight of all semantic primitives that answer is comprised sued for peace according to each classification, obtained the topic weights of answer in each classification.
Step S30, the topic weights of utilizing the described problem obtain and the topic weights of each answer, calculate respectively the Topic Similarity of each answer and described problem, according to the result of calculation recommendation answer of described Topic Similarity.
Utilize topic weights and the answer topic weights in each classification of problem in each classification calculated in step S20, calculate the Topic Similarity of answer and problem.
The computing method of the Topic Similarity of answer and problem can be, but not limited to adopt the mode of the topic weights product of the topic weights of problem and answer to be calculated.Particularly, first calculate respectively described answer and the problem Topic Similarity under each classification, then choose the Topic Similarity maximal value that the calculates Topic Similarity as described answer and problem, that is:
sim(query,ans)=weight(query,C
j)×weight(ans,C
j)
Wherein, sim (query, ans) means the Topic Similarity of answer and problem, weight (query, C
j) problem of representation is at classification C
jIn topic weights, weight (ans, C
j) mean that answer is at classification C
jIn topic weights.
In the problem calculated or answer, after the topic weights in each classification, only the topic weights in front 5 classifications of On The Choice and answer is carried out similarity calculating.
If the topic weights that problem is the highest is 0, showing can not have theme clearly to judge to this problem, can not calculate the Topic Similarity of question and answer centering question and answer, now, adopts existing semantic relevancy to weigh the degree of correlation that question and answer are right.
If the highest topic weights of answer is 0, showing can not have theme clearly to judge to this answer, can not calculate the Topic Similarity of this answer and problem, now in like manner, adopts existing semantic relevancy to weigh the degree of correlation that question and answer are right.
The multiplied by weight in corresponding classification by problem and answer, as such other degree of subject relativity, and choose the degree of subject relativity of the maximal value of product as answer and problem.
By above-mentioned computing method, can calculate the degree of subject relativity that question and answer are right.As shown in table 1 below:
Table 1
Degree of subject relativity according to question and answer to problem and answer, can identify preferably the question and answer pair with same subject, and the judgement that can produce the Topic Similarity of comparison high weight, thereby, judge that for the content related fields from text the question and answer quality provides effective means, can recommend answer more accurately.
Below in conjunction with Fig. 2 and Fig. 3, the problem domain dictionary of foundation in advance and the method for building up of answer field dictionary are described.
Fig. 2 is the method for building up process flow diagram of the problem domain dictionary that provides of the present embodiment, and as shown in Figure 2, the method specifically comprises:
Step S401, obtain the content of question and answer to problem in language material, participle obtains the semantic primitive of described problem.
Obtain whole question and answer to expecting the content of text of problem in storehouse, carry out participle, and the lexical item that participle is obtained removed the filtration treatment such as stop words, punctuate, obtained the semantic primitive of problem.Concrete processing procedure and step S10 are similar, in this, repeat no more.
Step S402, the semantic primitive by word frequency lower than default word frequency threshold value filter out.
In order to raise the efficiency, first semantic primitive to be filtered based on word frequency, the semantic primitive by word frequency lower than default word frequency threshold value filters out.Such as, remove word frequency lower than the semantic primitive of 5 times.
Certainly, this step is not steps necessary, when less demanding to treatment effeciency, can not carry out.
Step S403, calculate respectively the weight of each semantic primitive in each classification of described problem.
The weight of described semantic primitive in each classification calculated according to following listed a kind of or combination in any:
The otherness of the word frequency of semantic primitive between of all categories, semantic primitive are in the word frequency of middle appearance of all categories or the contrary word frequency rate of semantic primitive.
Word frequency with semantic primitive is integrated as example in otherness, semantic primitive between of all categories in the word frequency of middle appearance of all categories and the contrary word frequency rate three of semantic primitive, the weighing computation method of semantic primitive in each classification can be, but not limited to adopt: the otherness of the word frequency of semantic primitive between of all categories, semantic primitive are calculated at the contrary word frequency rate three's of the word frequency of middle appearance of all categories and semantic primitive product, that is:
Wherein, w (token
i, C
j) expression semantic primitive token
iAt classification C
jIn weight.
P
Ij=T
Ij/ L
j, L
jMean classification C
jIn the number of times summation of all semantic primitives of containing, T
IjMean semantic primitive token
iAt classification C
jThe number of times of middle appearance.
Be illustrated in semantic primitive token
iAt classification C
jThe word frequency of middle appearance, n is the word frequency factor of influence.Word frequency factor of influence n can be set according to actual conditions, regulates the degree of influence of word frequency, as chooses n=5.
N means the number of times summation that in language material, all semantic primitives occur, N (token
i) expression semantic primitive token
iThe number of times occurred, log (N/N (token
i)) expression semantic primitive token
iContrary word frequency rate.Should contrary word frequency rate also can directly adopt the rate of falling the document in the natural language processing language material.
Step S404, the weight to each semantic primitive between each classification are carried out the similarity weight heavy filtration.
For the significance level between each classification makes a distinction by semantic primitive, after the weight in the computing semantic unit in each classification, need to filter out those and weight repeatedly occur in same weight interval.That is, for same semantic primitive, will be in same weight interval the occurrence number weight that is greater than predetermined threshold value filter out.
Described weight interval (as [and 0,10) interval) weight size according to described semantic primitive in each classification arranged.Particularly, can be, but not limited to adopt following methods:
Utilize the difference of the maximal value of semantic primitive to be calculated weight in all categories and minimum value divided by power interval numbers, determine each weight interval of described semantic primitive to be calculated.
For example, can determine the weight interval by a kind of didactic rule, if the highest weighting of a semantic primitive in each class must be divided into Score
max, minimum weight must be divided into Score
min, burst length can be defined as (Score
max-Score
min)/L, wherein, L is default power interval numbers, gets L=6 in this enforcement.Similar weight number Threshold is M/2, and wherein M represents that this semantic primitive divides in the recuperation of how many classification right of possessions.
For example, as the situation of the weight distribution of semantic primitive " stock " in each classification is: 1: 1.65,2: 2.32,3: 58.62,4: 3.12,5: 3.62,7: 14.82,8: 24.31,11: 14.85.At first certain range length is (58.62-0)/6=10, the weight interval can be divided into [0,10), [10,20) ..., " stock " divides in the recuperation of 8 classification right of possessions altogether, similar weight number threshold value is 4, in classification 1,2,4,5 weight of " stock " word all belong to the weight interval [0,10), therefore the weight of these four classifications is filtered, finally stay 3: 58.62,7: 14.82,8: 24.31, the weight of 11: 14.85 these four classifications.
It is worth mentioning that, to treatment effeciency and accuracy requirement, when not high, also can not carry out this step.
Step S405, the semantic primitive by individual character, repeat number word string or numeric string length over the preset length threshold value filter out.
After weight in the computing semantic unit in each classification, also semantic primitive is carried out to filtration treatment, comprising:
By the semantic primitive of individual character, Chinese character or word filter that length is 1 are fallen.
The semantic primitive that the numerical character string length is surpassed to the preset length threshold value filters out, and being greater than 10 digit strings such as, length is insignificant, is filtered.
The semantic primitive of repeat number word string is filtered out.Such as, it is insignificant that the digit strings (numeric string that is greater than 4 as 00001 repeat length such as grade) of larger multiplicity is arranged, and is filtered.
It is worth mentioning that, the filtration treatment of this step also can be processed before the weight in each classification in the computing semantic unit, specifically can be before step S402 or afterwards.
Step S406, described each semantic primitive and the weight in each classification thereof are formed to the problem domain dictionary.
That is, at least comprise semantic primitive and the weight of each semantic primitive in each classification in described problem domain dictionary.
In like manner, the method for building up process flow diagram of the answer field dictionary that Fig. 3 provides for the present embodiment as shown in Figure 3, specifically comprises:
Step S501, obtain the content of question and answer to answer in language material, participle obtains the semantic primitive of described answer.
Step S502, the semantic primitive by word frequency lower than default word frequency threshold value filter out.
Step S503, calculate respectively the weight of each semantic primitive in each classification of described answer.
Step S504, the weight to each semantic primitive between each classification are carried out the similarity weight heavy filtration, for same semantic primitive, will be in same weight interval the occurrence number weight that is greater than predetermined threshold value filter out.
Step S505, the semantic primitive by individual character, repeat number word string or numeric string length over the preset length threshold value filter out.
Step S506, described each semantic primitive and the weight in each classification thereof are formed to answer field dictionary.
Above-mentioned steps S501 is similar to step S406 to disposal route and the step S401 of step S506, in this, repeats no more.
By above-mentioned method for building up, form problem domain dictionary and the answer field dictionary of each classification.As shown in following table 2 and table 3.
Table 2
Problem domain Two-tuple Linguistic Information Processing unit | Weight | Two-tuple Linguistic Information Processing unit, answer field | Weight |
Text box | 45.226 | Control end | 51.5122 |
Share online | 45.2149 | Mitnick | 51.3074 |
Default gateway | 45.1803 | Stop message | 50.968 |
Packet | 45.1551 | Click cancellation | 50.8755 |
In java | 45.1044 | Partition table | 50.8634 |
The Excel form | 45.0597 | Robot dog | 50.7862 |
Enter DOS | 45.004 | The ash pigeon | 50.533 |
What table 2 represented is the distribution of Two-tuple Linguistic Information Processing unit in problem domain and answer field in the computing machine classification.As can be seen from Table 2, problem domain is mainly practical function or the Two-tuple Linguistic Information Processing unit that is effective, and in the answer field, is mainly to perform an action or the Two-tuple Linguistic Information Processing unit of application technology.
Table 3
Problem domain Two-tuple Linguistic Information Processing unit | Weight | Two-tuple Linguistic Information Processing unit, answer field | Weight |
Normal value | 45.4417 | Hbv antibody | 46.8926 |
Each menstruation | 45.4238 | Superficial suggestion | 46.6657 |
Ovarian cyst | 45.4168 | The liver function check | 46.468 |
Pleurisy | 45.3994 | Vaccine is strengthened | 46.3076 |
Hepatitis B core | 45.3889 | Fish contain | 46.2249 |
What table 3 represented is the distribution of Two-tuple Linguistic Information Processing unit in problem domain and answer field in medical classification.As can be seen from Table 3, problem domain is mainly the Two-tuple Linguistic Information Processing unit for some inquiries of illness, and, in the answer field, is mainly some cures and suggestive Two-tuple Linguistic Information Processing unit.
The mode that the present invention calculates respectively problem and answer utilization, can capture the common semantic primitive of Q&A for this field better.Simultaneously, can fully take into account N unit semantic primitive also more unbalanced situation of distribution situation in each classification, the reasonable the set goal that reached.
Be more than the detailed description that method provided by the present invention is carried out, below answer recommendation apparatus provided by the invention be described in detail.
Embodiment bis-
Fig. 4 is the answer recommendation apparatus schematic diagram that the present embodiment provides.As shown in Figure 4, this device comprises:
Text acquisition module 10, for obtaining the content of text of problem and the corresponding answer of this problem, participle obtains the semantic primitive of described problem and the semantic primitive of described answer.
A problem may comprise the answer of a plurality of correspondences, and the content of text of problem and each answer is carried out to the processing such as participle filtration, the semantic primitive comprised in the problem that obtains obtaining and each answer.
Text acquisition module 10 can adopt existing segmenting method to carry out participle to the content of text of problem or answer, such as the N metagrammar, divides morphology, Forward Maximum Method method, reverse maximum matching method etc.The N metagrammar of take divides morphology as example, carries out the monobasic division and obtains each monobasic semantic primitive, as " text ", " data ", " form " etc.; Carry out the binary division and obtain each Two-tuple Linguistic Information Processing unit, as " text box ", " packet ", " new form " etc.; Carry out the ternary division and obtain each ternary semantic primitive, " multiline text frame ", " data package capture ", " new form is downloaded " etc.; The rest may be inferred, carries out the participle of N unit semantic primitive.N unit semantic primitive is N the lexical item that context is adjacent in problem or answer, and N the lexical item occurred continuously is middle without separators such as word, punctuate or spaces.
Problem or answer may comprise the content in a plurality of territories.Such as, a problem comprises title, text and three territories that remark additionally, and extracts respectively the content of text in these three territories, it is carried out to participle and obtain corresponding semantic primitive.Problem or answer are acquired respectively to corresponding N unit semantic primitive according to title, text and supplemental content.
Topic weights computing module 20, for utilizing the problem domain dictionary of setting up in advance, find out the weight of semantic primitive in each classification of the problem that text acquisition module 10 obtains, and calculates the topic weights of described problem in each classification.
And, for utilizing the answer field dictionary of setting up in advance, find out the weight of semantic primitive in each classification of each answer that text acquisition module 10 obtains, calculate respectively the topic weights of described each answer in each classification.
Described problem domain dictionary or answer field dictionary comprise semantic primitive and the weight of each semantic primitive in each classification.Described classification is several default domain classifications, can adopt the encyclopaedia classification, for example, and the classification such as computing machine, medicine, education, map, song, film.
About the apparatus for establishing that utilizes existing question and answer to set up in advance problem domain dictionary and answer field dictionary to corpus, will in follow-up length, describe in detail.
Utilize the problem domain dictionary, find out the weight of each semantic primitive in each classification of problem, the weight of all semantic primitives that problem is comprised is sued for peace according to each classification, obtains the topic weights of problem in each classification.For example, utilize semantic primitive " computer " to be searched in the problem domain dictionary, obtaining this semantic primitive " computer " is 15 in computer other weight, in educational other weight, is 30, and the weight in the medicine classification is 10.Find out successively the weight of each semantic primitive of problem in each classification obtained in text acquisition module 10.
According to different classes of, each semantic primitive is weighted to summation in the weight under respective classes, obtain the topic weights of problem under each classification.If the weight of semantic primitive under certain classification can not find, the weight of this semantic primitive under this classification is zero.For example, the semantic primitive that problem obtains through participle, only " computer " and " master-hand " is in medicine classification right of possession weight, the topic weights in the medicine classification using semantic primitive " computer " and " master-hand's " weight addition as problem.
In like manner, utilize answer field dictionary, find out the weight of semantic primitive in each classification of each answer, and the weight of all semantic primitives that answer is comprised sued for peace according to each classification, obtained the topic weights of answer in each classification.
Similarity calculation module 30, for the topic weights of the described problem of utilizing topic weights computing module 20 to obtain and the topic weights of each answer, calculate respectively the Topic Similarity of each answer and described problem, according to the result of calculation recommendation answer of Topic Similarity.
Utilize topic weights and the answer topic weights in each classification of problem in each classification calculated in topic weights computing module 20, calculate the Topic Similarity of answer and problem.
The computing method of the Topic Similarity of answer and problem can be, but not limited to adopt the mode of the topic weights product of the topic weights of problem and answer to be calculated.Particularly, first calculate respectively described answer and the problem Topic Similarity under each classification, then choose the Topic Similarity maximal value that the calculates Topic Similarity as described answer and problem, that is:
sim(query,ans)=weight(query,C
j)×weight(ans,C
j)
Wherein, sim (query, ans) means the Topic Similarity of answer and problem, weight (query, C
j) problem of representation is at classification C
jIn topic weights, weight (ans, C
j) mean that answer is at classification C
jIn topic weights.
Similarity calculation module 30 can only be chosen problem that topic weights computing module 20 calculates and the topic weights in front 5 classifications of answer is carried out similarity calculating.
If the topic weights that problem is the highest is 0, showing can not have theme clearly to judge to this problem, can not calculate the Topic Similarity of question and answer centering question and answer, now, adopts existing semantic relevancy to weigh the degree of correlation that question and answer are right.
If the highest topic weights of answer is 0, showing can not have theme clearly to judge to this answer, can not calculate the Topic Similarity of this answer and problem, now in like manner, adopts existing semantic relevancy to weigh the degree of correlation that question and answer are right.
The multiplied by weight in corresponding classification by problem and answer, as such other degree of subject relativity, and choose the degree of subject relativity of the maximal value of product as answer and problem.
Degree of subject relativity according to question and answer to problem and answer, can identify preferably the question and answer pair with same subject, and the judgement that can produce the Topic Similarity of comparison high weight, thereby, judge that for the content related fields from text the question and answer quality provides effective means, can recommend answer more accurately.
Below in conjunction with Fig. 5 and Fig. 6, the problem domain dictionary of foundation in advance and the apparatus for establishing of answer field dictionary are described.
Fig. 5 is the apparatus for establishing schematic diagram of the problem domain dictionary that provides of the present embodiment, as shown in Figure 5, specifically comprises:
Problem is obtained submodule 401, and for obtaining the content of question and answer to the language material problem, participle obtains the semantic primitive of described problem.
Obtain whole question and answer to expecting the content of text of problem in storehouse, carry out participle, and the lexical item that participle is obtained removed the filtration treatment such as stop words, punctuate, obtained the semantic primitive of problem.Concrete processing procedure and text acquisition module 10 are similar, in this, repeat no more.
Word frequency is filtered submodule 402, for by word frequency, the semantic primitive lower than default word frequency threshold value filters out.
In order to raise the efficiency, first semantic primitive to be filtered based on word frequency, the semantic primitive by word frequency lower than default word frequency threshold value filters out.Such as, remove word frequency lower than the semantic primitive of 5 times.
Certainly, this submodule is not necessary submodule, when less demanding to treatment effeciency, can not comprise.
The first weight calculation submodule 403, the weight for each semantic primitive of calculating respectively described problem in each classification.
The weight of described semantic primitive in each classification calculated according to following listed a kind of or combination in any:
The otherness of the word frequency of semantic primitive between of all categories, semantic primitive are in the word frequency of middle appearance of all categories or the contrary word frequency rate of semantic primitive.
Word frequency with semantic primitive is integrated as example in otherness, semantic primitive between of all categories in the word frequency of middle appearance of all categories and the contrary word frequency rate three of semantic primitive, the weighing computation method of semantic primitive in each classification can be, but not limited to adopt: the otherness of the word frequency of semantic primitive between of all categories, semantic primitive are calculated at the contrary word frequency rate three's of the word frequency of middle appearance of all categories and semantic primitive product, that is:
Wherein, w (token
i, C
j) expression semantic primitive token
iAt classification C
jIn weight.
P
Ij=T
Ij/ L
j, L
jMean classification C
jIn the number of times summation of all semantic primitives of containing, T
IjMean semantic primitive token
iAt classification C
jThe number of times of middle appearance.
Be illustrated in semantic primitive token
iAt classification C
jThe word frequency of middle appearance, n is the word frequency factor of influence.Word frequency factor of influence n can be set according to actual conditions, regulates the degree of influence of word frequency, as chooses n=5.
N means the number of times summation that in language material, all semantic primitives occur, N (token
i) expression semantic primitive token
iThe number of times occurred, log (N/N (token
i)) expression semantic primitive token
iContrary word frequency rate.Should contrary word frequency rate also can directly adopt the rate of falling the document in the natural language processing language material.
Weight is filtered submodule 404, for to each semantic primitive the weight between each classification carry out the similarity weight heavy filtration.
For the significance level between each classification makes a distinction by semantic primitive, after the weight in the computing semantic unit in each classification, need to filter out those and weight repeatedly occur in same weight interval.That is, for same semantic primitive, will be in same weight interval the occurrence number weight that is greater than predetermined threshold value filter out.
Described weight interval (as [and 0,10) interval) weight size according to described semantic primitive in each classification arranged.Particularly, can be, but not limited to adopt following methods:
Utilize the difference of the maximal value of semantic primitive to be calculated weight in all categories and minimum value divided by power interval numbers, determine each weight interval of described semantic primitive to be calculated.
For example, can determine the weight interval by a kind of didactic rule, if the highest weighting of a semantic primitive in each class must be divided into Score
max, minimum weight must be divided into Score
min, burst length can be defined as (Score
max-Score
min)/L, wherein, L is default power interval numbers, gets L=6 in this enforcement.Similar weight number Threshold is M/2, and wherein M represents that this semantic primitive divides in the recuperation of how many classification right of possessions.
For example, as the situation of the weight distribution of semantic primitive " stock " in each classification is: 1: 1.65,2: 2.32,3: 58.62,4: 3.12,5: 3.62,7: 14.82,8: 24.31,11: 14.85.At first certain range length is (58.62-0)/6=10, the weight interval can be divided into [0,10), [10,20) ..., " stock " divides in the recuperation of 8 classification right of possessions altogether, similar weight number threshold value is 4, in classification 1,2,4,5 weight of " stock " word all belong to the weight interval [0,10), therefore the weight of these four classifications is filtered, finally stay 3: 58.62,7: 14.82,8: 24.31, the weight of 11: 14.85 these four classifications.
It is worth mentioning that, to treatment effeciency and accuracy requirement, when not high, also can not comprise this submodule.
Semantic primitive is filtered submodule 405, for the semantic primitive that individual character, repeat number word string or numeric string length is surpassed to the preset length threshold value, filters out.
Semantic primitive is filtered 405 pairs of semantic primitives of submodule and is carried out filtration treatment, comprising:
By the semantic primitive of individual character, Chinese character or word filter that length is 1 are fallen.
The semantic primitive that the numerical character string length is surpassed to the preset length threshold value filters out, and being greater than 10 digit strings such as, length is insignificant, is filtered.
The semantic primitive of repeat number word string is filtered out.Such as, it is insignificant that the digit strings (numeric string that is greater than 4 as 00001 repeat length such as grade) of larger multiplicity is arranged, and is filtered.
It is worth mentioning that, before this submodule also can be arranged on the first weight calculation submodule 403, specifically can be before word frequency be filtered submodule 402 or afterwards.
The first integron module 406, for forming the problem domain dictionary by described each semantic primitive and in the weight of each classification.That is, at least comprise semantic primitive and the weight of each semantic primitive in each classification in described problem domain dictionary.
In like manner, the apparatus for establishing schematic diagram of the answer field dictionary that Fig. 6 provides for the present embodiment as shown in Figure 6, specifically comprises:
Word frequency is filtered submodule 502, for by word frequency, the semantic primitive lower than default word frequency threshold value filters out.
The second weight calculation submodule 503, the weight for each semantic primitive of calculating respectively described answer in each classification.
Weight is filtered submodule 504, for to each semantic primitive the weight between each classification carry out the similarity weight heavy filtration, for same semantic primitive, will be in same weight interval the occurrence number weight that is greater than predetermined threshold value filter out.
Semantic primitive is filtered submodule 505, for the semantic primitive that individual character, repeat number word string or numeric string length is surpassed to the preset length threshold value, filters out.
The second integron module 506, for forming answer field dictionary by described each semantic primitive and in the weight of each classification.
The setting of above-mentioned submodule 501 to 506 is similar to submodule 401 to 406, in this, repeats no more.
By above-mentioned apparatus for establishing, form problem domain dictionary and the answer field dictionary of each classification.As shown in following table 1 and table 2.
Answer recommend method provided by the invention and device, utilize question and answer language material to be set up respectively to problem domain dictionary and the answer field dictionary that comprises each classification, thereby expand the field mapping statement that question and answer are right, effectively promoted the accuracy rate of question and answer to semantic similarity, solution problem and answer to the inconsistent situation of word of describing same subject under the inaccurate problem of coupling, improve recall rate.The present invention can be used for the answer of diverse network interacting Question-Answer community and recommends, divides the aspects such as domain correlation degree commending contents and Search Results recommendation.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of making, be equal to replacement, improvement etc., within all should being included in the scope of protection of the invention.
Claims (22)
1. an answer recommend method, is characterized in that, comprising:
S1, obtain the content of text of the corresponding answer of problem and this problem, participle obtains the semantic primitive of described problem and the semantic primitive of described answer;
S2, utilization be the problem domain dictionary of foundation in advance, finds out the weight of semantic primitive in each classification of described problem, calculates the topic weights of described problem in each classification;
And
Utilize the answer field dictionary of setting up in advance, find out the weight of semantic primitive in each classification of described each answer, calculate respectively the topic weights of described each answer in each classification;
S3, the topic weights of utilizing the described problem obtain and the topic weights of each answer, calculate respectively the Topic Similarity of each answer and described problem, according to the result of calculation recommendation answer of described Topic Similarity.
2. method according to claim 1, is characterized in that, the method for building up of described problem domain dictionary specifically comprises:
Obtain the content of question and answer to problem in language material, participle obtains the semantic primitive of described problem;
Calculate respectively the weight of each semantic primitive in each classification of described problem;
Described each semantic primitive and the weight in each classification thereof are formed to the problem domain dictionary.
3. method according to claim 1, is characterized in that, the method for building up of described answer field dictionary specifically comprises:
Obtain the content of question and answer to answer in language material, participle obtains the semantic primitive of described answer;
Calculate respectively the weight of each semantic primitive in each classification of described answer;
Described each semantic primitive and the weight in each classification thereof are formed to answer field dictionary.
4. according to the method in claim 2 or 3, it is characterized in that, after the semantic primitive of the described semantic primitive that obtains described problem or answer, also comprise:
Semantic primitive by word frequency lower than default word frequency threshold value filters out;
Only, to filtering rear remaining semantic primitive, calculate respectively the weight in each classification.
5. according to the method in claim 2 or 3, it is characterized in that, the weight of described semantic primitive in each classification calculated according to following listed a kind of or combination in any:
The otherness of the word frequency of described semantic primitive between of all categories, described semantic primitive are in the word frequency of middle appearance of all categories or the contrary word frequency rate of described semantic primitive.
6. method according to claim 5, is characterized in that, the weighing computation method of described semantic primitive in each classification is:
Wherein, w (token
i, C
j) expression semantic primitive token
iAt classification C
jIn weight;
P
Ij=T
Ij/ L
j, L
jMean classification C
jIn the number of times summation of all semantic primitives of containing, T
IjMean semantic primitive token
iAt classification C
jThe number of times of middle appearance;
Be illustrated in semantic primitive token
iAt classification C
jThe word frequency of middle appearance, n is the word frequency factor of influence;
N means the number of times summation that in language material, all semantic primitives occur, N (token
i) expression semantic primitive token
iThe number of times occurred.
7. according to the method in claim 2 or 3, it is characterized in that, described each semantic primitive and the weight in each classification thereof are formed to problem domain dictionary or answer field dictionary before, also comprise:
Weight to each semantic primitive between each classification is carried out the similarity weight heavy filtration, for same semantic primitive, will be in same weight interval the occurrence number weight that is greater than predetermined threshold value filter out;
Only by semantic primitive in residue the weight in classification in order to form problem domain dictionary or answer field dictionary.
8. method according to claim 7, is characterized in that, according to described semantic primitive, the weight size in each classification is arranged in described weight interval.
9. according to the method in claim 2 or 3, it is characterized in that, described each semantic primitive and the weight in each classification thereof are formed to the problem domain dictionary before, also comprise:
The semantic primitive that individual character, repeat number word string or numeric string length is surpassed to the preset length threshold value filters out;
After only filtering, remaining semantic primitive is in order to form problem domain dictionary or answer field dictionary.
10. method according to claim 1, is characterized in that, the computing method of the Topic Similarity of described answer and problem comprise:
Calculate respectively described answer and the problem Topic Similarity under each classification;
Choose the Topic Similarity maximal value that the calculates Topic Similarity as described answer and problem.
11. method according to claim 10, is characterized in that, the computing method of the Topic Similarity of described answer and problem are:
sim(query,ans)=Max
j{weight(query,C
j)×weight(ans,C
j)}
Wherein, sim (query, ans) means the Topic Similarity of answer and problem, weight (query, C
j) problem of representation is at classification C
jIn topic weights, weight (ans, C
j) mean that answer is at classification C
jIn topic weights.
12. an answer recommendation apparatus, is characterized in that, comprising:
The text acquisition module, for obtaining the content of text of problem and the corresponding answer of this problem, participle obtains the semantic primitive of described problem and the semantic primitive of described answer;
The topic weights computing module, for utilizing the problem domain dictionary of setting up in advance, find out the weight of semantic primitive in each classification of described problem, calculates the topic weights of described problem in each classification;
And
For utilizing the answer field dictionary of setting up in advance, find out the weight of semantic primitive in each classification of described each answer, calculate respectively the topic weights of described each answer in each classification;
Similarity calculation module, for the topic weights of the described problem of utilizing described topic weights computing module to obtain and the topic weights of each answer, calculate respectively the Topic Similarity of each answer and described problem, according to the result of calculation of described Topic Similarity, recommend answer.
13. device according to claim 12, is characterized in that, described problem domain dictionary is set up module by the problem dictionary in advance and is set up, and described problem dictionary is set up module and specifically comprised:
Problem is obtained submodule, and for obtaining the content of question and answer to the language material problem, participle obtains the semantic primitive of described problem;
The first weight calculation submodule, the weight for each semantic primitive of calculating respectively described problem in each classification;
The first integron module, for forming the problem domain dictionary by described each semantic primitive and in the weight of each classification.
14. device according to claim 12, is characterized in that, described answer field dictionary is set up module by the answer dictionary in advance and is set up, and described answer dictionary is set up module and specifically comprised:
Submodule is obtained in answer, and for obtaining the content of question and answer to the language material answer, participle obtains the semantic primitive of described answer;
The second weight calculation submodule, the weight for each semantic primitive of calculating respectively described answer in each classification;
The second integron module, for forming answer field dictionary by described each semantic primitive and in the weight of each classification.
15. according to the described device of claim 13 or 14, it is characterized in that, module set up by described problem dictionary or described answer dictionary is set up module, also comprises:
Word frequency is filtered submodule, for by word frequency, the semantic primitive lower than default word frequency threshold value filters out;
After filtering, remaining semantic primitive offers described the first weight calculation submodule or described the second weight calculation submodule.
16. according to the described device of claim 13 or 14, it is characterized in that, described the first weight calculation submodule or the second weight calculation submodule calculate the weight of described semantic primitive in each classification according to following listed a kind of or combination in any:
The otherness of the word frequency of described semantic primitive between of all categories, described semantic primitive are in the word frequency of middle appearance of all categories or the contrary word frequency rate of described semantic primitive.
17. device according to claim 16, is characterized in that, the method that described the first weight calculation submodule or the second weight calculation submodule calculate the weight of described semantic primitive in each classification is:
Wherein, w (token
i, C
j) expression semantic primitive token
iAt classification C
jIn weight;
P
Ij=T
Ij/ L
j, L
jMean classification C
jIn the number of times summation of all semantic primitives of containing, T
IjMean semantic primitive token
iAt classification C
jThe number of times of middle appearance;
Be illustrated in semantic primitive token
iAt classification C
jThe word frequency of middle appearance, n is the word frequency factor of influence;
N means the number of times summation that in language material, all semantic primitives occur, N (token
i) expression semantic primitive token
iThe number of times occurred.
18. according to the described device of claim 13 or 14, it is characterized in that, module set up by described problem dictionary or described answer dictionary is set up module, also comprises:
Weight is filtered submodule, for to each semantic primitive the weight between each classification carry out the similarity weight heavy filtration, for same semantic primitive, will be in same weight interval the occurrence number weight that is greater than predetermined threshold value filter out;
Only by semantic primitive, the weight in the residue classification offers described the first integron module or described the second integron module, in order to form problem domain dictionary or answer field dictionary.
19. device according to claim 18, is characterized in that, according to described semantic primitive, the weight size in each classification is arranged in described weight interval.
20. according to the described device of claim 13 or 14, it is characterized in that, module set up by described problem dictionary or described answer dictionary is set up module, also comprises:
Semantic primitive is filtered submodule, for the semantic primitive that individual character, repeat number word string or numeric string length is surpassed to the preset length threshold value, filters out;
After only filtering, remaining semantic primitive offers described the first integron module or described the second integron module, in order to form problem domain dictionary or answer field dictionary.
21. device according to claim 12, it is characterized in that, described similarity calculation module is calculated respectively described answer and the Topic Similarity of problem under each classification, and chooses the Topic Similarity maximal value that the calculates Topic Similarity as described answer and problem.
22. device according to claim 21, is characterized in that, the method that described similarity calculation module is calculated the Topic Similarity of described answer and problem is:
sim(query,ans)=Max
j{weight(query,C
j)×weight(ans,C
j)}
Wherein, sim (query, ans) means the Topic Similarity of answer and problem, weight (query, C
j) problem of representation is at classification C
jIn topic weights, weight (ans, C
j) mean that answer is at classification C
jIn topic weights.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210151044.5A CN103425635B (en) | 2012-05-15 | 2012-05-15 | Method and apparatus are recommended in a kind of answer |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210151044.5A CN103425635B (en) | 2012-05-15 | 2012-05-15 | Method and apparatus are recommended in a kind of answer |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103425635A true CN103425635A (en) | 2013-12-04 |
CN103425635B CN103425635B (en) | 2018-02-02 |
Family
ID=49650400
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210151044.5A Active CN103425635B (en) | 2012-05-15 | 2012-05-15 | Method and apparatus are recommended in a kind of answer |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103425635B (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103714488A (en) * | 2014-01-03 | 2014-04-09 | 无锡清华信息科学与技术国家实验室物联网技术中心 | Method for optimizing question answering platform in social network |
CN104298735A (en) * | 2014-09-30 | 2015-01-21 | 北京金山安全软件有限公司 | Method and device for identifying application program type |
CN105005564A (en) * | 2014-04-17 | 2015-10-28 | 北京搜狗科技发展有限公司 | Data processing method and apparatus based on question-and-answer platform |
CN105653840A (en) * | 2015-12-21 | 2016-06-08 | 青岛中科慧康科技有限公司 | Similar case recommendation system based on word and phrase distributed representation, and corresponding method |
CN105740310A (en) * | 2015-12-21 | 2016-07-06 | 哈尔滨工业大学 | Automatic answer summarizing method and system for question answering system |
CN105786874A (en) * | 2014-12-23 | 2016-07-20 | 北京奇虎科技有限公司 | Method and device for constructing question-answer knowledge base data items based on encyclopedic entries |
CN105786793A (en) * | 2015-12-23 | 2016-07-20 | 百度在线网络技术(北京)有限公司 | Method and device for analyzing semanteme of spoken language text information |
CN106294505A (en) * | 2015-06-10 | 2017-01-04 | 华中师范大学 | A kind of method and apparatus feeding back answer |
CN106610932A (en) * | 2015-10-27 | 2017-05-03 | 中兴通讯股份有限公司 | Corpus processing method and device and corpus analyzing method and device |
CN106844686A (en) * | 2017-01-26 | 2017-06-13 | 武汉奇米网络科技有限公司 | Intelligent customer service question and answer robot and its implementation based on SOLR |
CN106997375A (en) * | 2017-02-28 | 2017-08-01 | 浙江大学 | Recommendation method is replied in customer service based on deep learning |
CN106997342A (en) * | 2017-03-27 | 2017-08-01 | 上海奔影网络科技有限公司 | Intension recognizing method and device based on many wheel interactions |
CN107145573A (en) * | 2017-05-05 | 2017-09-08 | 上海携程国际旅行社有限公司 | The problem of artificial intelligence customer service robot, answers method and system |
CN107168967A (en) * | 2016-03-07 | 2017-09-15 | 阿里巴巴集团控股有限公司 | The acquisition methods and device of object knowledge point |
CN107329995A (en) * | 2017-06-08 | 2017-11-07 | 北京神州泰岳软件股份有限公司 | A kind of controlled answer generation method of semanteme, apparatus and system |
CN107844531A (en) * | 2017-10-17 | 2018-03-27 | 东软集团股份有限公司 | Answer output intent, device and computer equipment |
CN108345672A (en) * | 2018-02-09 | 2018-07-31 | 平安科技(深圳)有限公司 | Intelligent response method, electronic device and storage medium |
CN108446320A (en) * | 2018-02-09 | 2018-08-24 | 北京搜狗科技发展有限公司 | A kind of data processing method, device and the device for data processing |
CN109033318A (en) * | 2018-07-18 | 2018-12-18 | 北京市农林科学院 | Intelligent answer method and device |
CN109299478A (en) * | 2018-12-05 | 2019-02-01 | 长春理工大学 | Intelligent automatic question-answering method and system based on two-way shot and long term Memory Neural Networks |
CN110852094A (en) * | 2018-08-01 | 2020-02-28 | 北京京东尚科信息技术有限公司 | Method, apparatus and computer-readable storage medium for retrieving a target |
CN113342950A (en) * | 2021-06-04 | 2021-09-03 | 北京信息科技大学 | Answer selection method and system based on semantic union |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1489089A (en) * | 2002-08-19 | 2004-04-14 | 松下电器产业株式会社 | Document search system and question answer system |
CN1790332A (en) * | 2005-12-28 | 2006-06-21 | 刘文印 | Display method and system for reading and browsing problem answers |
CN1928864A (en) * | 2006-09-22 | 2007-03-14 | 浙江大学 | FAQ based Chinese natural language ask and answer method |
CN101174259A (en) * | 2007-09-17 | 2008-05-07 | 张琰亮 | Intelligent interactive request-answering system |
US20080126319A1 (en) * | 2006-08-25 | 2008-05-29 | Ohad Lisral Bukai | Automated short free-text scoring method and system |
CN101286161A (en) * | 2008-05-28 | 2008-10-15 | 华中科技大学 | Intelligent Chinese request-answering system based on concept |
US20090089876A1 (en) * | 2007-09-28 | 2009-04-02 | Jamie Lynn Finamore | Apparatus system and method for validating users based on fuzzy logic |
CN101520802A (en) * | 2009-04-13 | 2009-09-02 | 腾讯科技(深圳)有限公司 | Question-answer pair quality evaluation method and system |
-
2012
- 2012-05-15 CN CN201210151044.5A patent/CN103425635B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1489089A (en) * | 2002-08-19 | 2004-04-14 | 松下电器产业株式会社 | Document search system and question answer system |
CN1790332A (en) * | 2005-12-28 | 2006-06-21 | 刘文印 | Display method and system for reading and browsing problem answers |
US20080126319A1 (en) * | 2006-08-25 | 2008-05-29 | Ohad Lisral Bukai | Automated short free-text scoring method and system |
CN1928864A (en) * | 2006-09-22 | 2007-03-14 | 浙江大学 | FAQ based Chinese natural language ask and answer method |
CN101174259A (en) * | 2007-09-17 | 2008-05-07 | 张琰亮 | Intelligent interactive request-answering system |
US20090089876A1 (en) * | 2007-09-28 | 2009-04-02 | Jamie Lynn Finamore | Apparatus system and method for validating users based on fuzzy logic |
CN101286161A (en) * | 2008-05-28 | 2008-10-15 | 华中科技大学 | Intelligent Chinese request-answering system based on concept |
CN101520802A (en) * | 2009-04-13 | 2009-09-02 | 腾讯科技(深圳)有限公司 | Question-answer pair quality evaluation method and system |
Cited By (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103714488A (en) * | 2014-01-03 | 2014-04-09 | 无锡清华信息科学与技术国家实验室物联网技术中心 | Method for optimizing question answering platform in social network |
CN105005564B (en) * | 2014-04-17 | 2019-09-03 | 北京搜狗科技发展有限公司 | A kind of data processing method and device based on answer platform |
CN105005564A (en) * | 2014-04-17 | 2015-10-28 | 北京搜狗科技发展有限公司 | Data processing method and apparatus based on question-and-answer platform |
CN104298735B (en) * | 2014-09-30 | 2018-06-05 | 北京金山安全软件有限公司 | Method and device for identifying application program type |
CN104298735A (en) * | 2014-09-30 | 2015-01-21 | 北京金山安全软件有限公司 | Method and device for identifying application program type |
CN105786874A (en) * | 2014-12-23 | 2016-07-20 | 北京奇虎科技有限公司 | Method and device for constructing question-answer knowledge base data items based on encyclopedic entries |
CN106294505B (en) * | 2015-06-10 | 2020-07-07 | 华中师范大学 | Answer feedback method and device |
CN106294505A (en) * | 2015-06-10 | 2017-01-04 | 华中师范大学 | A kind of method and apparatus feeding back answer |
CN106610932A (en) * | 2015-10-27 | 2017-05-03 | 中兴通讯股份有限公司 | Corpus processing method and device and corpus analyzing method and device |
CN105740310B (en) * | 2015-12-21 | 2019-08-02 | 哈尔滨工业大学 | A kind of automatic answer method of abstracting and system in question answering system |
CN105740310A (en) * | 2015-12-21 | 2016-07-06 | 哈尔滨工业大学 | Automatic answer summarizing method and system for question answering system |
CN105653840A (en) * | 2015-12-21 | 2016-06-08 | 青岛中科慧康科技有限公司 | Similar case recommendation system based on word and phrase distributed representation, and corresponding method |
CN105786793A (en) * | 2015-12-23 | 2016-07-20 | 百度在线网络技术(北京)有限公司 | Method and device for analyzing semanteme of spoken language text information |
CN105786793B (en) * | 2015-12-23 | 2019-05-28 | 百度在线网络技术(北京)有限公司 | Parse the semantic method and apparatus of spoken language text information |
CN107168967B (en) * | 2016-03-07 | 2020-12-04 | 创新先进技术有限公司 | Target knowledge point acquisition method and device |
CN107168967A (en) * | 2016-03-07 | 2017-09-15 | 阿里巴巴集团控股有限公司 | The acquisition methods and device of object knowledge point |
CN106844686A (en) * | 2017-01-26 | 2017-06-13 | 武汉奇米网络科技有限公司 | Intelligent customer service question and answer robot and its implementation based on SOLR |
CN106997375B (en) * | 2017-02-28 | 2020-08-18 | 浙江大学 | Customer service reply recommendation method based on deep learning |
CN106997375A (en) * | 2017-02-28 | 2017-08-01 | 浙江大学 | Recommendation method is replied in customer service based on deep learning |
CN106997342A (en) * | 2017-03-27 | 2017-08-01 | 上海奔影网络科技有限公司 | Intension recognizing method and device based on many wheel interactions |
CN107145573A (en) * | 2017-05-05 | 2017-09-08 | 上海携程国际旅行社有限公司 | The problem of artificial intelligence customer service robot, answers method and system |
CN107329995A (en) * | 2017-06-08 | 2017-11-07 | 北京神州泰岳软件股份有限公司 | A kind of controlled answer generation method of semanteme, apparatus and system |
CN107844531A (en) * | 2017-10-17 | 2018-03-27 | 东软集团股份有限公司 | Answer output intent, device and computer equipment |
CN107844531B (en) * | 2017-10-17 | 2020-05-22 | 东软集团股份有限公司 | Answer output method and device and computer equipment |
CN108446320A (en) * | 2018-02-09 | 2018-08-24 | 北京搜狗科技发展有限公司 | A kind of data processing method, device and the device for data processing |
CN108345672A (en) * | 2018-02-09 | 2018-07-31 | 平安科技(深圳)有限公司 | Intelligent response method, electronic device and storage medium |
WO2019153607A1 (en) * | 2018-02-09 | 2019-08-15 | 平安科技(深圳)有限公司 | Intelligent response method, electronic device and storage medium |
CN109033318A (en) * | 2018-07-18 | 2018-12-18 | 北京市农林科学院 | Intelligent answer method and device |
CN109033318B (en) * | 2018-07-18 | 2020-11-27 | 北京市农林科学院 | Intelligent question and answer method and device |
CN110852094B (en) * | 2018-08-01 | 2023-11-03 | 北京京东尚科信息技术有限公司 | Method, apparatus and computer readable storage medium for searching target |
CN110852094A (en) * | 2018-08-01 | 2020-02-28 | 北京京东尚科信息技术有限公司 | Method, apparatus and computer-readable storage medium for retrieving a target |
CN109299478A (en) * | 2018-12-05 | 2019-02-01 | 长春理工大学 | Intelligent automatic question-answering method and system based on two-way shot and long term Memory Neural Networks |
CN113342950A (en) * | 2021-06-04 | 2021-09-03 | 北京信息科技大学 | Answer selection method and system based on semantic union |
CN113342950B (en) * | 2021-06-04 | 2023-04-21 | 北京信息科技大学 | Answer selection method and system based on semantic association |
Also Published As
Publication number | Publication date |
---|---|
CN103425635B (en) | 2018-02-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103425635A (en) | Method and device for recommending answers | |
Waitelonis et al. | Linked data enabled generalized vector space model to improve document retrieval | |
CN103778214B (en) | A kind of item property clustering method based on user comment | |
CN107122413A (en) | A kind of keyword extracting method and device based on graph model | |
CN105528437B (en) | A kind of question answering system construction method extracted based on structured text knowledge | |
Hartawan et al. | Using vector space model in question answering system | |
CN105843897A (en) | Vertical domain-oriented intelligent question and answer system | |
CN106294744A (en) | Interest recognition methods and system | |
CN106970910A (en) | A kind of keyword extracting method and device based on graph model | |
CN111143672B (en) | Knowledge graph-based professional speciality scholars recommendation method | |
CN103886034A (en) | Method and equipment for building indexes and matching inquiry input information of user | |
CN103885937A (en) | Method for judging repetition of enterprise Chinese names on basis of core word similarity | |
Wu et al. | Using relation selection to improve value propagation in a conceptnet-based sentiment dictionary | |
CN106126619A (en) | A kind of video retrieval method based on video content and system | |
CN109992674B (en) | Recommendation method fusing automatic encoder and knowledge graph semantic information | |
CN108681574A (en) | A kind of non-true class quiz answers selection method and system based on text snippet | |
CN110362678A (en) | A kind of method and apparatus automatically extracting Chinese text keyword | |
CN107193883B (en) | Data processing method and system | |
KR20060122276A (en) | Relation extraction from documents for the automatic construction of ontologies | |
CN110633464A (en) | Semantic recognition method, device, medium and electronic equipment | |
CN103646099A (en) | Thesis recommendation method based on multilayer drawing | |
CN108804595A (en) | A kind of short text representation method based on word2vec | |
CN109522396B (en) | Knowledge processing method and system for national defense science and technology field | |
CN105630890A (en) | Neologism discovery method and system based on intelligent question-answering system session history | |
Hasanati et al. | Implementation of support vector machine with lexicon based for sentimenT ANALYSIS ON TWITter |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |