CN105786794A

CN105786794A - Question-answer pair search method and community question-answer search system

Info

Publication number: CN105786794A
Application number: CN201610082304.6A
Authority: CN
Inventors: 王金龙; 董日壮
Original assignee: Qingdao University of Technology
Current assignee: Qingdao University of Technology
Priority date: 2016-02-05
Filing date: 2016-02-05
Publication date: 2016-07-20
Anticipated expiration: 2036-02-05
Also published as: CN105786794B

Abstract

The invention discloses a question-answer pair search method, which comprises the following steps of extracting at least one key word from a question sentence and obtaining an expansion word of each key word and final similarity of each expansion word and the corresponding key word; analyzing a dependence relationship between every two lexical items with the grammatical relation in the question sentence; determining a distance weight capable of reflecting compact degree between every two lexical items in the question sentence according to an importance weight which is preset for the dependence relationship; determining correlation between every two lexical items in the question sentence according to the distance weight; determining a lexical item weight of each lexical item in the question sentence according to the correlation and searching a question-answer pair related to the question sentence according to the lexical item weight of each lexical item in the question sentence; computing a lexical item weight of each expansion word according to the final similarity of each expansion word and the corresponding key word and searching question-answer pair related to the question sentence according to the lexical item weight of each expansion word; and sorting and displaying all searched question-answer pairs according to a preset rule. The invention also discloses a community question-answer search system.

Description

A kind of question and answer are to search method and community's question and answer searching system

Technical field

The present invention relates to technical field of information retrieval, particularly relate to a kind of question and answer to search method and community's question and answer searching system.

Background technology

In recent years, community's question answering system has been increasingly becoming a kind of very popular and internet, applications of practicality, with tradition question answering system the difference is that, in community's question answering system, user not only can put question to and answer any field, any kind of problem, but also the answer of other users can be made evaluation and ballot, the Similar Problems in the historical problem answer storehouse that the system that even can also directly search is accumulated, greatly enriches and meets the information requirement of user.

When user wants to utilize community's question answering system retrieval with oneself same or analogous problem of proposition problem and answer thereof, due to user's input is the question sentence adopting natural language description, the structure of its complexity and tediously long clause so that extracting important key word item from question sentence can be relatively difficult.Because cannot accurately obtain the kernel keyword in question sentence, cause that retrieval result is not accurate enough.

Summary of the invention

In view of this, the main purpose of the embodiment of the present invention is in that to provide a kind of question and answer to search method and community's question and answer searching system, to realize the purpose improving question and answer to the accuracy of retrieval result.

For achieving the above object, embodiments provide a kind of question and answer to search method, including:

From question sentence, extract at least one key word, and obtain the final similarity of the expansion word of each key word and each expansion word and corresponding key word；

Analyze the dependence between each two lexical item in described question sentence with grammar association；

According to being the importance degree weight that sets of described dependence in advance, it is determined that reflect in described question sentence the distance weighting of tightness degree between each two lexical item；

The degree of association between each two lexical item is determined in described question sentence according to described distance weighting；

Determine the lexical item weight of each lexical item in described question sentence according to the described degree of association, and the lexical item weight according to lexical item each in described question sentence retrieves the question and answer pair relevant to described question sentence；

Lexical item weight according to described expansion word with expansion word described in the final Similarity Measure of corresponding key word, and retrieve the question and answer pair relevant to described question sentence according to the lexical item weight of described expansion word；

The all question and answer retrieved are ranked up display to according to preset rules.

Optionally, the final similarity of the expansion word of each key word of described acquisition and each expansion word and corresponding key word, including:

Utilize and know that net HowNet obtains at least one expansion word of each key word respectively, and define each expansion word and be 1 with the initial similarity of corresponding key word；

Utilize Chinese thesaurus to obtain at least one expansion word of each key word respectively, and define each expansion word and be 1 with the initial similarity of corresponding key word；

Utilize trained after text depth representing model word2vec, obtain the initial similarity of at least one expansion word of each key word and each expansion word and corresponding key word respectively；

Merge the identical expansion word got, calculate the final similarity S of each expansion word after merging and corresponding key word respectively_R, wherein, S_R=S_sum/ 3, S_sumFor all initial similarity sum that described expansion word is corresponding.

Optionally, described basis is the importance degree weight that described dependence sets in advance, it is determined that reflect in described question sentence the distance weighting of tightness degree between each two lexical item, including:

Calculating the distance weighting D between the first lexical item and each second lexical item respectively, described first lexical item is any one lexical item in described question sentence, and described second lexical item is there is the lexical item of described dependence with described first lexical item；

Wherein,Y is the importance degree weight arranged for the dependence between described first lexical item and described second lexical item in advance, and α is reference value；

Calculate the distance weighting Dis between described first lexical item and each 3rd lexical item respectively, described 3rd lexical item is any one lexical item except described first word in described question sentence, and Dis is at least one distance weighting D sum that at least one dependence existed between described first lexical item with described 3rd lexical item is corresponding.

Optionally, the described degree of association determined in described question sentence according to described distance weighting between each two lexical item, including:

Calculate lexical item t in described question sentence according to the following equation_iWith lexical item t_jBetween degree of association w_rel(i,j):

w_rel(i,j)=λ Dep (t_i,t_j)+(1-λ)Close_pmi(t_i,t_j)；

Wherein,

{Dep}_{(t_{i}, t_{j})} = \frac{1}{b^{D i s (t_{i}, t_{j})}}, {Close}_{p m i} (t_{i}, t_{j}) = \log \frac{p (t_{i}, t_{j})}{p (t_{i}) p (t_{j})};

t_iRepresent the i-th lexical item in described question sentence, t_jRepresent the jth lexical item in described question sentence, i=1,2 ... n, j=1,2 ... n, n are the lexical item sum in described question sentence；

λ is regulatory factor；

B is a constant more than 1；

Dis(t_i,t_j) for lexical item t_iWith lexical item t_jBetween distance weighting；

Lexical item t is concentrated for question sentence_iWith lexical item t_jThe common probability occurred, N_d(t_i,t_j) concentrate lexical item t for question sentence_iWith lexical item t_jThe number of the question sentence simultaneously occurred, N_DFor the question sentence sum that question sentence is concentrated；

WithRepresent lexical item t respectively_iWith lexical item t_jEach comfortable question sentence concentrates the probability occurred, N_d(t_i) comprise lexical item t for question sentence concentration_iThe sum of question sentence, N_d(t_j) comprise lexical item t for question sentence concentration_jQuestion sentence sum, N_DFor the question sentence sum that question sentence is concentrated.

Optionally, the described lexical item weight determining each lexical item in described question sentence according to the described degree of association, including:

Calculate the weight matrix of the final weight composition of each lexical item in described question sentence according to the following equation

W_{q}^{*} = (1 - α) {(1 - α E)}^{- 1} W_{q}^{0};

Wherein, α is given constant；

E is the random matrix after incidence matrix M carries out orthogonal transformation, and described incidence matrix M is the symmetrical matrix of the degree of association formation in described question sentence between each two lexical item；

Weight matrix for the original weight composition of lexical item each in described question sentence.

Optionally, the described lexical item weight according to described expansion word with expansion word described in the final Similarity Measure of corresponding key word, including:

Obtain the original weight of key word corresponding to described expansion word；

By the product of described original weight and described expansion word with the final similarity of corresponding key word, as the lexical item weight of described expansion word.

The embodiment of the present invention additionally provides a kind of community question and answer searching system, including:

Keyword extracting unit, for extracting at least one key word from question sentence；

Keyword expansion unit, is used for the final similarity obtaining the expansion word of each key word that described keyword extracting unit obtains and each expansion word with corresponding key word；

Relation analysis unit, for analyzing the dependence between each two lexical item in described question sentence with grammar association；

Weights determine unit, for the importance degree weight according to the dependence setting obtained for described relation analysis element analysis in advance, it is determined that reflect in described question sentence the distance weighting of tightness degree between each two lexical item；

The degree of association determines unit, for determining that the distance weighting that unit is determined determines the degree of association in described question sentence between each two lexical item according to described weights；

According to the described degree of association, first weight determining unit, for determining that the degree of association that unit is determined determines the lexical item weight of each lexical item in described question sentence；

First retrieval unit, for the question and answer pair that the lexical item weight retrieval of each lexical item in the question sentence determined according to described first weight determining unit is relevant to described question sentence；

Second weight determining unit, for the lexical item weight of the expansion word that obtains according to described keyword expansion unit extensions with expansion word described in the final Similarity Measure of corresponding key word；

Second retrieval unit, the lexical item weight of the expansion word for determining according to described second weight determining unit retrieves the question and answer pair relevant to described question sentence；

Retrieval result display unit, is ranked up display for all question and answer described first retrieval unit and described second retrieval unit retrieved to according to preset rules.

Optionally, described keyword expansion unit, including:

Know net expansion module, know that net HowNet obtains at least one expansion word of each key word respectively for utilizing, and define each expansion word and be 1 with the initial similarity of corresponding key word；

Word woods expansion module, for utilizing Chinese thesaurus to obtain at least one expansion word of each key word respectively, and defines each expansion word and is 1 with the initial similarity of corresponding key word；

Model extension module, for utilize trained after text depth representing model word2vec, obtain the initial similarity of at least one expansion word of each key word and each expansion word and corresponding key word respectively；

Similarity calculation module, knows, described in merging, the identical expansion word that net expansion module, institute's predicate woods expansion module get with described model extension module, calculates the final similarity S of each expansion word after merging and corresponding key word respectively_R, wherein, S_R=S_sum/ 3, S_sumFor all initial similarity sum that described expansion word is corresponding.

Optionally, described weights determine unit, including:

First weight computation module, for calculating the distance weighting D between the first lexical item and each second lexical item respectively, described first lexical item is any one lexical item in described question sentence, and described second lexical item is there is the lexical item of described dependence with described first lexical item；

Second weight computation module, for calculating the distance weighting Dis between described first lexical item and each 3rd lexical item respectively, described 3rd lexical item is any one lexical item except described first word in described question sentence, and Dis is the described first calculated distance weighting D sum of weight computation module of at least one that at least one dependence existed between described first lexical item with described 3rd lexical item is corresponding.

Optionally, the described degree of association determines unit, specifically for calculating lexical item t in described question sentence according to the following equation_iWith lexical item t_jBetween degree of association w_rel(i,j):

w_rel(i,j)=λ Dep (t_i,t_j)+(1-λ)Close_pmi(t_i,t_j)；

Wherein,

{Dep}_{(t_{i}, t_{j})} = \frac{1}{b^{D i s (t_{i}, t_{j})}}, {Close}_{p m i} (t_{i}, t_{j}) = l o g \frac{p (t_{i}, t_{j})}{p (t_{i}) p (t_{j})};

λ is regulatory factor；

B is a constant more than 1；

Optionally, described first weight determining unit, specifically for calculating the weight matrix of the final weight composition of each lexical item in described question sentence according to the following equation

W_{q}^{*} = (1 - α) {(1 - α E)}^{- 1} W_{q}^{0};

Wherein, α is given constant；

Optionally, described second weight determining unit, including:

Original Weight Acquisition module, for obtaining the original weight of key word corresponding to described expansion word；

Second weight determination module, for the product of the original weight described original Weight Acquisition module obtained and described expansion word and the final similarity of corresponding key word, as the lexical item weight of described expansion word.

The question and answer that the embodiment of the present invention provides are to search method and community's question and answer searching system, by the importance degree weight arranged for dependences different in question sentence, can determine that the association compactness between lexical item in question sentence, the lexical item weight of each lexical item in question sentence can be further determined that according to the degree of association, by merging the lexical item weight that obtains of importance degree weight it appeared that important lexical item in question sentence, thus obtaining the question and answer more relevant to question sentence to retrieval result, solve existing community question and answer searching system and do not account for the shortcoming that question sentence structure is complicated and clause is tediously long and cannot find the important lexical item of question sentence, and then improve the purpose of retrieval result accuracy.

Accompanying drawing explanation

In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, the accompanying drawing used required in embodiment or description of the prior art will be briefly described below, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skill in the art, under the premise not paying creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings.

Fig. 1 is the embodiment of the present invention question and answer schematic flow sheets to search method；

Fig. 2 is the acquisition methods block diagram of embodiment of the present invention expansion word and similarity；

Fig. 3 is embodiment of the present invention question sentence retrieving block diagram；

Fig. 4 is embodiment of the present invention dependence schematic diagram；

Fig. 5 is the composition schematic diagram of embodiment of the present invention community question and answer searching system.

Detailed description of the invention

For making the purpose of the embodiment of the present invention, technical scheme and advantage clearly, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is a part of embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, the every other embodiment that those of ordinary skill in the art obtain under not making creative work premise, broadly fall into the scope of protection of the invention.

When the problem that user proposes with the form of question sentence, by this question sentence being input in community's question and answer searching system, community's question and answer searching system can retrieve same or analogous problem semantic with this question sentence, retrieve the answer corresponding with the semantic same or analogous problem of this question sentence simultaneously, namely question and answer pair are retrieved, and according to the degree of approximation with question sentence, by question and answer to being ranked up display.Introduce the search method of question and answer pair in detail below.

Referring to Fig. 1, the question and answer provided for the embodiment of the present invention schematic flow sheet to search method, the method includes:

Step 101: extract at least one key word from question sentence, and obtain the expansion word of described key word and the final similarity of each expansion word and described key word.

The acquisition methods block diagram of expansion word shown in Figure 2 and similarity.

First, according to part of speech with disable vocabulary and extract the key word in this question sentence.Specifically, remove and this question sentence belongs to the stop words disabling in vocabulary, extract, from the lexical item after removal stop words, the lexical item belonging to specific part of speech again, such as preset specific part of speech respectively nouns and adjectives (can certainly be other type of part of speech), each lexical item after stop words is extracted lexical item that part of speech is noun and part of speech is adjectival part of speech from removing, using the noun lexical item extracted and adjective lexical item as key word.The purpose extracting key word is, can play by restricting the quantity of key word and prevent lexical item from propagating through problem that is many and that affect recall precision.

Then, each key word is carried out conventional Extension and text depth representing model word2vec extension.Wherein, described conventional Extension includes: uses and knows that each key word is extended by net HowNet, to obtain an extension lexical item collection, and, use Chinese thesaurus that each key word is extended, to obtain an extension lexical item collection；Described word2vec extension refer to, use trained after word2vec model each key word is extended, with obtain one extension lexical item collection and extension lexical item concentrate each expansion word respectively with the initial similarity of corresponding key word.Owing to conventional Extension method can not obtain similarity, therefore for adopting the expansion word that obtains of conventional Extension method, the initial similarity defining its corresponding key word with it is 1.

Finally, by merging expansion word and the initial similarity thereof that three kinds of extended methods obtain, finally extension lexical item and similarity collection are obtained.Specifically, for an expansion word, above-mentioned three kinds of extended methods are likely to all can obtain this expansion word, be likely to two of which extended method and can obtain this expansion word, also or only one of which extended method can obtain this expansion word, now, identical expansion word is merged into an expansion word, for this expansion word after merging, corresponding three or two or an initial similarity are added, and will add up sum divided by the final similarity as this expansion word with corresponding key word of the result after 3.

To sum up, step 101 specifically obtains the expansion word of described key word and the final similarity of each expansion word and described key word in the following manner, including:

Utilize and know that net HowNet obtains at least one expansion word of each key word respectively, and define each expansion word and be 1 with the initial similarity of corresponding key word；Utilize Chinese thesaurus to obtain at least one expansion word of each key word respectively, and define each expansion word and be 1 with the initial similarity of corresponding key word；Utilize trained after text depth representing model word2vec, obtain the initial similarity of at least one expansion word of each key word and each expansion word and corresponding key word respectively；Merge the identical expansion word got, calculate the final similarity S of each expansion word after merging and corresponding key word respectively_R, wherein, S_R=S_sum/ 3, S_sumFor all initial similarity sum that described expansion word is corresponding.

In order to understand the concrete methods of realizing of step 101 more easily, it is exemplified below:

Assume that the key word extracted from question sentence includes " screen " word.

For key word " screen ", use HowNet, Chinese thesaurus and trained word2vec to carry out synonym extension respectively, respectively obtain extension lexical item and the similarity thereof of respective method.

Assume to use the HowNet expansion word being extended obtaining to " screen " to include:

Expansion word	Definition similarity
		Screen	1
Display screen	1
		Display	1

Assume to use the expansion word that " screen " is extended obtaining by Chinese thesaurus to include:

Expansion word	Definition similarity
		Screen	1
Display screen	1

Assume to use trained after word2vec model " screen " is extended the expansion word that obtains and similarity includes:

Merging expansion word " screen ", the final similarity of itself and " screen " is (1+1+0.788869)/3；

Merging expansion word " display screen ", its final similarity with " screen " is (1+1+0.775589)/3；

Merging expansion word " display ", its final similarity with " screen " is (1+0.654054)/3；

Merging expansion word " touch screen ", its final similarity with " screen " is 0.649287/3.

For other key word in question sentence, obtain the final similarity of expansion word and expansion word and corresponding key word also according to aforesaid way.

HowNet extension and Chinese thesaurus extension is mainly adopted due to prior art, but adopt expansion word that existing mode the obtains keywords semantics often and in question sentence not to be inconsistent, and word2vec synonym extended mode is fused in existing synonym extended method by the embodiment of the present invention, solve the problem that expansion word is not inconsistent with the corresponding key word meaning of a word.

Step 102: analyze the dependence between each two lexical item in described question sentence with grammar association.

Below in conjunction with the question sentence retrieving block diagram shown in Fig. 3, understand step 102 to step 107.

For ease of understanding step 102, existing illustration, it is assumed that question sentence is: " how iphone5s is by mode getting back mobile phones such as missing mode ", the result that this question sentence carries out grammatical relation analysis is:

Dependence schematic diagram shown in Figure 4, above-mentioned question sentence by " iphone5s ", " how ", " passing through ", " loss ", " pattern ", " etc. ", " mode ", " giving for change ", " mobile phone " these lexical items form, wherein, the starting point that the lexical item " giving for change " that root points to is syntactic analysis, the dependence between lexical item is respectively as follows:

1, " giving " dependence between " iphone5s " for change is nominal subject relation (nsubj)；

2, the dependence " given for change " between " how " is adverbial modified relationship (advmod)；

3, " giving " dependence between " loss " for change is premodification relation (prep)；

4, " giving " dependence between " mobile phone " for change is direct object relation (dobj)；

5, " loss " and " by " between dependence be example relation (case)；

6, the dependence between " loss " and " pattern " is direct object relation (dobj)；

7, " pattern " and " etc. " between dependence be etc. relation (etc)；

8, the dependence between " pattern " and " mode " is direct object relation (dobj).

Step 103: according to being the importance degree weight that sets of described dependence in advance, it is determined that reflect in described question sentence the distance weighting of tightness degree between each two lexical item.

In embodiments of the present invention, step 103 is realized as steps described below:

Before performing each step of the embodiment of the present invention, significance level according to different dependences, community's question and answer searching system is provided with different importance degree weights for every kind of dependence in advance, such as " subject relation " and " object relation " can be arranged bigger weight, because these two kinds of passes tie up in sentence structure and composition occupies major part, the importance degree weight that such as can arrange " subject relation " is 5, the importance degree weight of " object relation " is 4, and arranges the importance degree weight of other dependences.Wherein, the importance degree weight arranging the dependence without importance degree is 1.

After question sentence being carried out the dependence that the interdependent syntactic analysis of question sentence obtains question sentence by step 102, next, according to the importance degree weight arranged for dependence, calculating the different distance weight of different dependences in question sentence, described distance weighting reflects the tightness degree of dependence between lexical item, and distance weighting is more little, then tightness degree is more high, otherwise, distance weighting is more big, then tightness degree is more low.

Such as: if the importance degree weight of subject relation " subj " is " 5 ", then the distance weighting of two lexical items that this relation relates to isIf the importance degree weight of direct object relation " dobj " is " 4 ", then the distance weighting of two lexical items that this relation relates to isWherein, α is reference value, and α can pass through parameter adjustment and obtain.The different distance weight that different dependence is corresponding can be obtained by this kind of mode.

It follows that ignore between lexical item interdependent with by interdependent points relationship, just can build weighted-graph G=(V according to the distance weighting D having between each two lexical item of dependence, E, W), by this kind of mode, between any two lexical item, it is connection.

Wherein, n lexical item v in question sentence_i, form vertex set V={v_i| i=1 ..., n}；

Any two lexical item (the lexical item v of concrete dependence_iWith lexical item v_j) be connected the limit formed, and forms limit collection E={ < v_i,v_j>_i|i≠j,1≤i≤n,1≤j≤n}；

Any two lexical item (the lexical item v of concrete dependence_iWith lexical item v_j) corresponding distance weighting D, form weights collection W={ < v_i,v_j>|v_i∈V,v_j∈V,<v_i,v_j>∈E}。

Based on weighted-graph, just can calculate the distance weighting Dis of each two lexical item in question sentence in the following manner, be exemplified below:

Such as: for question sentence " how iphone5s is by mode getting back mobile phones such as missing mode ", it is necessary to calculate 8 distance weightings of each lexical item (such as " iphone5s ") and other 8 lexical item, totally 64 distance weightings.Due to based on importance degree weight can obtain a lexical item (being defined as the first lexical item) and and its there is the distance weighting D between another lexical item (defining the second lexical item) of dependence, and then the distance weighting Dis between the first lexical item and each lexical item (definition the 3rd lexical item) except the first lexical item can be calculated further based on distance weighting D.

Such as: for question sentence " how iphone5s is by mode getting back mobile phones such as missing mode ", when assuming described first lexical item for " giving for change ", there is different dependences from the second lexical item " iphone5s ", " how ", " loss ", " mobile phone " respectively in it, according to formulaFour distance weighting D of " giving for change " and these four the second lexical items can be calculated, such as, when calculating the distance weighting of " giving for change " and " iphone5s ", the importance degree weight of the nominal subject relation that y has with " iphone5s " for " giving for change ", other three distance weighting D according to said method calculate equally.

Based on these four distance weighting D, the distance weighting Dis between " giving for change " and other 8 lexical items is respectively as follows:

1, the distance weighting Dis " given for change " between " iphone5s ", " how ", " loss ", " mobile phone " is exactly the distance weighting D of its dependence；" giving " distance weighting Dis=A+B, A between " passing through " for change be distance weighting D, the B between " giving for change " and " loss " is the distance weighting D between " loss " and " passing through "；

2, " giving " distance weighting Dis=A+C, A between " pattern " for change be distance weighting D, the C between " giving for change " and " loss " is the distance weighting D between " loss " and " pattern "；

3, " give for change " and " etc. " between distance weighting Dis=A+C+E, A be the distance weighting D between " giving for change " and " loss ", C is the distance weighting D between " loss " and " pattern ", E be " pattern " and " etc. " between distance weighting D；

4, " giving " distance weighting Dis=A+C+F, A between " mode " for change be distance weighting D, the C between " giving for change " and " loss " be distance weighting D, the F between " loss " and " pattern " is the distance weighting D between " pattern " and " mode ".

When the first lexical item is other lexical item (such as " how "), calculates the distance weighting Dis between the first lexical item and other each lexical item also according to said method, do not repeat them here.

To sum up, concrete described step 103 in the following manner:

First, calculating the distance weighting D between the first lexical item and each second lexical item respectively, described first lexical item is any one lexical item in described question sentence, and described second lexical item is there is the lexical item of described dependence with described first lexical item；Wherein,Y is the importance degree weight arranged for the dependence between described first lexical item and described second lexical item in advance, and α is reference value；Then, calculate the distance weighting Dis between described first lexical item and each 3rd lexical item respectively, described 3rd lexical item is any one lexical item except described first word in described question sentence, and Dis is at least one distance weighting D sum that at least one dependence existed between described first lexical item with described 3rd lexical item is corresponding.

Step 104: determine in described question sentence the degree of association between each two lexical item according to described distance weighting.

The distance weighting Dis that each two lexical item that calculated by step 103 is corresponding, can build the incidence matrix M that dimension is n × n, and owing to have ignored the interdependent direction between lexical item, therefore matrix M is symmetrical matrix, the element M of the i-th row jth row in M_ijRepresent lexical item t in question sentence_iWith lexical item t_jThe degree of association.

w_rel(i,j)=λ Dep (t_i,t_j)+(1-λ)Close_pmi(t_i,t_j)；

Wherein,

{Dep}_{(t_{i}, t_{j})} = \frac{1}{b^{D i s (t_{i}, t_{j})}}, {Close}_{p m i} (t_{i}, t_{j}) = \log \frac{p (t_{i}, t_{j})}{p (t_{i}) p (t_{j})};

λ is regulatory factor；

B is a constant more than 1；

Step 105: determine the lexical item weight of each lexical item in described question sentence according to the described degree of association, and the lexical item weight according to lexical item each in described question sentence retrieves the question and answer pair relevant to described question sentence.

In embodiments of the present invention, step 105 specifically determines the lexical item weight of each lexical item in question sentence in the following manner, namely calculates the weight matrix of the final weight composition of each lexical item in described question sentence according to the following equation

W_{q}^{*} = (1 - α) {(1 - α E)}^{- 1} W_{q}^{0};

Wherein, α is given constant；E is the random matrix after incidence matrix M carries out orthogonal transformation, and described incidence matrix M is the symmetrical matrix of the degree of association formation in described question sentence between each two lexical item；Weight matrix for the original weight composition of lexical item each in described question sentence.

Wherein, original weightIt is determined by the retrieval model of community's question and answer searching system, described retrieval model is: vector space model (Vectorspacemodel, or probabilistic model okapiBM25 or language model (LanguageModel, LM) or other model VSM).Described retrieval model includes but not limited to above-mentioned model.

It should be noted that random matrix E utilizes orthogonal transformation incidence matrix M conversion to be obtained, to guarantee that equation group corresponding for this incidence matrix M certainly exists analytic solutions, i.e. E=D^-1M(D^-1It is an orthogonal matrix).For each element in matrix E, its value all [0,1) between, and each row element sum of E is equal to 1.Therefore, E must have an eigenvalue equal to 1, and the characteristic vector corresponding to this eigenvalue equal to 1 is solution of equations vector corresponding for E, thus equation group corresponding to E certainly exists analytic solutions.After converting incidence matrix M battle array to a random matrix E, it is possible to obtain lexical item weight to be calculated by the solution of equations analysis solution solving this matrix correspondingBut the original value of the weight which obtains and lexical itemIt doesn't matter, and original weightIt is obtained by original retrieval model, such as VSM, okapiBM25, LM model etc., utilizationAnd incidence matrix M can calculate lexical item weight more accurately

Step 106: the lexical item weight according to described expansion word with expansion word described in the final Similarity Measure of corresponding key word, and retrieve the question and answer pair relevant to described question sentence according to the lexical item weight of described expansion word.

In embodiments of the present invention, for each expansion word, first, determined the original weight of key word corresponding to this expansion word by the retrieval model of community's question and answer searching system, then calculate extension lexical item weight: the original weight * of extension lexical item weight=key word item extends the similarity of lexical item and key word item；Then, retrieval model utilizes extension lexical item weight to obtain retrieval result, and this retrieval result not only can include question and answer to including the marking value to question and answer pair.

Wherein, described retrieval model is: vector space model (Vectorspacemodel, VSM) or probabilistic model okapiBM25 or language model (LanguageModel, LM) or other model.Described retrieval model includes but not limited to above-mentioned model.

To sum up, step 106 specifically calculates the lexical item weight of described expansion word in the following manner:

Obtain the original weight of key word corresponding to described expansion word；By the product of described original weight and described expansion word with the final similarity of corresponding key word, as the lexical item weight of described expansion word.

Step 107: all question and answer step 105 and step 106 retrieved are ranked up display to according to preset rules.

Step 105 and step 106 are while obtaining question and answer pair, also can obtain the marking value to each question and answer pair further, the embodiment of the present invention can according to the question and answer that step 105 and step 106 are obtained by marking value to being ranked up display, namely the question and answer that marking value is high are in front display, and marking is worth low question and answer in rear display.

It should be noted that, step 101 and step 106 can perform the optional position before step 107, step 101 front step 106 after, such as step 106 is moved to execution after step 101, or step 101 is moved to step 106 execution before etc., this present invention is not limited.

The question and answer that the embodiment of the present invention provides are to search method, by the importance degree weight arranged for dependences different in question sentence, can determine that the association compactness between lexical item in question sentence, the lexical item weight of each lexical item in question sentence can be further determined that according to the degree of association, by merging the lexical item weight that obtains of importance degree weight it appeared that important lexical item in question sentence, thus obtaining the question and answer more relevant to question sentence to retrieval result, solve existing community question and answer searching system and do not account for the shortcoming that question sentence structure is complicated and clause is tediously long and cannot find the important lexical item of question sentence, and then improve the purpose of retrieval result accuracy.

Referring to Fig. 5, for the composition schematic diagram of community's question and answer searching system that the embodiment of the present invention provides, including:

Keyword extracting unit 501, for extracting at least one key word from question sentence；

Keyword expansion unit 502, is used for the final similarity obtaining the expansion word of each key word that described keyword extracting unit 501 obtains and each expansion word with corresponding key word；

Relation analysis unit 503, for analyzing the dependence between each two lexical item in described question sentence with grammar association；

Weights determine unit 504, for according to analyzing, for described relation analysis unit 503, the importance degree weight that the dependence that obtains sets in advance, it is determined that reflect in described question sentence the distance weighting of tightness degree between each two lexical item；

The degree of association determines unit 505, for determining that the distance weighting that unit 504 is determined determines the degree of association in described question sentence between each two lexical item according to described weights；

According to the described degree of association, first weight determining unit 506, for determining that the degree of association that unit 505 is determined determines the lexical item weight of each lexical item in described question sentence；

First retrieval unit 507, for the question and answer pair that the lexical item weight retrieval of each lexical item in the question sentence determined according to described first weight determining unit 506 is relevant to described question sentence；

Second weight determining unit 508, for extending the lexical item weight of expansion word and the expansion word described in the final Similarity Measure of corresponding key word obtained according to described keyword expansion unit 501；

Second retrieval unit 509, the lexical item weight of the expansion word for determining according to described second weight determining unit 508 retrieves the question and answer pair relevant to described question sentence；

Retrieval result display unit 510, is ranked up display for all question and answer described first retrieval unit 507 and described second retrieval unit 509 retrieved to according to preset rules.

In embodiments of the present invention, described keyword expansion unit 502, including:

In embodiments of the present invention, described weights determine unit 504, including:

In embodiments of the present invention, the described degree of association determines unit 505, specifically for calculating lexical item t in described question sentence according to the following equation_iWith lexical item t_jBetween degree of association w_rel(i,j):

w_rel(i,j)=λ Dep (t_i,t_j)+(1-λ)Close_pmi(t_i,t_j)；

Wherein,

{Dep}_{(t_{i}, t_{j})} = \frac{1}{b^{D i s (t_{i}, t_{j})}}, {Close}_{p m i} (t_{i}, t_{j}) = \log \frac{p (t_{i}, t_{j})}{p (t_{i}) p (t_{j})};

λ is regulatory factor；

B is a constant more than 1；

In embodiments of the present invention, described first weight determining unit 506, specifically for calculating the weight matrix of the final weight composition of each lexical item in described question sentence according to the following equation

W_{q}^{*} = (1 - α) {(1 - α E)}^{- 1} W_{q}^{0};

Wherein, α is given constant；

In embodiments of the present invention, described second weight determining unit 508, including:

Community's question and answer searching system that the embodiment of the present invention provides, by the importance degree weight arranged for dependences different in question sentence, can determine that the association compactness between lexical item in question sentence, the lexical item weight of each lexical item in question sentence can be further determined that according to the degree of association, by merging the lexical item weight that obtains of importance degree weight it appeared that important lexical item in question sentence, thus obtaining the question and answer more relevant to question sentence to retrieval result, solve existing community question and answer searching system and do not account for the shortcoming that question sentence structure is complicated and clause is tediously long and cannot find the important lexical item of question sentence, and then improve the purpose of retrieval result accuracy.

As seen through the above description of the embodiments, those skilled in the art is it can be understood that can add the mode of required general hardware platform by software to all or part of step in above-described embodiment method and realize.Based on such understanding, the part that prior art is contributed by technical scheme substantially in other words can embody with the form of software product, this computer software product can be stored in storage medium, such as ROM/RAM, magnetic disc, CD etc., including some instructions with so that a computer equipment (can be personal computer, server, or the network communication equipments such as such as WMG, etc.) perform the method described in some part of each embodiment of the present invention or embodiment.

It should be noted that for system disclosed in embodiment, owing to it corresponds to the method disclosed in Example, so what describe is fairly simple, relevant part illustrates referring to method part.

It can further be stated that, in this article, the relational terms of such as first and second or the like is used merely to separate an entity or operation with another entity or operating space, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " includes ", " comprising " or its any other variant are intended to comprising of nonexcludability, so that include the process of a series of key element, method, article or equipment not only include those key elements, but also include other key elements being not expressly set out, or also include the key element intrinsic for this process, method, article or equipment.When there is no more restriction, statement " including ... " key element limited, it is not excluded that there is also other identical element in including the process of described key element, method, article or equipment.

Described above to the disclosed embodiments, makes professional and technical personnel in the field be capable of or uses the present invention.The multiple amendment of these embodiments be will be apparent from for those skilled in the art, and generic principles defined herein can without departing from the spirit or scope of the present invention, realize in other embodiments.Therefore, the present invention is not intended to be limited to the embodiments shown herein, and is to fit to the widest scope consistent with principles disclosed herein and features of novelty.

Claims

1. question and answer are to search method, it is characterised in that including:

2. method according to claim 1, it is characterised in that the final similarity of the expansion word of each key word of described acquisition and each expansion word and corresponding key word, including:

3. method according to claim 1, it is characterised in that described basis is the importance degree weight that described dependence sets in advance, it is determined that reflect in described question sentence the distance weighting of tightness degree between each two lexical item, including:

4. method according to claim 3, it is characterised in that the described degree of association determined in described question sentence according to described distance weighting between each two lexical item, including:

w_rel(i,j)=λ Dep (t_i,t_j)+(1-λ)Close_pmi(t_i,t_j)；

Wherein,

{Dep}_{(t_{i}, t_{j})} = \frac{1}{b^{D i s (t_{i}, t_{j})}}, {Close}_{p m i} (t_{i}, t_{j}) = \log \frac{p (t_{i}, t_{j})}{p (t_{i}) p (t_{j})};

t_iRepresent the i-th lexical item in described question sentence, t_jRepresent the jth lexical item in described question sentence,

I=1,2 ... n, j=1,2 ... n, n are the lexical item sum in described question sentence；

λ is regulatory factor；

B is a constant more than 1；

5. the method according to any one of Claims 1-4, it is characterised in that the described lexical item weight determining each lexical item in described question sentence according to the described degree of association, including:

W_{q}^{*} = (1 - α) {(1 - α E)}^{- 1} W_{q}^{0};

Wherein, α is given constant；

6. method according to claim 1, it is characterised in that the described lexical item weight according to described expansion word with expansion word described in the final Similarity Measure of corresponding key word, including:

7. community's question and answer searching system, it is characterised in that including:

8. system according to claim 7, it is characterised in that described keyword expansion unit, including:

9. system according to claim 7, it is characterised in that described weights determine unit, including:

10. system according to claim 9, it is characterised in that the described degree of association determines unit, specifically for calculating lexical item t in described question sentence according to the following equation_iWith lexical item t_jBetween degree of association w_rel(i,j):

w_rel(i,j)=λ Dep (t_i,t_j)+(1-λ)Close_pmi(t_i,t_j)；

Wherein,

{Dep}_{(t_{i}, t_{j})} = \frac{1}{b^{D i s (t_{i}, t_{j})}}, {Close}_{p m i} (t_{i}, t_{j}) = \log \frac{p (t_{i}, t_{j})}{p (t_{i}) p (t_{j})};

λ is regulatory factor；

B is a constant more than 1；

11. according to the system described in any one of claim 7 to 10, it is characterised in that described first weight determining unit, specifically for calculating the weight matrix of the final weight composition of each lexical item in described question sentence according to the following equation

W_{q}^{*} = (1 - α) {(1 - α E)}^{- 1} W_{q}^{0};

Wherein, α is given constant；

12. method according to claim 7, it is characterised in that described second weight determining unit, including: