CN105786794B

CN105786794B - A kind of question and answer are to search method and community's question and answer searching system

Info

Publication number: CN105786794B
Application number: CN201610082304.6A
Authority: CN
Inventors: 王金龙; 董日壮
Original assignee: Qingdao University of Technology
Current assignee: Qingdao University of Technology
Priority date: 2016-02-05
Filing date: 2016-02-05
Publication date: 2018-09-04
Anticipated expiration: 2036-02-05
Also published as: CN105786794A

Abstract

The invention discloses a kind of question and answer to search method, including：At least one keyword is extracted from question sentence, and obtains the expansion word of each keyword and the final similarity of each expansion word and corresponding keyword；Analyze the dependence between each two lexical item with grammar association in question sentence；According to being in advance the importance weight of dependence setting, the distance weighting of tightness degree between each two lexical item in reflection question sentence is determined；The degree of association in question sentence between each two lexical item is determined according to distance weighting；The lexical item weight of each lexical item in question sentence is determined according to the degree of association, and according to the lexical item weight retrieval of each lexical item in question sentence and the relevant question and answer pair of question sentence；According to the lexical item weight of expansion word and the final similarity calculation expansion word of corresponding keyword, and according to the retrieval of the lexical item weight of expansion word and the relevant question and answer pair of question sentence；By all question and answer retrieved to being ranked up display according to preset rules.The invention also discloses a kind of community's question and answer searching systems.

Description

A kind of question and answer are to search method and community's question and answer searching system

Technical field

The present invention relates to technical field of information retrieval more particularly to a kind of question and answer to search method and community's question and answer retrieval system System.

Background technology

In recent years, community's question answering system has been increasingly becoming a kind of very popular and practical the Internet, applications, is asked with tradition It answers unlike system, in community's question answering system, user not only can put question to and answer any field, any kind of ask Topic, but also can make the answer of other users evaluation and ballot, or even can also directly search that system accumulated go through Similar Problems in history problem answers library, greatly enrich and meet the information requirement of user.

When user wants to utilize the retrieval of community's question answering system and oneself proposition problem same or similar problem and its answer When, due to it is input by user be the question sentence using natural language description, complicated structure and interminable clause so that from question sentence The middle important key word item of extraction can be relatively difficult.Because can not accurately obtain the kernel keyword in question sentence, lead to retrieval result not It is enough accurate.

Invention content

Search method and community's question and answer are examined in view of this, the main purpose of the embodiment of the present invention is to provide a kind of question and answer Cable system, to realize the purpose for improving question and answer to the accuracy of retrieval result.

To achieve the above object, an embodiment of the present invention provides a kind of question and answer to search method, including：

At least one keyword is extracted from question sentence, and obtain each keyword expansion word and each expansion word with it is right Answer the final similarity of keyword；

Analyze the dependence between each two lexical item with grammar association in the question sentence；

According to the importance weight for being in advance dependence setting, determine reflect in the question sentence each two lexical item it Between tightness degree distance weighting；

The degree of association in the question sentence between each two lexical item is determined according to the distance weighting；

The lexical item weight of each lexical item in the question sentence is determined according to the degree of association, and according to each word in the question sentence The retrieval of lexical item weight and the relevant question and answer pair of the question sentence of item；

According to the lexical item weight of the expansion word and expansion word described in the final similarity calculation of corresponding keyword, and according to The retrieval of lexical item weight and the relevant question and answer pair of the question sentence of the expansion word；

By all question and answer retrieved to being ranked up display according to preset rules.

Optionally, the expansion word for obtaining each keyword and each expansion word are final similar with corresponding keyword Degree, including：

Obtain at least one expansion word of each keyword respectively using Hownet HowNet, and define each expansion word with it is right It is 1 to answer the initial similarity of keyword；

Obtain at least one expansion word of each keyword respectively using Chinese thesaurus, and define each expansion word with it is right It is 1 to answer the initial similarity of keyword；

Using the text depth representing model word2vec after trained, at least one expansion of each keyword is obtained respectively Open up the initial similarity of word and each expansion word and corresponding keyword；

Merge the identical expansion word got, calculates separately the most last phase of each expansion word and corresponding keyword after merging Like degree S_R, wherein S_R=S_sum/ 3, S_sumFor the sum of corresponding all initial similarities of the expansion word.

Optionally, the basis is the importance weight of dependence setting in advance, determines and reflects in the question sentence The distance weighting of tightness degree between each two lexical item, including：

The distance between the first lexical item and each second lexical item weight D are calculated separately, first lexical item is the question sentence In any one lexical item, second lexical item be there are the lexical items of the dependence with first lexical item；

Wherein,Y be that dependence between first lexical item and second lexical item is arranged in advance Importance weight is worth on the basis of α；

The distance between first lexical item and each third lexical item weight Dis are calculated separately, the third lexical item is institute Any one lexical item in addition to first word in question sentence is stated, Dis is deposited between first lexical item and the third lexical item The sum of the corresponding at least one distance weighting D of at least one dependence.

Optionally, the degree of association determined according to the distance weighting in the question sentence between each two lexical item, including：

Lexical item t in the question sentence is calculated according to the following equation_iWith lexical item t_jBetween degree of association w_rel(i,j)：

w_rel(i,j)=λ Dep (t_i,t_j)+(1-λ)Close_pmi(t_i,t_j)；

t_iIndicate i-th of lexical item in the question sentence, t_jIndicate j-th of lexical item in the question sentence, i=1,2 ... n, j =1,2 ... n, n are the lexical item sum in the question sentence；

λ is regulatory factor；

B is a constant more than 1；

Dis(t_i,t_j) it is lexical item t_iWith lexical item t_jThe distance between weight；

Lexical item t is concentrated for question sentence_iWith lexical item t_jThe probability occurred jointly, N_d(t_i,t_j) it is question sentence Concentrate lexical item t_iWith lexical item t_jThe number of the question sentence occurred simultaneously, N_DThe question sentence sum concentrated for question sentence；

WithLexical item t is indicated respectively_iWith lexical item t_jEach leisure question sentence concentrates appearance Probability, N_d(t_i) it is that question sentence is concentrated comprising lexical item t_iQuestion sentence sum, N_d(t_j) it is that question sentence is concentrated comprising lexical item t_jQuestion sentence it is total Number, N_DThe question sentence sum concentrated for question sentence.

Optionally, the lexical item weight that each lexical item in the question sentence is determined according to the degree of association, including：

The weight matrix of each lexical item final weight composition in the question sentence is calculated according to the following equation

Wherein, α is given constant；

E is that incidence matrix M is carried out the random matrix after orthogonal transformation, and the incidence matrix M is every two in the question sentence The symmetrical matrix that the degree of association between a lexical item is formed；

For the weight matrix of the original weight composition of each lexical item in the question sentence.

Optionally, the lexical item according to the expansion word and expansion word described in the final similarity calculation of corresponding keyword Weight, including：

Obtain the original weight of the corresponding keyword of the expansion word；

By the product of the original weight and the expansion word and the final similarity of corresponding keyword, as the extension The lexical item weight of word.

The embodiment of the present invention additionally provides a kind of community's question and answer searching system, including：

Keyword extracting unit, for extracting at least one keyword from question sentence；

Keyword expansion unit, expansion word for obtaining each keyword that the keyword extracting unit obtains and The final similarity of each expansion word and corresponding keyword；

Relationship analysis unit, for analyzing the interdependent pass in the question sentence between each two lexical item with grammar association System；

Weights determination unit, for according to the weight for being in advance the dependence setting that the relationship analysis unit is analyzed Weight is spent, determines the distance weighting for reflecting tightness degree between each two lexical item in the question sentence；

Degree of association determination unit, the distance weighting for being determined according to the weights determination unit determine every in the question sentence The degree of association between two lexical items；

First weight determining unit, the degree of association for being determined according to the degree of association determination unit determine in the question sentence The lexical item weight of each lexical item；

First retrieval unit, the lexical item power of each lexical item in the question sentence for being determined according to first weight determining unit Retrieval and the relevant question and answer pair of the question sentence again；

Second weight determining unit, the expansion word for being obtained according to the keyword expansion unit extensions and corresponding key The lexical item weight of expansion word described in the final similarity calculation of word；

Second retrieval unit, the lexical item weight retrieval of the expansion word for being determined according to second weight determining unit and The relevant question and answer pair of question sentence；

Retrieval result display unit, it is all for retrieving first retrieval unit and second retrieval unit Question and answer according to preset rules to being ranked up display.

Optionally, the keyword expansion unit, including：

Hownet expansion module, at least one expansion word for obtaining each keyword respectively using Hownet HowNet, and It is 1 that each expansion word, which is defined, with the initial similarity of corresponding keyword；

Word woods expansion module, at least one expansion word for obtaining each keyword respectively using Chinese thesaurus, and It is 1 that each expansion word, which is defined, with the initial similarity of corresponding keyword；

Model extension module obtains each respectively for text depth representing model word2vec of the utilization after trained The initial similarity of at least one expansion word and each expansion word and corresponding keyword of keyword；

Similarity calculation module expands for merging the Hownet expansion module, institute's predicate woods expansion module and the model The identical expansion word that exhibition module is got, calculates separately the final similarity of each expansion word and corresponding keyword after merging S_R, wherein S_R=S_sum/ 3, S_sumFor the sum of corresponding all initial similarities of the expansion word.

Optionally, the weights determination unit, including：

First weight computation module, for calculating separately the distance between the first lexical item and each second lexical item weight D, institute It is any one lexical item in the question sentence to state the first lexical item, and second lexical item is that there are described interdependent with first lexical item The lexical item of relationship；

Wherein,Y be in advance between first lexical item and second lexical item dependence setting Importance weight, be worth on the basis of α；

Second weight computation module, for calculating separately the distance between first lexical item and each third lexical item weight Dis, the third lexical item are any one lexical item in addition to first word in the question sentence, and Dis is first lexical item Corresponding at least one the first weight computation module meter of existing at least one dependence between the third lexical item The sum of obtained distance weighting D.

Optionally, the degree of association determination unit, specifically for calculating lexical item t in the question sentence according to the following equation_iWith Lexical item t_jBetween degree of association w_rel(i,j)：

w_rel(i,j)=λ Dep (t_i,t_j)+(1-λ)Close_pmi(t_i,t_j)；

λ is regulatory factor；

B is a constant more than 1；

Optionally, first weight determining unit, specifically for calculating each lexical item in the question sentence according to the following equation The weight matrix of final weight composition

Wherein, α is given constant；

Optionally, second weight determining unit, including：

Original Weight Acquisition module, the original weight for obtaining the corresponding keyword of the expansion word；

Second weight determination module, the original weight for obtaining the original Weight Acquisition module and the expansion word With the product of the final similarity of corresponding keyword, the lexical item weight as the expansion word.

Question and answer provided in an embodiment of the present invention are to search method and community's question and answer searching system, by being different in question sentence The importance weight of dependence setting, it may be determined that the association tight ness rating in question sentence between lexical item, it can be further according to the degree of association The lexical item weight for determining each lexical item in question sentence is can be found that by merging the lexical item weight that importance weight obtains in question sentence Important lexical item, to obtain solving existing community's question and answer searching system to retrieval result with the more relevant question and answer of question sentence Not the shortcomings that not accounting for that question sentence is complicated and clause is tediously long and can not finding question sentence important lexical item, and then improve retrieval result The purpose of accuracy.

Description of the drawings

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the present invention Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.

Fig. 1 is flow diagram of the question and answer of the embodiment of the present invention to search method；

Fig. 2 is expansion word of the embodiment of the present invention and the acquisition methods block diagram of similarity；

Fig. 3 is question sentence retrieving block diagram of the embodiment of the present invention；

Fig. 4 is dependence schematic diagram of the embodiment of the present invention；

Fig. 5 is the composition schematic diagram of community of embodiment of the present invention question and answer searching system.

Specific implementation mode

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art The every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

When the problem that user is proposed in the form of question sentence, by the way that the question sentence is input to community's question and answer searching system In, community's question and answer searching system can retrieve the problem same or similar with the question semanteme, while retrieve and the question sentence The semantic same or similar corresponding answer of problem, that is, retrieve question and answer pair, and according to the degree of approximation with question sentence, by question and answer into Row sequencing display.Lower mask body introduces the search method of question and answer pair.

It is flow diagram of the question and answer provided in an embodiment of the present invention to search method referring to Fig. 1, this method includes：

Step 101：At least one keyword is extracted from question sentence, and obtains the expansion word of the keyword and each expansion Open up the final similarity of word and the keyword.

The acquisition methods block diagram of expansion word and similarity shown in Figure 2.

First, the keyword in the question sentence is extracted according to part of speech and deactivated vocabulary.Specifically, it removes to belong in the question sentence and stop With the stop words in vocabulary, then extraction belongs to the lexical item of specific part of speech from the lexical item after removal stop words, such as default specific Part of speech is respectively nouns and adjectives (can certainly be other types of part of speech), from each lexical item after removal stop words It is the lexical item of noun and part of speech is adjectival part of speech to extract part of speech, using the noun lexical item extracted and adjective lexical item as Keyword.The purpose of extraction keyword is that the quantity by limiting keyword, which can play, prevents lexical item from propagating through mostly and influencing The problem of recall precision.

Then, conventional Extension and the word2vec extensions of text depth representing model are carried out to each keyword.Wherein, described Conventional Extension includes：Each keyword is extended using Hownet HowNet, to obtain an expansion word item collection, and, make Each keyword is extended with Chinese thesaurus, to obtain an expansion word item collection；The word2vec extensions refer to making Each keyword is extended with the word2vec models after trained, to obtain an expansion word item collection and extension lexical item Concentrate each expansion word respectively with the initial similarity of corresponding keyword.Since conventional Extension method cannot obtain similarity, For the expansion word obtained using conventional Extension method, the initial similarity for defining its keyword that is corresponding to it is 1.

Finally, it by merging the expansion word and its initial similarity that three kinds of extended methods obtain, obtains finally extending lexical item With similarity collection.Specifically, for an expansion word, above-mentioned three kinds of extended methods can may obtain the expansion word, it is also possible to Two of which extended method, which can obtain the expansion word or only one of which extended method, can obtain the expansion word, at this point, will Identical expansion word merges into an expansion word, will be at the beginning of corresponding three or two or one for the expansion word after merging Beginning similarity is added, and the sum of will add up divided by 3 after final similarity of the result as the expansion word and corresponding keyword.

To sum up, the specific expansion word for obtaining the keyword in the following manner of step 101 and each expansion word and institute The final similarity of keyword is stated, including：

Obtain at least one expansion word of each keyword respectively using Hownet HowNet, and define each expansion word with it is right It is 1 to answer the initial similarity of keyword；Obtain at least one expansion word of each keyword respectively using Chinese thesaurus, and It is 1 that each expansion word, which is defined, with the initial similarity of corresponding keyword；Utilize the text depth representing model after trained Word2vec obtains at least one expansion word of each keyword and the initial phase of each expansion word and corresponding keyword respectively Like degree；Merge the identical expansion word got, each expansion word calculated separately after merging is final similar with corresponding keyword Spend S_R, wherein S_R=S_sum/ 3, S_sumFor the sum of corresponding all initial similarities of the expansion word.

In order to more easily understand the concrete methods of realizing of step 101, it is exemplified below：

Assuming that the keyword extracted from question sentence includes " screen " word.

For keyword " screen ", HowNet, Chinese thesaurus and trained word2vec is used to carry out respectively same Adopted word extension, respectively obtains the extension lexical item and its similarity of respective method.

Assuming that including using the HowNet expansion words being extended to " screen "：

Expansion word	Define similarity
		Screen	1
Display screen	1
		Display	1

Assuming that including using the expansion word that Chinese thesaurus is extended " screen "：

Expansion word	Define similarity
		Screen	1
Display screen	1

Assuming that the expansion word and its similarity that are extended to " screen " using the word2vec models after trained Including：

Merge expansion word " screen ", the final similarity with " screen " is (1+1+0.788869)/3；

Merge expansion word " display screen ", the final similarity with " screen " is (1+1+0.775589)/3；

Merge expansion word " display ", the final similarity with " screen " is (1+0.654054)/3；

Merge expansion word " touch screen ", the final similarity with " screen " is 0.649287/3.

For other keywords in question sentence, expansion word and expansion word and corresponding key are obtained also according to aforesaid way The final similarity of word.

Since the prior art is mainly using HowNet extensions and Chinese thesaurus extension, but obtained using existing way Expansion word is not often inconsistent with the keywords semantics in question sentence, and the embodiment of the present invention merges word2vec synonym extended modes Into existing synonym extended method, solve the problems, such as that expansion word is not inconsistent with the corresponding keyword meaning of a word.

Step 102：Analyze the dependence between each two lexical item with grammar association in the question sentence.

With reference to question sentence retrieving block diagram shown in Fig. 3, understand step 102 to step 107.

Step 102 for ease of understanding now illustrates, it is assumed that question sentence is：" how iphone5s passes through the side such as missing mode Formula getting back mobile phone ", the result that grammatical relation analysis is carried out to the question sentence are：

Dependence schematic diagram shown in Figure 4, above-mentioned question sentence by " iphone5s ", " how ", " passing through ", " lose Lose ", " pattern ", " etc. ", " mode ", " giving for change ", " mobile phone " these lexical items composition, wherein root be directed toward lexical item " giving for change " be The starting point of syntactic analysis, the dependence between lexical item are respectively：

1, it is nominal subject relationship (nsubj) " to give " dependence between " iphone5s " for change；

2, " give for change " and " how " between dependence be adverbial modified relationship (advmod)；

3, it is premodification relationship (prep) " to give " dependence between " loss " for change；

4, it is direct object relationship (dobj) " to give " dependence between " mobile phone " for change；

5, the dependence between " loss " and " passing through " is example relationship (case)；

6, the dependence between " loss " and " pattern " is direct object relationship (dobj)；

7, " pattern " and " etc. " between dependence be etc. relationship (etc)；

8, the dependence between " pattern " and " mode " is direct object relationship (dobj).

Step 103：According to being in advance the importance weight of dependence setting, determines and reflect every two in the question sentence The distance weighting of tightness degree between a lexical item.

In embodiments of the present invention, step 103 is realized as steps described below：

Before executing each step of the embodiment of the present invention, according to the significance level of different dependences, examined in community's question and answer It is provided with different importance weights for each dependence in advance in cable system, such as can be to " subject relationship " and " object Larger weight is arranged in relationship ", because this two kinds of relationships occupy major part in sentence structure and ingredient, such as can be arranged The importance weight of " subject relationship " is 5, and the importance weight of " object relationship " is 4, and the important of other dependences is arranged Spend weight.Wherein, the importance weight of dependence of the setting without importance is 1.

After obtaining the dependence of question sentence to the question sentence progress interdependent syntactic analysis of question sentence by step 102, next, root According to the importance weight being arranged for dependence, the different distance weight of different dependences in question sentence, the distance power are calculated The tightness degree of dependence between lexical item is reflected again, and distance weighting is smaller, then tightness degree is higher, conversely, distance weighting is got over Greatly, then tightness degree is lower.

Such as：If the importance weight of subject relationship " subj " is " 5 ", the distance for two lexical items which is related to is weighed Weight isIf the importance weight of direct object relationship " dobj " be " 4 ", two lexical items which is related to away from It is from weightWherein, it is worth on the basis of α, α can be obtained by parameter regulation.Difference can be obtained in this manner The corresponding different distance weight of dependence.

Next, between ignoring lexical item it is interdependent with by interdependent points relationship, can be according to the every of dependence The distance between two lexical items weight D structure weighted-graph G=(V, E, W), in this manner, between any two lexical item It is connection.

Wherein, n lexical item v in question sentence_i, composition vertex set V={ v_i| i=1 ..., n }；

Any two lexical item (the lexical item v of specific dependence_iWith lexical item v_j) be connected the side to be formed, composition side collection E=< v_i,v_j>_i|i≠j,1≤i≤n,1≤j≤n}；

Any two lexical item (the lexical item v of specific dependence_iWith lexical item v_j) corresponding distance weighting D, composition weights collection W =<v_i,v_j>|v_i∈V,v_j∈V,<v_i,v_j>∈E}。

Based on weighted-graph, the distance weighting Dis of each two lexical item in question sentence can be calculated in the following manner, below It illustrates：

Such as：For question sentence " how iphone5s passes through the modes such as missing mode getting back mobile phone ", need to calculate each word 8 distance weightings of (such as " iphone5s ") and other 8 lexical items, totally 64 distance weightings.Due to being based on importance weight Can obtain a lexical item (being defined as the first lexical item) and with its another lexical item (defining the second lexical item) with dependence The distance between weight D, and then the first lexical item and each word in addition to the first lexical item can be further calculated based on distance weighting D The distance between item (defining third lexical item) weight Dis.

Such as：For question sentence " how iphone5s passes through the modes such as missing mode getting back mobile phone ", it is assumed that first word Be " giving for change " when, respectively from the second lexical item " iphone5s ", " how ", " loss ", " mobile phone " there are different interdependent passes System, according to formulaFour distance weighting D of " giving for change " and this four the second lexical items can be calculated, for example, when meter When calculating " giving for change " and the distance weighting of " iphone5s ", y is " giving for change " and nominal subject relationship possessed by " iphone5s " Importance weight, other three distance weighting D are equally according to said method calculated.

Based on this four distance weighting D, the distance between " giving for change " and other 8 lexical items weight Dis are respectively：

1, " give for change " with " iphone5s ", " how ", " loss ", the distance between " mobile phone " weight Dis be exactly that its is interdependent The distance weighting D of relationship；The distance between " giving for change " and " passing through " weight Dis=A+B, A is between " giving for change " and " loss " Distance weighting D, B are the distance between " loss " and " passing through " weight D；

2, the distance between " pattern " weight Dis=A+C, A " is given " for change to weigh for the distance between " giving for change " and " loss " Weight D, C are the distance between " loss " and " pattern " weight D；

3, " give for change " with " etc. " the distance between weight Dis=A+C+E, A be the distance between " giving for change " and " loss " power Weight D, C are the distance between " losss " and " pattern " weight D, E for " pattern " and " etc. " the distance between weight D；

4, it is the distance between " giving for change " and " loss " " to give " the distance between " mode " weight Dis=A+C+F, A for change Weight D, C are the distance between " loss " and " pattern " weight D, and F is the distance between " pattern " and " mode " weight D.

When the first lexical item is other lexical items (such as " how "), also according to the above method calculate the first lexical item with it is other The distance between each lexical item weight Dis, details are not described herein.

To sum up, the specific step 103 in the following manner：

First, the distance between the first lexical item and each second lexical item weight D are calculated separately, first lexical item is described Any one lexical item in question sentence, second lexical item are that there are the lexical items of the dependence with first lexical item；Wherein,Y be in advance between first lexical item and second lexical item dependence setting importance weight, α On the basis of be worth；Then, the distance between first lexical item and each third lexical item weight Dis, the third word are calculated separately Item is any one lexical item in addition to first word in the question sentence, and Dis is first lexical item and the third lexical item Between the sum of the corresponding at least one distance weighting D of existing at least one dependence.

Step 104：The degree of association in the question sentence between each two lexical item is determined according to the distance weighting.

By the corresponding distance weighting Dis of the calculated each two lexical item of step 103, it is n × n's that can build a dimension Incidence matrix M, due to having ignored the interdependent direction between lexical item, matrix M is symmetrical matrix, the element that the i-th row jth arranges in M M_ijIndicate lexical item t in question sentence_iWith lexical item t_jThe degree of association.

w_rel(i,j)=λ Dep (t_i,t_j)+(1-λ)Close_pmi(t_i,t_j)；

λ is regulatory factor；

B is a constant more than 1；

Step 105：The lexical item weight of each lexical item in the question sentence is determined according to the degree of association, and according to the question sentence In each lexical item the retrieval of lexical item weight and the relevant question and answer pair of the question sentence.

In embodiments of the present invention, the specific lexical item weight for determining each lexical item in question sentence in the following manner of step 105, The weight matrix of each lexical item final weight composition in the question sentence is calculated according to the following equation

Wherein, α is given constant；E is that incidence matrix M is carried out the random matrix after orthogonal transformation, the incidence matrix M The symmetrical matrix that the degree of association between each two lexical item in the question sentence is formed；For each original power of lexical item in the question sentence The weight matrix reassembled into.

Wherein, original weightIt is to be determined by the retrieval model of community's question and answer searching system, the retrieval model For：Vector space model (Vector space model, VSM) or probabilistic model okapi BM25 or language model (Language Model, LM) or other models.The retrieval model includes but not limited to above-mentioned model.

It should be noted that incidence matrix M is converted to obtain by random matrix E using orthogonal transformation, to ensure the pass The corresponding equation groups of connection matrix M certainly exist analytic solutions, i.e. E=D^-1M(D^-1For an orthogonal matrix).For every in matrix E One element, value all [0,1) between, and the sum of each row element of E be equal to 1.Therefore, E must there are one characteristic values etc. It is the corresponding solution of equations vectors of E in 1, and equal to the feature vector corresponding to 1 this characteristic value, thus E is corresponded to Equation group certainly exist analytic solutions.Incidence matrix is converted into after a random matrix E for M gusts, so that it may to be somebody's turn to do by solving The corresponding solution of equations of matrix analyses solution to obtain lexical item weight to be calculatedBut the weight and lexical item which obtains Original valueIt is not related, and original weightTo be obtained by original retrieval model, as VSM, okapi BM25, LM models etc. utilizeAnd more accurate lexical item weight can be calculated in incidence matrix M

Step 106：It is weighed with the lexical item of expansion word described in the final similarity calculation of corresponding keyword according to the expansion word Weight, and according to the retrieval of the lexical item weight of the expansion word and the relevant question and answer pair of the question sentence.

In embodiments of the present invention, for each expansion word, first, the retrieval model by community's question and answer searching system is true Then the original weight of fixed keyword corresponding with the expansion word calculates extension lexical item weight：Extend lexical item weight=key word item The similarity of original weight * extensions lexical item and key word item；Then, retrieval model obtains retrieval knot using extension lexical item weight Fruit, the retrieval result may include not only question and answer to that can also include the marking value to question and answer pair.

Wherein, the retrieval model is：Vector space model (Vector space model, VSM) or probabilistic model Okapi BM25 or language model (Language Model, LM) or other models.The retrieval model includes but not limited to Above-mentioned model.

To sum up, the specific lexical item weight for calculating the expansion word in the following manner of step 106：

Obtain the original weight of the corresponding keyword of the expansion word；By the original weight and the expansion word with it is corresponding The product of the final similarity of keyword, the lexical item weight as the expansion word.

Step 107：By all question and answer that step 105 and step 106 retrieve to being ranked up display according to preset rules.

Step 105 and step 106 obtain question and answer to while, can also further obtain the marking to each question and answer pair Value, the embodiment of the present invention can show the question and answer that step 105 and step 106 obtain to being ranked up according to marking value, that is, give a mark It is worth high question and answer in preceding display, marking is worth low question and answer in rear display.

It should be noted that step 101 and step 106 can be executed in any position before step 107, step 101 In preceding step 106 after, for example step 106 is moved to after step 101 and is executed, or by step 101 move to step 106 it Preceding execution etc. is not limited this present invention.

Question and answer provided in an embodiment of the present invention are important by being arranged for dependence different in question sentence to search method Spend weight, it may be determined that the association tight ness rating in question sentence between lexical item can further determine that each lexical item in question sentence according to the degree of association Lexical item weight, the important lexical item being can be found that in question sentence by merging the obtained lexical item weight of importance weight, to obtain With the more relevant question and answer of question sentence to retrieval result, solving existing community's question and answer searching system, not account for question sentence structure multiple Miscellaneous and clause is tediously long and the shortcomings that can not find question sentence important lexical item, and then improve the purpose of retrieval result accuracy.

It is the composition schematic diagram of question and answer searching system in community's provided in an embodiment of the present invention referring to Fig. 5, including：

Keyword extracting unit 501, for extracting at least one keyword from question sentence；

Keyword expansion unit 502, the extension of each keyword for obtaining the acquisition of the keyword extracting unit 501 The final similarity of word and each expansion word and corresponding keyword；

Relationship analysis unit 503, it is interdependent between each two lexical item with grammar association in the question sentence for analyzing Relationship；

Weights determination unit 504, for being set according to the dependence obtained in advance for the relationship analysis unit 503 analysis Fixed importance weight determines the distance weighting for reflecting tightness degree between each two lexical item in the question sentence；

Degree of association determination unit 505, distance weighting for being determined according to the weights determination unit 504 determine described in ask The degree of association in sentence between each two lexical item；

First weight determining unit 506, described in the degree of association determination for being determined according to the degree of association determination unit 505 The lexical item weight of each lexical item in question sentence；

First retrieval unit 507, each lexical item in the question sentence for being determined according to first weight determining unit 506 Lexical item weight is retrieved and the relevant question and answer pair of the question sentence；

Second weight determining unit 508, for according to the keyword expansion unit 501 obtained expansion word of extension with it is right Answer the lexical item weight of expansion word described in the final similarity calculation of keyword；

Second retrieval unit 509, the lexical item weight of the expansion word for being determined according to second weight determining unit 508 Retrieval and the relevant question and answer pair of the question sentence；

Retrieval result display unit 510, for examining first retrieval unit 507 and second retrieval unit 509 All question and answer that rope goes out according to preset rules to being ranked up display.

In embodiments of the present invention, the keyword expansion unit 502, including：

In embodiments of the present invention, the weights determination unit 504, including：

In embodiments of the present invention, the degree of association determination unit 505, specifically for calculating described ask according to the following equation Lexical item t in sentence_iWith lexical item t_jBetween degree of association w_rel(i,j)：

w_rel(i,j)=λ Dep (t_i,t_j)+(1-λ)Close_pmi(t_i,t_j)；

λ is regulatory factor；

B is a constant more than 1；

In embodiments of the present invention, first weight determining unit 506 is specifically used for described in calculating according to the following equation The weight matrix of each lexical item final weight composition in question sentence

Wherein, α is given constant；

In embodiments of the present invention, second weight determining unit 508, including：

Question and answer searching system in community's provided in an embodiment of the present invention passes through the weight for dependence setting different in question sentence Spend weight, it may be determined that the association tight ness rating in question sentence between lexical item can further determine that each word in question sentence according to the degree of association The lexical item weight of item, the important lexical item being can be found that in question sentence by merging the lexical item weight that importance weight obtains, to It has arrived with the more relevant question and answer of question sentence to retrieval result, has solved existing community's question and answer searching system and do not account for question sentence structure Complicated and clause is tediously long and the shortcomings that can not find question sentence important lexical item, and then improve the purpose of retrieval result accuracy.

As seen through the above description of the embodiments, those skilled in the art can be understood that above-mentioned implementation All or part of step in example method can add the mode of required general hardware platform to realize by software.Based on such Understand, substantially the part that contributes to existing technology can be in the form of software products in other words for technical scheme of the present invention It embodies, which can be stored in a storage medium, such as ROM/RAM, magnetic disc, CD, including several Instruction is used so that a computer equipment (can be the network communications such as personal computer, server, or Media Gateway Equipment, etc.) execute method described in certain parts of each embodiment of the present invention or embodiment.

It should be noted that for system disclosed in embodiment, since it is corresponded to the methods disclosed in the examples, So description is fairly simple, reference may be made to the description of the method.

It should also be noted that, herein, relational terms such as first and second and the like are used merely to one Entity or operation are distinguished with another entity or operation, without necessarily requiring or implying between these entities or operation There are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to contain Lid non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also include other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.

The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest range caused.

Claims

1. a kind of question and answer are to search method, which is characterized in that including：

At least one keyword is extracted from question sentence, and obtain each keyword expansion word and each expansion word and it is corresponding close The final similarity of keyword；

According to the importance weight for being in advance dependence setting, determine tight between each two lexical item in the reflection question sentence The distance weighting of close degree；

The lexical item weight of each lexical item in the question sentence is determined according to the degree of association, and according to each lexical item in the question sentence Lexical item weight is retrieved and the relevant question and answer pair of the question sentence；

According to the lexical item weight of the expansion word and expansion word described in the final similarity calculation of corresponding keyword, and according to described The retrieval of lexical item weight and the relevant question and answer pair of the question sentence of expansion word；

2. according to the method described in claim 1, it is characterized in that, the expansion word for obtaining each keyword and each expansion The final similarity of word and corresponding keyword is opened up, including：

It obtains at least one expansion word of each keyword respectively using Hownet HowNet, and defines each expansion word and closed with corresponding The initial similarity of keyword is 1；

It obtains at least one expansion word of each keyword respectively using Chinese thesaurus, and defines each expansion word and closed with corresponding The initial similarity of keyword is 1；

Using the text depth representing model word2vec after trained, at least one expansion word of each keyword is obtained respectively And the initial similarity of each expansion word and corresponding keyword；

Merge the identical expansion word got, calculates separately the final similarity of each expansion word and corresponding keyword after merging S_R, wherein S_R=S_sum/ 3, S_sumFor the sum of corresponding all initial similarities of the expansion word.

3. according to the method described in claim 1, it is characterized in that, the basis is the important of dependence setting in advance Weight is spent, determines the distance weighting for reflecting tightness degree between each two lexical item in the question sentence, including：

The distance between the first lexical item and each second lexical item weight D are calculated separately, first lexical item is in the question sentence Any one lexical item, second lexical item are that there are the lexical items of the dependence with first lexical item；

Wherein,Y be in advance between first lexical item and second lexical item dependence setting it is important Weight is spent, is worth on the basis of α；

The distance between first lexical item and each third lexical item weight Dis are calculated separately, the third lexical item is described asks Any one lexical item in sentence in addition to first lexical item, Dis are existing between first lexical item and the third lexical item The sum of corresponding at least one distance weighting D of at least one dependence.

4. according to the method described in claim 3, it is characterized in that, described determine in the question sentence often according to the distance weighting The degree of association between two lexical items, including：

w_rel(i,j)=λ Dep (t_i,t_j)+(1-λ)Close_pmi(t_i,t_j)；

Wherein,

t_iIndicate i-th of lexical item in the question sentence, t_jIndicate j-th of lexical item in the question sentence, i=1,2 ... n, j=1, 2 ... n, n are the lexical item sum in the question sentence；

λ is regulatory factor；

B is a constant more than 1；

Lexical item t is concentrated for question sentence_iWith lexical item t_jThe probability occurred jointly, N_d(t_i,t_j) it is that question sentence is concentrated Lexical item t_iWith lexical item t_jThe number of the question sentence occurred simultaneously, N_DThe question sentence sum concentrated for question sentence；

WithLexical item t is indicated respectively_iWith lexical item t_jEach leisure question sentence concentrates the probability occurred, N_d (t_i) it is that question sentence is concentrated comprising lexical item t_iQuestion sentence sum, N_d(t_j) it is that question sentence is concentrated comprising lexical item t_jQuestion sentence sum, N_DFor The question sentence sum that question sentence is concentrated.

5. method according to any one of claims 1 to 4, which is characterized in that described according to described in degree of association determination The lexical item weight of each lexical item in question sentence, including：

The weight matrix W of each lexical item final weight composition in the question sentence is calculated according to the following equation_q ^*：

Wherein, α is given constant；

E is that incidence matrix M is carried out the random matrix after orthogonal transformation, and the incidence matrix M is each two word in the question sentence The symmetrical matrix that the degree of association between is formed；

W_q ⁰For the weight matrix of the original weight composition of each lexical item in the question sentence.

6. according to the method described in claim 1, it is characterized in that, described according to the final of the expansion word and corresponding keyword The lexical item weight of expansion word described in similarity calculation, including：

By the product of the original weight and the expansion word and the final similarity of corresponding keyword, as the expansion word Lexical item weight.

7. a kind of community's question and answer searching system, which is characterized in that including：

Keyword expansion unit, expansion word for obtaining each keyword that the keyword extracting unit obtains and each The final similarity of expansion word and corresponding keyword；

Relationship analysis unit, for analyzing the dependence in the question sentence between each two lexical item with grammar association；

Weights determination unit, for according to the importance for being in advance the dependence setting that the relationship analysis unit is analyzed Weight determines the distance weighting for reflecting tightness degree between each two lexical item in the question sentence；

Degree of association determination unit, the distance weighting for being determined according to the weights determination unit determine each two in the question sentence The degree of association between lexical item；

First weight determining unit, the degree of association for being determined according to the degree of association determination unit determine each in the question sentence The lexical item weight of lexical item；

First retrieval unit, the lexical item weight inspection of each lexical item in the question sentence for being determined according to first weight determining unit Rope and the relevant question and answer pair of the question sentence；

Second weight determining unit, the expansion word for being obtained according to the keyword expansion unit extensions and corresponding keyword The lexical item weight of expansion word described in final similarity calculation；

Second retrieval unit, the lexical item weight of the expansion word for being determined according to second weight determining unit retrieve with it is described The relevant question and answer pair of question sentence；

Retrieval result display unit, all question and answer for retrieving first retrieval unit and second retrieval unit To being ranked up display according to preset rules.

8. system according to claim 7, which is characterized in that the keyword expansion unit, including：

Hownet expansion module, at least one expansion word for obtaining each keyword respectively using Hownet HowNet, and define Each expansion word is 1 with the initial similarity of corresponding keyword；

Word woods expansion module, at least one expansion word for obtaining each keyword respectively using Chinese thesaurus, and define Each expansion word is 1 with the initial similarity of corresponding keyword；

Model extension module, it is each crucial for using the text depth representing model word2vec after trained, obtaining respectively The initial similarity of at least one expansion word and each expansion word and corresponding keyword of word；

Similarity calculation module, for merging the Hownet expansion module, institute's predicate woods expansion module and the model extension mould The identical expansion word that block is got calculates separately the final similarity S of each expansion word and corresponding keyword after merging_R, In, S_R=S_sum/ 3, S_sumFor the sum of corresponding all initial similarities of the expansion word.

9. system according to claim 7, which is characterized in that the weights determination unit, including：

First weight computation module, for calculating separately the distance between the first lexical item and each second lexical item weight D, described One lexical item is any one lexical item in the question sentence, and second lexical item is that there are the dependences with first lexical item Lexical item；

Second weight computation module, for calculating separately the distance between first lexical item and each third lexical item weight Dis, The third lexical item is any one lexical item in addition to first lexical item in the question sentence, and Dis is first lexical item and institute Corresponding at least one first weight computation module of existing at least one dependence between third lexical item is stated to calculate The sum of distance weighting D arrived.

10. system according to claim 9, which is characterized in that the degree of association determination unit is specifically used for according to following Formula calculates lexical item t in the question sentence_iWith lexical item t_jBetween degree of association w_rel(i,j)：

w_rel(i,j)=λ Dep (t_i,t_j)+(1-λ)Close_pmi(t_i,t_j)；

Wherein,

λ is regulatory factor；

B is a constant more than 1；

11. according to claim 7 to 10 any one of them system, which is characterized in that first weight determining unit, specifically Weight matrix W for calculating each lexical item final weight composition in the question sentence according to the following equation_q ^*：

Wherein, α is given constant；

12. system according to claim 7, which is characterized in that second weight determining unit, including：

Second weight determination module, original weight and the expansion word for obtaining the original Weight Acquisition module with it is right The product for answering the final similarity of keyword, the lexical item weight as the expansion word.