CN103577556A

CN103577556A - Device and method for obtaining association degree of question and answer pair

Info

Publication number: CN103577556A
Application number: CN201310495641.4A
Authority: CN
Inventors: 孙林; 陈培军; 秦吉胜
Original assignee: Beijing Qihoo Technology Co Ltd; Qizhi Software Beijing Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2013-10-21
Filing date: 2013-10-21
Publication date: 2014-02-12
Anticipated expiration: 2033-10-21
Also published as: CN103577556B

Abstract

The invention discloses a device and a method for obtaining the association degree of a question and answer pair. The method comprises the following steps: carrying out word extraction on question content and answer content of a question and answer pair to be analyzed to obtain at least one question word to be analyzed and at least one answer word to be analyzed; selecting at least one question and answer knowledge record from a question and answer knowledge library including a plurality of question and answer knowledge records according to the question words to be analyzed and the answer words to be analyzed, and calculating the association degree of the question and answer pair to be analyzed according to the selected question and answer knowledge records. According to the device and the method, the quality of the question and answer pair can be semantically evaluated, and the evaluation effect is good; in addition, the method is easy to implement and good in universality.

Description

A kind of apparatus and method of obtaining the degree that is associated that question and answer are right

Technical field

The present invention relates to network data communication field, be specifically related to a kind of apparatus and method of obtaining the degree that is associated that question and answer are right.

Background technology

Ask-Answer Community is the network application that a kind of user produces content, and citation form is to be asked a question according to the demand of oneself by user, and provides answer by other user.This form provides new channel for user's obtaining information on network.Yet due to any user content creating optionally, caused the information quality difference in Ask-Answer Community very large, to such an extent as in Ask-Answer Community, occurred a large amount of inferior quality question and answer pair.This has not only brought inconvenience to user's information of searching, and has also reduced the quality of Ask-Answer Community simultaneously.Meanwhile, the method for prior art, right non-text feature is evaluated question and answer to quality to depend on more question and answer, can affect its versatility.

Summary of the invention

In view of the above problems, the present invention has been proposed to a kind of a kind of method of obtaining the device of the degree that is associated that question and answer are right and obtaining accordingly the degree that is associated that question and answer are right that overcomes the problems referred to above or address the above problem is at least in part provided.

According to one aspect of the present invention, a kind of device that obtains the degree that is associated that question and answer are right is provided, this device comprises:

Question and answer knowledge base, is suitable for storing many question and answer knowledge records;

Word extraction unit, is suitable for the right problem content of question and answer to be analyzed and answer content to carry out word extraction operation, obtains at least one problem word to be analyzed and at least one answer word to be analyzed;

The degree that is associated computing unit, is suitable for, according to problem word to be analyzed and answer word to be analyzed, from question and answer knowledge base, selecting at least one question and answer knowledge record, according to selected question and answer knowledge record, calculates the degree that is associated that question and answer to be analyzed are right.

Alternatively, this device further comprises question and answer construction of knowledge base unit, described question and answer construction of knowledge base unit, is suitable for that right webpage extracts a plurality of question and answer pair from containing question and answer in advance, according to the question and answer of extracting to building the question and answer knowledge base that comprises many question and answer knowledge records; Described question and answer construction of knowledge base unit, be further adapted for from the right webpage that contains question and answer, extract a plurality of question and answer to time, capture with described question and answer corresponding classification; Described question and answer construction of knowledge base unit, be further adapted for according to the question and answer of extracting when building question and answer knowledge base, according to question and answer to with described question and answer, corresponding classification is built to question and answer knowledge record; Each question and answer knowledge record, corresponding to a classification, comprises respectively a problem word, an answer word, and the semantic relevancy between described problem word and described answer word.

Alternatively, described in the degree computing unit that is associated, be suitable for choosing the question and answer knowledge record of it problem word comprising and problem word match to be analyzed and the answer word comprising and answer word match to be analyzed; According in the described question and answer knowledge record of choosing corresponding to the question and answer knowledge record of identical category, obtain these question and answer to be analyzed to the degree that is associated for each classification; Choose the maximal value of above-mentioned these question and answer to be analyzed to the degree that is associated for each classification, using this maximal value as the right degree that is associated of question and answer to be analyzed.

Alternatively, described in the degree computing unit that is associated, be suitable for by the question and answer knowledge record of choosing corresponding to the semantic relevancy weighting summation of the question and answer knowledge record of identical category, obtain these question and answer to be analyzed to respectively for the degree that is associated of each classification.

Alternatively, described word extraction unit, is suitable for the right problem content of question and answer to be analyzed and answer content to carry out participle, removal stop words, word merging, and the operation of extracting entity word.

Alternatively, described question and answer construction of knowledge base unit, is suitable for each question and answer carrying out following operation: the right problem content of these question and answer and answer content are carried out to word and extract operation, obtain problem set of words and answer set of words; Make each problem word in problem set of words and each answer word in answer set of words form an information recording with these question and answer on to each corresponding classification respectively; Described question and answer construction of knowledge base unit, be suitable for each information recording, carry out following operation: calculate this answer word and belong to such other probability, calculating is the single-minded degree of this answer word to the explanation of this problem word in this classification, calculates the intensity that this problem word makes an explanation with this answer word in this classification; Above-mentioned probability, single-minded degree and intensity are multiplied each other, and resulting product is the semantic relevancy of this answer word and this problem word; Make this problem word, this answer word and its semantic relevancy form one corresponding to such other question and answer knowledge record.

Alternatively, described question and answer construction of knowledge base unit, is suitable for calculating as follows this answer word and belongs to such other probability:

P (Ck | AWj) = P (AWj | Ck) * \frac{P (Ck)}{P (AWj)};

Described question and answer construction of knowledge base unit, is suitable for calculating as follows the single-minded degree of each answer word to the explanation of this problem word in this classification:

apecific (QWi, AWi | C = Ck) = P (QWi | AWj, C = Ck) = \frac{# (QWi, AWj)}{# (AWj)} | C = Ck;

Described question and answer construction of knowledge base unit, is suitable for calculating as follows the intensity that this problem word makes an explanation with each answer word in this classification:

interpret (QWi, AWj | C = Ck) = P (AWj | QWi, C = Ck) = \frac{# (QWi, AWj)}{Σ_{j = 1}^{x} # (QWi, AWj)} | C = Ck;

Described question and answer construction of knowledge base unit, is suitable for as follows above-mentioned probability, single-minded degree and intensity being multiplied each other:

weight（QWi,AWj|C＝Ck）＝P（Ck|AWj）*specific（QWi,AWj|C＝Ck）*interpret（QWi,AWj|C＝Ck）；

Wherein, the probability that P(Ck) represents classification Ck appearance; P(AWj) represent the probability that answer is AWj; P(AWj │ Ck) represent that Ck classification belongs to the probability of AWj;

#(QWi, AWj) problem of representation word is the number of times that QWi and answer word are AWj;

#(AWj) represent the number of times that answer word is AWj.

According to a further aspect in the invention, provide a kind of method of obtaining the degree that is associated that question and answer are right, the method comprises the steps:

The right problem content of question and answer to be analyzed and answer content are carried out to word extraction operation, obtain at least one problem word to be analyzed and at least one answer word to be analyzed;

According to problem word to be analyzed and answer word to be analyzed, from comprising that the question and answer knowledge base of many question and answer knowledge records selects at least one question and answer knowledge record, according to selected question and answer knowledge record, calculate the degree that is associated that question and answer to be analyzed are right.

Alternatively, the method further comprises: from containing question and answer, right webpage extracts a plurality of question and answer pair in advance, according to the question and answer of extracting, structure is comprised the question and answer knowledge base of many question and answer knowledge records; From the right webpage that contains question and answer, extract a plurality of question and answer to time, capture with described question and answer corresponding classification; According to the question and answer of extracting when building question and answer knowledge base, according to question and answer to with described question and answer, corresponding classification is built to question and answer knowledge record; Each question and answer knowledge record, corresponding to a classification, comprises respectively a problem word, an answer word, and the semantic relevancy between described problem word and described answer word.

Alternatively, described according to problem word to be analyzed and answer word to be analyzed, from question and answer knowledge base, select at least one question and answer knowledge record, according to selected question and answer knowledge record, calculate the degree that is associated that question and answer to be analyzed are right, specifically comprise: the question and answer knowledge record of choosing it problem word comprising and problem word match to be analyzed and the answer word comprising and answer word match to be analyzed; According in the described question and answer knowledge record of choosing corresponding to the question and answer knowledge record of identical category, obtain these question and answer to be analyzed to the degree that is associated for each classification; Choose the maximal value of above-mentioned these question and answer to be analyzed to the degree that is associated for each classification, using this maximal value as the right degree that is associated of question and answer to be analyzed.

Alternatively, according in the described question and answer knowledge record of choosing corresponding to the question and answer knowledge record of identical category, obtain these question and answer to be analyzed to respectively for the degree that is associated of each classification, specifically comprise: by the question and answer knowledge record of choosing corresponding to the semantic relevancy weighting summation of the question and answer knowledge record of identical category, obtain these question and answer to be analyzed to respectively for the degree that is associated of each classification.

Alternatively, describedly the right problem content of described question and answer to be analyzed and answer content are carried out to word extract operation, specifically comprise: the right problem content of question and answer to be analyzed and answer content are carried out to participle, removal stop words, word merging, and the operation of extracting entity word.

Alternatively, described according to question and answer to with described question and answer, corresponding classification is built to question and answer knowledge base, specifically comprise: to each question and answer pair, the right problem content of these question and answer and answer content are carried out to word extraction operation, obtain problem set of words and answer set of words; Make each problem word in problem set of words and each answer word in answer set of words form an information recording with these question and answer on to each corresponding classification respectively; To each information recording, carry out following operation: calculate this answer word and belong to such other probability, calculating is the single-minded degree of this answer word to the explanation of this problem word in this classification, calculates the intensity that this problem word makes an explanation with this answer word in this classification; Above-mentioned probability, single-minded degree and intensity are multiplied each other, and resulting product is the semantic relevancy of this answer word and this problem word; Make this problem word, this answer word and its semantic relevancy form one corresponding to such other question and answer knowledge record.

Alternatively, this answer word of described calculating belongs to such other probability, specifically comprises:

P (Ck | AWj) = P (AWj | Ck) * \frac{P (Ck)}{P (AWj)};

Described calculating is the single-minded degree of each answer word to the explanation of this problem word in this classification, specifically comprises:

apecific (QWi, AWi | C = Ck) = P (QWi | AWj, C = Ck) = \frac{# (QWi, AWj)}{# (AWj)} | C = Ck;

The described calculating intensity that this problem word makes an explanation with each answer word in this classification, specifically comprises:

interpret (QWi, AWj | C = Ck) = P (AWj | QWi, C = Ck) = \frac{# (QWi, AWj)}{Σ_{j = 1}^{x} # (QWi, AWj)} | C = Ck;

Above-mentioned probability, single-minded degree and intensity are multiplied each other, specifically comprise:

#(AWj) represent the number of times that answer word is AWj.

According to technical scheme of the present invention, from the right webpage that contains question and answer extract a plurality of question and answer to and according to the question and answer of extracting to building the question and answer knowledge base that comprises many question and answer knowledge records, the right problem content of question and answer to be analyzed and answer content are carried out word extraction operation and obtain at least one problem word to be analyzed and at least one answer word to be analyzed, and then select at least one question and answer knowledge record and calculate according to selected question and answer knowledge record the degree that is associated that question and answer to be analyzed are right from question and answer knowledge base according to problem word to be analyzed and answer word to be analyzed, can evaluate the right quality of question and answer from semantic aspect, solve prior art and only in morphology aspect, evaluated the quality that question and answer are right and the not good problem of evaluation effect causing, and easily realize, highly versatile.

Accompanying drawing explanation

By reading below detailed description of the preferred embodiment, various other advantage and benefits will become cheer and bright for those of ordinary skills.Accompanying drawing is only for the object of preferred implementation is shown, and do not think limitation of the present invention.And in whole accompanying drawing, by identical reference symbol, represent identical parts.In the accompanying drawings:

Fig. 1 shows the process flow diagram of the method for obtaining according to an embodiment of the invention the degree that is associated that question and answer are right;

Fig. 2 shows the detailed process flow diagram that builds question and answer knowledge base;

Fig. 3 shows step as shown in Figure 2 of use and an interpretation model schematic diagram of the question and answer knowledge base that obtains;

Fig. 4 shows the detailed process flow diagram of step S200 in Fig. 1; And

Fig. 5 shows the block diagram of the device that obtains according to an embodiment of the invention the degree that is associated that question and answer are right;

Fig. 6 shows the block diagram of the device that obtains in accordance with another embodiment of the present invention the degree that is associated that question and answer are right.

Embodiment

The existing method of obtaining the degree that is associated that question and answer are right is to describe with text feature and non-text feature problem and the answer that question and answer are right.Text feature mainly comprises text visual signature (punctuation mark density for example, average word is long, text entropy etc.) and content of text feature (content of text word ratio for example, interrogative density, and extract the Chinese feature that mistake extensively adopts automatically (such as individual character density feature etc.) related term covering etc.); The technorati authority index that non-text feature comprises user, answer problem state, answer response time, customer relationship interaction feature etc.Problem and answer are being extracted respectively after feature, on training set, learning out respectively a problem prediction of quality model and answer prediction of quality model, and evaluate question and answer to quality with the Output rusults of two models.Yet, while using the existing method of obtaining the degree that is associated that question and answer are right to evaluate for answer quality, only used related term Cover Characteristics to carry out the semantic matches degree between description problem and answer, this not only only rests in morphology aspect, and do not consider a problem and answer between semantic matches degree.Yet the semantic matches degree between problem and answer is the core of question and answer to quality exactly, such as problem for " China capital where be? ", answer 1 is " Beijing ", answer 2 is " capital of China is Shanghai ".Problem, through participle and after abandoning stop words and processing, is " the Chinese capital where " so, and answer 1 word segmentation result is " Beijing ", and answer 2 word segmentation result are " the Chinese capital Shanghai ".In prior art, semantic matches degree can be defined as: in problem and answer, the common word number occurring is divided by the number of all words in problem and answer.The semantic matches degree of problem and answer 1 is: 0/4=0.The semantic matches degree of problem and answer 2 is: 2/4=0.5.Use prior art, will think that answer 2 and problem comparatively mate.And we know that this is obviously improperly.

Exemplary embodiment of the present disclosure is described below with reference to accompanying drawings in more detail.Although shown exemplary embodiment of the present disclosure in accompanying drawing, yet should be appreciated that and can realize the disclosure and the embodiment that should do not set forth limits here with various forms.On the contrary, it is in order more thoroughly to understand the disclosure that these embodiment are provided, and can by the scope of the present disclosure complete convey to those skilled in the art.

Fig. 1 shows the process flow diagram of the method for obtaining according to an embodiment of the invention the degree that is associated that question and answer are right.According to a further aspect in the invention, provide a kind of method of obtaining the degree that is associated that question and answer are right, the method comprises the steps S100 and step S200:

S100, the right problem content of question and answer to be analyzed and answer content are carried out to word extract operation, obtain at least one problem word to be analyzed and at least one answer word to be analyzed.

In one embodiment of the invention, the right problem content of question and answer to be analyzed and answer content are carried out to word to be extracted operation and specifically comprises: to the right problem content of question and answer to be analyzed and answer content carry out participle, remove stop words, word merges (word join), and extracts the operation of entity word (such as noun, verb etc.).By the right problem content of question and answer to be analyzed, obtain at least one problem word to be analyzed, by the right answer content of question and answer to be analyzed, obtain at least one answer word to be analyzed.

S200, according to problem word to be analyzed and answer word to be analyzed, from comprising that the question and answer knowledge base of many question and answer knowledge records selects at least one question and answer knowledge record, according to selected question and answer knowledge record, calculate the degree that is associated that question and answer to be analyzed are right.

The step S200 of the present embodiment, can be by utilizing question and answer knowledge base to analyze to obtain from semantic aspect to the right problem content of question and answer to be analyzed and answer content the degree that is associated that question and answer to be analyzed are right, and evaluation effect better and is easily realized.

Further, described in comprise the question and answer knowledge base of many question and answer knowledge records, be by from containing question and answer, right webpage extracts a plurality of question and answer pair in advance, according to the question and answer of extracting, structure is obtained.In one embodiment of the invention, from the right webpage that contains question and answer, extract a plurality of question and answer to time, capture with described question and answer corresponding classification.According to the question and answer of extracting when building question and answer knowledge base, according to question and answer to with described question and answer, corresponding classification is built to question and answer knowledge record.Each question and answer knowledge record among the question and answer knowledge base obtaining, corresponding to a classification, comprises respectively a problem word (QW), an answer word (AW), and the semantic relevancy between described problem word and described answer word.

By utilize the magnanimity extracted by webpage, high-quality question and answer are to building the question and answer knowledge base that comprises many question and answer knowledge records, can be based on the study of magnanimity information is obtained to the problem word of many question and answer knowledge records and the semantic relevancy between answer word; And extract by utilizing from webpage the information architecture question and answer knowledge base obtaining, and applicable is wider, and the versatility of method is stronger.

Fig. 2 shows the detailed process flow diagram that builds question and answer knowledge base.Specifically comprise the following steps S310, step S320 and step S330:

S310, from containing question and answer, right webpage extracts a plurality of question and answer pair in advance, captures with described question and answer corresponding classification.

In the present embodiment, can, by using web crawlers, from internet, contain the webpage that high-quality question and answer are right and capture data and extract question and answer pair, the right quality of question and answer of being extracted to guarantee; Describedly contain high-quality question and answer right webpage comprises cQA community, each large professional forum etc., can use floor recognition technology, according to building-owner, ask a question, 1st floor 2nd floors etc. is the mode of answer, extracts question and answer pair.Due to described, contain high-quality question and answer right webpage comprises corresponding to the right classification information of each question and answer, so can capture in the lump with described question and answer corresponding classification in right capturing question and answer.

S320, to each question and answer pair, the right problem content of these question and answer and answer content are carried out to word and extract operation, obtain problem set of words and answer set of words; Make each problem word in problem set of words and each answer word in answer set of words form an information recording with these question and answer on to each corresponding classification respectively.

In one embodiment of the invention, to extracting right problem content and the answer content of each question and answer of the described question and answer centering obtaining in step S310, carry out word extraction operation, specifically comprise, the right problem content of question and answer and answer content are carried out to participle, removal stop words, word merging, and the operation of extracting entity word.

By the right problem content of each question and answer, obtain at least one problem word, by the right answer content of each question and answer, obtain at least one answer word, can obtain for the right classification set <C of these question and answer ₁..., C _k..., C _p>, problem set of words <QW ₁..., QW _i..., QW _m> and answer set of words <AW ₁..., AW _j..., AW _n>.

By making each the problem word (QW in problem set of words _i) with answer set of words in each answer word (AW _j) respectively with these question and answer to each corresponding classification (C _k) upper formation information recording, for example a <QW _i, AW _j, C _k>, can form m*n*p bar information recording.

S330, to each information recording, carry out following operation: calculate this answer word and belong to such other probability, calculating is the single-minded degree of this answer word to the explanation of this problem word in this classification, calculates the intensity that this problem word makes an explanation with this answer word in this classification; Above-mentioned probability, single-minded degree and intensity are multiplied each other, and resulting product is the semantic relevancy of this answer word and this problem word; Make this problem word, this answer word and its semantic relevancy form one corresponding to such other question and answer knowledge record <QW _i, AW _j, weight(QW _i, AW _j) > or <QW _i, AW _j, C _k, weight(QW _i, AW _j) >.Step S330 in the present embodiment, can be after the information recording that the question and answer of the magnanimity capturing from webpage is obtained to magnanimity to having carried out word as described in step S320 and extracting operation based on as described in the information recording of magnanimity carry out, the information recording based on magnanimity and the semantic relevancy that obtains is more accurate.

Preferably, this answer word of described calculating belongs to such other probability, specifically comprises:

P (Ck | AWj) = P (AWj | Ck) * \frac{P (Ck)}{P (AWj)};

apecific (QWi, AWi | C = Ck) = P (QWi | AWj, C = Ck) = \frac{# (QWi, AWj)}{# (AWj)} | C = Ck;

interpret (QWi, AWj | C = Ck) = P (AWj | QWi, C = Ck) = \frac{# (QWi, AWj)}{Σ_{j = 1}^{x} # (QWi, AWj)} | C = Ck;

#(AWj) represent the number of times that answer word is AWj.

By step S310, step S320 and step S330, can obtain question and answer knowledge record and build question and answer knowledge base.Fig. 3 shows step as shown in Figure 2 of use and an interpretation model schematic diagram of the question and answer knowledge base that obtains.Known, for each problem word QW _i, can be for classification set <C ₁..., C _k..., C _peach classification in >, obtains n bar question and answer knowledge record.Certainly, those skilled in the art are scrutable, if the semantic relevancy calculating is 0, can delete corresponding question and answer knowledge record; Moreover, if the quantity of question and answer knowledge record is excessive and make to store question and answer knowledge record and calculate the expense of the degree that is associated that question and answer to be analyzed are right excessive in question and answer knowledge base, can preset a threshold value, the question and answer knowledge record that semantic relevancy is less than to threshold value deletes to reduce expense.

Fig. 4 shows the detailed process flow diagram of step S200 in Fig. 1.Obtain at least one problem word to be analyzed and at least one answer word to be analyzed by step S100 after, step S200 specifically comprises the following steps S210, step S220 and step S230:

S210, choose the question and answer knowledge record of it problem word comprising and problem word match to be analyzed and the answer word comprising and answer word match to be analyzed.In the present embodiment, problem word refers to problem word match to be analyzed the substring that problem word to be analyzed is identical with problem word or problem word to be analyzed is problem word; Answer word refers to answer word match to be analyzed the substring that answer word to be analyzed is identical with answer word or answer word to be analyzed is answer word, the present embodiment is by step S210, use the method for fields match or field search, from question and answer knowledge base, select part to question and answer to be analyzed to relevant question and answer knowledge record.

S220, according to described in the question and answer knowledge record chosen corresponding to the question and answer knowledge record of identical category, obtain these question and answer to be analyzed to respectively for the degree that is associated of each classification, specifically comprise: by the question and answer knowledge record of choosing corresponding to the semantic relevancy weighting summation of the question and answer knowledge record of identical category, obtain these question and answer to be analyzed to respectively for the degree that is associated of each classification.

The present embodiment, divides into groups the question and answer knowledge record of selecting by step S210 according to its corresponding classification, corresponding to the question and answer knowledge record of identical category, be one group; The semantic relevancy weighting of the question and answer knowledge record of each group (for example, weights are 1 or 100) is added, obtains these question and answer to be analyzed to the degree that is associated for such other; Degree is associated to obtain thus at least one (number of the degree that is associated in the present embodiment is the numbers of question and answer to be analyzed to corresponding classification).

S230, choose the maximal value of above-mentioned these question and answer to be analyzed to the degree that is associated for each classification, using this maximal value as the right degree that is associated of question and answer to be analyzed.

Fig. 5 shows the block diagram of the device that obtains according to an embodiment of the invention the degree that is associated that question and answer are right.This device comprises question and answer knowledge base 100, word extraction unit 200 and the degree computing unit 300 that is associated.

Question and answer knowledge base 100, is suitable for storing many question and answer knowledge records; The question and answer knowledge base 100 of the present embodiment can obtain building by the magnanimity question and answer that capture in webpage.

Word extraction unit 200, is suitable for the right problem content of question and answer to be analyzed and answer content to carry out word extraction operation, obtains at least one problem word to be analyzed and at least one answer word to be analyzed.

In one embodiment of the invention, word extraction unit 200, be suitable for the right problem content of question and answer to be analyzed and answer content to carry out participle, removal stop words, word merging (word join), with the operation of extracting entity word (such as noun, verb etc.), to obtain at least one problem word to be analyzed and at least one answer word to be analyzed.

The degree that is associated computing unit 300, is suitable for, according to problem word to be analyzed and answer word to be analyzed, from question and answer knowledge base, selecting at least one question and answer knowledge record, according to selected question and answer knowledge record, calculates the degree that is associated that question and answer to be analyzed are right.

In one embodiment of the invention, the degree that is associated computing unit 300, is suitable for choosing the question and answer knowledge record of it problem word comprising and problem word match to be analyzed and the answer word comprising and answer word match to be analyzed.In the present embodiment, problem word refers to problem word match to be analyzed the substring that problem word to be analyzed is identical with problem word or problem word to be analyzed is problem word, answer word refers to answer word match to be analyzed the substring that answer word to be analyzed is identical with answer word or answer word to be analyzed is answer word, according in the described question and answer knowledge record of choosing corresponding to the question and answer knowledge record of identical category, obtain these question and answer to be analyzed to the degree that is associated for each classification, more specifically, be by the question and answer knowledge record of choosing corresponding to the semantic relevancy weighting of the question and answer knowledge record of identical category (for example, weights are 1 or 100) be added and obtain these question and answer to be analyzed to respectively for the degree that is associated of each classification, degree is associated to obtain thus at least one (number of the degree that is associated in the present embodiment is the numbers of question and answer to be analyzed to corresponding classification), choose the maximal value of above-mentioned these question and answer to be analyzed to the degree that is associated for each classification, using this maximal value as the right degree that is associated of question and answer to be analyzed.

Utilize question and answer knowledge base 100, word extraction unit 200 and the degree computing unit 300 that is associated, by utilizing problem word to be analyzed and answer word to be analyzed, from question and answer knowledge base, select at least one question and answer knowledge record, and calculate according to selected question and answer knowledge record the degree that is associated that question and answer to be analyzed are right, can be from semantic aspect to question and answer to be analyzed to analyzing, evaluation effect better and is easily realized, by utilizing from webpage, extract the information architecture question and answer knowledge base obtaining, applicable is wider, and versatility is stronger.

Fig. 6 shows the block diagram of the device that obtains in accordance with another embodiment of the present invention the degree that is associated that question and answer are right.In the present embodiment, this device also comprises question and answer construction of knowledge base unit 400, question and answer construction of knowledge base unit 400 is suitable for that right webpage extracts a plurality of question and answer pair from containing question and answer in advance, according to the question and answer of extracting to building the question and answer knowledge base that comprises many question and answer knowledge records.In the device shown in Fig. 5, question and answer knowledge base is existing, because the quantity of information of real network constantly increases, the pace of change of the information content is fast, the content of question and answer knowledge base often needs to upgrade, the present embodiment builds (upgrading in other words) question and answer knowledge base by setting up question and answer construction of knowledge base unit 400, can guarantee instantaneity and the reliability of the content of question and answer knowledge base.

Preferably, from the right webpage that contains question and answer, extract a plurality of question and answer to time, question and answer construction of knowledge base unit 400 captures with described question and answer corresponding classification.In the present embodiment, can, by using web crawlers, from internet, contain the webpage that high-quality question and answer are right and capture data and extract question and answer pair, the right quality of question and answer of being extracted to guarantee; Describedly contain high-quality question and answer right webpage comprises cQA community, each large professional forum etc.Question and answer construction of knowledge base unit 400 due to described, contain high-quality question and answer right webpage comprises corresponding to the right classification information of each question and answer, so can capture with described question and answer to corresponding classification in right in the lump capturing question and answer.

In the present embodiment, question and answer construction of knowledge base unit 400, be suitable for each question and answer carrying out following operation: the right problem content of these question and answer and answer content are carried out to word and extract operation, obtain problem set of words and answer set of words, particularly, the problem content that each question and answer of the described question and answer centering that the extraction of 400 pairs of question and answer construction of knowledge base unit obtains are right and answer content are carried out participle, are removed stop words, word merges, and extract the operation of entity word and obtain problem word and answer word; Make each problem word in problem set of words and each answer word in answer set of words form an information recording with these question and answer on to each corresponding classification respectively.Question and answer construction of knowledge base unit 400, be suitable for each information recording, carry out following operation: calculate this answer word and belong to such other probability, calculating is the single-minded degree of this answer word to the explanation of this problem word in this classification, calculates the intensity that this problem word makes an explanation with this answer word in this classification; Above-mentioned probability, single-minded degree and intensity are multiplied each other, and resulting product is the semantic relevancy of this answer word and this problem word; Make this problem word, this answer word and its semantic relevancy form one corresponding to such other question and answer knowledge record.

More specifically, question and answer construction of knowledge base unit 400, is suitable for calculating as follows this answer word and belongs to such other probability:

P (Ck | AWj) = P (AWj | Ck) * \frac{P (Ck)}{P (AWj)};

More specifically, question and answer construction of knowledge base unit 400, is suitable for calculating as follows the single-minded degree of each answer word to the explanation of this problem word in this classification:

apecific (QWi, AWi | C = Ck) = P (QWi | AWj, C = Ck) = \frac{# (QWi, AWj)}{# (AWj)} | C = Ck;

More specifically, question and answer construction of knowledge base unit 400, is suitable for calculating as follows the intensity that this problem word makes an explanation with each answer word in this classification:

interpret (QWi, AWj | C = Ck) = P (AWj | QWi, C = Ck) = \frac{# (QWi, AWj)}{Σ_{j = 1}^{x} # (QWi, AWj)} | C = Ck;

More specifically, question and answer construction of knowledge base unit 400, is suitable for as follows above-mentioned probability, single-minded degree and intensity being multiplied each other:

#(AWj) represent the number of times that answer word is AWj.

The effect of using embodiments of the invention to reach by an example explanation below, such as there being following question and answer pair, classification is " medical treatment & health ":

By participle technique, process, obtain problem word to be analyzed and answer word to be analyzed is as follows:

From word segmentation result, can find out in problem and answer, do not have related term to cover, if therefore use prior art, easily think that these question and answer are low to the degree of being associated, of low quality.But in fact use obvious known these question and answer of artificial judgment to being high-quality question and answer pair.

If use method and apparatus of the present invention to process above-mentioned question and answer pair, first, can transfer existing question and answer knowledge base, or by capturing the question and answer pair of cQA community, each large professional forum, build question and answer knowledge base;

Second step, to above-mentioned question and answer pair to be analyzed, extracts operation through word and obtains problem set of words child < to be analyzed, cough, nasal mucus >, answer set of words < symptom to be analyzed, medicine, treatment, antiviral, xiao'er ganmao granules, explanation, dosage, cough-relieving, Chinese medicine, electuary, microbiotic, Amoxicillin, amoxicillin granules, particle, oral, Roxithromycin, curative effect >, and obtain classification that question and answer to be analyzed are right for " medical treatment & health ";

The 3rd step, according to each problem word to be analyzed and this classification, from question and answer knowledge base, select to obtain some question and answer knowledge records of problem word and problem word match to be analyzed, thereby obtain following answer word and semantic relevancy (for easy-to-read, the numerical value of the semantic relevancy in following table is the numerical value having carried out after suitable normalized):

The 4th step, according to the answer word to be analyzed in answer set of words to be analyzed, on the basis of the selected question and answer knowledge record obtaining of the 3rd step, filter out the question and answer knowledge record of it answer word comprising and answer word match to be analyzed, and then obtain the semantic relevancy of filtered out question and answer knowledge record.Known by analysis, in this example with question and answer knowledge record in the answer word to be analyzed of answer word match comprise: < is oral, coughs and breathes heavily, and xiao'er ganmao granules, checks, cough-relieving, treatment, flu-like symptom, cold granules >.

The right degree of being associated can draw to calculate above-mentioned question and answer to be analyzed again, and the degree of being associated that these question and answer to be analyzed are right has reached under the condition that 0.9(is 0～1 in the degree span of being associated).

It should be noted that:

The algorithm providing at this is intrinsic not relevant to any certain computer, virtual system or miscellaneous equipment with demonstration.Various general-purpose systems also can with based on using together with this teaching.According to description above, it is apparent constructing the desired structure of this type systematic.In addition, the present invention is not also for any certain programmed language.It should be understood that and can utilize various programming languages to realize content of the present invention described here, and the description of above language-specific being done is in order to disclose preferred forms of the present invention.

In the instructions that provided herein, a large amount of details have been described.Yet, can understand, embodiments of the invention can not put into practice in the situation that there is no these details.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.

Similarly, be to be understood that, in order to simplify the disclosure and to help to understand one or more in each inventive aspect, in the above in the description of exemplary embodiment of the present invention, each feature of the present invention is grouped together into single embodiment, figure or sometimes in its description.Yet, the method for the disclosure should be construed to the following intention of reflection: the present invention for required protection requires than the more feature of feature of clearly recording in each claim.Or rather, as reflected in claims below, inventive aspect is to be less than all features of disclosed single embodiment above.Therefore, claims of following embodiment are incorporated to this embodiment thus clearly, and wherein each claim itself is as independent embodiment of the present invention.

Those skilled in the art are appreciated that and can the module in the equipment in embodiment are adaptively changed and they are arranged in one or more equipment different from this embodiment.Module in embodiment or unit or assembly can be combined into a module or unit or assembly, and can put them into a plurality of submodules or subelement or sub-component in addition.At least some in such feature and/or process or unit are mutually repelling, and can adopt any combination to combine all processes or the unit of disclosed all features in this instructions (comprising claim, summary and the accompanying drawing followed) and disclosed any method like this or equipment.Unless clearly statement in addition, in this instructions (comprising claim, summary and the accompanying drawing followed) disclosed each feature can be by providing identical, be equal to or the alternative features of similar object replaces.

In addition, those skilled in the art can understand, although embodiment more described herein comprise some feature rather than further feature included in other embodiment, the combination of the feature of different embodiment means within scope of the present invention and forms different embodiment.For example, in the following claims, the one of any of embodiment required for protection can be used with array mode arbitrarily.

All parts embodiment of the present invention can realize with hardware, or realizes with the software module moved on one or more processor, or realizes with their combination.It will be understood by those of skill in the art that and can use in practice microprocessor or digital signal processor (DSP) to realize according to the some or all functions of the some or all parts in the device that obtains the degree that is associated that question and answer are right of the embodiment of the present invention.The present invention for example can also be embodied as, for carrying out part or all equipment or device program (, computer program and computer program) of method as described herein.Realizing program of the present invention and can be stored on computer-readable medium like this, or can there is the form of one or more signal.Such signal can be downloaded and obtain from internet website, or provides on carrier signal, or provides with any other form.

It should be noted above-described embodiment the present invention will be described rather than limit the invention, and those skilled in the art can design alternative embodiment in the situation that do not depart from the scope of claims.In the claims, any reference symbol between bracket should be configured to limitations on claims.Word " comprises " not to be got rid of existence and is not listed as element or step in the claims.Being positioned at word " " before element or " one " does not get rid of and has a plurality of such elements.The present invention can be by means of including the hardware of some different elements and realizing by means of the computing machine of suitably programming.In having enumerated the unit claim of some devices, several in these devices can be to carry out imbody by same hardware branch.The use of word first, second and C grade does not represent any order.Can be title by these word explanations.

Claims

1. a device that obtains the degree that is associated that question and answer are right, this device comprises:

2. device according to claim 1, wherein, this device further comprises question and answer construction of knowledge base unit,

Described question and answer construction of knowledge base unit, is suitable for that right webpage extracts a plurality of question and answer pair from containing question and answer in advance, according to the question and answer of extracting to building the question and answer knowledge base that comprises many question and answer knowledge records;

Described question and answer construction of knowledge base unit, be further adapted for from the right webpage that contains question and answer, extract a plurality of question and answer to time, capture with described question and answer corresponding classification;

Described question and answer construction of knowledge base unit, be further adapted for according to the question and answer of extracting when building question and answer knowledge base, according to question and answer to with described question and answer, corresponding classification is built to question and answer knowledge record; Each question and answer knowledge record, corresponding to a classification, comprises respectively a problem word, an answer word, and the semantic relevancy between described problem word and described answer word.

3. device according to claim 1 and 2, wherein,

The described degree computing unit that is associated, is suitable for choosing the question and answer knowledge record of it problem word comprising and problem word match to be analyzed and the answer word comprising and answer word match to be analyzed; According in the described question and answer knowledge record of choosing corresponding to the question and answer knowledge record of identical category, obtain these question and answer to be analyzed to the degree that is associated for each classification; Choose the maximal value of above-mentioned these question and answer to be analyzed to the degree that is associated for each classification, using this maximal value as the right degree that is associated of question and answer to be analyzed.

4. device according to claim 2, wherein,

Described question and answer construction of knowledge base unit, is suitable for each question and answer carrying out following operation:

The right problem content of these question and answer and answer content are carried out to word extraction operation, obtain problem set of words and answer set of words; Make each problem word in problem set of words and each answer word in answer set of words form an information recording with these question and answer on to each corresponding classification respectively;

Described question and answer construction of knowledge base unit, is suitable for each information recording, carries out following operation:

Calculate this answer word and belong to such other probability, calculate the single-minded degree of this answer word to the explanation of this problem word in this classification, calculate the intensity that this problem word makes an explanation with this answer word in this classification; Above-mentioned probability, single-minded degree and intensity are multiplied each other, and resulting product is the semantic relevancy of this answer word and this problem word; Make this problem word, this answer word and its semantic relevancy form one corresponding to such other question and answer knowledge record.

5. a method of obtaining the degree that is associated that question and answer are right, the method comprises the steps:

6. method according to claim 5, wherein, the method further comprises:

From containing question and answer, right webpage extracts a plurality of question and answer pair in advance, according to the question and answer of extracting, structure is comprised the question and answer knowledge base of many question and answer knowledge records;

From the right webpage that contains question and answer, extract a plurality of question and answer to time, capture with described question and answer corresponding classification;

According to the question and answer of extracting when building question and answer knowledge base, according to question and answer to with described question and answer, corresponding classification is built to question and answer knowledge record;

Each question and answer knowledge record, corresponding to a classification, comprises respectively a problem word, an answer word, and the semantic relevancy between described problem word and described answer word.

7. according to the method described in claim 5 or 6, wherein,

Describedly according to problem word to be analyzed and answer word to be analyzed, from question and answer knowledge base, select at least one question and answer knowledge record, according to selected question and answer knowledge record, calculate the degree that is associated that question and answer to be analyzed are right, specifically comprise:

Choose the question and answer knowledge record of it problem word comprising and problem word match to be analyzed and the answer word comprising and answer word match to be analyzed;

According in the described question and answer knowledge record of choosing corresponding to the question and answer knowledge record of identical category, obtain these question and answer to be analyzed to the degree that is associated for each classification;

Choose the maximal value of above-mentioned these question and answer to be analyzed to the degree that is associated for each classification, using this maximal value as the right degree that is associated of question and answer to be analyzed.

8. method according to claim 7, wherein,

According in the described question and answer knowledge record of choosing corresponding to the question and answer knowledge record of identical category, obtain these question and answer to be analyzed to respectively for the degree that is associated of each classification, specifically comprise:

By in the question and answer knowledge record of choosing corresponding to the semantic relevancy weighting summation of the question and answer knowledge record of identical category, obtain these question and answer to be analyzed to respectively for the degree that is associated of each classification.

9. method according to claim 6, wherein, described according to question and answer to with described question and answer, corresponding classification is built to question and answer knowledge base, specifically comprise:

To each question and answer pair, the right problem content of these question and answer and answer content are carried out to word extraction operation, obtain problem set of words and answer set of words;

Make each problem word in problem set of words and each answer word in answer set of words form an information recording with these question and answer on to each corresponding classification respectively;

To each information recording, carry out following operation:

Calculate this answer word and belong to such other probability, calculate the single-minded degree of this answer word to the explanation of this problem word in this classification, calculate the intensity that this problem word makes an explanation with this answer word in this classification;

Above-mentioned probability, single-minded degree and intensity are multiplied each other, and resulting product is the semantic relevancy of this answer word and this problem word;

Make this problem word, this answer word and its semantic relevancy form one corresponding to such other question and answer knowledge record.

10. method according to claim 9, wherein,

This answer word of described calculating belongs to such other probability, specifically comprises:

P (Ck | AWj) = P (AWj | Ck) * \frac{P (Ck)}{P (AWj)};

apecific (QWi, AWi | C = Ck) = P (QWi | AWj, C = Ck) = \frac{# (QWi, AWj)}{# (AWj)} | C = Ck;

interpret (QWi, AWj | C = Ck) = P (AWj | QWi, C = Ck) = \frac{# (QWi, AWj)}{Σ_{j = 1}^{x} # (QWi, AWj)} | C = Ck;

#(AWj) represent the number of times that answer word is AWj.