CN106909572A - A kind of construction method and device of question and answer knowledge base - Google Patents

A kind of construction method and device of question and answer knowledge base Download PDF

Info

Publication number
CN106909572A
CN106909572A CN201510981420.7A CN201510981420A CN106909572A CN 106909572 A CN106909572 A CN 106909572A CN 201510981420 A CN201510981420 A CN 201510981420A CN 106909572 A CN106909572 A CN 106909572A
Authority
CN
China
Prior art keywords
answer
word
question
words
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510981420.7A
Other languages
Chinese (zh)
Inventor
孙林
陈培军
秦吉胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201510981420.7A priority Critical patent/CN106909572A/en
Publication of CN106909572A publication Critical patent/CN106909572A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • G06F16/90332Natural language query formulation or dialogue systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

This application discloses a kind of construction method and device of question and answer knowledge base, and then using the question and answer knowledge base for completing is built to question and answer to evaluating, improve to question and answer to the accuracy evaluated.The question and answer knowledge base is made up of a plurality of Question Log, and methods described includes:The content and the question and answer of question and answer pair are obtained to affiliated classification;The word in the word and answer content in the problem content of question and answer pair is extracted, problem set of words and answer set of words is obtained;Each answer word in each problem word and answer set of words in problem set of words point is made to form an information record, and to each information record, calculate the answer word and semantic relevancy of the problem word in the case where the question and answer are to generic;By a problem word, the semantic relevancy between each answer word in multiple answer words and multiple answer words and described problem word in answer set of words forms a Question Log.

Description

A kind of construction method and device of question and answer knowledge base
Technical field
The application is related to field of computer technology, more particularly to a kind of construction method and device of question and answer knowledge base.
Background technology
With flourishing for Internet technology, user gets over when living or work runs into various problems Tend in the communities such as question and answer or other netpage search answers to get over.The citation form of Ask-Answer Community leads to Often for user proposes problem according to the demand of oneself, and answer is provided by other users.This form is User obtains answer information there is provided new channel on network.Can be optionally yet with any user Content is created, that is, is created problem and is created answer, the information quality difference that result in Ask-Answer Community is very big, Therefore need to evaluate the quality of question and answer pair, and then according to evaluation result by the preferable question and answer of quality to row Name is forward, or is to delete second-rate question and answer equity.
When at present, to the quality evaluation of question and answer pair, simply used related term Cover Characteristics describe problem and Semantic matching degree between answer, this is only not only to rest in morphology aspect, and many problems and is answered It is 0 so as to cause the semantic matching degree between problem and answer between case and in the absence of the covering of related term, but Exactly such as question and answer a certain in Ask-Answer Community to ask to the core of quality for semantic matching degree between problem and answer Entitled " which city the provincial capital in Shandong is ", corresponding answer has following two " Jinan ", " province in Shandong Can be Beijing ".When prior art evaluates question and answer using related term Cover Characteristics to quality, according to problem and Semantic matching degree between answer then can be by " which city the provincial capital in Shandong is " and " provincial capital in Shandong be Beijing " It is considered a high-quality question and answer pair, and " which city the provincial capital in Shandong is " and " Jinan ", between Semantic matching degree be 0, therefore be considered as a low quality question and answer pair, this does not substantially conform to the actual conditions, because This is necessary to construct question and answer knowledge base in advance, so using question and answer knowledge base to commonly use question and answer to commenting Valency.
The content of the invention
In order to solve the above technical problems, the embodiment of the present application provides a kind of construction method and dress of question and answer knowledge base Put, for the question and answer knowledge base using structure completion to question and answer to evaluating, and then improve to question and answer to commenting The accuracy of valency.
The embodiment of the present application uses following technical proposals:
A kind of construction method of question and answer knowledge base, the question and answer knowledge base is made up of a plurality of Question Log, described Method includes:
The content and the question and answer of question and answer pair are obtained to affiliated classification;
The word in the word and answer content in the problem content of question and answer pair is extracted, problem set of words is obtained With answer set of words;
Make each the answer word point in each the problem word and answer set of words in problem set of words An information record is formed, and to each information record, is calculated the answer word and the problem word is existed The question and answer are to the semantic relevancy under generic;
It is every in the multiple answer words and multiple answer words in answer set of words by a problem word Semantic relevancy between individual answer word and described problem word forms a Question Log.
Preferably, the answer word and semantic phase of the problem word in the case where the question and answer are to generic are calculated Guan Du, specifically includes:
Calculate the probability that the answer word belongs to the category, calculate in the category answer word to the problem The single-minded degree of the explanation of word, calculates what problem word answer word in the category was explained Intensity;
Above-mentioned probability, single-minded degree are multiplied with intensity, resulting product is the answer word and the problem Semantic relevancy of the word in the case where the question and answer are to generic.
Preferably, the probability that the answer word belongs to the category is calculated, is specifically included:
The answer word in the category is calculated, to the single-minded degree of the explanation of the problem word, to specifically include:
The intensity that the problem word is explained with the answer word in the category is calculated, is specifically included:
Probability, single-minded degree are multiplied with intensity, are specifically included:
weight(QWi,AWj| C=Ck)=P (Ck|AWj)*specific(QWi,AWj| C=Ck)*interpret(QWi,AWj| C=Ck)
Wherein:
P(Ck|AWj) it is answer word AWjBelong to classification CkProbability;
specific(QWi,AWj| C=Ck) it is in classification CkUpper answer word AWjTo problem word QWiExplanation Single-minded degree;
interpret(QWi,AWj| C=Ck) it is in classification CkUpper problem word QWiWith answer word AWjSolved The intensity released;
P(CK) represent classification CkThe probability of appearance;P(AWj) expression answer be AWjProbability;P(AWj|Ck) Represent CkClassification belongs to AWjProbability;#(QWi,AWj) problem of representation word be QWiAnd answer word is AWj Number of times;#(AWj) represent that answer word is AWjNumber of times.
Preferably, the question and answer are to being high-quality question and answer pair, and wherein question and answer are to being divided into high-quality question and answer pair and low Quality question and answer pair.
Preferably, the word in the word and answer content in the problem content of question and answer pair is extracted, problem is obtained Set of words and answer set of words, specifically include:
Real is extracted by participle, removal stop words, word join to problem content and answer content respectively Pronouns, general term for nouns, numerals and measure words, obtains problem set of words and answer set of words is as follows:
(<QW1,QW2,…,QWi,…,QWm>,<AW1,AW2,…,AWi,…,AWn>)
Wherein, QWiRepresent a problem word;AWiRepresent an answer word.
The embodiment of the present application also provides a kind of construction device of question and answer knowledge base, and the question and answer knowledge base is by a plurality of Question Log is constituted, and described device includes acquisition module, and extraction module, computing module and Question Log are created Module, wherein:
The acquisition module, obtains the content and the question and answer of question and answer pair to affiliated classification;
The extraction module, for the word in the word and answer content in the problem content for extracting question and answer pair, Obtain problem set of words and answer set of words;
The computing module, for making each the problem word and answer set of words in problem set of words in Each answer word point form an information record, and to each information record, calculate the answer word Language and semantic relevancy of the problem word in the case where the question and answer are to generic;
The Question Log creation module, for by a problem word, the multiple in answer set of words to be answered The semantic relevancy between each answer word and described problem word in case word and multiple answer words Form a Question Log.
Preferably, the computing module has specifically included computing unit, wherein:
The computing unit, the probability of the category is belonged to for calculating the answer word, is calculated in the category , to the single-minded degree of the explanation of the problem word, calculating problem word in the category is used should for the answer word The intensity that answer word is explained;
Above-mentioned probability, single-minded degree are multiplied with intensity, resulting product is the answer word and the problem Semantic relevancy of the word in the case where the question and answer are to generic.
Preferably, the processing procedure of the computing unit includes:
The probability that the answer word belongs to the category is calculated, is specifically included:
The answer word in the category is calculated, to the single-minded degree of the explanation of the problem word, to specifically include:
The intensity that the problem word is explained with the answer word in the category is calculated, is specifically included:
Probability, single-minded degree are multiplied with intensity, are specifically included:
weight(QWi,AWj| C=Ck)=P (Ck|AWj)*specific(QWi,AWj| C=Ck)*interpret(QWi,AWj| C=Ck)
Wherein:
P(Ck|AWj) it is answer word AWjBelong to classification CkProbability;
specific(QWi,AWj| C=Ck) it is in classification CkUpper answer word AWjTo problem word QWiExplanation Single-minded degree;
interpret(QWi,AWj| C=Ck) it is in classification CkUpper problem word QWiWith answer word AWjSolved The intensity released;
P(CK) represent classification CkThe probability of appearance;P(AWj) expression answer be AWjProbability;P(AWj|Ck) Represent CkClassification belongs to AWjProbability;#(QWi,AWj) problem of representation word be QWiAnd answer word is AWj Number of times;#(AWj) represent that answer word is AWjNumber of times.
Preferably, the acquisition module has specifically included selection unit, wherein, the selection unit is used for High-quality question and answer pair are chosen, wherein question and answer are to being divided into high-quality question and answer pair and low quality question and answer pair.
Preferably, the extraction module, specifically for respectively to problem content and answer content by participle, Removal stop words, word join, extract entity word, obtain problem set of words and answer set of words such as Under:
(<QW1,QW2,…,QWi,…,QWm>,<AW1,AW2,…,AWi,…,AWn>)
Wherein, QWiRepresent a problem word;AWiRepresent an answer word.
Above-mentioned at least one technical scheme that the embodiment of the present application is used can reach following beneficial effect:Obtain The content of question and answer pair and the question and answer are to obtaining problem set of words and answer word collection after affiliated classification Close;Make each the answer word point shape in each the problem word and answer set of words in problem set of words Into an information record, and to each information record, the answer word and the problem word are calculated in institute Question and answer are stated to the semantic relevancy under generic;By a problem word, the multiple in answer set of words The semanteme between each answer word and described problem word in answer word and multiple answer words is related Degree forms a Question Log, and then the question and answer knowledge base for including a plurality of Question Log completed using structure It is final to improve to question and answer to the accuracy evaluated to question and answer to evaluating.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, The schematic description and description of the application does not constitute the improper limit to the application for explaining the application It is fixed.In the accompanying drawings:
A kind of construction method of question and answer knowledge base that Fig. 1 is provided for the embodiment of the present application realizes schematic flow sheet;
The Question Log detailed schematic that Fig. 2 is provided for the embodiment of the present application;
A kind of structure schematic flow sheet of question and answer knowledge base that Fig. 3 is provided for the embodiment of the present application;
A kind of construction device schematic diagram of question and answer knowledge base that Fig. 4 is provided for the embodiment of the present application.
Specific embodiment
It is specifically real below in conjunction with the application to make the purpose, technical scheme and advantage of the application clearer Apply example and corresponding accompanying drawing is clearly and completely described to technical scheme.Obviously, it is described Embodiment is only some embodiments of the present application, rather than whole embodiments.Based on the implementation in the application Example, the every other implementation that those of ordinary skill in the art are obtained under the premise of creative work is not made Example, belongs to the scope of the application protection.
A kind of construction method of question and answer knowledge base that Fig. 1 is provided for the embodiment of the present application realizes schematic flow sheet, Including following steps:
Step 11:The content and the question and answer of question and answer pair are obtained to affiliated classification.
To being usually in network application, certain user proposes problem content to question and answer according to oneself demand, by other User provides answer content, wherein, a problem content may correspond to a plurality of answer content, here can be with One problem content and an answer content are turned into a question and answer pair.When obtaining question and answer pair, it is possible to use climb Worm captures the webpage containing high-quality question and answer pair from internet, and high-quality is relative to low quality question and answer pair Speech, the answer content of high-quality question and answer centering can preferably answer the problem content of question and answer pair.These The webpage for including high-quality question and answer pair can be cQA communities, major professional forums, and above-mentioned webpage is entered Row parsing obtains substantial amounts of question and answer to content.
In addition obtain question and answer to while, the question and answer can also be extracted to generic information, these classes Can be the classification of Ask-Answer Community or web page contents to question and answer to entirety, such as by all of question and answer pair Classification be divided into game, medical treatment & health, motion is read, business etc..
Step 12:The word in the word and answer content in the problem content of question and answer pair is extracted, problem is obtained Set of words and answer set of words.
The word in the problem content and answer content of question and answer pair is extracted, specifically can respectively to problem content With answer content by steps such as participle, removal stop words, word join, extraction entity words, problem is obtained Set of words and answer set of words can be as described below form:
(<QW1,QW2,…,QWi,…,QWm>,<AW1,AW2,…,AWi,…,AWn>),
Wherein QW is a problem word, and AW is an answer word, and the label in the lower right corner is then problem The numbering of word or answer word.The problem content of such as a certain question and answer pair is that " which city the provincial capital in Shandong is City ", corresponding answer content is:" provincial capital in Shandong is Jinan ", then obtain problem word by above-mentioned treatment Language set and answer set of words can for (<Shandong1, provincial capital2, city3>,<Shandong1, provincial capital2, Ji South3>)。
Step 13:Make each answer in each the problem word and answer set of words in problem set of words Word point forms an information record, and to each information record, calculates the answer word and the problem Semantic relevancy of the word in the case where the question and answer are to generic.
Make each the answer word point in each the problem word and answer set of words in problem set of words Formed an information record when, also by before for an example as a example by, for problem set of words<Mountain East1, provincial capital2, city3>In each problem word and answer set of words<Shandong1, provincial capital2, Ji South3>In each answer word, set up form nine information records altogether, the form of information record can be with It is (Shandong1, Shandong1), (Shandong1, provincial capital2), (Shandong1, Jinan3), (provincial capital2, Shandong1) Deng totally nine information records.
To each information record, the answer word and the problem word are calculated in the question and answer to affiliated Semantic relevancy under classification, obtains the numerical value of specific semantic relevancy.
Step 14:By a problem word, multiple answer words and multiple answer words in answer set of words Semantic relevancy between each answer word and described problem word in language forms a Question Log.
A general question and answer after in the treatment of step 11 to that can produce multiple problem words and multiple answers Word, can answer a problem word in multiple problem words, the multiple in answer set of words here The semantic phase between each answer word and one problem word in case word and multiple answer words Guan Du forms a Question Log, and a final question and answer are a plurality of to that will be formed after the treatment by step 13 Question Log.
It should be noted that above-mentioned steps 11, step 12 and step 13 are only to a question and answer pair Processing procedure, most at last a large amount of high-quality question and answer to can be built by after above-mentioned several step process Go out question and answer knowledge base.
By above-described embodiment to substantial amounts of question and answer to being processed after construct question and answer knowledge base, Jin Erli With build complete include the question and answer knowledge base of a plurality of Question Log to question and answer to be evaluated to evaluating, It is final to improve to question and answer to the accuracy evaluated.
The answer word and semanteme of the problem word in the case where the question and answer are to generic are calculated in step 13 The degree of correlation can specifically include:The probability that the answer word belongs to the category is calculated, calculating should in the category To the single-minded degree of the explanation of the problem word, this is answered answer word to calculate in the category problem word use The intensity that case word is explained;Above-mentioned probability, single-minded degree are multiplied with intensity, resulting product is The answer word and the semantic relevancy of the problem word under the classification of the question and answer pair.
Wherein:The probability that the answer word belongs to the category is calculated, is specifically as follows:
The answer word in the category is calculated, to the single-minded degree of the explanation of the problem word, to be specifically as follows:
The intensity that the problem word is explained with the answer word in the category is calculated, is specifically as follows:
Probability, single-minded degree are multiplied with intensity, are specifically as follows:
weight(QWi,AWj| C=Ck)=P (Ck|AWj)*specific(QWi,AWj| C=Ck)*interpret(QWi,AWj| C=Ck)
P (the C in above-mentioned several formulak|AWj) it is answer word AWjBelong to classification CkProbability;
specific(QWi,AWj| C=Ck) it is in classification CkUpper answer word AWjTo problem word QWiExplanation Single-minded degree;
interpret(QWi,AWj| C=Ck) it is in classification CkUpper problem word QWiWith answer word AWjSolved The intensity released;
P(CK) represent classification CkThe probability of appearance;P(AWj) expression answer be AWjProbability;P(AWj|Ck) Represent CkClassification belongs to AWjProbability;#(QWi,AWj) problem of representation word be QWiAnd answer word is AWj Number of times;#(AWj) represent that answer word is AWjNumber of times.
Processed by step 12 and obtain the storage format of problem set of words and answer set of words and can be:
(<QW1,QW2,…,QWi,…,QWm>,<AW1,AW2,…,AWi,…,AWn>, cate1) Then by above-mentioned calculating after, for above-mentioned each problem word QWi(i=1,2 ..., m), calculate AWj(j=1 ..., n) and QWiThe semantic relevancy on classification cate1.Finally, for each QWi, Each answer word in multiple answer words in the answer set of words for obtaining and multiple answer words and QWiBetween semantic relevancy form a Question Log, its corresponding Question Log details is as shown in Figure 2. Simply three Question Logs are shown in Fig. 2.
It should be noted that a question and answer to that can include a plurality of Question Log, here one after treatment Individual Question Log includes a problem word, multiple answer words, and each answer word and described problem Semantic relevancy between word.Also, it should be noted that above-mentioned semantic relevancy is Question Log at one Semantic relevancy under classification, can also calculate the language of answer word and problem word under different classifications The step of adopted degree of correlation, final multiple above-mentioned Question Logs composition question and answer knowledge bases, structure question and answer knowledge base, Detail can be as shown in Figure 3.
In order to clearly demonstrate the technical scheme of the embodiment of the present application offer, with reference to a question and answer knowledge base Local detail illustrate, as shown in table 1, including three Question Logs, wherein answer word in table 1 Numerical value below is semantic relevancy of the answer word with problem word in the case where classification is medical treatment & health.
The Question Log example of table 1
Using above-mentioned question and answer knowledge base to the question and answer shown in table 2 to evaluating when, obtain question and answer pair to be evaluated Problem content in word and answer content in word, from question and answer knowledge base choose answer word have [oral, cough and asthma, xiao'er ganmao granules, check, cough-relieving, treatment, flu-like symptom, cold granules], meter Calculate [oral, cough and asthma, xiao'er ganmao granules check, cough-relieving, treatment, flu-like symptom, cold granules] with The semantic relevancy of problem word in table 1, matter of the final related value using semantic relevancy to question and answer pair Amount is evaluated.
Semantic relevancy is calculated by question and answer can draw, the semantic relevancy of the question and answer pair has reached 0.9 (semantic relevancy span is 0-1).It can be seen that the application can be very good solve it is this kind of without phase Close word covering but semantic similarity question and answer pair very high.And prior art to the question and answer shown in table 2 to commenting During valency, only according to not having related term to cover in problem and answer, therefore prior art will be considered that the question and answer pair There is no semantic relevancy, be finally classified as a low-quality question and answer pair.Therefore structure is provided with the application The method of question and answer knowledge base, and then the question and answer knowledge base pair for including a plurality of Question Log completed using structure Question and answer are final to improve to question and answer to the accuracy evaluated to evaluating.
The question and answer pair to be evaluated of table 2
Above-mentioned several embodiments are all the present processes embodiment, correspondingly, present invention also provides one kind The construction device embodiment of question and answer knowledge base, the question and answer knowledge base is made up of a plurality of Question Log, is specifically shown in Fig. 4, including:Acquisition module 21, extraction module 22, computing module 23 and Question Log creation module 24, Wherein:
The acquisition module 21, can be used for obtaining the content and the question and answer of question and answer pair to affiliated classification;
The extraction module 22, can be used for extracting in the word and answer content in the problem content of question and answer pair Word, obtain problem set of words and answer set of words;
The computing module 23, can be used for making each the problem word and answer word in problem set of words Each answer word point in set forms an information record, and to each information record, calculating should Answer word and semantic relevancy of the problem word in the case where the question and answer are to generic;
The Question Log creation module 24, can be used for by a problem word, in answer set of words The semanteme between each answer word and described problem word in multiple answer words and multiple answer words The degree of correlation forms a Question Log.
Above-mentioned computing module 23 has specifically included computing unit, wherein:The computing unit, for calculating The answer word belongs to the probability of the category, calculates solution of the answer word to the problem word in the category The single-minded degree released, calculates the intensity that the problem word is explained with the answer word in the category;Will Above-mentioned probability, single-minded degree are multiplied with intensity, and resulting product is that the answer word and the problem word exist The question and answer are to the semantic relevancy under generic.
The processing procedure of the computing unit includes:The probability that the answer word belongs to the category is calculated, specifically Including:
The answer word in the category is calculated, to the single-minded degree of the explanation of the problem word, to specifically include:
The intensity that the problem word is explained with the answer word in the category is calculated, is specifically included:
Probability, single-minded degree are multiplied with intensity, are specifically included:
weight(QWi,AWj| C=Ck)=P (Ck|AWj)*specific(QWi,AWj| C=Ck)*interpret(QWi,AWj| C=Ck)
Wherein:
P(Ck|AWj) it is answer word AWjBelong to classification CkProbability;
specific(QWi,AWj| C=Ck) it is in classification CkUpper answer word AWjTo problem word QWiExplanation Single-minded degree;
interpret(QWi,AWj| C=Ck) it is in classification CkUpper problem word QWiWith answer word AWjSolved The intensity released;
P(CK) represent classification CkThe probability of appearance;P(AWj) expression answer be AWjProbability;P(AWj|Ck) Represent CkClassification belongs to AWjProbability;#(QWi,AWj) problem of representation word be QWiAnd answer word is AWj Number of times;#(AWj) represent that answer word is AWjNumber of times.
Above-mentioned acquisition module 21 can specifically include selection unit, wherein, the selection unit is used for High-quality question and answer pair are chosen, wherein question and answer are to being divided into high-quality question and answer pair and low quality question and answer pair.
Said extracted module 22, can specifically for respectively to problem content and answer content by participle, go Except stop words, word join, entity word is extracted, obtain problem set of words and answer set of words is as follows:
(<QW1,QW2,…,QWi,…,QWm>,<AW1,AW2,…,AWi,…,AWn>)
Wherein, QWiRepresent a problem word;AWiRepresent an answer word.
Embodiments herein is these are only, the application is not limited to.For people in the art For member, the application can have various modifications and variations.It is all to be made within spirit herein and principle Any modification, equivalent substitution and improvements etc., within the scope of should be included in claims hereof.

Claims (10)

1. a kind of construction method of question and answer knowledge base, it is characterised in that the question and answer knowledge base is asked by a plurality of Answer record to constitute, methods described includes:
The content and the question and answer of question and answer pair are obtained to affiliated classification;
The word in the word and answer content in the problem content of question and answer pair is extracted, problem set of words is obtained With answer set of words;
Make each the answer word point in each the problem word and answer set of words in problem set of words An information record is formed, and to each information record, is calculated the answer word and the problem word is existed The question and answer are to the semantic relevancy under generic;
It is every in the multiple answer words and multiple answer words in answer set of words by a problem word Semantic relevancy between individual answer word and described problem word forms a Question Log.
2. method according to claim 1, it is characterised in that calculate the answer word and the problem Semantic relevancy of the word in the case where the question and answer are to generic, specifically includes:
Calculate the probability that the answer word belongs to the category, calculate in the category answer word to the problem The single-minded degree of the explanation of word, calculates what problem word answer word in the category was explained Intensity;
Above-mentioned probability, single-minded degree are multiplied with intensity, resulting product is the answer word and the problem Semantic relevancy of the word in the case where the question and answer are to generic.
3. method according to claim 2, it is characterised in that
The probability that the answer word belongs to the category is calculated, is specifically included:
P ( C k | AW j ) = P ( AW j | C k ) * P ( C k ) P ( AW j )
The answer word in the category is calculated, to the single-minded degree of the explanation of the problem word, to specifically include:
s p e c i f i c ( QW i , AW j | C = C k ) = P ( QW i | AW j , C = C k ) = # ( QW i , AW j ) # ( AW j ) | C = C k
The intensity that the problem word is explained with the answer word in the category is calculated, is specifically included:
int e r p r e t ( QW i , AW j | C = C k ) = P ( AW j | QW i , C = C k ) = # ( QW i , AW j ) &Sigma; j = 1 x # ( QW i , AW j ) | C = C k
Probability, single-minded degree are multiplied with intensity, are specifically included:
weight(QWi,AWj| C=Ck)=P (Ck|AWj)*specific(QWi,AWj| C=Ck)*interpret(QWi,AWj| C=Ck)
Wherein:
P(Ck|AWj) it is answer word AWjBelong to classification CkProbability;
specific(QWi,AWj| C=Ck) it is in classification CkUpper answer word AWjTo problem word QWiExplanation Single-minded degree;
interpret(QWi,AWj| C=Ck) it is in classification CkUpper problem word QWiWith answer word AWjSolved The intensity released;
P(CK) represent classification CkThe probability of appearance;P(AWj) expression answer be AWjProbability;P(AWj|Ck) Represent CkClassification belongs to AWjProbability;#(QWi,AWj) problem of representation word be QWiAnd answer word is AWj Number of times;#(AWj) represent that answer word is AWjNumber of times.
4. method according to claim 1, it is characterised in that the question and answer are to being high-quality question and answer Right, wherein question and answer are to being divided into high-quality question and answer pair and low quality question and answer pair.
5. the method according to any one of Claims 1-4, it is characterised in that extract question and answer pair The word in word and answer content in problem content, obtains problem set of words and answer set of words, Specifically include:
Real is extracted by participle, removal stop words, word join to problem content and answer content respectively Pronouns, general term for nouns, numerals and measure words, obtains problem set of words and answer set of words is as follows:
(<QW1,QW2,…,QWi,…,QWm>,<AW1,AW2,…,AWi,…,AWn>)
Wherein, QWiRepresent a problem word;AWiRepresent an answer word.
6. a kind of construction device of question and answer knowledge base, it is characterised in that the question and answer knowledge base is asked by a plurality of Answer record to constitute, described device includes acquisition module, extraction module, computing module and Question Log create mould Block, wherein:
The acquisition module, for obtaining the content and the question and answer of question and answer pair to affiliated classification;
The extraction module, for the word in the word and answer content in the problem content for extracting question and answer pair, Obtain problem set of words and answer set of words;
The computing module, for making each the problem word and answer set of words in problem set of words in Each answer word point form an information record, and to each information record, calculate the answer word Language and semantic relevancy of the problem word in the case where the question and answer are to generic;
The Question Log creation module, for by a problem word, the multiple in answer set of words to be answered The semantic relevancy between each answer word and described problem word in case word and multiple answer words Form a Question Log.
7. device according to claim 6, it is characterised in that the computing module has been specifically included Computing unit, wherein:
The computing unit, the probability of the category is belonged to for calculating the answer word, is calculated in the category , to the single-minded degree of the explanation of the problem word, calculating problem word in the category is used should for the answer word The intensity that answer word is explained;
Above-mentioned probability, single-minded degree are multiplied with intensity, resulting product is the answer word and the problem Semantic relevancy of the word in the case where the question and answer are to generic.
8. device according to claim 7, it is characterised in that the processing procedure of the computing unit Including:
The probability that the answer word belongs to the category is calculated, is specifically included:
P ( C k | AW j ) = P ( AW j | C k ) * P ( C k ) P ( AW j )
The answer word in the category is calculated, to the single-minded degree of the explanation of the problem word, to specifically include:
s p e c i f i c ( QW i , AW j | C = C k ) = P ( QW i | AW j , C = C k ) = # ( QW i , AW j ) # ( AW j ) | C = C k
The intensity that the problem word is explained with the answer word in the category is calculated, is specifically included:
int e r p r e t ( QW i , AW j | C = C k ) = P ( AW j | QW i , C = C k ) = # ( QW i , AW j ) &Sigma; j = 1 x # ( QW i , AW j ) | C = C k
Probability, single-minded degree are multiplied with intensity, are specifically included:
weight(QWi,AWj| C=Ck)=P (Ck|AWj)*specific(QWi,AWj| C=Ck)*interpret(QWi,AWj| C=Ck)
Wherein:
P(Ck|AWj) it is answer word AWjBelong to classification CkProbability;
specific(QWi,AWj| C=Ck) it is in classification CkUpper answer word AWjTo problem word QWiExplanation Single-minded degree;
interpret(QWi,AWj| C=Ck) it is in classification CkUpper problem word QWiWith answer word AWjSolved The intensity released;
P(CK) represent classification CkThe probability of appearance;P(AWj) expression answer be AWjProbability;P(AWj|Ck) Represent CkClassification belongs to AWjProbability;#(QWi,AWj) problem of representation word be QWiAnd answer word is AWj Number of times;#(AWj) represent that answer word is AWjNumber of times.
9. device according to claim 6, it is characterised in that the acquisition module has been specifically included Unit is chosen, wherein, the selection unit, for choosing high-quality question and answer pair, wherein question and answer are to being divided into High-quality question and answer pair and low quality question and answer pair.
10. the device according to any one of claim 6 to 9, it is characterised in that
The extraction module, specifically for being disabled by participle, removal to problem content and answer content respectively Word, word join, extract entity word, obtain problem set of words and answer set of words is as follows:
(<QW1,QW2,…,QWi,…,QWm>,<AW1,AW2,…,AWi,…,AWn>)
Wherein, QWiRepresent a problem word;AWiRepresent an answer word.
CN201510981420.7A 2015-12-23 2015-12-23 A kind of construction method and device of question and answer knowledge base Pending CN106909572A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510981420.7A CN106909572A (en) 2015-12-23 2015-12-23 A kind of construction method and device of question and answer knowledge base

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510981420.7A CN106909572A (en) 2015-12-23 2015-12-23 A kind of construction method and device of question and answer knowledge base

Publications (1)

Publication Number Publication Date
CN106909572A true CN106909572A (en) 2017-06-30

Family

ID=59200005

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510981420.7A Pending CN106909572A (en) 2015-12-23 2015-12-23 A kind of construction method and device of question and answer knowledge base

Country Status (1)

Country Link
CN (1) CN106909572A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664644A (en) * 2018-05-16 2018-10-16 微梦创科网络科技(中国)有限公司 A kind of question answering system construction method, question and answer processing method and processing device
CN108846138A (en) * 2018-07-10 2018-11-20 苏州大学 A kind of the problem of fusion answer information disaggregated model construction method, device and medium
CN109284383A (en) * 2018-10-09 2019-01-29 北京来也网络科技有限公司 Text handling method and device
CN109460453A (en) * 2018-10-09 2019-03-12 北京来也网络科技有限公司 Data processing method and device for positive negative sample
CN109785698A (en) * 2017-11-13 2019-05-21 上海流利说信息技术有限公司 Method, apparatus, electronic equipment and medium for spoken language proficiency evaluation and test
CN109947905A (en) * 2017-08-15 2019-06-28 富士通株式会社 Generate the method and apparatus for puing question to answer pair
CN110019739A (en) * 2017-11-30 2019-07-16 上海智臻智能网络科技股份有限公司 Answering method and device, computer equipment and storage medium based on necessary condition
WO2019153612A1 (en) * 2018-02-09 2019-08-15 平安科技(深圳)有限公司 Question and answer data processing method, electronic device and storage medium
CN110175241A (en) * 2019-05-23 2019-08-27 三角兽(北京)科技有限公司 Question and answer base construction method, device, electronic equipment and computer-readable medium
CN111046133A (en) * 2019-10-29 2020-04-21 平安科技(深圳)有限公司 Question-answering method, question-answering equipment, storage medium and device based on atlas knowledge base
CN111984775A (en) * 2020-08-12 2020-11-24 北京百度网讯科技有限公司 Question and answer quality determination method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577556A (en) * 2013-10-21 2014-02-12 北京奇虎科技有限公司 Device and method for obtaining association degree of question and answer pair
CN103577558A (en) * 2013-10-21 2014-02-12 北京奇虎科技有限公司 Device and method for optimizing search ranking of frequently asked question and answer pairs
CN103810218A (en) * 2012-11-14 2014-05-21 北京百度网讯科技有限公司 Problem cluster-based automatic asking and answering method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810218A (en) * 2012-11-14 2014-05-21 北京百度网讯科技有限公司 Problem cluster-based automatic asking and answering method and device
CN103577556A (en) * 2013-10-21 2014-02-12 北京奇虎科技有限公司 Device and method for obtaining association degree of question and answer pair
CN103577558A (en) * 2013-10-21 2014-02-12 北京奇虎科技有限公司 Device and method for optimizing search ranking of frequently asked question and answer pairs

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947905B (en) * 2017-08-15 2023-02-21 富士通株式会社 Method and equipment for generating question and answer pairs
CN109947905A (en) * 2017-08-15 2019-06-28 富士通株式会社 Generate the method and apparatus for puing question to answer pair
CN109785698A (en) * 2017-11-13 2019-05-21 上海流利说信息技术有限公司 Method, apparatus, electronic equipment and medium for spoken language proficiency evaluation and test
CN110019739A (en) * 2017-11-30 2019-07-16 上海智臻智能网络科技股份有限公司 Answering method and device, computer equipment and storage medium based on necessary condition
WO2019153612A1 (en) * 2018-02-09 2019-08-15 平安科技(深圳)有限公司 Question and answer data processing method, electronic device and storage medium
CN108664644A (en) * 2018-05-16 2018-10-16 微梦创科网络科技(中国)有限公司 A kind of question answering system construction method, question and answer processing method and processing device
CN108846138B (en) * 2018-07-10 2022-06-07 苏州大学 Question classification model construction method, device and medium fusing answer information
CN108846138A (en) * 2018-07-10 2018-11-20 苏州大学 A kind of the problem of fusion answer information disaggregated model construction method, device and medium
CN109460453A (en) * 2018-10-09 2019-03-12 北京来也网络科技有限公司 Data processing method and device for positive negative sample
CN109284383A (en) * 2018-10-09 2019-01-29 北京来也网络科技有限公司 Text handling method and device
CN110175241B (en) * 2019-05-23 2021-08-03 腾讯科技(深圳)有限公司 Question and answer library construction method and device, electronic equipment and computer readable medium
CN110175241A (en) * 2019-05-23 2019-08-27 三角兽(北京)科技有限公司 Question and answer base construction method, device, electronic equipment and computer-readable medium
CN111046133A (en) * 2019-10-29 2020-04-21 平安科技(深圳)有限公司 Question-answering method, question-answering equipment, storage medium and device based on atlas knowledge base
CN111046133B (en) * 2019-10-29 2023-07-25 平安科技(深圳)有限公司 Question and answer method, equipment, storage medium and device based on mapping knowledge base
CN111984775A (en) * 2020-08-12 2020-11-24 北京百度网讯科技有限公司 Question and answer quality determination method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN106909572A (en) A kind of construction method and device of question and answer knowledge base
CN105447206B (en) New comment object identifying method and system based on word2vec algorithms
CN110175325A (en) The comment and analysis method and Visual Intelligent Interface Model of word-based vector sum syntactic feature
CN104268160B (en) A kind of OpinionTargetsExtraction Identification method based on domain lexicon and semantic role
CN102236722B (en) Method and system for generating user comment summaries based on triples
CN103699626B (en) Method and system for analysing individual emotion tendency of microblog user
CN106776711A (en) A kind of Chinese medical knowledge mapping construction method based on deep learning
CN107368547A (en) A kind of intelligent medical automatic question-answering method based on deep learning
CN106980692A (en) A kind of influence power computational methods based on microblogging particular event
CN106909573A (en) A kind of method and apparatus for evaluating question and answer to quality
CN103577556A (en) Device and method for obtaining association degree of question and answer pair
CN105843897A (en) Vertical domain-oriented intelligent question and answer system
CN107305539A (en) A kind of text tendency analysis method based on Word2Vec network sentiment new word discoveries
CN109033166B (en) Character attribute extraction training data set construction method
CN104951518B (en) One kind recommends method based on the newer context of dynamic increment
CN102682120B (en) Method and device for acquiring essential article commented on network
CN105528437A (en) Question-answering system construction method based on structured text knowledge extraction
CN104484380A (en) Personalized search method and personalized search device
CN107357785A (en) Theme feature word abstracting method and system, feeling polarities determination methods and system
CN106407235A (en) A semantic dictionary establishing method based on comment data
CN106446072A (en) Webpage content processing method and apparatus
CN107402912A (en) Parse semantic method and apparatus
CN106446147A (en) Emotion analysis method based on structuring features
CN108363699A (en) A kind of netizen&#39;s school work mood analysis method based on Baidu&#39;s mhkc
CN114547293A (en) Cross-platform false news detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170630