CN106909572A - A kind of construction method and device of question and answer knowledge base - Google Patents
A kind of construction method and device of question and answer knowledge base Download PDFInfo
- Publication number
- CN106909572A CN106909572A CN201510981420.7A CN201510981420A CN106909572A CN 106909572 A CN106909572 A CN 106909572A CN 201510981420 A CN201510981420 A CN 201510981420A CN 106909572 A CN106909572 A CN 106909572A
- Authority
- CN
- China
- Prior art keywords
- answer
- word
- question
- words
- probability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9032—Query formulation
- G06F16/90332—Natural language query formulation or dialogue systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
This application discloses a kind of construction method and device of question and answer knowledge base, and then using the question and answer knowledge base for completing is built to question and answer to evaluating, improve to question and answer to the accuracy evaluated.The question and answer knowledge base is made up of a plurality of Question Log, and methods described includes:The content and the question and answer of question and answer pair are obtained to affiliated classification;The word in the word and answer content in the problem content of question and answer pair is extracted, problem set of words and answer set of words is obtained;Each answer word in each problem word and answer set of words in problem set of words point is made to form an information record, and to each information record, calculate the answer word and semantic relevancy of the problem word in the case where the question and answer are to generic;By a problem word, the semantic relevancy between each answer word in multiple answer words and multiple answer words and described problem word in answer set of words forms a Question Log.
Description
Technical field
The application is related to field of computer technology, more particularly to a kind of construction method and device of question and answer knowledge base.
Background technology
With flourishing for Internet technology, user gets over when living or work runs into various problems
Tend in the communities such as question and answer or other netpage search answers to get over.The citation form of Ask-Answer Community leads to
Often for user proposes problem according to the demand of oneself, and answer is provided by other users.This form is
User obtains answer information there is provided new channel on network.Can be optionally yet with any user
Content is created, that is, is created problem and is created answer, the information quality difference that result in Ask-Answer Community is very big,
Therefore need to evaluate the quality of question and answer pair, and then according to evaluation result by the preferable question and answer of quality to row
Name is forward, or is to delete second-rate question and answer equity.
When at present, to the quality evaluation of question and answer pair, simply used related term Cover Characteristics describe problem and
Semantic matching degree between answer, this is only not only to rest in morphology aspect, and many problems and is answered
It is 0 so as to cause the semantic matching degree between problem and answer between case and in the absence of the covering of related term, but
Exactly such as question and answer a certain in Ask-Answer Community to ask to the core of quality for semantic matching degree between problem and answer
Entitled " which city the provincial capital in Shandong is ", corresponding answer has following two " Jinan ", " province in Shandong
Can be Beijing ".When prior art evaluates question and answer using related term Cover Characteristics to quality, according to problem and
Semantic matching degree between answer then can be by " which city the provincial capital in Shandong is " and " provincial capital in Shandong be Beijing "
It is considered a high-quality question and answer pair, and " which city the provincial capital in Shandong is " and " Jinan ", between
Semantic matching degree be 0, therefore be considered as a low quality question and answer pair, this does not substantially conform to the actual conditions, because
This is necessary to construct question and answer knowledge base in advance, so using question and answer knowledge base to commonly use question and answer to commenting
Valency.
The content of the invention
In order to solve the above technical problems, the embodiment of the present application provides a kind of construction method and dress of question and answer knowledge base
Put, for the question and answer knowledge base using structure completion to question and answer to evaluating, and then improve to question and answer to commenting
The accuracy of valency.
The embodiment of the present application uses following technical proposals:
A kind of construction method of question and answer knowledge base, the question and answer knowledge base is made up of a plurality of Question Log, described
Method includes:
The content and the question and answer of question and answer pair are obtained to affiliated classification;
The word in the word and answer content in the problem content of question and answer pair is extracted, problem set of words is obtained
With answer set of words;
Make each the answer word point in each the problem word and answer set of words in problem set of words
An information record is formed, and to each information record, is calculated the answer word and the problem word is existed
The question and answer are to the semantic relevancy under generic;
It is every in the multiple answer words and multiple answer words in answer set of words by a problem word
Semantic relevancy between individual answer word and described problem word forms a Question Log.
Preferably, the answer word and semantic phase of the problem word in the case where the question and answer are to generic are calculated
Guan Du, specifically includes:
Calculate the probability that the answer word belongs to the category, calculate in the category answer word to the problem
The single-minded degree of the explanation of word, calculates what problem word answer word in the category was explained
Intensity;
Above-mentioned probability, single-minded degree are multiplied with intensity, resulting product is the answer word and the problem
Semantic relevancy of the word in the case where the question and answer are to generic.
Preferably, the probability that the answer word belongs to the category is calculated, is specifically included:
The answer word in the category is calculated, to the single-minded degree of the explanation of the problem word, to specifically include:
The intensity that the problem word is explained with the answer word in the category is calculated, is specifically included:
Probability, single-minded degree are multiplied with intensity, are specifically included:
weight(QWi,AWj| C=Ck)=P (Ck|AWj)*specific(QWi,AWj| C=Ck)*interpret(QWi,AWj| C=Ck)
Wherein:
P(Ck|AWj) it is answer word AWjBelong to classification CkProbability;
specific(QWi,AWj| C=Ck) it is in classification CkUpper answer word AWjTo problem word QWiExplanation
Single-minded degree;
interpret(QWi,AWj| C=Ck) it is in classification CkUpper problem word QWiWith answer word AWjSolved
The intensity released;
P(CK) represent classification CkThe probability of appearance;P(AWj) expression answer be AWjProbability;P(AWj|Ck)
Represent CkClassification belongs to AWjProbability;#(QWi,AWj) problem of representation word be QWiAnd answer word is AWj
Number of times;#(AWj) represent that answer word is AWjNumber of times.
Preferably, the question and answer are to being high-quality question and answer pair, and wherein question and answer are to being divided into high-quality question and answer pair and low
Quality question and answer pair.
Preferably, the word in the word and answer content in the problem content of question and answer pair is extracted, problem is obtained
Set of words and answer set of words, specifically include:
Real is extracted by participle, removal stop words, word join to problem content and answer content respectively
Pronouns, general term for nouns, numerals and measure words, obtains problem set of words and answer set of words is as follows:
(<QW1,QW2,…,QWi,…,QWm>,<AW1,AW2,…,AWi,…,AWn>)
Wherein, QWiRepresent a problem word;AWiRepresent an answer word.
The embodiment of the present application also provides a kind of construction device of question and answer knowledge base, and the question and answer knowledge base is by a plurality of
Question Log is constituted, and described device includes acquisition module, and extraction module, computing module and Question Log are created
Module, wherein:
The acquisition module, obtains the content and the question and answer of question and answer pair to affiliated classification;
The extraction module, for the word in the word and answer content in the problem content for extracting question and answer pair,
Obtain problem set of words and answer set of words;
The computing module, for making each the problem word and answer set of words in problem set of words in
Each answer word point form an information record, and to each information record, calculate the answer word
Language and semantic relevancy of the problem word in the case where the question and answer are to generic;
The Question Log creation module, for by a problem word, the multiple in answer set of words to be answered
The semantic relevancy between each answer word and described problem word in case word and multiple answer words
Form a Question Log.
Preferably, the computing module has specifically included computing unit, wherein:
The computing unit, the probability of the category is belonged to for calculating the answer word, is calculated in the category
, to the single-minded degree of the explanation of the problem word, calculating problem word in the category is used should for the answer word
The intensity that answer word is explained;
Above-mentioned probability, single-minded degree are multiplied with intensity, resulting product is the answer word and the problem
Semantic relevancy of the word in the case where the question and answer are to generic.
Preferably, the processing procedure of the computing unit includes:
The probability that the answer word belongs to the category is calculated, is specifically included:
The answer word in the category is calculated, to the single-minded degree of the explanation of the problem word, to specifically include:
The intensity that the problem word is explained with the answer word in the category is calculated, is specifically included:
Probability, single-minded degree are multiplied with intensity, are specifically included:
weight(QWi,AWj| C=Ck)=P (Ck|AWj)*specific(QWi,AWj| C=Ck)*interpret(QWi,AWj| C=Ck)
Wherein:
P(Ck|AWj) it is answer word AWjBelong to classification CkProbability;
specific(QWi,AWj| C=Ck) it is in classification CkUpper answer word AWjTo problem word QWiExplanation
Single-minded degree;
interpret(QWi,AWj| C=Ck) it is in classification CkUpper problem word QWiWith answer word AWjSolved
The intensity released;
P(CK) represent classification CkThe probability of appearance;P(AWj) expression answer be AWjProbability;P(AWj|Ck)
Represent CkClassification belongs to AWjProbability;#(QWi,AWj) problem of representation word be QWiAnd answer word is AWj
Number of times;#(AWj) represent that answer word is AWjNumber of times.
Preferably, the acquisition module has specifically included selection unit, wherein, the selection unit is used for
High-quality question and answer pair are chosen, wherein question and answer are to being divided into high-quality question and answer pair and low quality question and answer pair.
Preferably, the extraction module, specifically for respectively to problem content and answer content by participle,
Removal stop words, word join, extract entity word, obtain problem set of words and answer set of words such as
Under:
(<QW1,QW2,…,QWi,…,QWm>,<AW1,AW2,…,AWi,…,AWn>)
Wherein, QWiRepresent a problem word;AWiRepresent an answer word.
Above-mentioned at least one technical scheme that the embodiment of the present application is used can reach following beneficial effect:Obtain
The content of question and answer pair and the question and answer are to obtaining problem set of words and answer word collection after affiliated classification
Close;Make each the answer word point shape in each the problem word and answer set of words in problem set of words
Into an information record, and to each information record, the answer word and the problem word are calculated in institute
Question and answer are stated to the semantic relevancy under generic;By a problem word, the multiple in answer set of words
The semanteme between each answer word and described problem word in answer word and multiple answer words is related
Degree forms a Question Log, and then the question and answer knowledge base for including a plurality of Question Log completed using structure
It is final to improve to question and answer to the accuracy evaluated to question and answer to evaluating.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application,
The schematic description and description of the application does not constitute the improper limit to the application for explaining the application
It is fixed.In the accompanying drawings:
A kind of construction method of question and answer knowledge base that Fig. 1 is provided for the embodiment of the present application realizes schematic flow sheet;
The Question Log detailed schematic that Fig. 2 is provided for the embodiment of the present application;
A kind of structure schematic flow sheet of question and answer knowledge base that Fig. 3 is provided for the embodiment of the present application;
A kind of construction device schematic diagram of question and answer knowledge base that Fig. 4 is provided for the embodiment of the present application.
Specific embodiment
It is specifically real below in conjunction with the application to make the purpose, technical scheme and advantage of the application clearer
Apply example and corresponding accompanying drawing is clearly and completely described to technical scheme.Obviously, it is described
Embodiment is only some embodiments of the present application, rather than whole embodiments.Based on the implementation in the application
Example, the every other implementation that those of ordinary skill in the art are obtained under the premise of creative work is not made
Example, belongs to the scope of the application protection.
A kind of construction method of question and answer knowledge base that Fig. 1 is provided for the embodiment of the present application realizes schematic flow sheet,
Including following steps:
Step 11:The content and the question and answer of question and answer pair are obtained to affiliated classification.
To being usually in network application, certain user proposes problem content to question and answer according to oneself demand, by other
User provides answer content, wherein, a problem content may correspond to a plurality of answer content, here can be with
One problem content and an answer content are turned into a question and answer pair.When obtaining question and answer pair, it is possible to use climb
Worm captures the webpage containing high-quality question and answer pair from internet, and high-quality is relative to low quality question and answer pair
Speech, the answer content of high-quality question and answer centering can preferably answer the problem content of question and answer pair.These
The webpage for including high-quality question and answer pair can be cQA communities, major professional forums, and above-mentioned webpage is entered
Row parsing obtains substantial amounts of question and answer to content.
In addition obtain question and answer to while, the question and answer can also be extracted to generic information, these classes
Can be the classification of Ask-Answer Community or web page contents to question and answer to entirety, such as by all of question and answer pair
Classification be divided into game, medical treatment & health, motion is read, business etc..
Step 12:The word in the word and answer content in the problem content of question and answer pair is extracted, problem is obtained
Set of words and answer set of words.
The word in the problem content and answer content of question and answer pair is extracted, specifically can respectively to problem content
With answer content by steps such as participle, removal stop words, word join, extraction entity words, problem is obtained
Set of words and answer set of words can be as described below form:
(<QW1,QW2,…,QWi,…,QWm>,<AW1,AW2,…,AWi,…,AWn>),
Wherein QW is a problem word, and AW is an answer word, and the label in the lower right corner is then problem
The numbering of word or answer word.The problem content of such as a certain question and answer pair is that " which city the provincial capital in Shandong is
City ", corresponding answer content is:" provincial capital in Shandong is Jinan ", then obtain problem word by above-mentioned treatment
Language set and answer set of words can for (<Shandong1, provincial capital2, city3>,<Shandong1, provincial capital2, Ji
South3>)。
Step 13:Make each answer in each the problem word and answer set of words in problem set of words
Word point forms an information record, and to each information record, calculates the answer word and the problem
Semantic relevancy of the word in the case where the question and answer are to generic.
Make each the answer word point in each the problem word and answer set of words in problem set of words
Formed an information record when, also by before for an example as a example by, for problem set of words<Mountain
East1, provincial capital2, city3>In each problem word and answer set of words<Shandong1, provincial capital2, Ji
South3>In each answer word, set up form nine information records altogether, the form of information record can be with
It is (Shandong1, Shandong1), (Shandong1, provincial capital2), (Shandong1, Jinan3), (provincial capital2, Shandong1)
Deng totally nine information records.
To each information record, the answer word and the problem word are calculated in the question and answer to affiliated
Semantic relevancy under classification, obtains the numerical value of specific semantic relevancy.
Step 14:By a problem word, multiple answer words and multiple answer words in answer set of words
Semantic relevancy between each answer word and described problem word in language forms a Question Log.
A general question and answer after in the treatment of step 11 to that can produce multiple problem words and multiple answers
Word, can answer a problem word in multiple problem words, the multiple in answer set of words here
The semantic phase between each answer word and one problem word in case word and multiple answer words
Guan Du forms a Question Log, and a final question and answer are a plurality of to that will be formed after the treatment by step 13
Question Log.
It should be noted that above-mentioned steps 11, step 12 and step 13 are only to a question and answer pair
Processing procedure, most at last a large amount of high-quality question and answer to can be built by after above-mentioned several step process
Go out question and answer knowledge base.
By above-described embodiment to substantial amounts of question and answer to being processed after construct question and answer knowledge base, Jin Erli
With build complete include the question and answer knowledge base of a plurality of Question Log to question and answer to be evaluated to evaluating,
It is final to improve to question and answer to the accuracy evaluated.
The answer word and semanteme of the problem word in the case where the question and answer are to generic are calculated in step 13
The degree of correlation can specifically include:The probability that the answer word belongs to the category is calculated, calculating should in the category
To the single-minded degree of the explanation of the problem word, this is answered answer word to calculate in the category problem word use
The intensity that case word is explained;Above-mentioned probability, single-minded degree are multiplied with intensity, resulting product is
The answer word and the semantic relevancy of the problem word under the classification of the question and answer pair.
Wherein:The probability that the answer word belongs to the category is calculated, is specifically as follows:
The answer word in the category is calculated, to the single-minded degree of the explanation of the problem word, to be specifically as follows:
The intensity that the problem word is explained with the answer word in the category is calculated, is specifically as follows:
Probability, single-minded degree are multiplied with intensity, are specifically as follows:
weight(QWi,AWj| C=Ck)=P (Ck|AWj)*specific(QWi,AWj| C=Ck)*interpret(QWi,AWj| C=Ck)
P (the C in above-mentioned several formulak|AWj) it is answer word AWjBelong to classification CkProbability;
specific(QWi,AWj| C=Ck) it is in classification CkUpper answer word AWjTo problem word QWiExplanation
Single-minded degree;
interpret(QWi,AWj| C=Ck) it is in classification CkUpper problem word QWiWith answer word AWjSolved
The intensity released;
P(CK) represent classification CkThe probability of appearance;P(AWj) expression answer be AWjProbability;P(AWj|Ck)
Represent CkClassification belongs to AWjProbability;#(QWi,AWj) problem of representation word be QWiAnd answer word is AWj
Number of times;#(AWj) represent that answer word is AWjNumber of times.
Processed by step 12 and obtain the storage format of problem set of words and answer set of words and can be:
(<QW1,QW2,…,QWi,…,QWm>,<AW1,AW2,…,AWi,…,AWn>, cate1)
Then by above-mentioned calculating after, for above-mentioned each problem word QWi(i=1,2 ..., m), calculate
AWj(j=1 ..., n) and QWiThe semantic relevancy on classification cate1.Finally, for each QWi,
Each answer word in multiple answer words in the answer set of words for obtaining and multiple answer words and
QWiBetween semantic relevancy form a Question Log, its corresponding Question Log details is as shown in Figure 2.
Simply three Question Logs are shown in Fig. 2.
It should be noted that a question and answer to that can include a plurality of Question Log, here one after treatment
Individual Question Log includes a problem word, multiple answer words, and each answer word and described problem
Semantic relevancy between word.Also, it should be noted that above-mentioned semantic relevancy is Question Log at one
Semantic relevancy under classification, can also calculate the language of answer word and problem word under different classifications
The step of adopted degree of correlation, final multiple above-mentioned Question Logs composition question and answer knowledge bases, structure question and answer knowledge base,
Detail can be as shown in Figure 3.
In order to clearly demonstrate the technical scheme of the embodiment of the present application offer, with reference to a question and answer knowledge base
Local detail illustrate, as shown in table 1, including three Question Logs, wherein answer word in table 1
Numerical value below is semantic relevancy of the answer word with problem word in the case where classification is medical treatment & health.
The Question Log example of table 1
Using above-mentioned question and answer knowledge base to the question and answer shown in table 2 to evaluating when, obtain question and answer pair to be evaluated
Problem content in word and answer content in word, from question and answer knowledge base choose answer word have
[oral, cough and asthma, xiao'er ganmao granules, check, cough-relieving, treatment, flu-like symptom, cold granules], meter
Calculate [oral, cough and asthma, xiao'er ganmao granules check, cough-relieving, treatment, flu-like symptom, cold granules] with
The semantic relevancy of problem word in table 1, matter of the final related value using semantic relevancy to question and answer pair
Amount is evaluated.
Semantic relevancy is calculated by question and answer can draw, the semantic relevancy of the question and answer pair has reached 0.9
(semantic relevancy span is 0-1).It can be seen that the application can be very good solve it is this kind of without phase
Close word covering but semantic similarity question and answer pair very high.And prior art to the question and answer shown in table 2 to commenting
During valency, only according to not having related term to cover in problem and answer, therefore prior art will be considered that the question and answer pair
There is no semantic relevancy, be finally classified as a low-quality question and answer pair.Therefore structure is provided with the application
The method of question and answer knowledge base, and then the question and answer knowledge base pair for including a plurality of Question Log completed using structure
Question and answer are final to improve to question and answer to the accuracy evaluated to evaluating.
The question and answer pair to be evaluated of table 2
Above-mentioned several embodiments are all the present processes embodiment, correspondingly, present invention also provides one kind
The construction device embodiment of question and answer knowledge base, the question and answer knowledge base is made up of a plurality of Question Log, is specifically shown in
Fig. 4, including:Acquisition module 21, extraction module 22, computing module 23 and Question Log creation module 24,
Wherein:
The acquisition module 21, can be used for obtaining the content and the question and answer of question and answer pair to affiliated classification;
The extraction module 22, can be used for extracting in the word and answer content in the problem content of question and answer pair
Word, obtain problem set of words and answer set of words;
The computing module 23, can be used for making each the problem word and answer word in problem set of words
Each answer word point in set forms an information record, and to each information record, calculating should
Answer word and semantic relevancy of the problem word in the case where the question and answer are to generic;
The Question Log creation module 24, can be used for by a problem word, in answer set of words
The semanteme between each answer word and described problem word in multiple answer words and multiple answer words
The degree of correlation forms a Question Log.
Above-mentioned computing module 23 has specifically included computing unit, wherein:The computing unit, for calculating
The answer word belongs to the probability of the category, calculates solution of the answer word to the problem word in the category
The single-minded degree released, calculates the intensity that the problem word is explained with the answer word in the category;Will
Above-mentioned probability, single-minded degree are multiplied with intensity, and resulting product is that the answer word and the problem word exist
The question and answer are to the semantic relevancy under generic.
The processing procedure of the computing unit includes:The probability that the answer word belongs to the category is calculated, specifically
Including:
The answer word in the category is calculated, to the single-minded degree of the explanation of the problem word, to specifically include:
The intensity that the problem word is explained with the answer word in the category is calculated, is specifically included:
Probability, single-minded degree are multiplied with intensity, are specifically included:
weight(QWi,AWj| C=Ck)=P (Ck|AWj)*specific(QWi,AWj| C=Ck)*interpret(QWi,AWj| C=Ck)
Wherein:
P(Ck|AWj) it is answer word AWjBelong to classification CkProbability;
specific(QWi,AWj| C=Ck) it is in classification CkUpper answer word AWjTo problem word QWiExplanation
Single-minded degree;
interpret(QWi,AWj| C=Ck) it is in classification CkUpper problem word QWiWith answer word AWjSolved
The intensity released;
P(CK) represent classification CkThe probability of appearance;P(AWj) expression answer be AWjProbability;P(AWj|Ck)
Represent CkClassification belongs to AWjProbability;#(QWi,AWj) problem of representation word be QWiAnd answer word is AWj
Number of times;#(AWj) represent that answer word is AWjNumber of times.
Above-mentioned acquisition module 21 can specifically include selection unit, wherein, the selection unit is used for
High-quality question and answer pair are chosen, wherein question and answer are to being divided into high-quality question and answer pair and low quality question and answer pair.
Said extracted module 22, can specifically for respectively to problem content and answer content by participle, go
Except stop words, word join, entity word is extracted, obtain problem set of words and answer set of words is as follows:
(<QW1,QW2,…,QWi,…,QWm>,<AW1,AW2,…,AWi,…,AWn>)
Wherein, QWiRepresent a problem word;AWiRepresent an answer word.
Embodiments herein is these are only, the application is not limited to.For people in the art
For member, the application can have various modifications and variations.It is all to be made within spirit herein and principle
Any modification, equivalent substitution and improvements etc., within the scope of should be included in claims hereof.
Claims (10)
1. a kind of construction method of question and answer knowledge base, it is characterised in that the question and answer knowledge base is asked by a plurality of
Answer record to constitute, methods described includes:
The content and the question and answer of question and answer pair are obtained to affiliated classification;
The word in the word and answer content in the problem content of question and answer pair is extracted, problem set of words is obtained
With answer set of words;
Make each the answer word point in each the problem word and answer set of words in problem set of words
An information record is formed, and to each information record, is calculated the answer word and the problem word is existed
The question and answer are to the semantic relevancy under generic;
It is every in the multiple answer words and multiple answer words in answer set of words by a problem word
Semantic relevancy between individual answer word and described problem word forms a Question Log.
2. method according to claim 1, it is characterised in that calculate the answer word and the problem
Semantic relevancy of the word in the case where the question and answer are to generic, specifically includes:
Calculate the probability that the answer word belongs to the category, calculate in the category answer word to the problem
The single-minded degree of the explanation of word, calculates what problem word answer word in the category was explained
Intensity;
Above-mentioned probability, single-minded degree are multiplied with intensity, resulting product is the answer word and the problem
Semantic relevancy of the word in the case where the question and answer are to generic.
3. method according to claim 2, it is characterised in that
The probability that the answer word belongs to the category is calculated, is specifically included:
The answer word in the category is calculated, to the single-minded degree of the explanation of the problem word, to specifically include:
The intensity that the problem word is explained with the answer word in the category is calculated, is specifically included:
Probability, single-minded degree are multiplied with intensity, are specifically included:
weight(QWi,AWj| C=Ck)=P (Ck|AWj)*specific(QWi,AWj| C=Ck)*interpret(QWi,AWj| C=Ck)
Wherein:
P(Ck|AWj) it is answer word AWjBelong to classification CkProbability;
specific(QWi,AWj| C=Ck) it is in classification CkUpper answer word AWjTo problem word QWiExplanation
Single-minded degree;
interpret(QWi,AWj| C=Ck) it is in classification CkUpper problem word QWiWith answer word AWjSolved
The intensity released;
P(CK) represent classification CkThe probability of appearance;P(AWj) expression answer be AWjProbability;P(AWj|Ck)
Represent CkClassification belongs to AWjProbability;#(QWi,AWj) problem of representation word be QWiAnd answer word is AWj
Number of times;#(AWj) represent that answer word is AWjNumber of times.
4. method according to claim 1, it is characterised in that the question and answer are to being high-quality question and answer
Right, wherein question and answer are to being divided into high-quality question and answer pair and low quality question and answer pair.
5. the method according to any one of Claims 1-4, it is characterised in that extract question and answer pair
The word in word and answer content in problem content, obtains problem set of words and answer set of words,
Specifically include:
Real is extracted by participle, removal stop words, word join to problem content and answer content respectively
Pronouns, general term for nouns, numerals and measure words, obtains problem set of words and answer set of words is as follows:
(<QW1,QW2,…,QWi,…,QWm>,<AW1,AW2,…,AWi,…,AWn>)
Wherein, QWiRepresent a problem word;AWiRepresent an answer word.
6. a kind of construction device of question and answer knowledge base, it is characterised in that the question and answer knowledge base is asked by a plurality of
Answer record to constitute, described device includes acquisition module, extraction module, computing module and Question Log create mould
Block, wherein:
The acquisition module, for obtaining the content and the question and answer of question and answer pair to affiliated classification;
The extraction module, for the word in the word and answer content in the problem content for extracting question and answer pair,
Obtain problem set of words and answer set of words;
The computing module, for making each the problem word and answer set of words in problem set of words in
Each answer word point form an information record, and to each information record, calculate the answer word
Language and semantic relevancy of the problem word in the case where the question and answer are to generic;
The Question Log creation module, for by a problem word, the multiple in answer set of words to be answered
The semantic relevancy between each answer word and described problem word in case word and multiple answer words
Form a Question Log.
7. device according to claim 6, it is characterised in that the computing module has been specifically included
Computing unit, wherein:
The computing unit, the probability of the category is belonged to for calculating the answer word, is calculated in the category
, to the single-minded degree of the explanation of the problem word, calculating problem word in the category is used should for the answer word
The intensity that answer word is explained;
Above-mentioned probability, single-minded degree are multiplied with intensity, resulting product is the answer word and the problem
Semantic relevancy of the word in the case where the question and answer are to generic.
8. device according to claim 7, it is characterised in that the processing procedure of the computing unit
Including:
The probability that the answer word belongs to the category is calculated, is specifically included:
The answer word in the category is calculated, to the single-minded degree of the explanation of the problem word, to specifically include:
The intensity that the problem word is explained with the answer word in the category is calculated, is specifically included:
Probability, single-minded degree are multiplied with intensity, are specifically included:
weight(QWi,AWj| C=Ck)=P (Ck|AWj)*specific(QWi,AWj| C=Ck)*interpret(QWi,AWj| C=Ck)
Wherein:
P(Ck|AWj) it is answer word AWjBelong to classification CkProbability;
specific(QWi,AWj| C=Ck) it is in classification CkUpper answer word AWjTo problem word QWiExplanation
Single-minded degree;
interpret(QWi,AWj| C=Ck) it is in classification CkUpper problem word QWiWith answer word AWjSolved
The intensity released;
P(CK) represent classification CkThe probability of appearance;P(AWj) expression answer be AWjProbability;P(AWj|Ck)
Represent CkClassification belongs to AWjProbability;#(QWi,AWj) problem of representation word be QWiAnd answer word is AWj
Number of times;#(AWj) represent that answer word is AWjNumber of times.
9. device according to claim 6, it is characterised in that the acquisition module has been specifically included
Unit is chosen, wherein, the selection unit, for choosing high-quality question and answer pair, wherein question and answer are to being divided into
High-quality question and answer pair and low quality question and answer pair.
10. the device according to any one of claim 6 to 9, it is characterised in that
The extraction module, specifically for being disabled by participle, removal to problem content and answer content respectively
Word, word join, extract entity word, obtain problem set of words and answer set of words is as follows:
(<QW1,QW2,…,QWi,…,QWm>,<AW1,AW2,…,AWi,…,AWn>)
Wherein, QWiRepresent a problem word;AWiRepresent an answer word.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510981420.7A CN106909572A (en) | 2015-12-23 | 2015-12-23 | A kind of construction method and device of question and answer knowledge base |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510981420.7A CN106909572A (en) | 2015-12-23 | 2015-12-23 | A kind of construction method and device of question and answer knowledge base |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106909572A true CN106909572A (en) | 2017-06-30 |
Family
ID=59200005
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510981420.7A Pending CN106909572A (en) | 2015-12-23 | 2015-12-23 | A kind of construction method and device of question and answer knowledge base |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106909572A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108664644A (en) * | 2018-05-16 | 2018-10-16 | 微梦创科网络科技(中国)有限公司 | A kind of question answering system construction method, question and answer processing method and processing device |
CN108846138A (en) * | 2018-07-10 | 2018-11-20 | 苏州大学 | A kind of the problem of fusion answer information disaggregated model construction method, device and medium |
CN109284383A (en) * | 2018-10-09 | 2019-01-29 | 北京来也网络科技有限公司 | Text handling method and device |
CN109460453A (en) * | 2018-10-09 | 2019-03-12 | 北京来也网络科技有限公司 | Data processing method and device for positive negative sample |
CN109785698A (en) * | 2017-11-13 | 2019-05-21 | 上海流利说信息技术有限公司 | Method, apparatus, electronic equipment and medium for spoken language proficiency evaluation and test |
CN109947905A (en) * | 2017-08-15 | 2019-06-28 | 富士通株式会社 | Generate the method and apparatus for puing question to answer pair |
CN110019739A (en) * | 2017-11-30 | 2019-07-16 | 上海智臻智能网络科技股份有限公司 | Answering method and device, computer equipment and storage medium based on necessary condition |
WO2019153612A1 (en) * | 2018-02-09 | 2019-08-15 | 平安科技(深圳)有限公司 | Question and answer data processing method, electronic device and storage medium |
CN110175241A (en) * | 2019-05-23 | 2019-08-27 | 三角兽(北京)科技有限公司 | Question and answer base construction method, device, electronic equipment and computer-readable medium |
CN111046133A (en) * | 2019-10-29 | 2020-04-21 | 平安科技(深圳)有限公司 | Question-answering method, question-answering equipment, storage medium and device based on atlas knowledge base |
CN111984775A (en) * | 2020-08-12 | 2020-11-24 | 北京百度网讯科技有限公司 | Question and answer quality determination method, device, equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103577556A (en) * | 2013-10-21 | 2014-02-12 | 北京奇虎科技有限公司 | Device and method for obtaining association degree of question and answer pair |
CN103577558A (en) * | 2013-10-21 | 2014-02-12 | 北京奇虎科技有限公司 | Device and method for optimizing search ranking of frequently asked question and answer pairs |
CN103810218A (en) * | 2012-11-14 | 2014-05-21 | 北京百度网讯科技有限公司 | Problem cluster-based automatic asking and answering method and device |
-
2015
- 2015-12-23 CN CN201510981420.7A patent/CN106909572A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103810218A (en) * | 2012-11-14 | 2014-05-21 | 北京百度网讯科技有限公司 | Problem cluster-based automatic asking and answering method and device |
CN103577556A (en) * | 2013-10-21 | 2014-02-12 | 北京奇虎科技有限公司 | Device and method for obtaining association degree of question and answer pair |
CN103577558A (en) * | 2013-10-21 | 2014-02-12 | 北京奇虎科技有限公司 | Device and method for optimizing search ranking of frequently asked question and answer pairs |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109947905B (en) * | 2017-08-15 | 2023-02-21 | 富士通株式会社 | Method and equipment for generating question and answer pairs |
CN109947905A (en) * | 2017-08-15 | 2019-06-28 | 富士通株式会社 | Generate the method and apparatus for puing question to answer pair |
CN109785698A (en) * | 2017-11-13 | 2019-05-21 | 上海流利说信息技术有限公司 | Method, apparatus, electronic equipment and medium for spoken language proficiency evaluation and test |
CN110019739A (en) * | 2017-11-30 | 2019-07-16 | 上海智臻智能网络科技股份有限公司 | Answering method and device, computer equipment and storage medium based on necessary condition |
WO2019153612A1 (en) * | 2018-02-09 | 2019-08-15 | 平安科技(深圳)有限公司 | Question and answer data processing method, electronic device and storage medium |
CN108664644A (en) * | 2018-05-16 | 2018-10-16 | 微梦创科网络科技(中国)有限公司 | A kind of question answering system construction method, question and answer processing method and processing device |
CN108846138B (en) * | 2018-07-10 | 2022-06-07 | 苏州大学 | Question classification model construction method, device and medium fusing answer information |
CN108846138A (en) * | 2018-07-10 | 2018-11-20 | 苏州大学 | A kind of the problem of fusion answer information disaggregated model construction method, device and medium |
CN109460453A (en) * | 2018-10-09 | 2019-03-12 | 北京来也网络科技有限公司 | Data processing method and device for positive negative sample |
CN109284383A (en) * | 2018-10-09 | 2019-01-29 | 北京来也网络科技有限公司 | Text handling method and device |
CN110175241B (en) * | 2019-05-23 | 2021-08-03 | 腾讯科技(深圳)有限公司 | Question and answer library construction method and device, electronic equipment and computer readable medium |
CN110175241A (en) * | 2019-05-23 | 2019-08-27 | 三角兽(北京)科技有限公司 | Question and answer base construction method, device, electronic equipment and computer-readable medium |
CN111046133A (en) * | 2019-10-29 | 2020-04-21 | 平安科技(深圳)有限公司 | Question-answering method, question-answering equipment, storage medium and device based on atlas knowledge base |
CN111046133B (en) * | 2019-10-29 | 2023-07-25 | 平安科技(深圳)有限公司 | Question and answer method, equipment, storage medium and device based on mapping knowledge base |
CN111984775A (en) * | 2020-08-12 | 2020-11-24 | 北京百度网讯科技有限公司 | Question and answer quality determination method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106909572A (en) | A kind of construction method and device of question and answer knowledge base | |
CN105447206B (en) | New comment object identifying method and system based on word2vec algorithms | |
CN110175325A (en) | The comment and analysis method and Visual Intelligent Interface Model of word-based vector sum syntactic feature | |
CN104268160B (en) | A kind of OpinionTargetsExtraction Identification method based on domain lexicon and semantic role | |
CN102236722B (en) | Method and system for generating user comment summaries based on triples | |
CN103699626B (en) | Method and system for analysing individual emotion tendency of microblog user | |
CN106776711A (en) | A kind of Chinese medical knowledge mapping construction method based on deep learning | |
CN107368547A (en) | A kind of intelligent medical automatic question-answering method based on deep learning | |
CN106980692A (en) | A kind of influence power computational methods based on microblogging particular event | |
CN106909573A (en) | A kind of method and apparatus for evaluating question and answer to quality | |
CN103577556A (en) | Device and method for obtaining association degree of question and answer pair | |
CN105843897A (en) | Vertical domain-oriented intelligent question and answer system | |
CN107305539A (en) | A kind of text tendency analysis method based on Word2Vec network sentiment new word discoveries | |
CN109033166B (en) | Character attribute extraction training data set construction method | |
CN104951518B (en) | One kind recommends method based on the newer context of dynamic increment | |
CN102682120B (en) | Method and device for acquiring essential article commented on network | |
CN105528437A (en) | Question-answering system construction method based on structured text knowledge extraction | |
CN104484380A (en) | Personalized search method and personalized search device | |
CN107357785A (en) | Theme feature word abstracting method and system, feeling polarities determination methods and system | |
CN106407235A (en) | A semantic dictionary establishing method based on comment data | |
CN106446072A (en) | Webpage content processing method and apparatus | |
CN107402912A (en) | Parse semantic method and apparatus | |
CN106446147A (en) | Emotion analysis method based on structuring features | |
CN108363699A (en) | A kind of netizen's school work mood analysis method based on Baidu's mhkc | |
CN114547293A (en) | Cross-platform false news detection method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170630 |