CN108959529A - Determination method, apparatus, equipment and the storage medium of problem answers type - Google Patents

Determination method, apparatus, equipment and the storage medium of problem answers type Download PDF

Info

Publication number
CN108959529A
CN108959529A CN201810695686.9A CN201810695686A CN108959529A CN 108959529 A CN108959529 A CN 108959529A CN 201810695686 A CN201810695686 A CN 201810695686A CN 108959529 A CN108959529 A CN 108959529A
Authority
CN
China
Prior art keywords
lat
result
query statement
question
answer class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810695686.9A
Other languages
Chinese (zh)
Inventor
郑俊强
时迎超
丁宇辰
佘俏俏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810695686.9A priority Critical patent/CN108959529A/en
Publication of CN108959529A publication Critical patent/CN108959529A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses determination method, apparatus, equipment and the storage mediums of a kind of problem answers type.This method comprises: extracting the characteristic information of the question and answer class query statement of input;The characteristic information is inputted to the sequence labelling model and disaggregated model pre-established, obtains the first problem answer type LAT result of the sequence labelling model output and the 2nd LAT result of disaggregated model output;The corresponding LAT result of the question and answer class query statement is determined according to the first LAT result and the 2nd LAT result.The determination method of problem answers type provided in an embodiment of the present invention, the LAT result exported respectively using sequence labelling model and disaggregated model come determine the final LAT of question and answer class query statement as a result, it is possible to increase determine LAT result accuracy.

Description

Determination method, apparatus, equipment and the storage medium of problem answers type
Technical field
The present embodiments relate to technical field of information processing more particularly to a kind of determination method of problem answers type, Device, equipment and storage medium.
Background technique
In internet when carrying out information search using search engine, inquiry (query) sentence of user's input can divide For question and answer class query sentence and non-question and answer class query sentence.For question and answer class query statement, problem answers type (Lexical Answer Type, LAT) is much entity (entity) type.Entity is present in objective world and can be mutual The things mutually distinguished, entity can be people and be also possible to object material object, can also be abstract concept.For example, one possible Query sentence are as follows: how old change deciduous teeth, the corresponding LAT result of the sentence are as follows: age;Another possible query sentence are as follows: Pregnant woman can eat any fruit, the corresponding LAT result of the sentence are as follows: fruit.
In search technique, it can be answered according to the LAT of question and answer class query sentence as a result, carrying out entity from search result Case positioning, i.e., filtering out from search result includes entity corresponding with LAT result as a result, showing user in turn.
Currently, the method for determining the LAT result of question and answer class query sentence is realized based on disaggregated model, the classification mould The algorithm that type uses includes the machines such as support vector machines (Support Vector Machine, SVM), maximum entropy, logistic regression Learning algorithm and convolutional neural networks (Convolutional Neural Network, CNN) even depth learning algorithm.
The LAT of question and answer class query sentence is determined based on disaggregated model as a result, its result granularity depends on the LAT of prebuild Classification system be easy to cause the problem that granularity is excessively thick, accuracy is lower.
Summary of the invention
It, can be with the embodiment of the invention provides determination method, apparatus, equipment and the storage medium of a kind of problem answers type Improve the accuracy of the problem of determining question and answer class query statement answer type.
In a first aspect, the embodiment of the invention provides a kind of determination methods of problem answers type, this method comprises:
Extract the characteristic information of the question and answer class query statement of input;
The characteristic information is inputted to the sequence labelling model and disaggregated model pre-established, obtains the sequence labelling mould The first problem answer type LAT result of type output and the 2nd LAT result of disaggregated model output;
The corresponding LAT result of the question and answer class query statement is determined according to the first LAT result and the 2nd LAT result.
Second aspect, the embodiment of the invention also provides a kind of determining device of problem answers type, which includes:
Characteristic information extracting module, the characteristic information of the question and answer class query statement for extracting input;
Mode input module, for the characteristic information to be inputted the sequence labelling model and disaggregated model that pre-establish, Obtain the first problem answer type LAT result of the sequence labelling model output and the 2nd LAT of disaggregated model output As a result;
LAT result determining module, for determining that the question and answer class inquires language according to the first LAT result and the 2nd LAT result The corresponding LAT result of sentence.
The third aspect the embodiment of the invention also provides a kind of computer equipment, including memory, processor and is stored in On memory and the computer program that can run on a processor, the processor are realized when executing described program as the present invention is real Apply method described in example.
Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer Program, the program realize method as described in the embodiments of the present invention when being executed by processor.
The embodiment of the present invention extracts the characteristic information of the question and answer class query statement of input first, then that characteristic information is defeated Enter the sequence labelling model and disaggregated model pre-established, obtains the first problem answer type LAT of sequence labelling model output As a result the 2nd LAT with disaggregated model output is as a result, finally determine that question and answer class is looked into according to the first LAT result and the 2nd LAT result Ask the corresponding LAT result of sentence.The determination method of problem answers type provided in an embodiment of the present invention, utilizes sequence labelling model The LAT result exported respectively with disaggregated model come determine the final LAT of question and answer class query statement as a result, it is possible to increase determine LAT As a result accuracy.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the determination method for problem answers type that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow chart of the determination method of problem answers type provided by Embodiment 2 of the present invention;
Fig. 3 is a kind of flow chart of the determination method for problem answers type that the embodiment of the present invention three provides;
Fig. 4 is a kind of flow chart of the determination method for problem answers type that the embodiment of the present invention four provides;
Fig. 5 a is the flow chart of the determination method for another problem answers type that the embodiment of the present invention four provides;
Fig. 5 b is a kind of usage scenario flow chart of the determination method for problem answers type that the embodiment of the present invention four provides;
Fig. 6 is a kind of structural schematic diagram of the determining device for problem answers type that the embodiment of the present invention five provides;
Fig. 7 is a kind of structural schematic diagram for computer equipment that the embodiment of the present invention six provides.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
Embodiment one
Fig. 1 is a kind of flow chart of the determination method for problem answers type that the embodiment of the present invention one provides, the present embodiment It is applicable to the case where being determined to problem answers type, this method can be executed by the determining device of answer type, should Device can be made of hardware and/or software, and can be generally integrated in computer, server and all be determined function comprising answer type In the terminal of energy.As shown in Figure 1, this method specifically comprises the following steps.
Step 110, the characteristic information of the question and answer class query statement of input is extracted.
Wherein, characteristic information may include at least one of word cutting information, part-of-speech information and dependence information.Word cutting Information, which can be, is cut into individual word one by one for question and answer class query statement using word cutting algorithm;Part-of-speech information may include name Word, verb, notional word and function word etc.;Dependence may include interdependent word and interdependent word part of speech.The mode for extracting dependence can To be, the dependence of word in question and answer class query statement is extracted using the syntactic analysis based on dependency grammar, such as Subject, Predicate and Object, number Magnitude relation, apposition, front and back additional relationships and analogy relationship etc..
In the present embodiment, after user's input inquiry quasi-sentence into search box, use word cutting algorithm by question and answer class first Query statement is cut into individual word one by one, obtains word cutting information, then analyzes the part of speech of each word, obtain The part-of-speech information for obtaining each word finally uses the syntactic analysis based on dependency grammar to extract the dependence between word, obtains Obtain dependence information.
Step 120, characteristic information is inputted to the sequence labelling model and disaggregated model pre-established, obtains sequence labelling mould First LAT result of type output and the 2nd LAT result of disaggregated model output.
Wherein, problem answers type (Lexical Answer Type, LAT) result, which can be, indicates question and answer class inquiry language The word of the answer type of sentence.Such as: question and answer class query statement are as follows: what fruit pregnant woman can eat, then LAT result are as follows: fruit, " fruit " is the word that can indicate answer type.Indicate that the word of answer type is likely to be present in question and answer class query statement In, it is also possible to it is not present in question and answer class query statement.
Sequence labelling model can be condition random field (Conditional Random Fields, CRF) model.Sequence Marking model can be based on the first training data source, model obtained from being trained using CRF algorithm.First training data Source may include: multiple random question and answer class query statements, the characteristic information of the multiple random question and answer class query statement and right The LAT result of the multiple random question and answer class query statement mark.In the present embodiment, the working principle of sequence labelling model can be with It is that the characteristic information of input is labeled and is classified, obtains the first LAT of question and answer class query statement as a result, and exporting.
Disaggregated model can be based on the second training data source, using convolutional neural networks (Convolutional Neural Network, CNN) model that is trained of algorithm.Second training data source may include: multiple by asking at random Answer data that the LAT result of class query statement and random question and answer class query statement forms to and the random question and answer class of data centering The characteristic information of query statement.In the present embodiment, the working principle of disaggregated model be can be, and be divided the characteristic information of input Class obtains the 2nd LAT of question and answer class query statement as a result, and exporting.
Specifically, characteristic information is separately input into and is pre-established after the characteristic information for obtaining question and answer class query statement Sequence labelling model and disaggregated model, after two models respectively analyze characteristic information, sequence labelling model output the One LAT is as a result, disaggregated model exports the 2nd LAT result.
Step 130, the corresponding LAT result of question and answer class query statement is determined according to the first LAT result and the 2nd LAT result.
Specifically, determining the corresponding LAT result of question and answer class query statement according to the first LAT result and the 2nd LAT result Mode can be, if the first LAT result is sky, using the 2nd LAT result as the corresponding LAT result of question and answer class query statement; If the 2nd LAT result is sky, using the first LAT result as the corresponding LAT result of question and answer class query statement;If the first LAT is tied Fruit and the 2nd LAT result are not sky, then the first LAT result and the 2nd LAT result are inputted decision-tree model, obtain decision tree The LAT integrated results of model output, using LAT integrated results as the corresponding LAT result of question and answer class query statement;Alternatively, no matter Whether the first LAT result and the 2nd LAT result are sky, and the first LAT result and the 2nd LAT result are all inputted decision-tree model, The LAT integrated results of decision-tree model output are obtained, using LAT integrated results as the corresponding LAT result of question and answer class query statement.
Wherein, integrated results can be one in the first LAT result and the 2nd LAT result or the first LAT result and The intersection of two LAT results or the union of the first LAT result and the 2nd LAT result.
Decision-tree model is a kind of classifier.In the present embodiment, decision-tree model be can be based on third training data source, It promotes decision tree (Gradient Boosting Decision Tree, GBDT) algorithm using gradient to be trained, the mould of acquisition Type.Wherein, third training data source may include: multiple two different LAT result groups by random question and answer class query statement At data to and for data to the correct LAT result of corresponding random question and answer class query statement mark.Decision-tree model Working principle can be, the first LAT result of input and the 2nd LAT result are analyzed, obtain question and answer class query statement Corresponding LAT result.
Optionally, the corresponding LAT result of question and answer class query statement is being determined according to the first LAT result and the 2nd LAT result Later, further include following steps: determining whether the corresponding LAT result of question and answer class query statement meets preset rules;If satisfied, then LAT result is exported, if not satisfied, then deleting LAT result.
Wherein, if LAT result belongs to the type in pre-set blacklist, or belong to preset yellow or reaction Content is then unsatisfactory for preset rules.Specifically, after the corresponding LAT result of question and answer class query statement has been determined, to LAT result It is analyzed, judges the type whether LAT result belongs in pre-set blacklist, or belong to preset yellow or anti- Dynamic content is deleted if belonging to, if being not belonging to, LAT result is exported.The advantage of doing so is that can purify Internet environment.
The technical solution of the present embodiment extracts the characteristic information of the question and answer class query statement of input, then by feature first The sequence labelling model and disaggregated model that information input pre-establishes obtain the first problem answer class of sequence labelling model output Type LAT result and the 2nd LAT of disaggregated model output are as a result, finally determine question and answer according to the first LAT result and the 2nd LAT result The corresponding LAT result of class query statement.The determination method of problem answers type provided in an embodiment of the present invention, utilizes sequence labelling The LAT result that model and disaggregated model export respectively come determine the final LAT of question and answer class query statement as a result, it is possible to increase determine The accuracy of LAT result.
Embodiment two
Fig. 2 is a kind of flow chart of the determination method of problem answers type provided by Embodiment 2 of the present invention, as to upper Being explained further for embodiment is stated, as shown in Fig. 2, this method comprises the following steps.
Step 210, the first training data source is obtained.
Wherein, the first training data source may include: multiple random question and answer class query statements, multiple random question and answer class inquiries The characteristic information of sentence and the LAT result that multiple random question and answer class query statements are marked.
In the present embodiment, the data scale in the first training data source can achieve hundreds of thousands, such as: 800,000.To asking at random Answering the mode that the LAT result of class query statement is labeled may is that artificial mark, or using BIE label to multiple random The LAT result of question and answer class query statement mark carries out labeling processing.
BIE label is Begin In End note.The mode that LAT result carries out labeling processing can be, LAT is tied The word of fruit adds label.Illustratively, it is assumed that question and answer class query statement is " what public institution is nursing speciality can enter oneself for the examination ", Corresponding LAT result is " public institution ", then to " public institution " progress labeling processing as a result, " cause " is added " B " note, " unit " addition " E " note.Specifically, by the random question and answer class query statement mark in each of the first training data source The LAT result of note adds BIE label.
Step 220, it is based on the first training data source, model training is carried out using CRF algorithm, obtains sequence labelling model.
Specifically, model training is constantly carried out using CRF algorithm, in training process behind the first training data source of acquisition In, the parameter in CRF algorithm is constantly adjusted, until model has the ability of the first LAT result of accurate output, to obtain sequence Marking model.
Step 230, the characteristic information of the question and answer class query statement of input is extracted.
Step 240, characteristic information is inputted to the sequence labelling model and disaggregated model pre-established, obtains sequence labelling mould First LAT result of type output and the 2nd LAT result of disaggregated model output.
Optionally, obtain sequence labelling model output the first LAT result mode may is that sequence labelling model is defeated The LAT result of BIE labeling out carries out splicing, is ranked up, selects according to score to the multiple LAT results being spliced into The LAT result of highest scoring is as the first LAT result.
The mode that the LAT result of BIE labeling is spliced can be, by adjacent addition " B " note, " I " label The word of " E " label is stitched together, and forms a LAT result.The acquisition modes of LAT result score can be, by addition " B " Note, " I " label and " E " label word in best result as current LAT result score.For example, adding in some LAT result The word of " B " note is added to be scored at 3 points, the word of addition " I " label is scored at 3.5 points, and the word of addition " E " label is scored at 5 points, then the LAT result is scored at 5 points.
Step 250, the corresponding LAT result of question and answer class query statement is determined according to the first LAT result and the 2nd LAT result.
The technical solution of the present embodiment is based on the first training data source, carries out model instruction using condition random field CRF algorithm Practice, obtains sequence labelling model, the accuracy that sequence labelling model determines the first LAT result can be improved.
Embodiment three
Fig. 3 is a kind of flow chart of the determination method for problem answers type that the embodiment of the present invention three provides, as to upper Being explained further for embodiment is stated, as shown in figure 3, this method comprises the following steps.
Step 310, the second training data source is obtained.
Wherein, the second training data source may include: multiple by random question and answer class query statement and the inquiry of random question and answer class Sentence LAT result composition data to and the random question and answer class query statement of data centering characteristic information.
Optionally, obtaining the second training data source can be implemented by following manner: obtain multiple random question and answer class inquiry languages Sentence;For each random question and answer class query statement: will current random question and answer class query statement list entries marking model, obtain sequence The LAT of the current random question and answer class query statement of column marking model output according to search log as a result, searching and currently asking at random Answer class query statement and correspond to the identical query statement for being clicked search result, will currently random question and answer class query statement with currently with The data of the LAT result composition of machine question and answer class query statement to and the query statement that finds and current question and answer class at random look into The data pair for asking the LAT result composition of sentence, as the second training data source;Wherein, data centering includes corresponding query statement Characteristic information.
Specifically, currently current random question and answer class inquiry will be obtained random question and answer class query statement list entries marking model The LAT of sentence as a result, and, for current random question and answer class query statement, the following conditions are met according to search log lookup Other query statements: other query statements with current random question and answer class query statement is having the same was searched by what user clicked Rope is as a result, if find, by the LAT result group of current random question and answer class query statement and current random question and answer class query statement At data to and the query statement that finds and current random question and answer class query statement LAT result composition data pair, As the second training data source.Illustratively, it is assumed that after random question and answer class query statement A list entries marking model, output LAT result is a1, searches other query statements B and C and random question and answer class query statement A quilt having the same according to search log The search result that user clicked, then by A and a1 composition data to, B and a1 composition data to and C and a1 composition data pair, As the second training data source.
Optionally, the LAT result of above-mentioned data centering is mapped to the classification in the LAT system constructed in advance, after mapping The corresponding classification of LAT result should be present in LAT system.Wherein, LAT system includes first-level class or secondary classification.
Illustratively, some LAT result is " public institution ", not " public institution " this classification in LAT system, In LAT system and " public institution " immediate classification is " mechanism ", then " public institution " is mapped to " mechanism ".
Step 320, it is based on second training data source, model training is carried out using CNN algorithm, obtains the classification mould Type.
Wherein, CNN is a kind of feedforward neural network, including convolutional layer and pond layer.Specifically, in the training number of acquisition second Behind source, model training is constantly carried out using CNN algorithm, in the training process, constantly adjusts the parameter in CNN algorithm, until Model has the ability of the 2nd LAT result of accurate output, to obtain disaggregated model.
Step 330, the characteristic information of the question and answer class query statement of input is extracted.
Step 340, characteristic information is inputted to the sequence labelling model and disaggregated model pre-established, obtains sequence labelling mould First LAT result of type output and the 2nd LAT result of disaggregated model output.
Step 350, the corresponding LAT result of question and answer class query statement is determined according to the first LAT result and the 2nd LAT result.
The technical solution of the present embodiment is based on second training data source, carries out model training using CNN algorithm, obtains To the disaggregated model, the accuracy that disaggregated model determines LAT result can be improved.
Example IV
Fig. 4 is a kind of flow chart of the determination method for problem answers type that the embodiment of the present invention four provides, as to upper Being explained further for embodiment is stated, as shown in figure 4, this method comprises the following steps.
Step 410, third training data source is obtained.
Wherein, third training data source includes: multiple two different LAT result groups by random question and answer class query statement At data to and for data to the correct LAT result of corresponding random question and answer class query statement mark.
Optionally, obtaining third training data source can be implemented by following manner: obtain multiple random question and answer class inquiry languages Sentence;For each random question and answer class query statement: will currently random question and answer class query statement difference list entries marking model and Disaggregated model by sequence labelling model and divides if the LAT result that sequence labelling model and disaggregated model export respectively is inconsistent The LAT result and be directed to the correct LAT of current random question and answer class query statement mark in advance as a result, making that class model exports respectively For third training data source.
Step 420, it is based on third training data source, model training is carried out using GBDT algorithm, obtains decision-tree model.
Wherein, GBDT algorithm is a kind of decision Tree algorithms of iteration, which is made of more decision trees, all decision trees Result add up and be determined as final output.Specifically, specifically, being used after obtaining third training data source GBDT algorithm constantly carries out model training, in the training process, constantly adjusts the parameter in GBDT algorithm, until model has standard The really ability of output LAT result, to obtain decision-tree model.
Step 430, the characteristic information of the question and answer class query statement of input is extracted.
Step 440, characteristic information is inputted to the sequence labelling model and disaggregated model pre-established, obtains sequence labelling mould First LAT result of type output and the 2nd LAT result of disaggregated model output.
Step 450, the first LAT result and the 2nd LAT result are inputted into decision-tree model, obtains decision-tree model output LAT integrated results, using LAT integrated results as the corresponding LAT result of question and answer class query statement.
The technical solution of the present embodiment is based on third training data source, carries out model training using GBDT algorithm, is determined The accuracy that decision tree determines LAT result can be improved in plan tree-model.
Optionally, Fig. 5 a is the process of the determination method for another problem answers type that the embodiment of the present invention four provides Figure, is explained further, as shown in Figure 5 a, this method comprises the following steps as to above-described embodiment.
Step 510, the characteristic information of the question and answer class query statement of input is extracted.
Step 520, characteristic information is inputted into the sequence labelling model that pre-establishes, obtains the of sequence labelling model output One LAT result.
Step 530, characteristic information is inputted to the disaggregated model pre-established, obtains the 2nd LAT knot of disaggregated model output Fruit.
Step 540, the first LAT result and the 2nd LAT result are inputted into decision-tree model, obtains decision-tree model output LAT integrated results, using LAT integrated results as the corresponding LAT result of question and answer class query statement.
Step 550, determine whether the corresponding LAT result of question and answer class query statement meets preset rules, if satisfied, then will LAT result is exported, if not satisfied, then deleting LAT result.
Fig. 5 b is a kind of usage scenario flow chart of the determination method for problem answers type that the embodiment of the present invention four provides. The problem of determining question and answer class query statement when answer type, feature extraction, sequence labelling based on question and answer class query statement and The technologies such as classification can be used sequence labelling technology for a question and answer class query statement and mark out LAT as a result, then will mark LAT result out maps in LAT system.In the case of LAT result is not in question and answer class query statement, using classification skill Art can map to LAT result in two-stage system and one-level system in three-level LAT system.Wherein two-stage system includes 3000 classes, one-level system include 34 classes.As described in Fig. 5 b, which includes the following steps.
Step 501, the characteristic information in the question and answer class query statement of input is extracted using Feature Extraction Technology;
Step 502, the corresponding search result of question and answer class query statement is obtained;
Step 503, the LAT result of question and answer class query statement is obtained using sequence labelling technology and sorting technique;
Step 504, the target search result in search result is determined according to LAT result;
Step 505, target search result is exported and is shown.
Embodiment five
Fig. 6 is a kind of structural schematic diagram of the determining device for problem answers type that the embodiment of the present invention five provides.Such as Fig. 6 Shown, which includes: characteristic information extracting module 610, mode input module 620 and LAT result determining module 630.
Characteristic information extracting module 610, the characteristic information of the question and answer class query statement for extracting input;
Mode input module 620 is obtained for characteristic information to be inputted the sequence labelling model and disaggregated model that pre-establish 2nd LAT result of first problem answer type LAT result and the disaggregated model output exported to sequence labelling model;
LAT result determining module 630, for determining question and answer class query statement according to the first LAT result and the 2nd LAT result Corresponding LAT result.
Optionally, LAT result determining module 630, is also used to:
First LAT result and the 2nd LAT result are inputted into decision-tree model, obtain the LAT integration of decision-tree model output As a result, using LAT integrated results as the corresponding LAT result of question and answer class query statement;
Wherein, LAT integrated results are one in the first LAT result and the 2nd LAT result or the first LAT result and the The intersection of two LAT results or the union of the first LAT result and the 2nd LAT result.
Optionally, further includes:
Meet preset rules determining module, for determining it is default whether the corresponding LAT result of question and answer class query statement meets Rule;If satisfied, then LAT result is exported, if not satisfied, then deleting LAT result.
Optionally, further includes:
First training data source obtains module, for obtaining the first training data source;First training data source includes: multiple Random question and answer class query statement, the characteristic information of multiple random question and answer class query statements and languages are inquired to multiple random question and answer classes The LAT result of sentence mark;
Sequence labelling model obtains module, for being based on the first training data source, is carried out using condition random field CRF algorithm Model training obtains sequence labelling model.
Optionally, further includes:
Labeling processing module, for using BIE label to the LAT results of multiple random question and answer class query statements marks into Row labelization processing;
Correspondingly, obtaining the first LAT result of sequence labelling model output, comprising:
The LAT result for the BIE labeling that sequence labelling model is exported carries out splicing, to the multiple LAT being spliced into As a result it is ranked up according to score, selects the LAT result of highest scoring as the first LAT result.
Optionally, further includes:
Second training data source obtains module, for obtaining the second training data source;Second training data source includes: multiple The data being made of the LAT result of random question and answer class query statement and random question and answer class query statement to and data centering with The characteristic information of machine question and answer class query statement;
Disaggregated model obtains module, for being based on the second training data source, carries out mould using convolutional neural networks CNN algorithm Type training, obtains disaggregated model.
Optionally, the second training data source obtains module, is also used to:
Obtain multiple random question and answer class query statements;
For each random question and answer class query statement: will current random question and answer class query statement list entries marking model, The LAT of the current random question and answer class query statement of sequence labelling model output is obtained as a result, according to search log lookup and currently Random question and answer class query statement correspond to the identical query statement for being clicked search result, will currently random question and answer class query statement and The data of the LAT result composition of current random question and answer class query statement to and the query statement that finds with currently ask at random The data pair for answering the LAT result composition of class query statement, as the second training data source;Wherein, data centering includes that correspondence is looked into Ask the characteristic information of sentence.
Optionally, further includes:
Third training data source obtains module, for obtaining third training data source;Third training data source includes: multiple The data being made of two of random question and answer class query statement different LAT results are to and for data to corresponding random The correct LAT result of question and answer class query statement mark;
Decision-tree model obtains module, for being based on third training data source, promotes decision tree GBDT algorithm using gradient Model training is carried out, decision-tree model is obtained.
Optionally, third training data source obtains module, is also used to:
Obtain multiple random question and answer class query statements;
For each random question and answer class query statement: currently will mark mould by random question and answer class query statement difference list entries Type and disaggregated model, if the LAT result that sequence labelling model and disaggregated model export respectively is inconsistent, by sequence labelling model The LAT result exported respectively with disaggregated model and the correct LAT knot marked in advance for current random question and answer class query statement Fruit, as third training data source.
Optionally, characteristic information includes:
At least one of word cutting information, part-of-speech information, dependence information;Wherein, dependence information includes interdependent Word and interdependent word part of speech.
Embodiment six
Fig. 7 is a kind of structural schematic diagram for computer equipment that the embodiment of the present invention six provides, as shown in fig. 7, this implementation A kind of computer equipment that example provides, comprising: processor 71 and memory 72.Processor in the computer equipment can be one A or multiple, in Fig. 7 by taking a processor 71 as an example, processor 71 and memory 72 in the computer equipment can pass through Bus or other modes connect, in Fig. 7 for being connected by bus.
Problem answers type provided by the above embodiment is integrated in the present embodiment in the processor 71 of computer equipment Determining device.In addition, the memory 72 in the computer equipment is used as a kind of computer readable storage medium, can be used for storing one A or multiple programs, described program can be software program, computer executable program and module, in the embodiment of the present invention Corresponding program instruction/the module of the determination method of problem answers type.Processor 71 is stored in memory 72 by operation Software program, instruction and module, thereby executing the various function application and data processing of equipment, i.e. the realization above method is real Apply the determination method of problem answers type in example.
Memory 72 may include storing program area and storage data area, wherein storing program area can storage program area, extremely Application program needed for a few function;Storage data area, which can be stored, uses created data etc. according to equipment.In addition, depositing Reservoir 72 may include high-speed random access memory, can also include nonvolatile memory, and a for example, at least disk is deposited Memory device, flush memory device or other non-volatile solid state memory parts.In some instances, memory 72 can further comprise The memory remotely located relative to processor 71, these remote memories can pass through network connection to equipment.Above-mentioned network Example include but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
The program that processor 71 is stored in memory 72 by operation, at various function application and data Reason realizes the determination method of example problem answers type provided in an embodiment of the present invention.
Embodiment seven
The embodiment of the present invention seven additionally provides a kind of computer readable storage medium, is stored thereon with computer program, should The determination method of the problem answers type as provided by the embodiment of the present invention is realized when program is executed by processor.
Certainly, a kind of computer readable storage medium provided by the embodiment of the present invention, the computer program stored thereon The method operation being not limited to the described above, can also be performed the determination of problem answers type provided by any embodiment of the invention Relevant operation in method.
The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media Combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or Device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes: tool There are electrical connection, the portable computer diskette, hard disk, random access memory (RAM), read-only memory of one or more conducting wires (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD- ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable storage Medium can be any tangible medium for including or store program, which can be commanded execution system, device or device Using or it is in connection.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, It further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.? Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as mentioned using Internet service It is connected for quotient by internet).
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims (13)

1. a kind of determination method of problem answers type characterized by comprising
Extract the characteristic information of the question and answer class query statement of input;
The characteristic information is inputted to the sequence labelling model and disaggregated model pre-established, it is defeated to obtain the sequence labelling model 2nd LAT result of first problem answer type LAT result and disaggregated model output out;
The corresponding LAT result of the question and answer class query statement is determined according to the first LAT result and the 2nd LAT result.
2. the method according to claim 1, wherein according to the first LAT result and the determination of the 2nd LAT result The corresponding LAT result of question and answer class query statement, comprising:
The first LAT result and the 2nd LAT result are inputted into decision-tree model, obtain the decision-tree model output LAT integrated results, using the LAT integrated results as the corresponding LAT result of the question and answer class query statement;
Wherein, the LAT integrated results are one in the first LAT result and the 2nd LAT result or the first LAT result and the The intersection of two LAT results or the union of the first LAT result and the 2nd LAT result.
3. the method according to claim 1, wherein determining the corresponding LAT knot of the question and answer class query statement After fruit, further includes:
Determine whether the corresponding LAT result of the question and answer class query statement meets preset rules;
If satisfied, then the LAT result is exported, if not satisfied, then deleting the LAT result.
4. the method according to claim 1, wherein in the characteristic information for the question and answer class query statement for extracting input Before, further includes:
Obtain the first training data source;First training data source includes: multiple random question and answer class query statements, the multiple The characteristic information of random question and answer class query statement and the LAT result that the multiple random question and answer class query statement is marked;
Based on first training data source, model training is carried out using condition random field CRF algorithm, obtains the sequence labelling Model.
5. according to the method described in claim 4, it is characterized in that, be based on first training data source, using condition with Airport CRF algorithm carries out before model training, further includes:
Labeling processing is carried out using LAT result of the BIE label to the multiple random question and answer class query statement mark;
Correspondingly, obtaining the first LAT result of the sequence labelling model output, comprising:
The LAT result of the BIE labeling of sequence labelling model output is subjected to splicing, to the multiple LAT being spliced into As a result it is ranked up according to score, selects the LAT result of highest scoring as the first LAT result.
6. the method according to claim 1, wherein in the characteristic information for the question and answer class query statement for extracting input Before, further includes:
Obtain the second training data source;Second training data source includes: multiple by random question and answer class query statement and random Question and answer class query statement LAT result composition data to and the random question and answer class query statement of data centering characteristic information;
Based on second training data source, model training is carried out using convolutional neural networks CNN algorithm, obtains the classification mould Type.
7. according to the method described in claim 6, it is characterized in that, the second training data source of the acquisition includes:
Obtain multiple random question and answer class query statements;
For each random question and answer class query statement: currently it will input the sequence labelling model by random question and answer class query statement, Obtain the LAT of the current random question and answer class query statement of sequence labelling model output as a result, according to search log search with Current random question and answer class query statement corresponds to the identical query statement for being clicked search result, currently random question and answer class will inquire language The data that the LAT result of sentence and the current random question and answer class query statement forms to and the query statement found and institute The data pair for stating the LAT result composition of current random question and answer class query statement, as the second training data source;Wherein, the number It include the characteristic information of corresponding query statement according to centering.
8. according to the method described in claim 2, it is characterized in that, in the characteristic information for extracting the question and answer class query statement inputted Before, further includes:
Obtain third training data source;Third training data source includes: multiple two by random question and answer class query statement The data of different LAT result compositions are to and for data to the correct LAT of corresponding random question and answer class query statement mark As a result;
Based on third training data source, decision tree GBDT algorithm is promoted using gradient and carries out model training, obtains described determine Plan tree-model.
9. according to the method described in claim 8, it is characterized in that, acquisition third training data source includes:
Obtain multiple random question and answer class query statements;
For each random question and answer class query statement: currently will input the sequence labelling mould respectively by random question and answer class query statement Type and the disaggregated model will if the LAT result that the sequence labelling model and the disaggregated model export respectively is inconsistent The LAT result and inquired in advance for current random question and answer class that the sequence labelling model and the disaggregated model export respectively The correct LAT of sentence mark is as a result, as third training data source.
10. -9 any method according to claim 1, which is characterized in that the characteristic information includes:
At least one of word cutting information, part-of-speech information, dependence information;Wherein, dependence information include interdependent word and Interdependent word part of speech.
11. a kind of determining device of problem answers type characterized by comprising
Characteristic information extracting module, the characteristic information of the question and answer class query statement for extracting input;
Mode input module is obtained for the characteristic information to be inputted the sequence labelling model and disaggregated model that pre-establish The first problem answer type LAT result of the sequence labelling model output and the 2nd LAT result of disaggregated model output;
LAT result determining module, for determining the question and answer class query statement pair according to the first LAT result and the 2nd LAT result The LAT result answered.
12. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, which is characterized in that the processor realizes the side as described in any in claim 1-10 when executing described program Method.
13. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor The method as described in any in claim 1-10 is realized when execution.
CN201810695686.9A 2018-06-29 2018-06-29 Determination method, apparatus, equipment and the storage medium of problem answers type Pending CN108959529A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810695686.9A CN108959529A (en) 2018-06-29 2018-06-29 Determination method, apparatus, equipment and the storage medium of problem answers type

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810695686.9A CN108959529A (en) 2018-06-29 2018-06-29 Determination method, apparatus, equipment and the storage medium of problem answers type

Publications (1)

Publication Number Publication Date
CN108959529A true CN108959529A (en) 2018-12-07

Family

ID=64487933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810695686.9A Pending CN108959529A (en) 2018-06-29 2018-06-29 Determination method, apparatus, equipment and the storage medium of problem answers type

Country Status (1)

Country Link
CN (1) CN108959529A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800293A (en) * 2018-12-20 2019-05-24 出门问问信息科技有限公司 A kind of method, apparatus and electronic equipment obtaining answer based on Question Classification
CN111382247A (en) * 2018-12-29 2020-07-07 深圳市优必选科技有限公司 Content pushing optimization method, content pushing optimization device and electronic equipment
CN111858899A (en) * 2020-07-31 2020-10-30 中国工商银行股份有限公司 Statement processing method, device, system and medium
CN112015876A (en) * 2020-08-27 2020-12-01 北京智通云联科技有限公司 Time analysis method and device, electronic equipment and storage medium
CN113779205A (en) * 2020-09-03 2021-12-10 北京沃东天骏信息技术有限公司 Intelligent response method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103309926A (en) * 2013-03-12 2013-09-18 中国科学院声学研究所 Chinese and English-named entity identification method and system based on conditional random field (CRF)
CN103914543A (en) * 2014-04-03 2014-07-09 北京百度网讯科技有限公司 Search result displaying method and device
CN104471568A (en) * 2012-07-02 2015-03-25 微软公司 Learning-based processing of natural language questions
CN105468713A (en) * 2015-11-19 2016-04-06 西安交通大学 Multi-model fused short text classification method
CN106649786A (en) * 2016-12-28 2017-05-10 北京百度网讯科技有限公司 Deep question answer-based answer retrieval method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104471568A (en) * 2012-07-02 2015-03-25 微软公司 Learning-based processing of natural language questions
CN103309926A (en) * 2013-03-12 2013-09-18 中国科学院声学研究所 Chinese and English-named entity identification method and system based on conditional random field (CRF)
CN103914543A (en) * 2014-04-03 2014-07-09 北京百度网讯科技有限公司 Search result displaying method and device
CN105468713A (en) * 2015-11-19 2016-04-06 西安交通大学 Multi-model fused short text classification method
CN106649786A (en) * 2016-12-28 2017-05-10 北京百度网讯科技有限公司 Deep question answer-based answer retrieval method and device

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800293A (en) * 2018-12-20 2019-05-24 出门问问信息科技有限公司 A kind of method, apparatus and electronic equipment obtaining answer based on Question Classification
CN111382247A (en) * 2018-12-29 2020-07-07 深圳市优必选科技有限公司 Content pushing optimization method, content pushing optimization device and electronic equipment
CN111382247B (en) * 2018-12-29 2023-07-14 深圳市优必选科技有限公司 Content pushing optimization method, content pushing optimization device and electronic equipment
CN111858899A (en) * 2020-07-31 2020-10-30 中国工商银行股份有限公司 Statement processing method, device, system and medium
CN111858899B (en) * 2020-07-31 2023-09-15 中国工商银行股份有限公司 Statement processing method, device, system and medium
CN112015876A (en) * 2020-08-27 2020-12-01 北京智通云联科技有限公司 Time analysis method and device, electronic equipment and storage medium
CN113779205A (en) * 2020-09-03 2021-12-10 北京沃东天骏信息技术有限公司 Intelligent response method and device
CN113779205B (en) * 2020-09-03 2024-05-24 北京沃东天骏信息技术有限公司 Intelligent response method and device

Similar Documents

Publication Publication Date Title
CN108959529A (en) Determination method, apparatus, equipment and the storage medium of problem answers type
CN111026842B (en) Natural language processing method, natural language processing device and intelligent question-answering system
CN108446286B (en) Method, device and server for generating natural language question answers
KR102288249B1 (en) Information processing method, terminal, and computer storage medium
CN109284363A (en) A kind of answering method, device, electronic equipment and storage medium
CN110866093A (en) Machine question-answering method and device
CN110162611A (en) A kind of intelligent customer service answer method and system
CN108153876A (en) Intelligent answer method and system
CN109299245B (en) Method and device for recalling knowledge points
CN110162780B (en) User intention recognition method and device
CN109460459B (en) Log learning-based dialogue system automatic optimization method
CN105843875A (en) Smart robot-oriented question and answer data processing method and apparatus
US11580299B2 (en) Corpus cleaning method and corpus entry system
CN109325040B (en) FAQ question-answer library generalization method, device and equipment
CN109857846B (en) Method and device for matching user question and knowledge point
CN106934068A (en) The method that robot is based on the semantic understanding of environmental context
CN112256845A (en) Intention recognition method, device, electronic equipment and computer readable storage medium
CN109783624A (en) Answer generation method, device and the intelligent conversational system in knowledge based library
CN110781204A (en) Identification information determination method, device, equipment and storage medium of target object
CN109739969A (en) Answer generation method and intelligent conversational system
CN108595609A (en) Generation method, system, medium and equipment are replied by robot based on personage IP
CN109492081A (en) Text information search and information interacting method, device, equipment and storage medium
CN113722457A (en) Intention recognition method and device, storage medium, and electronic device
CN112115242A (en) Intelligent customer service question-answering system based on naive Bayes classification algorithm
CA3153056A1 (en) Intelligently questioning and answering method, device, computer, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination