CN108959529A - Determination method, apparatus, equipment and the storage medium of problem answers type - Google Patents
Determination method, apparatus, equipment and the storage medium of problem answers type Download PDFInfo
- Publication number
- CN108959529A CN108959529A CN201810695686.9A CN201810695686A CN108959529A CN 108959529 A CN108959529 A CN 108959529A CN 201810695686 A CN201810695686 A CN 201810695686A CN 108959529 A CN108959529 A CN 108959529A
- Authority
- CN
- China
- Prior art keywords
- lat
- result
- query statement
- question
- answer class
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses determination method, apparatus, equipment and the storage mediums of a kind of problem answers type.This method comprises: extracting the characteristic information of the question and answer class query statement of input;The characteristic information is inputted to the sequence labelling model and disaggregated model pre-established, obtains the first problem answer type LAT result of the sequence labelling model output and the 2nd LAT result of disaggregated model output;The corresponding LAT result of the question and answer class query statement is determined according to the first LAT result and the 2nd LAT result.The determination method of problem answers type provided in an embodiment of the present invention, the LAT result exported respectively using sequence labelling model and disaggregated model come determine the final LAT of question and answer class query statement as a result, it is possible to increase determine LAT result accuracy.
Description
Technical field
The present embodiments relate to technical field of information processing more particularly to a kind of determination method of problem answers type,
Device, equipment and storage medium.
Background technique
In internet when carrying out information search using search engine, inquiry (query) sentence of user's input can divide
For question and answer class query sentence and non-question and answer class query sentence.For question and answer class query statement, problem answers type
(Lexical Answer Type, LAT) is much entity (entity) type.Entity is present in objective world and can be mutual
The things mutually distinguished, entity can be people and be also possible to object material object, can also be abstract concept.For example, one possible
Query sentence are as follows: how old change deciduous teeth, the corresponding LAT result of the sentence are as follows: age;Another possible query sentence are as follows:
Pregnant woman can eat any fruit, the corresponding LAT result of the sentence are as follows: fruit.
In search technique, it can be answered according to the LAT of question and answer class query sentence as a result, carrying out entity from search result
Case positioning, i.e., filtering out from search result includes entity corresponding with LAT result as a result, showing user in turn.
Currently, the method for determining the LAT result of question and answer class query sentence is realized based on disaggregated model, the classification mould
The algorithm that type uses includes the machines such as support vector machines (Support Vector Machine, SVM), maximum entropy, logistic regression
Learning algorithm and convolutional neural networks (Convolutional Neural Network, CNN) even depth learning algorithm.
The LAT of question and answer class query sentence is determined based on disaggregated model as a result, its result granularity depends on the LAT of prebuild
Classification system be easy to cause the problem that granularity is excessively thick, accuracy is lower.
Summary of the invention
It, can be with the embodiment of the invention provides determination method, apparatus, equipment and the storage medium of a kind of problem answers type
Improve the accuracy of the problem of determining question and answer class query statement answer type.
In a first aspect, the embodiment of the invention provides a kind of determination methods of problem answers type, this method comprises:
Extract the characteristic information of the question and answer class query statement of input;
The characteristic information is inputted to the sequence labelling model and disaggregated model pre-established, obtains the sequence labelling mould
The first problem answer type LAT result of type output and the 2nd LAT result of disaggregated model output;
The corresponding LAT result of the question and answer class query statement is determined according to the first LAT result and the 2nd LAT result.
Second aspect, the embodiment of the invention also provides a kind of determining device of problem answers type, which includes:
Characteristic information extracting module, the characteristic information of the question and answer class query statement for extracting input;
Mode input module, for the characteristic information to be inputted the sequence labelling model and disaggregated model that pre-establish,
Obtain the first problem answer type LAT result of the sequence labelling model output and the 2nd LAT of disaggregated model output
As a result;
LAT result determining module, for determining that the question and answer class inquires language according to the first LAT result and the 2nd LAT result
The corresponding LAT result of sentence.
The third aspect the embodiment of the invention also provides a kind of computer equipment, including memory, processor and is stored in
On memory and the computer program that can run on a processor, the processor are realized when executing described program as the present invention is real
Apply method described in example.
Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer
Program, the program realize method as described in the embodiments of the present invention when being executed by processor.
The embodiment of the present invention extracts the characteristic information of the question and answer class query statement of input first, then that characteristic information is defeated
Enter the sequence labelling model and disaggregated model pre-established, obtains the first problem answer type LAT of sequence labelling model output
As a result the 2nd LAT with disaggregated model output is as a result, finally determine that question and answer class is looked into according to the first LAT result and the 2nd LAT result
Ask the corresponding LAT result of sentence.The determination method of problem answers type provided in an embodiment of the present invention, utilizes sequence labelling model
The LAT result exported respectively with disaggregated model come determine the final LAT of question and answer class query statement as a result, it is possible to increase determine LAT
As a result accuracy.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the determination method for problem answers type that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow chart of the determination method of problem answers type provided by Embodiment 2 of the present invention;
Fig. 3 is a kind of flow chart of the determination method for problem answers type that the embodiment of the present invention three provides;
Fig. 4 is a kind of flow chart of the determination method for problem answers type that the embodiment of the present invention four provides;
Fig. 5 a is the flow chart of the determination method for another problem answers type that the embodiment of the present invention four provides;
Fig. 5 b is a kind of usage scenario flow chart of the determination method for problem answers type that the embodiment of the present invention four provides;
Fig. 6 is a kind of structural schematic diagram of the determining device for problem answers type that the embodiment of the present invention five provides;
Fig. 7 is a kind of structural schematic diagram for computer equipment that the embodiment of the present invention six provides.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just
Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
Embodiment one
Fig. 1 is a kind of flow chart of the determination method for problem answers type that the embodiment of the present invention one provides, the present embodiment
It is applicable to the case where being determined to problem answers type, this method can be executed by the determining device of answer type, should
Device can be made of hardware and/or software, and can be generally integrated in computer, server and all be determined function comprising answer type
In the terminal of energy.As shown in Figure 1, this method specifically comprises the following steps.
Step 110, the characteristic information of the question and answer class query statement of input is extracted.
Wherein, characteristic information may include at least one of word cutting information, part-of-speech information and dependence information.Word cutting
Information, which can be, is cut into individual word one by one for question and answer class query statement using word cutting algorithm;Part-of-speech information may include name
Word, verb, notional word and function word etc.;Dependence may include interdependent word and interdependent word part of speech.The mode for extracting dependence can
To be, the dependence of word in question and answer class query statement is extracted using the syntactic analysis based on dependency grammar, such as Subject, Predicate and Object, number
Magnitude relation, apposition, front and back additional relationships and analogy relationship etc..
In the present embodiment, after user's input inquiry quasi-sentence into search box, use word cutting algorithm by question and answer class first
Query statement is cut into individual word one by one, obtains word cutting information, then analyzes the part of speech of each word, obtain
The part-of-speech information for obtaining each word finally uses the syntactic analysis based on dependency grammar to extract the dependence between word, obtains
Obtain dependence information.
Step 120, characteristic information is inputted to the sequence labelling model and disaggregated model pre-established, obtains sequence labelling mould
First LAT result of type output and the 2nd LAT result of disaggregated model output.
Wherein, problem answers type (Lexical Answer Type, LAT) result, which can be, indicates question and answer class inquiry language
The word of the answer type of sentence.Such as: question and answer class query statement are as follows: what fruit pregnant woman can eat, then LAT result are as follows: fruit,
" fruit " is the word that can indicate answer type.Indicate that the word of answer type is likely to be present in question and answer class query statement
In, it is also possible to it is not present in question and answer class query statement.
Sequence labelling model can be condition random field (Conditional Random Fields, CRF) model.Sequence
Marking model can be based on the first training data source, model obtained from being trained using CRF algorithm.First training data
Source may include: multiple random question and answer class query statements, the characteristic information of the multiple random question and answer class query statement and right
The LAT result of the multiple random question and answer class query statement mark.In the present embodiment, the working principle of sequence labelling model can be with
It is that the characteristic information of input is labeled and is classified, obtains the first LAT of question and answer class query statement as a result, and exporting.
Disaggregated model can be based on the second training data source, using convolutional neural networks (Convolutional
Neural Network, CNN) model that is trained of algorithm.Second training data source may include: multiple by asking at random
Answer data that the LAT result of class query statement and random question and answer class query statement forms to and the random question and answer class of data centering
The characteristic information of query statement.In the present embodiment, the working principle of disaggregated model be can be, and be divided the characteristic information of input
Class obtains the 2nd LAT of question and answer class query statement as a result, and exporting.
Specifically, characteristic information is separately input into and is pre-established after the characteristic information for obtaining question and answer class query statement
Sequence labelling model and disaggregated model, after two models respectively analyze characteristic information, sequence labelling model output the
One LAT is as a result, disaggregated model exports the 2nd LAT result.
Step 130, the corresponding LAT result of question and answer class query statement is determined according to the first LAT result and the 2nd LAT result.
Specifically, determining the corresponding LAT result of question and answer class query statement according to the first LAT result and the 2nd LAT result
Mode can be, if the first LAT result is sky, using the 2nd LAT result as the corresponding LAT result of question and answer class query statement;
If the 2nd LAT result is sky, using the first LAT result as the corresponding LAT result of question and answer class query statement;If the first LAT is tied
Fruit and the 2nd LAT result are not sky, then the first LAT result and the 2nd LAT result are inputted decision-tree model, obtain decision tree
The LAT integrated results of model output, using LAT integrated results as the corresponding LAT result of question and answer class query statement;Alternatively, no matter
Whether the first LAT result and the 2nd LAT result are sky, and the first LAT result and the 2nd LAT result are all inputted decision-tree model,
The LAT integrated results of decision-tree model output are obtained, using LAT integrated results as the corresponding LAT result of question and answer class query statement.
Wherein, integrated results can be one in the first LAT result and the 2nd LAT result or the first LAT result and
The intersection of two LAT results or the union of the first LAT result and the 2nd LAT result.
Decision-tree model is a kind of classifier.In the present embodiment, decision-tree model be can be based on third training data source,
It promotes decision tree (Gradient Boosting Decision Tree, GBDT) algorithm using gradient to be trained, the mould of acquisition
Type.Wherein, third training data source may include: multiple two different LAT result groups by random question and answer class query statement
At data to and for data to the correct LAT result of corresponding random question and answer class query statement mark.Decision-tree model
Working principle can be, the first LAT result of input and the 2nd LAT result are analyzed, obtain question and answer class query statement
Corresponding LAT result.
Optionally, the corresponding LAT result of question and answer class query statement is being determined according to the first LAT result and the 2nd LAT result
Later, further include following steps: determining whether the corresponding LAT result of question and answer class query statement meets preset rules;If satisfied, then
LAT result is exported, if not satisfied, then deleting LAT result.
Wherein, if LAT result belongs to the type in pre-set blacklist, or belong to preset yellow or reaction
Content is then unsatisfactory for preset rules.Specifically, after the corresponding LAT result of question and answer class query statement has been determined, to LAT result
It is analyzed, judges the type whether LAT result belongs in pre-set blacklist, or belong to preset yellow or anti-
Dynamic content is deleted if belonging to, if being not belonging to, LAT result is exported.The advantage of doing so is that can purify Internet environment.
The technical solution of the present embodiment extracts the characteristic information of the question and answer class query statement of input, then by feature first
The sequence labelling model and disaggregated model that information input pre-establishes obtain the first problem answer class of sequence labelling model output
Type LAT result and the 2nd LAT of disaggregated model output are as a result, finally determine question and answer according to the first LAT result and the 2nd LAT result
The corresponding LAT result of class query statement.The determination method of problem answers type provided in an embodiment of the present invention, utilizes sequence labelling
The LAT result that model and disaggregated model export respectively come determine the final LAT of question and answer class query statement as a result, it is possible to increase determine
The accuracy of LAT result.
Embodiment two
Fig. 2 is a kind of flow chart of the determination method of problem answers type provided by Embodiment 2 of the present invention, as to upper
Being explained further for embodiment is stated, as shown in Fig. 2, this method comprises the following steps.
Step 210, the first training data source is obtained.
Wherein, the first training data source may include: multiple random question and answer class query statements, multiple random question and answer class inquiries
The characteristic information of sentence and the LAT result that multiple random question and answer class query statements are marked.
In the present embodiment, the data scale in the first training data source can achieve hundreds of thousands, such as: 800,000.To asking at random
Answering the mode that the LAT result of class query statement is labeled may is that artificial mark, or using BIE label to multiple random
The LAT result of question and answer class query statement mark carries out labeling processing.
BIE label is Begin In End note.The mode that LAT result carries out labeling processing can be, LAT is tied
The word of fruit adds label.Illustratively, it is assumed that question and answer class query statement is " what public institution is nursing speciality can enter oneself for the examination ",
Corresponding LAT result is " public institution ", then to " public institution " progress labeling processing as a result, " cause " is added
" B " note, " unit " addition " E " note.Specifically, by the random question and answer class query statement mark in each of the first training data source
The LAT result of note adds BIE label.
Step 220, it is based on the first training data source, model training is carried out using CRF algorithm, obtains sequence labelling model.
Specifically, model training is constantly carried out using CRF algorithm, in training process behind the first training data source of acquisition
In, the parameter in CRF algorithm is constantly adjusted, until model has the ability of the first LAT result of accurate output, to obtain sequence
Marking model.
Step 230, the characteristic information of the question and answer class query statement of input is extracted.
Step 240, characteristic information is inputted to the sequence labelling model and disaggregated model pre-established, obtains sequence labelling mould
First LAT result of type output and the 2nd LAT result of disaggregated model output.
Optionally, obtain sequence labelling model output the first LAT result mode may is that sequence labelling model is defeated
The LAT result of BIE labeling out carries out splicing, is ranked up, selects according to score to the multiple LAT results being spliced into
The LAT result of highest scoring is as the first LAT result.
The mode that the LAT result of BIE labeling is spliced can be, by adjacent addition " B " note, " I " label
The word of " E " label is stitched together, and forms a LAT result.The acquisition modes of LAT result score can be, by addition " B "
Note, " I " label and " E " label word in best result as current LAT result score.For example, adding in some LAT result
The word of " B " note is added to be scored at 3 points, the word of addition " I " label is scored at 3.5 points, and the word of addition " E " label is scored at
5 points, then the LAT result is scored at 5 points.
Step 250, the corresponding LAT result of question and answer class query statement is determined according to the first LAT result and the 2nd LAT result.
The technical solution of the present embodiment is based on the first training data source, carries out model instruction using condition random field CRF algorithm
Practice, obtains sequence labelling model, the accuracy that sequence labelling model determines the first LAT result can be improved.
Embodiment three
Fig. 3 is a kind of flow chart of the determination method for problem answers type that the embodiment of the present invention three provides, as to upper
Being explained further for embodiment is stated, as shown in figure 3, this method comprises the following steps.
Step 310, the second training data source is obtained.
Wherein, the second training data source may include: multiple by random question and answer class query statement and the inquiry of random question and answer class
Sentence LAT result composition data to and the random question and answer class query statement of data centering characteristic information.
Optionally, obtaining the second training data source can be implemented by following manner: obtain multiple random question and answer class inquiry languages
Sentence;For each random question and answer class query statement: will current random question and answer class query statement list entries marking model, obtain sequence
The LAT of the current random question and answer class query statement of column marking model output according to search log as a result, searching and currently asking at random
Answer class query statement and correspond to the identical query statement for being clicked search result, will currently random question and answer class query statement with currently with
The data of the LAT result composition of machine question and answer class query statement to and the query statement that finds and current question and answer class at random look into
The data pair for asking the LAT result composition of sentence, as the second training data source;Wherein, data centering includes corresponding query statement
Characteristic information.
Specifically, currently current random question and answer class inquiry will be obtained random question and answer class query statement list entries marking model
The LAT of sentence as a result, and, for current random question and answer class query statement, the following conditions are met according to search log lookup
Other query statements: other query statements with current random question and answer class query statement is having the same was searched by what user clicked
Rope is as a result, if find, by the LAT result group of current random question and answer class query statement and current random question and answer class query statement
At data to and the query statement that finds and current random question and answer class query statement LAT result composition data pair,
As the second training data source.Illustratively, it is assumed that after random question and answer class query statement A list entries marking model, output
LAT result is a1, searches other query statements B and C and random question and answer class query statement A quilt having the same according to search log
The search result that user clicked, then by A and a1 composition data to, B and a1 composition data to and C and a1 composition data pair,
As the second training data source.
Optionally, the LAT result of above-mentioned data centering is mapped to the classification in the LAT system constructed in advance, after mapping
The corresponding classification of LAT result should be present in LAT system.Wherein, LAT system includes first-level class or secondary classification.
Illustratively, some LAT result is " public institution ", not " public institution " this classification in LAT system,
In LAT system and " public institution " immediate classification is " mechanism ", then " public institution " is mapped to " mechanism ".
Step 320, it is based on second training data source, model training is carried out using CNN algorithm, obtains the classification mould
Type.
Wherein, CNN is a kind of feedforward neural network, including convolutional layer and pond layer.Specifically, in the training number of acquisition second
Behind source, model training is constantly carried out using CNN algorithm, in the training process, constantly adjusts the parameter in CNN algorithm, until
Model has the ability of the 2nd LAT result of accurate output, to obtain disaggregated model.
Step 330, the characteristic information of the question and answer class query statement of input is extracted.
Step 340, characteristic information is inputted to the sequence labelling model and disaggregated model pre-established, obtains sequence labelling mould
First LAT result of type output and the 2nd LAT result of disaggregated model output.
Step 350, the corresponding LAT result of question and answer class query statement is determined according to the first LAT result and the 2nd LAT result.
The technical solution of the present embodiment is based on second training data source, carries out model training using CNN algorithm, obtains
To the disaggregated model, the accuracy that disaggregated model determines LAT result can be improved.
Example IV
Fig. 4 is a kind of flow chart of the determination method for problem answers type that the embodiment of the present invention four provides, as to upper
Being explained further for embodiment is stated, as shown in figure 4, this method comprises the following steps.
Step 410, third training data source is obtained.
Wherein, third training data source includes: multiple two different LAT result groups by random question and answer class query statement
At data to and for data to the correct LAT result of corresponding random question and answer class query statement mark.
Optionally, obtaining third training data source can be implemented by following manner: obtain multiple random question and answer class inquiry languages
Sentence;For each random question and answer class query statement: will currently random question and answer class query statement difference list entries marking model and
Disaggregated model by sequence labelling model and divides if the LAT result that sequence labelling model and disaggregated model export respectively is inconsistent
The LAT result and be directed to the correct LAT of current random question and answer class query statement mark in advance as a result, making that class model exports respectively
For third training data source.
Step 420, it is based on third training data source, model training is carried out using GBDT algorithm, obtains decision-tree model.
Wherein, GBDT algorithm is a kind of decision Tree algorithms of iteration, which is made of more decision trees, all decision trees
Result add up and be determined as final output.Specifically, specifically, being used after obtaining third training data source
GBDT algorithm constantly carries out model training, in the training process, constantly adjusts the parameter in GBDT algorithm, until model has standard
The really ability of output LAT result, to obtain decision-tree model.
Step 430, the characteristic information of the question and answer class query statement of input is extracted.
Step 440, characteristic information is inputted to the sequence labelling model and disaggregated model pre-established, obtains sequence labelling mould
First LAT result of type output and the 2nd LAT result of disaggregated model output.
Step 450, the first LAT result and the 2nd LAT result are inputted into decision-tree model, obtains decision-tree model output
LAT integrated results, using LAT integrated results as the corresponding LAT result of question and answer class query statement.
The technical solution of the present embodiment is based on third training data source, carries out model training using GBDT algorithm, is determined
The accuracy that decision tree determines LAT result can be improved in plan tree-model.
Optionally, Fig. 5 a is the process of the determination method for another problem answers type that the embodiment of the present invention four provides
Figure, is explained further, as shown in Figure 5 a, this method comprises the following steps as to above-described embodiment.
Step 510, the characteristic information of the question and answer class query statement of input is extracted.
Step 520, characteristic information is inputted into the sequence labelling model that pre-establishes, obtains the of sequence labelling model output
One LAT result.
Step 530, characteristic information is inputted to the disaggregated model pre-established, obtains the 2nd LAT knot of disaggregated model output
Fruit.
Step 540, the first LAT result and the 2nd LAT result are inputted into decision-tree model, obtains decision-tree model output
LAT integrated results, using LAT integrated results as the corresponding LAT result of question and answer class query statement.
Step 550, determine whether the corresponding LAT result of question and answer class query statement meets preset rules, if satisfied, then will
LAT result is exported, if not satisfied, then deleting LAT result.
Fig. 5 b is a kind of usage scenario flow chart of the determination method for problem answers type that the embodiment of the present invention four provides.
The problem of determining question and answer class query statement when answer type, feature extraction, sequence labelling based on question and answer class query statement and
The technologies such as classification can be used sequence labelling technology for a question and answer class query statement and mark out LAT as a result, then will mark
LAT result out maps in LAT system.In the case of LAT result is not in question and answer class query statement, using classification skill
Art can map to LAT result in two-stage system and one-level system in three-level LAT system.Wherein two-stage system includes
3000 classes, one-level system include 34 classes.As described in Fig. 5 b, which includes the following steps.
Step 501, the characteristic information in the question and answer class query statement of input is extracted using Feature Extraction Technology;
Step 502, the corresponding search result of question and answer class query statement is obtained;
Step 503, the LAT result of question and answer class query statement is obtained using sequence labelling technology and sorting technique;
Step 504, the target search result in search result is determined according to LAT result;
Step 505, target search result is exported and is shown.
Embodiment five
Fig. 6 is a kind of structural schematic diagram of the determining device for problem answers type that the embodiment of the present invention five provides.Such as Fig. 6
Shown, which includes: characteristic information extracting module 610, mode input module 620 and LAT result determining module 630.
Characteristic information extracting module 610, the characteristic information of the question and answer class query statement for extracting input;
Mode input module 620 is obtained for characteristic information to be inputted the sequence labelling model and disaggregated model that pre-establish
2nd LAT result of first problem answer type LAT result and the disaggregated model output exported to sequence labelling model;
LAT result determining module 630, for determining question and answer class query statement according to the first LAT result and the 2nd LAT result
Corresponding LAT result.
Optionally, LAT result determining module 630, is also used to:
First LAT result and the 2nd LAT result are inputted into decision-tree model, obtain the LAT integration of decision-tree model output
As a result, using LAT integrated results as the corresponding LAT result of question and answer class query statement;
Wherein, LAT integrated results are one in the first LAT result and the 2nd LAT result or the first LAT result and the
The intersection of two LAT results or the union of the first LAT result and the 2nd LAT result.
Optionally, further includes:
Meet preset rules determining module, for determining it is default whether the corresponding LAT result of question and answer class query statement meets
Rule;If satisfied, then LAT result is exported, if not satisfied, then deleting LAT result.
Optionally, further includes:
First training data source obtains module, for obtaining the first training data source;First training data source includes: multiple
Random question and answer class query statement, the characteristic information of multiple random question and answer class query statements and languages are inquired to multiple random question and answer classes
The LAT result of sentence mark;
Sequence labelling model obtains module, for being based on the first training data source, is carried out using condition random field CRF algorithm
Model training obtains sequence labelling model.
Optionally, further includes:
Labeling processing module, for using BIE label to the LAT results of multiple random question and answer class query statements marks into
Row labelization processing;
Correspondingly, obtaining the first LAT result of sequence labelling model output, comprising:
The LAT result for the BIE labeling that sequence labelling model is exported carries out splicing, to the multiple LAT being spliced into
As a result it is ranked up according to score, selects the LAT result of highest scoring as the first LAT result.
Optionally, further includes:
Second training data source obtains module, for obtaining the second training data source;Second training data source includes: multiple
The data being made of the LAT result of random question and answer class query statement and random question and answer class query statement to and data centering with
The characteristic information of machine question and answer class query statement;
Disaggregated model obtains module, for being based on the second training data source, carries out mould using convolutional neural networks CNN algorithm
Type training, obtains disaggregated model.
Optionally, the second training data source obtains module, is also used to:
Obtain multiple random question and answer class query statements;
For each random question and answer class query statement: will current random question and answer class query statement list entries marking model,
The LAT of the current random question and answer class query statement of sequence labelling model output is obtained as a result, according to search log lookup and currently
Random question and answer class query statement correspond to the identical query statement for being clicked search result, will currently random question and answer class query statement and
The data of the LAT result composition of current random question and answer class query statement to and the query statement that finds with currently ask at random
The data pair for answering the LAT result composition of class query statement, as the second training data source;Wherein, data centering includes that correspondence is looked into
Ask the characteristic information of sentence.
Optionally, further includes:
Third training data source obtains module, for obtaining third training data source;Third training data source includes: multiple
The data being made of two of random question and answer class query statement different LAT results are to and for data to corresponding random
The correct LAT result of question and answer class query statement mark;
Decision-tree model obtains module, for being based on third training data source, promotes decision tree GBDT algorithm using gradient
Model training is carried out, decision-tree model is obtained.
Optionally, third training data source obtains module, is also used to:
Obtain multiple random question and answer class query statements;
For each random question and answer class query statement: currently will mark mould by random question and answer class query statement difference list entries
Type and disaggregated model, if the LAT result that sequence labelling model and disaggregated model export respectively is inconsistent, by sequence labelling model
The LAT result exported respectively with disaggregated model and the correct LAT knot marked in advance for current random question and answer class query statement
Fruit, as third training data source.
Optionally, characteristic information includes:
At least one of word cutting information, part-of-speech information, dependence information;Wherein, dependence information includes interdependent
Word and interdependent word part of speech.
Embodiment six
Fig. 7 is a kind of structural schematic diagram for computer equipment that the embodiment of the present invention six provides, as shown in fig. 7, this implementation
A kind of computer equipment that example provides, comprising: processor 71 and memory 72.Processor in the computer equipment can be one
A or multiple, in Fig. 7 by taking a processor 71 as an example, processor 71 and memory 72 in the computer equipment can pass through
Bus or other modes connect, in Fig. 7 for being connected by bus.
Problem answers type provided by the above embodiment is integrated in the present embodiment in the processor 71 of computer equipment
Determining device.In addition, the memory 72 in the computer equipment is used as a kind of computer readable storage medium, can be used for storing one
A or multiple programs, described program can be software program, computer executable program and module, in the embodiment of the present invention
Corresponding program instruction/the module of the determination method of problem answers type.Processor 71 is stored in memory 72 by operation
Software program, instruction and module, thereby executing the various function application and data processing of equipment, i.e. the realization above method is real
Apply the determination method of problem answers type in example.
Memory 72 may include storing program area and storage data area, wherein storing program area can storage program area, extremely
Application program needed for a few function;Storage data area, which can be stored, uses created data etc. according to equipment.In addition, depositing
Reservoir 72 may include high-speed random access memory, can also include nonvolatile memory, and a for example, at least disk is deposited
Memory device, flush memory device or other non-volatile solid state memory parts.In some instances, memory 72 can further comprise
The memory remotely located relative to processor 71, these remote memories can pass through network connection to equipment.Above-mentioned network
Example include but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
The program that processor 71 is stored in memory 72 by operation, at various function application and data
Reason realizes the determination method of example problem answers type provided in an embodiment of the present invention.
Embodiment seven
The embodiment of the present invention seven additionally provides a kind of computer readable storage medium, is stored thereon with computer program, should
The determination method of the problem answers type as provided by the embodiment of the present invention is realized when program is executed by processor.
Certainly, a kind of computer readable storage medium provided by the embodiment of the present invention, the computer program stored thereon
The method operation being not limited to the described above, can also be performed the determination of problem answers type provided by any embodiment of the invention
Relevant operation in method.
The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media
Combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable
Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or
Device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes: tool
There are electrical connection, the portable computer diskette, hard disk, random access memory (RAM), read-only memory of one or more conducting wires
(ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-
ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable storage
Medium can be any tangible medium for including or store program, which can be commanded execution system, device or device
Using or it is in connection.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for
By the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited
In wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof
Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++,
It further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with
It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion
Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.?
Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or
Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as mentioned using Internet service
It is connected for quotient by internet).
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that
The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation,
It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention
It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also
It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.
Claims (13)
1. a kind of determination method of problem answers type characterized by comprising
Extract the characteristic information of the question and answer class query statement of input;
The characteristic information is inputted to the sequence labelling model and disaggregated model pre-established, it is defeated to obtain the sequence labelling model
2nd LAT result of first problem answer type LAT result and disaggregated model output out;
The corresponding LAT result of the question and answer class query statement is determined according to the first LAT result and the 2nd LAT result.
2. the method according to claim 1, wherein according to the first LAT result and the determination of the 2nd LAT result
The corresponding LAT result of question and answer class query statement, comprising:
The first LAT result and the 2nd LAT result are inputted into decision-tree model, obtain the decision-tree model output
LAT integrated results, using the LAT integrated results as the corresponding LAT result of the question and answer class query statement;
Wherein, the LAT integrated results are one in the first LAT result and the 2nd LAT result or the first LAT result and the
The intersection of two LAT results or the union of the first LAT result and the 2nd LAT result.
3. the method according to claim 1, wherein determining the corresponding LAT knot of the question and answer class query statement
After fruit, further includes:
Determine whether the corresponding LAT result of the question and answer class query statement meets preset rules;
If satisfied, then the LAT result is exported, if not satisfied, then deleting the LAT result.
4. the method according to claim 1, wherein in the characteristic information for the question and answer class query statement for extracting input
Before, further includes:
Obtain the first training data source;First training data source includes: multiple random question and answer class query statements, the multiple
The characteristic information of random question and answer class query statement and the LAT result that the multiple random question and answer class query statement is marked;
Based on first training data source, model training is carried out using condition random field CRF algorithm, obtains the sequence labelling
Model.
5. according to the method described in claim 4, it is characterized in that, be based on first training data source, using condition with
Airport CRF algorithm carries out before model training, further includes:
Labeling processing is carried out using LAT result of the BIE label to the multiple random question and answer class query statement mark;
Correspondingly, obtaining the first LAT result of the sequence labelling model output, comprising:
The LAT result of the BIE labeling of sequence labelling model output is subjected to splicing, to the multiple LAT being spliced into
As a result it is ranked up according to score, selects the LAT result of highest scoring as the first LAT result.
6. the method according to claim 1, wherein in the characteristic information for the question and answer class query statement for extracting input
Before, further includes:
Obtain the second training data source;Second training data source includes: multiple by random question and answer class query statement and random
Question and answer class query statement LAT result composition data to and the random question and answer class query statement of data centering characteristic information;
Based on second training data source, model training is carried out using convolutional neural networks CNN algorithm, obtains the classification mould
Type.
7. according to the method described in claim 6, it is characterized in that, the second training data source of the acquisition includes:
Obtain multiple random question and answer class query statements;
For each random question and answer class query statement: currently it will input the sequence labelling model by random question and answer class query statement,
Obtain the LAT of the current random question and answer class query statement of sequence labelling model output as a result, according to search log search with
Current random question and answer class query statement corresponds to the identical query statement for being clicked search result, currently random question and answer class will inquire language
The data that the LAT result of sentence and the current random question and answer class query statement forms to and the query statement found and institute
The data pair for stating the LAT result composition of current random question and answer class query statement, as the second training data source;Wherein, the number
It include the characteristic information of corresponding query statement according to centering.
8. according to the method described in claim 2, it is characterized in that, in the characteristic information for extracting the question and answer class query statement inputted
Before, further includes:
Obtain third training data source;Third training data source includes: multiple two by random question and answer class query statement
The data of different LAT result compositions are to and for data to the correct LAT of corresponding random question and answer class query statement mark
As a result;
Based on third training data source, decision tree GBDT algorithm is promoted using gradient and carries out model training, obtains described determine
Plan tree-model.
9. according to the method described in claim 8, it is characterized in that, acquisition third training data source includes:
Obtain multiple random question and answer class query statements;
For each random question and answer class query statement: currently will input the sequence labelling mould respectively by random question and answer class query statement
Type and the disaggregated model will if the LAT result that the sequence labelling model and the disaggregated model export respectively is inconsistent
The LAT result and inquired in advance for current random question and answer class that the sequence labelling model and the disaggregated model export respectively
The correct LAT of sentence mark is as a result, as third training data source.
10. -9 any method according to claim 1, which is characterized in that the characteristic information includes:
At least one of word cutting information, part-of-speech information, dependence information;Wherein, dependence information include interdependent word and
Interdependent word part of speech.
11. a kind of determining device of problem answers type characterized by comprising
Characteristic information extracting module, the characteristic information of the question and answer class query statement for extracting input;
Mode input module is obtained for the characteristic information to be inputted the sequence labelling model and disaggregated model that pre-establish
The first problem answer type LAT result of the sequence labelling model output and the 2nd LAT result of disaggregated model output;
LAT result determining module, for determining the question and answer class query statement pair according to the first LAT result and the 2nd LAT result
The LAT result answered.
12. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor
Calculation machine program, which is characterized in that the processor realizes the side as described in any in claim 1-10 when executing described program
Method.
13. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor
The method as described in any in claim 1-10 is realized when execution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810695686.9A CN108959529A (en) | 2018-06-29 | 2018-06-29 | Determination method, apparatus, equipment and the storage medium of problem answers type |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810695686.9A CN108959529A (en) | 2018-06-29 | 2018-06-29 | Determination method, apparatus, equipment and the storage medium of problem answers type |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108959529A true CN108959529A (en) | 2018-12-07 |
Family
ID=64487933
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810695686.9A Pending CN108959529A (en) | 2018-06-29 | 2018-06-29 | Determination method, apparatus, equipment and the storage medium of problem answers type |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108959529A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109800293A (en) * | 2018-12-20 | 2019-05-24 | 出门问问信息科技有限公司 | A kind of method, apparatus and electronic equipment obtaining answer based on Question Classification |
CN111382247A (en) * | 2018-12-29 | 2020-07-07 | 深圳市优必选科技有限公司 | Content pushing optimization method, content pushing optimization device and electronic equipment |
CN111858899A (en) * | 2020-07-31 | 2020-10-30 | 中国工商银行股份有限公司 | Statement processing method, device, system and medium |
CN112015876A (en) * | 2020-08-27 | 2020-12-01 | 北京智通云联科技有限公司 | Time analysis method and device, electronic equipment and storage medium |
CN113779205A (en) * | 2020-09-03 | 2021-12-10 | 北京沃东天骏信息技术有限公司 | Intelligent response method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103309926A (en) * | 2013-03-12 | 2013-09-18 | 中国科学院声学研究所 | Chinese and English-named entity identification method and system based on conditional random field (CRF) |
CN103914543A (en) * | 2014-04-03 | 2014-07-09 | 北京百度网讯科技有限公司 | Search result displaying method and device |
CN104471568A (en) * | 2012-07-02 | 2015-03-25 | 微软公司 | Learning-based processing of natural language questions |
CN105468713A (en) * | 2015-11-19 | 2016-04-06 | 西安交通大学 | Multi-model fused short text classification method |
CN106649786A (en) * | 2016-12-28 | 2017-05-10 | 北京百度网讯科技有限公司 | Deep question answer-based answer retrieval method and device |
-
2018
- 2018-06-29 CN CN201810695686.9A patent/CN108959529A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104471568A (en) * | 2012-07-02 | 2015-03-25 | 微软公司 | Learning-based processing of natural language questions |
CN103309926A (en) * | 2013-03-12 | 2013-09-18 | 中国科学院声学研究所 | Chinese and English-named entity identification method and system based on conditional random field (CRF) |
CN103914543A (en) * | 2014-04-03 | 2014-07-09 | 北京百度网讯科技有限公司 | Search result displaying method and device |
CN105468713A (en) * | 2015-11-19 | 2016-04-06 | 西安交通大学 | Multi-model fused short text classification method |
CN106649786A (en) * | 2016-12-28 | 2017-05-10 | 北京百度网讯科技有限公司 | Deep question answer-based answer retrieval method and device |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109800293A (en) * | 2018-12-20 | 2019-05-24 | 出门问问信息科技有限公司 | A kind of method, apparatus and electronic equipment obtaining answer based on Question Classification |
CN111382247A (en) * | 2018-12-29 | 2020-07-07 | 深圳市优必选科技有限公司 | Content pushing optimization method, content pushing optimization device and electronic equipment |
CN111382247B (en) * | 2018-12-29 | 2023-07-14 | 深圳市优必选科技有限公司 | Content pushing optimization method, content pushing optimization device and electronic equipment |
CN111858899A (en) * | 2020-07-31 | 2020-10-30 | 中国工商银行股份有限公司 | Statement processing method, device, system and medium |
CN111858899B (en) * | 2020-07-31 | 2023-09-15 | 中国工商银行股份有限公司 | Statement processing method, device, system and medium |
CN112015876A (en) * | 2020-08-27 | 2020-12-01 | 北京智通云联科技有限公司 | Time analysis method and device, electronic equipment and storage medium |
CN113779205A (en) * | 2020-09-03 | 2021-12-10 | 北京沃东天骏信息技术有限公司 | Intelligent response method and device |
CN113779205B (en) * | 2020-09-03 | 2024-05-24 | 北京沃东天骏信息技术有限公司 | Intelligent response method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108959529A (en) | Determination method, apparatus, equipment and the storage medium of problem answers type | |
CN111026842B (en) | Natural language processing method, natural language processing device and intelligent question-answering system | |
CN108446286B (en) | Method, device and server for generating natural language question answers | |
KR102288249B1 (en) | Information processing method, terminal, and computer storage medium | |
CN109284363A (en) | A kind of answering method, device, electronic equipment and storage medium | |
CN110866093A (en) | Machine question-answering method and device | |
CN110162611A (en) | A kind of intelligent customer service answer method and system | |
CN108153876A (en) | Intelligent answer method and system | |
CN109299245B (en) | Method and device for recalling knowledge points | |
CN110162780B (en) | User intention recognition method and device | |
CN109460459B (en) | Log learning-based dialogue system automatic optimization method | |
CN105843875A (en) | Smart robot-oriented question and answer data processing method and apparatus | |
US11580299B2 (en) | Corpus cleaning method and corpus entry system | |
CN109325040B (en) | FAQ question-answer library generalization method, device and equipment | |
CN109857846B (en) | Method and device for matching user question and knowledge point | |
CN106934068A (en) | The method that robot is based on the semantic understanding of environmental context | |
CN112256845A (en) | Intention recognition method, device, electronic equipment and computer readable storage medium | |
CN109783624A (en) | Answer generation method, device and the intelligent conversational system in knowledge based library | |
CN110781204A (en) | Identification information determination method, device, equipment and storage medium of target object | |
CN109739969A (en) | Answer generation method and intelligent conversational system | |
CN108595609A (en) | Generation method, system, medium and equipment are replied by robot based on personage IP | |
CN109492081A (en) | Text information search and information interacting method, device, equipment and storage medium | |
CN113722457A (en) | Intention recognition method and device, storage medium, and electronic device | |
CN112115242A (en) | Intelligent customer service question-answering system based on naive Bayes classification algorithm | |
CA3153056A1 (en) | Intelligently questioning and answering method, device, computer, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |