CN110019736A - Question and answer matching process, system, equipment and storage medium based on language model - Google Patents

Question and answer matching process, system, equipment and storage medium based on language model Download PDF

Info

Publication number
CN110019736A
CN110019736A CN201711482842.5A CN201711482842A CN110019736A CN 110019736 A CN110019736 A CN 110019736A CN 201711482842 A CN201711482842 A CN 201711482842A CN 110019736 A CN110019736 A CN 110019736A
Authority
CN
China
Prior art keywords
answer
question
language model
model
language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711482842.5A
Other languages
Chinese (zh)
Other versions
CN110019736B (en
Inventor
王颖帅
李晓霞
苗诗雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201711482842.5A priority Critical patent/CN110019736B/en
Publication of CN110019736A publication Critical patent/CN110019736A/en
Application granted granted Critical
Publication of CN110019736B publication Critical patent/CN110019736B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems

Abstract

The invention discloses a kind of question and answer matching process, system, equipment and storage medium based on language model, wherein method comprising steps of S1, after receiving asked questions, from acquisition and the matched target problem of the asked questions in database is putd question to, each answer data corresponding with the target problem in answer database is then obtained;S2, the answer data is handled using language model, generates corresponding text feature and behavioural characteristic, the behavioural characteristic is used to characterize the state and attribute of the answer data;S3, the text feature and the behavioural characteristic are calculated using decision-tree model, and predicts the ranking results of the answer data according to calculated result.The present invention can quickly and accurately orient user demand and sort to answer data intelligence, to filter out most likely wanted to see answer for user, promote user experience by language model and decision-tree model.

Description

Question and answer matching process, system, equipment and storage medium based on language model
Technical field
The present invention relates to field of computer technology, in particular to a kind of question and answer matching process based on language model, system, Equipment and storage medium.
Background technique
In information age today, computer has gradually been popularized in the world as media of information.With people The development of work intelligence, can allow computer understanding language, and can make valuable sequence to the dialogue of user.The prior art In, question answering system generally uses following two method to carry out the matching of problem of implementation and answer:
(1) problem and the rule-based matching of answer
This method mainly includes that string matching is searched, and regular expression etc. is advised by writing complicated regular expression Then template simulates the corresponding keyword of each context and carries out matching association, if puing question to sentence that corresponding matching rule is not present, It will obtain not reliable sequence;
(2) the corpus system based on retrieval calculates term vector
This method is by forming term vector after segmenting to sentence, then according to sequencing of similarity answer, simultaneously due to algorithm The internal logical relationship of language is utilized without really study, it is possible to the answer person having replicates problem in answer one time, But very high similarity can be obtained based on similarity calculation, it is clear that it cannot be guaranteed that user experience.
Summary of the invention
The technical problem to be solved by the present invention is in order to overcome in the prior art rule-based answering method need complexity Regular expression and answer may it is not reliable, and the corpus system based on retrieval by calculate question-answer sentence between term vector phase Like degree, there is no the internal logical relationship for utilizing language, defect that answer may be not reliable is provided a kind of based on language model Question and answer matching process, system, equipment and storage medium.
The present invention is to solve above-mentioned technical problem by following technical proposals:
The present invention provides a kind of question and answer matching process based on language model, its main feature is that, comprising:
S1, after receiving asked questions, from put question to database in obtain with the matched target problem of the asked questions, Then each answer data corresponding with the target problem in answer database is obtained;
S2, the answer data is handled using language model, generates corresponding text feature and behavioural characteristic, institute Behavioural characteristic is stated for characterizing the state and attribute of the answer data, wherein the behavioural characteristic can be answer state, return The client type of user of question and answer topic, problem whether by anonymous answer, answer type, create the answer of problem till now Time, answer is modified to the now time, answer thumb up number, user receives the number of problem, user receives issue message time Number, user click put question to message number, user answer number, user by the features such as like time and user's best answers number it At least one of.
S3, the text feature and the behavioural characteristic are calculated using decision-tree model, and institute is predicted according to calculated result State the ranking results of answer data.
Preferably, the decision-tree model includes that (Gradient Boosting Decision Tree, gradient mention GBDT Rise decision tree) model.
Preferably, the language model includes N-Gram model (N member statistical model), neural network language model and circulation Neural network, the step S2 are specifically included:
The answer data is generated by corresponding answer term vector using the N-Gram model;
Using the neural network language model training answer term vector;
Use the result of the Recognition with Recurrent Neural Network training neural network language model output to obtain the text Feature and the behavioural characteristic.
Preferably, the N-Gram model includes that Skip-Gram model is (given to input lexical item to predict one kind of context Language Processing model).
Preferably, after step S3, the question and answer matching process further include: assess the AUC (Area of the ranking results Under Curve is the face under Roc (receiver operating characteristic, recipient's operating characteristics) curve Product) index and/or the assessment ranking results exposure clicking rate.
Preferably, after step S3, the question and answer matching process further include: if forward by sorting in the ranking results A dry answer data pushes to user.
Preferably, the step of several answer datas for sorting forward in the ranking results are pushed to user before, The question and answer matching process further include: the ranking results are subjected to ABTest test (A/B test).
Preferably, after the step of several answer datas for sorting forward in the ranking results are pushed to user, The question and answer matching process further include:
It receives user and chooses instruction;
Best answers are chosen from the ranking results according to selection instruction;
Label is marked to the best answers.
Preferably, step S1 is specifically included:
After receiving the asked questions, the asked questions are handled using language model, are mentioned described in generation The question sentence term vector asked questions;
Calculate the similarity of question sentence term vector term vector corresponding with problem each in enquirement database;
Problem corresponding with the maximum term vector of similarity of the question sentence term vector in database will be putd question to as target Problem;
Each answer data corresponding with the target problem in answer database is returned.
The present invention also provides a kind of question and answer matching system based on language model, its main feature is that, comprising: obtain module, language Say module and decision tree module;
The acquisition module is used for after receiving asked questions, is obtained and the asked questions from puing question in database The target problem matched is also used to obtain each answer data corresponding with the target problem in answer database;
The language module is used to handle the answer data using language model, generates corresponding text feature And behavioural characteristic, the behavioural characteristic are used to characterize the state and attribute of the answer data;
The decision tree module is used for using the decision-tree model calculating text feature and the behavioural characteristic, and according to Calculated result predicts the ranking results of the answer data.
Preferably, the decision-tree model includes GBDT model.
Preferably, the language model includes N-Gram model, neural network language model and Recognition with Recurrent Neural Network;
The N-Gram model is used to the answer data generating corresponding answer term vector;
The neural network language model is for training the answer term vector;
The Recognition with Recurrent Neural Network is used to train the result of the neural network language model output to obtain the text Feature and the behavioural characteristic.
Preferably, the N-Gram model includes Skip-Gram model.
Preferably, the question and answer matching system based on language model further includes evaluation module, the evaluation module is used for After the decision-tree model predicts the ranking results, AUC index and/or the assessment row of the ranking results are assessed The exposure clicking rate of sequence result.
Preferably, the question and answer matching system based on language model further includes pushing module, the pushing module is used for After the decision-tree model predicts the ranking results, several answer datas that will sort forward in the ranking results Push to user.
Preferably, the question and answer matching system based on language model further includes ABTest module, the ABTest module For the ranking results to be carried out ABTest test, the pushing module is then called.
Preferably, the question and answer matching system based on language model further includes label model, the push mould is being called After block, the label model chooses instruction for receiving user, is chosen most from the ranking results according to selection instruction Good answer marks label to the best answers.
Preferably, the acquisition module specifically includes: term vector generation unit, similarity calculated and data return single Member;
The term vector generation unit is used for after receiving the asked questions, is asked using language model described Topic is handled, and the question sentence term vector of the asked questions is generated;
The similarity calculated is corresponding with each problem in database is putd question to for calculating the question sentence term vector The similarity of term vector;
The data return unit for will put question in database with the maximum word of similarity of the question sentence term vector to Corresponding problem is measured as target problem, is also used to each answer data corresponding with the target problem in answer database It returns.
The present invention also provides a kind of electronic equipment, including memory, processor and storage on a memory and can handled The computer program run on device, its main feature is that, the processor is realized above-mentioned based on language model when executing described program Question and answer matching process.
The present invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, its main feature is that, it is described The step of above-mentioned question and answer matching process based on language model is realized when program is executed by processor.
The positive effect of the present invention is that: the present invention proposes a kind of question and answer matching process based on language model, is System, equipment and storage medium extract feature by language model, and carry out prediction sequence according to feature using decision-tree model, User demand can be quickly and accurately oriented, intelligent sequencing is carried out to answer data, most likely wanted to see is filtered out for user and answers Case promotes user experience.
Detailed description of the invention
Fig. 1 is the flow chart of the question and answer matching process based on language model of the embodiment of the present invention 1.
Fig. 2 is the flow chart of the step S101 of the question and answer matching process based on language model of the embodiment of the present invention 1.
Fig. 3 is the flow chart of the step S102 of the question and answer matching process based on language model of the embodiment of the present invention 1.
Fig. 4 is the flow diagram of the language model of the question and answer matching process based on language model of the embodiment of the present invention 1.
Fig. 5 is the flow chart of the question and answer matching process based on language model of the embodiment of the present invention 2.
Fig. 6 is the flow chart of the question and answer matching process based on language model of the embodiment of the present invention 3.
Fig. 7 is the composition schematic diagram of the question and answer matching system based on language model of the embodiment of the present invention 4.
Fig. 8 is the composition schematic diagram of the question and answer matching system based on language model of the embodiment of the present invention 5.
Fig. 9 is the composition schematic diagram of the question and answer matching system based on language model of the embodiment of the present invention 6.
Figure 10 is the hardware structural diagram of the electronic equipment of the embodiment of the present invention 7.
Specific embodiment
The present invention is further illustrated below by the mode of embodiment, but does not therefore limit the present invention to the reality It applies among a range.
Embodiment 1
As shown in Figure 1, the question and answer matching process based on language model that the present embodiment is related to, comprising:
Step S101, it after receiving asked questions, is obtained and the matched mesh of the asked questions from puing question in database Then mark problem obtains each answer data corresponding with the target problem in answer database.
In view of the development of information technology and artificial intelligence, user demand, such as a use can be met by database Family wants to buy some commodity in electric business platform, and want to understand this commodity some cases, at this moment user can carry out with regard to this commodity Question, the user for then once buying the commodity just will receive enquirement, and can be answered, and last answer can return to enquirement User, at this time user can make a policy to the shopping of oneself according to the description of answerer.With enquirement data and answer At this moment these can be putd question to data, answer data correspondingly to establish out and put question to database and answer number by the increase of case data According to libraries, thus asked questions are formed and put question to data-base content list, answer data is formed into answer database contents list, It and in puing question to data-base content list include the relevant some field contents of the commodity, for example user account, goods number, ask Newest turnaround time, problem answers number, the whether anonymous answer of problem etc. are inscribed, and includes needle in answer database contents list The commodity are made with some field contents answered, for example whether user's answer goods number, user's answer state, user are anonymous Answer, user's answer type, user answer whether be arranged to that best, user answers thumb up number etc..As database is continuous Accumulation, user can be putd question to relevant answer return to quizmaster.
When it is implemented, as shown in Fig. 2, step S101 is specific can include:
Step S101-1, after receiving the asked questions, the asked questions are handled using language model, Generate the question sentence term vector of the asked questions.
Here it is possible to it is preferred that the N-Gram model as Skip-Gram model as language model generate term vector, because It is based on Hidden Markov it is assumed that so as to remain the internal logical relationship of language for N-gram model, i.e., it is false Be located in one section of text, n-th of word occur probability only it is related with the limited n-1 word in front, be based on as it is assumed that Asked questions can be generated to the vector for characterizing the asked questions content, so that the natural language that user is used proposes The problem of and the content of text such as answer of answer be converted to computable term vector.
Step S101-2, the similar of question sentence term vector term vector corresponding with problem each in enquirement database is calculated Degree;
Step S101-3, corresponding with the maximum term vector of similarity of the question sentence term vector in database ask will be putd question to Topic is used as target problem;
Step S101-4, each answer data corresponding with the target problem in answer database is returned.
Step S102, the answer data is handled using language model, generates corresponding text feature and behavior Feature, the behavioural characteristic is used to characterize the state and attribute of the answer data, wherein the behavioural characteristic may include back Answer state, whether the client type of the user that answers a question, problem are answered by anonymous answer, the type of answer, creation problem Case time till now, answer be modified to the now time, number, user receives the number of problem, user receives for thumbing up of answering Issue message number, user, which click, puts question to message number, user to answer number, user by like time and user's best answers time At least one of the features such as number.In the implementation of concrete application scene, the behavioural characteristic can according to application scenarios come Selection setting is carried out, the weighted value of each behavioural characteristic can also be determined according to the significance level in practical application.
When it is implemented, needing to obtain being associated with tightness degree between these answer datas and target problem using language Speech module such as generates text feature to the progress feature extraction of these answer datas, extracts the behavior spy for including in answer data Sign, then also trains these features, to obtain the feature that can preferably characterize these answer datas.At this moment, such as Fig. 3 institute Show, step S102 is specific can include:
Step S102-1, the answer data is generated by corresponding answer term vector using the N-Gram model, here The preferable Skip-Gram model of N-Gram model;
Step S102-2, using the neural network language model training answer term vector;
Step S102-3, use the result of the Recognition with Recurrent Neural Network training neural network language model output to obtain Obtain the text feature and the behavioural characteristic.
Here, the process flow block diagram of language model is as shown in figure 4, bottom one layer of word is one after splitting text Each and every one word, upper one layer of neural network of word are the term vectors calculated by co-occurrence matrix, and upper one layer of term vector is with one Activation primitive such as Sigmoid function (threshold function table that Sigmoid function is commonly used for neural network) does nonlinear transformation, most Output valve is normalized into generally by a kind of Softmax (Softmax is function, is usually used in as regression model) probabilistic forecasting afterwards Rate, such model can be optimized as general neural network using gradient descent algorithm, and be obtained most by training Excellent parametric solution.
In specific implementation, the major parameter of language model includes:
(1) valid_size: if the near synonym of one word of calculating, calculative number are in expression test set, Value is 16;
(2) batch_size: neural network in batches gradient training when, it is every a batch in need trained number of samples, value It is 128;
(3) embedding_size: the dimension of vector, value 128 in the term vector after word conversion;
(4) skip_window: the context word number around each word, value 1;
(5) num_skips: the number that prediction context word is used, value 2;
(6) learning rate of neural network is set as 0.05.
S103, the text feature and the behavioural characteristic are calculated using decision-tree model, and is predicted according to calculated result The ranking results of the answer data.
When it is implemented, the preferred GBDT model of decision-tree model, model major parameter include:
(1) algorithm sequence logic is classification Classification;
(2) learning rate of optimizer is LearningRate=0.05;
(3) the iteration number set is NumIterations=50;
(4) depth capacity set is MaxDepth=6;
(5) the branch mailbox number of continuous feature is: MaxBins=32.
In this way, carrying out prediction sequence to answer data jointly by GBDT Construction of A Model more trees.
In the present embodiment, text feature and behavioural characteristic are combined by language model, then pass through GBDT model prediction Ranking results out make ranking results more meet user's enquirement, so as to allow user that can see oneself most likely wanted to see answer, improve User experience.
Embodiment 2
As shown in figure 5, on the basis of embodiment 1, the question and answer matching process based on language model that the present embodiment is related to Further include:
Step S104, the AUC index of the ranking results and/or the exposure clicking rate of the assessment ranking results are assessed.
On the one hand, by carrying out algorithm evaluation to the ranking results, that is, AUC index evaluation is carried out, can be known described The effect of question and answer matching process based on language model, in order to be adjusted to the question and answer matching process based on language model It participates in training white silk.Evaluated, AUC can reach 0.836, show the question and answer matching process based on language model achieve it is very good Good effect.
Another party can also can know user to the ranking results by carrying out business assessment to the ranking results Exposure clicking rate improve user in order to improve the ranking results of the question and answer matching process output based on language model Experience.
Embodiment 3
As shown in fig. 6, on the basis of embodiment 1, the question and answer matching process based on language model that the present embodiment is related to Further include:
Step S105, the ranking results are subjected to ABTest test;
Step S106, several answer datas for sorting forward in the ranking results are pushed into user;
Step S107-1, it receives user and chooses instruction;
Step S107-2, best answers are chosen from the ranking results according to selection instruction;
Step S107-3, label is marked to the best answers.
By doing ABTest test and collecting user using feedback, convenient for improving the question and answer based on language model Method of completing the square can provide the ranking results of more reference value for user, to improve the click conversion ratio of answer data, to mention High user experience.
Embodiment 4
As described in Figure 7, the question and answer matching system based on language model that the present embodiment is related to includes obtaining module 1, language Module 2 and decision tree module 3 obtain module 1 and are used for after receiving asked questions, mention from puing question to obtain in database with described It asks questions matched target problem, is also used to obtain each answer data corresponding with the target problem in answer database, Language module 2 is used to handle the answer data using language model, generates corresponding text feature and behavioural characteristic, The behavioural characteristic is used to characterize the state and attribute of the answer data, and decision tree module 3 is based on using decision-tree model The text feature and the behavioural characteristic are calculated, and predicts the ranking results of the answer data according to calculated result.
When it is implemented, obtaining module 1 includes that term vector generation unit 11, similarity calculated 12 and data return to list Member 13, wherein term vector generation unit 11 is used for after receiving the asked questions, is asked using language model described Topic is handled, and generates the question sentence term vector of the asked questions, similarity calculated 12 for calculate the question sentence word to The similarity of term vector corresponding with problem each in enquirement database is measured, data return unit 13 will be for that will put question in database The problem corresponding with the maximum term vector of similarity of the question sentence term vector is also used to as target problem by answer database In each answer data corresponding with the target problem return.
When it is implemented, the decision-tree model in decision tree module 3 includes GBDT model.
When it is implemented, the language model in language module 2 includes N-Gram model, neural network language model and circulation Neural network, wherein the N-Gram model is used to the answer data generating corresponding answer term vector, here preferably Skip-Gram model;The neural network language model is for training the answer term vector;The Recognition with Recurrent Neural Network is used for The result of the training neural network language model output is to obtain the text feature and the behavioural characteristic.
Embodiment 5
As shown in figure 8, the question and answer based on language model that the present embodiment is related to match on the basis of embodiment 4 System further includes evaluation module 4, and evaluation module 4 is used for after the decision-tree model predicts the ranking results, assesses institute State the AUC index of ranking results and/or the exposure clicking rate of the assessment ranking results.On the one hand, by being tied to the sequence Fruit carries out algorithm evaluation, i.e. progress AUC index evaluation, can know the effect of the question and answer matching process based on language model Fruit, in order to carry out adjusting white silk of participating in training to the question and answer matching process based on language model.Another party, can also be by the row Sequence result carries out business assessment, can know user to the exposure clicking rate of the ranking results, in order to improve described be based on The ranking results of the question and answer matching process output of language model, improve user experience.
Embodiment 6
As shown in figure 9, the question and answer based on language model that the present embodiment is related to match on the basis of embodiment 4 System further includes ABTest module 5, pushing module 6 and label model 7, and wherein ABTest module 5 is used for the ranking results ABTest test is carried out, pushing module 6 is used to several answer datas for sorting forward in the ranking results pushing to use Family, label model 7 choose instruction for receiving user, and best return is chosen from the ranking results according to selection instruction It answers, label is marked to the best answers.In this way by doing ABTest test, recommending answer data to user and collecting user To the label feedback for recommending answer data, system can add these labels as the important behavioural characteristic of answer data To utilize, so that such answer data has more reference value, consequently facilitating improving the question and answer based on language model Match system provides a user the answer data of more reference value, and to improve the click conversion ratio of answer data, improving user makes With experience.
Embodiment 7
The electronic equipment that the present embodiment is related to, including memory, processor and storage are on a memory and can be in processor The computer program of upper operation, the processor realize embodiment 1, embodiment 2 or embodiment 3 when executing the computer program The question and answer matching process based on language model.
Figure 10 is the structural schematic diagram for the electronic equipment that the present embodiment is related to.Figure 10, which is shown, to be suitable for being used to realizing the present invention The block diagram of the illustrative electronic equipment 50 of embodiment.The electronic equipment 50 that Figure 10 is shown is only an example, should not be right The function and use scope of the embodiment of the present invention bring any restrictions.
As shown in Figure 10, electronic equipment 50 can be showed in the form of universal computing device, such as it can be server Equipment.The component of electronic equipment 50 can include but is not limited to: at least one above-mentioned processor 51, above-mentioned at least one processor 52, the bus 53 of different system components (including memory 52 and processor 51) is connected.
Bus 53 includes data/address bus, address bus and control bus.
Memory 52 may include volatile memory, such as random access memory (RAM) 521 and/or cache Memory 522 can further include read-only memory (ROM) 523.
Memory 52 can also include the program means 525 with one group of (at least one) program module 524, such journey Sequence module 524 includes but is not limited to: operating system, one or more application program, other program modules and program data, It may include the realization of network environment in each of these examples or certain combination.
Processor 51 by the computer program that is stored in memory 52 of operation, thereby executing various function application and Data processing, such as the question and answer matching process based on language model provided by the embodiment of the present invention 1.
Electronic equipment 50 can also be communicated with one or more external equipments 54 (such as keyboard, sensing equipment etc.).It is this Communication can be carried out by input/output (I/O) interface 55.Also, electronic equipment 50 can also by network adapter 56 with One or more network (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) communication, net Network adapter 56 is communicated by bus 53 with other modules of electronic equipment 50.It should be understood that although not shown in the drawings, can tie It closes electronic equipment 50 and uses other hardware and/or software module, including but not limited to: microcode, device driver, redundancy processing Device, external disk drive array, RAID (disk array) system, tape drive and data backup storage system etc..
It should be noted that although being referred to several units/modules or subelement/mould of electronic equipment in the above detailed description Block, but it is this division be only exemplary it is not enforceable.In fact, being retouched above according to presently filed embodiment The feature and function for two or more units/modules stated can embody in a units/modules.Conversely, above description A units/modules feature and function can with further division be embodied by multiple units/modules.
Embodiment 8
The present embodiment is related to a kind of computer readable storage medium, is stored thereon with computer program, the computer journey Embodiment 1, embodiment 2 or the question and answer matching process described in embodiment 3 based on language model are realized when sequence is executed by processor The step of.
Wherein, what readable storage medium storing program for executing can use more specifically can include but is not limited to: portable disc, hard disk, random Access memory, read-only memory, erasable programmable read only memory, light storage device, magnetic memory device or above-mentioned times The suitable combination of meaning.
In possible embodiment, the present invention is also implemented as a kind of form of program product comprising program generation Code, when described program product is run on the terminal device, said program code is realized in fact for executing the terminal device Apply the step in example 1, embodiment 2 or the question and answer matching process described in embodiment 3 based on language model.
Wherein it is possible to be write with any combination of one or more programming languages for executing program of the invention Code, said program code can be executed fully on a user device, partly execute on a user device, is only as one Vertical software package executes, part executes on a remote device or executes on a remote device completely on a user device for part.
Although specific embodiments of the present invention have been described above, it will be appreciated by those of skill in the art that this is only For example, protection scope of the present invention is to be defined by the appended claims.Those skilled in the art without departing substantially from Under the premise of the principle and substance of the present invention, many changes and modifications may be made, but these change and Modification each falls within protection scope of the present invention.

Claims (20)

1. a kind of question and answer matching process based on language model characterized by comprising
S1, after receiving asked questions, from put question to database in obtain with the matched target problem of the asked questions, then Obtain each answer data corresponding with the target problem in answer database;
S2, the answer data is handled using language model, generates corresponding text feature and behavioural characteristic, the row It is characterized the state and attribute for characterizing the answer data;
S3, the text feature and the behavioural characteristic are calculated using decision-tree model, and according to calculated result predict described in answer The ranking results of case data.
2. the question and answer matching process based on language model as described in claim 1, which is characterized in that the decision-tree model packet Include GBDT model.
3. the question and answer matching process based on language model as described in claim 1, which is characterized in that the language model includes N-Gram model, neural network language model and Recognition with Recurrent Neural Network, step S2 are specifically included:
The answer data is generated by corresponding answer term vector using the N-Gram model;
Using the neural network language model training answer term vector;
Use the result of the Recognition with Recurrent Neural Network training neural network language model output to obtain the text feature And the behavioural characteristic.
4. the question and answer matching process based on language model as claimed in claim 3, which is characterized in that the N-Gram model packet Include Skip-Gram model.
5. the question and answer matching process based on language model as described in claim 1, which is characterized in that described after step S3 Question and answer matching process further include: the exposure of the AUC index and/or the assessment ranking results of assessing the ranking results is clicked Rate.
6. the question and answer matching process based on language model as described in claim 1, which is characterized in that described after step S3 Question and answer matching process further include: several answer datas for sorting forward in the ranking results are pushed into user.
7. the question and answer matching process based on language model as claimed in claim 6, which is characterized in that by the ranking results Before the step of several forward answer datas of middle sequence push to user, the question and answer matching process further include: by the row Sequence result carries out ABTest test.
8. the question and answer matching process based on language model as claimed in claim 6, which is characterized in that by the ranking results After the step of several forward answer datas of middle sequence push to user, the question and answer matching process further include:
It receives user and chooses instruction;
Best answers are chosen from the ranking results according to selection instruction;
Label is marked to the best answers.
9. the question and answer matching process based on language model as described in claim 1, which is characterized in that step S1 is specifically included:
After receiving the asked questions, the asked questions are handled using language model, generate described ask The question sentence term vector of topic;
Calculate the similarity of question sentence term vector term vector corresponding with problem each in enquirement database;
Problem corresponding with the maximum term vector of similarity of the question sentence term vector in database will be putd question to as target problem;
Each answer data corresponding with the target problem in answer database is returned.
10. a kind of question and answer matching system based on language model characterized by comprising obtain module, language module and decision Set module;
The acquisition module is used for after receiving asked questions, is obtained from enquirement database matched with the asked questions Target problem is also used to obtain each answer data corresponding with the target problem in answer database;
The language module is used to handle the answer data using language model, generates corresponding text feature and row It is characterized, the behavioural characteristic is used to characterize the state and attribute of the answer data;
The decision tree module is used to calculate the text feature and the behavioural characteristic using decision-tree model, and according to calculating The ranking results of answer data described in prediction of result.
11. the question and answer matching system based on language model as claimed in claim 10, which is characterized in that the decision-tree model Including GBDT model.
12. the question and answer matching system based on language model as claimed in claim 10, which is characterized in that the language model packet Include N-Gram model, neural network language model and Recognition with Recurrent Neural Network;
The N-Gram model is used to the answer data generating corresponding answer term vector;
The neural network language model is for training the answer term vector;
The Recognition with Recurrent Neural Network is used to train the result of the neural network language model output to obtain the text feature And the behavioural characteristic.
13. the question and answer matching system based on language model as claimed in claim 12, which is characterized in that the N-Gram model Including Skip-Gram model.
14. the question and answer matching system based on language model as claimed in claim 10, which is characterized in that described to be based on language mould The question and answer matching system of type further includes evaluation module, and the evaluation module is used to predict the sequence in the decision-tree model As a result after, the AUC index of the ranking results and/or the exposure clicking rate of the assessment ranking results are assessed.
15. the question and answer matching system based on language model as claimed in claim 10, which is characterized in that described to be based on language mould The question and answer matching system of type further includes pushing module, and the pushing module is used to predict the sequence in the decision-tree model As a result after, several answer datas for sorting forward in the ranking results are pushed into user.
16. the question and answer matching system based on language model as claimed in claim 15, which is characterized in that described to be based on language mould The question and answer matching system of type further includes ABTest module, and the ABTest module is used to the ranking results carrying out ABTest survey Examination, then calls the pushing module.
17. the question and answer matching system based on language model as claimed in claim 15, which is characterized in that described to be based on language mould The question and answer matching system of type further includes label model, and after calling the pushing module, the label model is for receiving user Instruction is chosen, best answers are chosen from the ranking results according to selection instruction, label is marked to the best answers.
18. the question and answer matching system based on language model as claimed in claim 10, which is characterized in that the acquisition module tool Body includes: term vector generation unit, similarity calculated and data return unit;
The term vector generation unit be used for after receiving the asked questions, using language model to the asked questions into Row processing, generates the question sentence term vector of the asked questions;
The similarity calculated for calculate question sentence term vector word corresponding with each problem in enquirement database to The similarity of amount;
The data return unit is used to that the maximum term vector pair of similarity in database with the question sentence term vector will to be putd question to The problem of answering is also used to return each answer data corresponding with the target problem in answer database as target problem It returns.
19. a kind of electronic equipment including memory, processor and stores the calculating that can be run on a memory and on a processor Machine program, which is characterized in that the processor is realized described in any one of claims 1 to 9 when executing the computer program The question and answer matching process based on language model.
20. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that described program is processed The step of claim 1 to 9 described in any item question and answer matching process based on language model are realized when device executes.
CN201711482842.5A 2017-12-29 2017-12-29 Question-answer matching method, system, equipment and storage medium based on language model Active CN110019736B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711482842.5A CN110019736B (en) 2017-12-29 2017-12-29 Question-answer matching method, system, equipment and storage medium based on language model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711482842.5A CN110019736B (en) 2017-12-29 2017-12-29 Question-answer matching method, system, equipment and storage medium based on language model

Publications (2)

Publication Number Publication Date
CN110019736A true CN110019736A (en) 2019-07-16
CN110019736B CN110019736B (en) 2021-10-01

Family

ID=67187166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711482842.5A Active CN110019736B (en) 2017-12-29 2017-12-29 Question-answer matching method, system, equipment and storage medium based on language model

Country Status (1)

Country Link
CN (1) CN110019736B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569350A (en) * 2019-08-08 2019-12-13 河北省讯飞人工智能研究院 Legal recommendation method, equipment and storage medium
CN110570838A (en) * 2019-08-02 2019-12-13 北京葡萄智学科技有限公司 Voice stream processing method and device
CN111026854A (en) * 2019-12-05 2020-04-17 电子科技大学广东电子信息工程研究院 Answer quality assessment method
CN111078972A (en) * 2019-11-29 2020-04-28 支付宝(杭州)信息技术有限公司 Method and device for acquiring questioning behavior data and server
CN111368064A (en) * 2020-03-26 2020-07-03 平安医疗健康管理股份有限公司 Survey information processing method, device, equipment and storage medium
WO2021051404A1 (en) * 2019-09-20 2021-03-25 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for auxiliary reply
CN117290694A (en) * 2023-11-24 2023-12-26 北京并行科技股份有限公司 Question-answering system evaluation method, device, computing equipment and storage medium
CN117473071A (en) * 2023-12-27 2024-01-30 珠海格力电器股份有限公司 Data retrieval method, device, equipment and computer readable medium

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050060662A1 (en) * 2003-08-22 2005-03-17 Thomas Soares Process for creating service action data structures
CN101118554A (en) * 2007-09-14 2008-02-06 中兴通讯股份有限公司 Intelligent interactive request-answering system and processing method thereof
CN101257512A (en) * 2008-02-02 2008-09-03 黄伟才 Inquiry answer matching method used for inquiry answer system as well as inquiry answer method and system
CN101377777A (en) * 2007-09-03 2009-03-04 北京百问百答网络技术有限公司 Automatic inquiring and answering method and system
CN101566998A (en) * 2009-05-26 2009-10-28 华中师范大学 Chinese question-answering system based on neural network
CN101656799A (en) * 2008-08-20 2010-02-24 阿鲁策株式会社 Automatic conversation system and conversation scenario editing device
CN101740024A (en) * 2008-11-19 2010-06-16 中国科学院自动化研究所 Method for automatic evaluation based on generalized fluent spoken language fluency
CN102034475A (en) * 2010-12-08 2011-04-27 中国科学院自动化研究所 Method for interactively scoring open short conversation by using computer
CN102255788A (en) * 2010-05-19 2011-11-23 北京启明星辰信息技术股份有限公司 Message classification decision establishing system and method and message classification system and method
CN102456073A (en) * 2011-11-03 2012-05-16 中国人民解放军国防科学技术大学 Partial extremum inquiry method
CN102760128A (en) * 2011-04-26 2012-10-31 华东师范大学 Telecommunication field package recommending method based on intelligent customer service robot interaction
CN104572868A (en) * 2014-12-18 2015-04-29 清华大学 Method and device for information matching based on questioning and answering system
CN104636456A (en) * 2015-02-03 2015-05-20 大连理工大学 Question routing method based on word vectors
US20160042275A1 (en) * 2014-08-11 2016-02-11 International Business Machines Corporation Debugging Code Using a Question and Answer System Based on Documentation and Code Change Records
CN107463699A (en) * 2017-08-15 2017-12-12 济南浪潮高新科技投资发展有限公司 A kind of method for realizing question and answer robot based on seq2seq models

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050060662A1 (en) * 2003-08-22 2005-03-17 Thomas Soares Process for creating service action data structures
CN101377777A (en) * 2007-09-03 2009-03-04 北京百问百答网络技术有限公司 Automatic inquiring and answering method and system
CN101118554A (en) * 2007-09-14 2008-02-06 中兴通讯股份有限公司 Intelligent interactive request-answering system and processing method thereof
CN101257512A (en) * 2008-02-02 2008-09-03 黄伟才 Inquiry answer matching method used for inquiry answer system as well as inquiry answer method and system
CN101656799A (en) * 2008-08-20 2010-02-24 阿鲁策株式会社 Automatic conversation system and conversation scenario editing device
CN101740024A (en) * 2008-11-19 2010-06-16 中国科学院自动化研究所 Method for automatic evaluation based on generalized fluent spoken language fluency
CN101566998A (en) * 2009-05-26 2009-10-28 华中师范大学 Chinese question-answering system based on neural network
CN102255788A (en) * 2010-05-19 2011-11-23 北京启明星辰信息技术股份有限公司 Message classification decision establishing system and method and message classification system and method
CN102034475A (en) * 2010-12-08 2011-04-27 中国科学院自动化研究所 Method for interactively scoring open short conversation by using computer
CN102760128A (en) * 2011-04-26 2012-10-31 华东师范大学 Telecommunication field package recommending method based on intelligent customer service robot interaction
CN102456073A (en) * 2011-11-03 2012-05-16 中国人民解放军国防科学技术大学 Partial extremum inquiry method
US20160042275A1 (en) * 2014-08-11 2016-02-11 International Business Machines Corporation Debugging Code Using a Question and Answer System Based on Documentation and Code Change Records
CN104572868A (en) * 2014-12-18 2015-04-29 清华大学 Method and device for information matching based on questioning and answering system
CN104636456A (en) * 2015-02-03 2015-05-20 大连理工大学 Question routing method based on word vectors
CN107463699A (en) * 2017-08-15 2017-12-12 济南浪潮高新科技投资发展有限公司 A kind of method for realizing question and answer robot based on seq2seq models

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110570838A (en) * 2019-08-02 2019-12-13 北京葡萄智学科技有限公司 Voice stream processing method and device
CN110570838B (en) * 2019-08-02 2022-06-07 北京葡萄智学科技有限公司 Voice stream processing method and device
CN110569350B (en) * 2019-08-08 2022-08-09 河北省讯飞人工智能研究院 Legal recommendation method, equipment and storage medium
CN110569350A (en) * 2019-08-08 2019-12-13 河北省讯飞人工智能研究院 Legal recommendation method, equipment and storage medium
WO2021051404A1 (en) * 2019-09-20 2021-03-25 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for auxiliary reply
CN111078972A (en) * 2019-11-29 2020-04-28 支付宝(杭州)信息技术有限公司 Method and device for acquiring questioning behavior data and server
CN111078972B (en) * 2019-11-29 2023-06-16 支付宝(杭州)信息技术有限公司 Questioning behavior data acquisition method, questioning behavior data acquisition device and server
CN111026854A (en) * 2019-12-05 2020-04-17 电子科技大学广东电子信息工程研究院 Answer quality assessment method
CN111368064A (en) * 2020-03-26 2020-07-03 平安医疗健康管理股份有限公司 Survey information processing method, device, equipment and storage medium
CN111368064B (en) * 2020-03-26 2023-04-07 深圳平安医疗健康科技服务有限公司 Survey information processing method, device, equipment and storage medium
CN117290694A (en) * 2023-11-24 2023-12-26 北京并行科技股份有限公司 Question-answering system evaluation method, device, computing equipment and storage medium
CN117290694B (en) * 2023-11-24 2024-03-15 北京并行科技股份有限公司 Question-answering system evaluation method, device, computing equipment and storage medium
CN117473071A (en) * 2023-12-27 2024-01-30 珠海格力电器股份有限公司 Data retrieval method, device, equipment and computer readable medium
CN117473071B (en) * 2023-12-27 2024-04-05 珠海格力电器股份有限公司 Data retrieval method, device, equipment and computer readable medium

Also Published As

Publication number Publication date
CN110019736B (en) 2021-10-01

Similar Documents

Publication Publication Date Title
CN110019736A (en) Question and answer matching process, system, equipment and storage medium based on language model
CN109493166B (en) Construction method for task type dialogue system aiming at e-commerce shopping guide scene
CN111078836B (en) Machine reading understanding method, system and device based on external knowledge enhancement
CN111078844B (en) Task-based dialog system and method for software crowdsourcing
CN117033608B (en) Knowledge graph generation type question-answering method and system based on large language model
CN110188272B (en) Community question-answering website label recommendation method based on user background
CN111444709A (en) Text classification method, device, storage medium and equipment
US11631338B2 (en) Deep knowledge tracing with transformers
CN113377936B (en) Intelligent question and answer method, device and equipment
CN108763535A (en) Information acquisition method and device
CN114896386A (en) Film comment semantic emotion analysis method and system based on BilSTM
CN112069329A (en) Text corpus processing method, device, equipment and storage medium
CN116029273A (en) Text processing method, device, computer equipment and storage medium
CN115455189A (en) Policy text classification method based on prompt learning
Dutt et al. PerKGQA: Question answering over personalized knowledge graphs
CN113869034B (en) Aspect emotion classification method based on reinforced dependency graph
Soni et al. Deep learning, wordnet, and spacy based hybrid method for detection of implicit aspects for sentiment analysis
Grigorev Machine Learning Bookcamp: Build a Portfolio of Real-life Projects
CN113407704A (en) Text matching method, device and equipment and computer readable storage medium
Deshmukh et al. Open domain conversational chatbot
Thahira et al. Comparative Study of Personality Prediction From Social Media by using Machine Learning and Deep Learning Method
Ackerman et al. Theory and Practice of Quality Assurance for Machine Learning Systems An Experiment Driven Approach
Jareño García Machine Learning Techniques for Natural Language Processing
Airlangga Investigating deep learning approach for automated software requirement classification
Jain A Duplicate Question Detection System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant