CN112035629A - Method for implementing question-answer model based on symbolized knowledge and neural network - Google Patents

Method for implementing question-answer model based on symbolized knowledge and neural network Download PDF

Info

Publication number
CN112035629A
CN112035629A CN202010826838.1A CN202010826838A CN112035629A CN 112035629 A CN112035629 A CN 112035629A CN 202010826838 A CN202010826838 A CN 202010826838A CN 112035629 A CN112035629 A CN 112035629A
Authority
CN
China
Prior art keywords
logit
answer
knowledge
input
question
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010826838.1A
Other languages
Chinese (zh)
Other versions
CN112035629B (en
Inventor
何钺
吴昊
黄河燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202010826838.1A priority Critical patent/CN112035629B/en
Publication of CN112035629A publication Critical patent/CN112035629A/en
Application granted granted Critical
Publication of CN112035629B publication Critical patent/CN112035629B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The invention relates to a method for realizing a question-answering model based on symbolic knowledge and a neural network, and belongs to the technical field of extraction type question-answering. Firstly, converting knowledge composed of natural languages into symbolic knowledge in a first-order logic mode, and then combining a regular expression to generate characteristics which can be identified by a neural network, so that information of the symbolic knowledge can be combined into the neural network; meanwhile, aiming at the problem that the generalization of the symbolized knowledge is reduced only by using the regular expression, a method based on an attention mechanism is provided, and the generalization of the symbolized knowledge in the question and answer process can be improved by utilizing the associated information between the symbolized knowledge and the input text. Compared with the prior art, the method combines the advantages of the rule-based question-answer model and the deep learning-based question-answer model, so that the reasoning process of the model is more explanatory, and the robustness and the accuracy of the question-answer model are improved.

Description

Method for implementing question-answer model based on symbolized knowledge and neural network
Technical Field
The invention relates to a method for realizing a question-answer model, in particular to a method for realizing a simple extraction type question-answer model by combining symbolic knowledge in a traditional question-answer model based on a neural network, belonging to the technical field of deep learning question-answer.
Background
Before the internet era, people were accustomed to obtaining information that they wanted to know by consulting documents, using search engines, and the like. The document review usually needs to spend a lot of time to retrieve useful knowledge, and although the information retrieval time can be greatly reduced by using a search engine, the search results returned by the search engine are often of uneven quality, the content is redundant, and people still need to spend time to retrieve the truly useful information. It would be desirable to have a question-answering system that can answer questions posed by a user in natural language in accurate and concise natural language.
Based on this need, question-answering systems have become a research hotspot in the industry and academia. A typical question-answering system has a workflow: the system carries out information retrieval and processing such as sorting on retrieved information according to questions input by a user, and the core question-answering model answers the questions according to the information input by the user and processed.
From the point of view of answering questions, there are two types of common question-answering systems: one is an extraction type question-answering system, i.e. after the user puts forward the question, the system will search the relevant document in the database, finally extract a text fragment in the document as the final answer; the other is a generating question-answering system, which generates answer text as the final answer after referring to the relevant document. The extraction type question-answering system is easier to construct than the generation type question-answering system, so the extraction type question-answering system is more widely applied.
From the structural aspect, common question-answering systems include a rule-based question-answering system, a neural network-based question-answering system, and other question-answering systems: the question-answering system based on the rules needs to consume a large amount of labor cost to construct the rules and the templates, can only be suitable for a certain vertical field, but has strong interpretability and strong logic in the reasoning process; the question-answering system based on the neural network depends on training data, the reasoning process is black-boxed, the interpretability is poor, and the generalization is strong; other question-answering systems are e.g. knowledge-graph based, knowledge-base based, etc.
Currently, in the field of natural language processing, a question-answering system based on a neural network is more widely applied than a question-answering system based on rules. Rule-based question-answering systems are gradually being replaced due to their poor generalization and high construction costs. However, since the question-answering system based on the rule can better compensate the defects of poor interpretability and weak inference logic of the system based on the neural network, in recent years, researchers have started to research how to combine the rule and the neural network to construct the question-answering system.
A complete question-answering system is complex in structure and relates to steps of information retrieval, document relevance ranking and the like, and the patent only relates to relevant rights of a core question-answering model in the question-answering system. The existing framework of the extraction type question-answering model is as follows:
the existing user proposes a question q in a natural language form, and the question answering system retrieves a related context c recorded in the natural language form according to the question q, wherein the context c may contain a text segment required for answering the question q. Question q and context c are both a non-empty string. The goals of the question-answer model are: the question q and the context c are received as input, and whether the context c contains a segment required for answering the question q is judged. If yes, extracting a text segment capable of answering the question q from the c and using the text segment as an answer a*Returning; otherwise, order the answer a*Is an empty string and returns. The model is modeled using a formula as follows:
Figure BDA0002636515360000021
where c (c) represents all non-empty substrings of string c, a non-empty substring of a string refers to a subsequence of non-zero consecutive characters in the string. continain (q, c) is a function used for judging whether the document c contains a segment required for answering the question q, and if yes, returning a true value true; otherwise a false value false is returned.
Disclosure of Invention
The invention aims to solve the problems partially or completely, is improved on the basis of a BERT model, and provides a more efficient and robust method for realizing the question-answering model compared with the traditional question-answering model only based on the BERT.
The object of the present invention is achieved by the following technical means.
The principle of the method is that firstly, knowledge composed of natural language is converted into symbolized knowledge in a first-order logic mode, and then characteristics which can be identified by a neural network are generated by combining a regular expression, so that information of the symbolized knowledge can be combined into the neural network; meanwhile, aiming at the problem that the generalization of the symbolized knowledge is reduced only by using the regular expression, a method based on an attention mechanism is provided, and the generalization of the symbolized knowledge in the question and answer process can be improved by utilizing the associated information between the symbolized knowledge and the input text.
The invention provides a method for realizing a question-answer model based on symbolic knowledge and a neural network, which comprises the following contents:
1: constructing a symbolized knowledge base;
2: constructing an input sequence input of a problem q-context c, transmitting the input sequence input into a BERT model to obtain a coded vector sequence H,
Figure BDA0002636515360000031
wherein hidden _ size is a hyper-parameter of the model;
3: h is to beiTransmitting a full connection layer, and respectively obtaining a log value logit of each position in the input as an answer starting position and an answer ending position, wherein
Figure BDA0002636515360000032
Representing the ith row vector of the H, i is more than or equal to 1 and less than or equal to | input |, and let logits,iIndicating that the ith position of said input is the answer start positionlogit,logite,iIndicating that the ith position of the input is a location of an answer end position;
4: judging whether the question q can be solved through the context c according to the logit, if the question q cannot be solved, returning null as a result, and calculating a Loss function Loss according to a labeled answer corresponding to a training sample q-c in the model training process0Reversely updating the 2-4 neural network parameters, and then ending the process; otherwise, continuing the step 5;
5: respectively matching q and c by using the kth knowledge in the symbolized knowledge base to generate characteristic information m of the symbolized knowledgek、nkK is a natural number, k is more than or equal to 1 and less than or equal to z, and z is the number of knowledge in the symbolized knowledge base;
6: calculating logarithms logic 'of symbolized knowledge to judge whether a certain position of an input sequence is covered by an answer or not by using feature information of attention mechanism and symbolized knowledge'i
7: according to logit and logit'iPredicting and outputting the answer, and calculating a Loss function Loss according to the labeled answer and the predicted answer corresponding to the training sample q-c in the model training process3And updates the 2-7 neural network parameters in reverse.
Preferably, the construction process of 1 is as follows:
first, various knowledge related to the problem is collected and constructed, such as:
knowledge 1: if the question is question length, then the answer should be in the form of a number + units of length
Knowledge 2: if the question is asking time, then the answer should be in the form of month + date
……
These gathered and structured knowledge are in natural language form in the form of "if the question is … …, then the answer should be in the form of … …". It is then symbolized using first order logic. That is, for a piece of knowledge, it can be symbolized as: p → Q, where the condition P is "question is … …", and the conclusion Q is "answer should be in the form of … …", P, Q has different meanings depending on the specific knowledge. Then the knowledge symbolIn the transformed representation P → Q, two regular expressions RE are constructedp、REQCorresponding to condition P and conclusion Q, respectively. As knowledge 1, condition P is "question is question length" and conclusion Q is "answer should be in the form of number + length units". Since the question of the query length usually starts with "How long", let REpThe question sentence is started by the "How long" and is used for judging whether the question sentence starts by the "How long"; for a segment in the form of a number + length unit, RE can be made to beQ="[1-9][0-9](killometers.
After all knowledge is subjected to the operations, the knowledge is organized into a symbolized knowledge base. The symbolic knowledge base contains a plurality of pieces of knowledge, and each piece of knowledge is represented by a natural language form, a symbolic form and two corresponding regular expressions REp、REQAnd (4) forming.
Preferably, the specific steps of 2 are as follows: preprocessing the input question q and the context c by word segmentation and the like, and splicing into an input sequence input ═ 2 [, ]<CLS>,q1,…,qi,…,qn,<SEP>,c1,c2,…,ck…,cm,<SEP>,<PAD>,…,<PAD>]Where n is the length of the problem, m is the length of the context, | input | ═ max _ seq _ length, | input | represents the length of the input sequence, max _ seq _ length is a hyper-parameter of the model, and is a positive integer.<CLS>Indicating the beginning of a sentence, qiRepresents the ith word in the question q, 1 ≦ i ≦ n,<SEP>denotes a separation mark, ckRepresents the kth word in context c, 1. ltoreq. k. ltoreq.m,<PAD>a padding flag is indicated for padding the sequence so that | input | ═ max _ seq _ length.
The input sequence is transmitted into a BERT model, so that a coded vector sequence H can be obtained at the output layer of the BERT model,
Figure BDA0002636515360000051
where hidden _ size is a hyper-parameter of the model.
Preferably, the specific steps of 3 are as follows: h is to beiA full connection layer is introducedAnd respectively obtaining the logit (logit) value of each position in the input as the start position and the end position of the answer. Wherein
Figure BDA0002636515360000052
The ith row vector representing H is processed by a BERT model, and H isiWhich contains the semantic information of the ith word in the input sequence input. The logarithms can be regarded as probabilities that have not been normalized, and the larger the value, the higher the probability. The formula is expressed as:
logits,i=Wshi+bs
logite,i=Wehi+be
wherein
Figure BDA0002636515360000053
In order to be a weight parameter, the weight parameter,
Figure BDA0002636515360000054
is a bias term; logits,iIs that
Figure BDA0002636515360000055
The ith element of (1), which is a real number, indicates that the ith position of input is the value of logit of the answer start position; logite,iIs that
Figure BDA0002636515360000056
The ith element of (2), which is also a real number, indicates that the ith position of input is the logarithm of the answer end position.
Preferably, the specific steps of 4 are as follows: the model first assumes that the question q can be solved, with the answer at the start position pos in inputsAnd end position poseThe calculation method is as follows:
Figure BDA0002636515360000061
s.t.i≤j
n+3≤i<n+3+m
n+3≤j<n+3+m
where n is the length of the question q and m is the length of the context c. The constraint is to ensure that the answer start position precedes the end position, and that the answer start/end positions are all within the context c of the input sequence input.
The model then calculates the value of logit that the problem can be solved foranswerableAnd a logit value logit that the question cannot be solvedunanswerableThe calculation method is as follows:
Figure BDA0002636515360000062
logitunanswerable=logits,1+logite,1
preferably, the model will enter the first word of the sequence input (i.e., the model will enter the first word of the sequence input<CLS>) Location of locations,1And logite,1Is regarded as the sum ofunanswerable
Figure BDA0002636515360000063
Respectively represent logits、logiteTo (1) a
Figure BDA0002636515360000064
First, the
Figure BDA0002636515360000065
And (4) each element.
After calculating logitanswerableAnd logitunanswerableThereafter, the next step is decided by judging the magnitude relation therebetween. If logitunanswerable>logitanswerableIf the model judges that the question can not be answered, the answer is null, the answer is returned, and the Loss function Loss is calculated0Updating the model parameters (including parameters in the BERT model in 2) involved in 2-4 through back propagation until the processing flow of the current sample is finished; otherwise, the model judgment problem can be solved without calculating Loss function Loss0Update model parameters and return answers, should go to 5.
When logit isunanswerable>logitanswerableLoss function Loss used by model0The cross entropy is calculated as follows:
ps=softmax(logits)
pe=softmax(logite)
Figure BDA0002636515360000071
wherein the content of the first and second substances,
Figure BDA0002636515360000072
the starting position and the ending position of the correct answer respectively labeled for the training sample in the input are positive integers which meet the requirement
Figure BDA0002636515360000073
If the sample is marked as being problematic, then the order is given
Figure BDA0002636515360000074
Preferably, the specific steps of 5 are as follows: for the k-th symbolized knowledge, the RE corresponding to the k-th symbolized knowledge is usedp、REQMatching the question q and the context c respectively and generating a feature mk、nk
Wherein m iskRE representing the kth symbolized knowledgepWhether there is a match problem q, if there is a match mk1 is ═ 1; otherwise there is mk=0。
Figure BDA0002636515360000075
Obtaining REQAll matching segments in the context part of an input, n if the ith position of the input is "covered" by a matching segmentkThe ith element n ofk,i1, otherwise nk,i0. Let p, p' be a matching text segment in inputIf p is less than or equal to i is less than or equal to p', the ith position of input is called to be covered by the text segment.
Preferably, the specific steps of 6 are as follows: through logit'att,i、logit′re,iTo calculate logit'i. Wherein
Figure BDA0002636515360000076
Representing the logarithms of the i-th position of the input sequence considered by the symbolized knowledge to be "covered" by the answer text segment.
Figure BDA0002636515360000077
Calculated from the attention mechanism (step 6.1),
Figure BDA0002636515360000078
from mk、nkCalculated (step 6.2);
step 6.1: logit'att,iObtained by the following process: first calculate pP,k
Figure BDA0002636515360000079
Figure BDA00026365153600000710
logitP,k=wPsP,k+bP
Figure BDA00026365153600000711
Figure BDA0002636515360000081
Is a matrix of parameters that is,
Figure BDA0002636515360000082
is trainable embedding corresponding to condition P in kth symbolized knowledgeThe vector of the vector is then calculated,
Figure BDA0002636515360000083
the attention score of the condition P representing the kth tokenized knowledge on the ith word of the input sequence,
Figure BDA0002636515360000084
which stores the information related to the kth tokenization knowledge gathered in the input sequence input in combination with the attention score.
Figure BDA0002636515360000085
Trainable weight vectors and bias terms, respectively;
Figure BDA0002636515360000086
a logarithm of the problem q representing the condition P in the kth piece of symbolized knowledge;
Figure BDA0002636515360000087
indicates the probability that the condition P holds at the question q in the kth piece of symbolized knowledge.
The assumptions of the above calculations are: mapping the condition P in the kth symbolized knowledge to an embedded vector vP,kThen using the vector and the input sequence to perform attention calculation operation to obtain an attention score attP,k,iThen, the attention score is taken as the weight pair hiWeighted summation is carried out to obtain sP,k. This process can be viewed as gathering and storing information about words related to the kth tokenization knowledge in the input sequence input at sP,kIn (1). Then to sP,kLinear transformation to obtain logitP,kThen to logitP,kIs calculated to obtain pP,k,pP,kIndicating the probability that the current problem q is solved by applying the kth symbolized knowledge.
After calculating the probability that the condition P holds, the logarithms logit of the conclusion Q are calculatedQ,k,iAnd logit'att,i. The calculation method is as follows:
Figure BDA0002636515360000088
Figure BDA0002636515360000089
wherein the content of the first and second substances,
Figure BDA00026365153600000810
is a matrix of parameters that is,
Figure BDA00026365153600000811
is the trainable embedded vector corresponding to the conclusion Q in the kth symbolized knowledge,
Figure BDA00026365153600000812
the ith word representing the input sequence is selected as the logarithms of the conclusion Q. Then using the previously calculated pP,kAs weight pairs logitQ,k,iAre weighted and summed to obtain
Figure BDA00026365153600000813
It represents: after using the full symbolization knowledge in combination with the attention mechanism, the logarithms of the i-th positions of the input sequence are "covered" by the answers.
The assumptions of the above calculations are: mapping the conclusion Q in the kth symbolized knowledge to an embedded vector vQ,kThen using the vector and the input sequence to do the similar attention mechanism operation to calculate the logitQ,k,iConclusion Q, which represents the kth tokenization knowledge, considers the logarithm of the ith position of the input sequence "covered" by the answer, and then uses pP,kLogic of all symbolized knowledgeQ,k,iWeighted summation is carried out to obtain logit'att,i
Step 6.2: logit're,iObtained by the following process:
Figure BDA0002636515360000091
logit′re=tanh(Wtt+bt)
Figure BDA0002636515360000092
is that
Figure BDA0002636515360000093
The (i) th element of (a),
Figure BDA0002636515360000094
is a matrix of parameters that is,
Figure BDA0002636515360000095
is a bias vector.
Figure BDA0002636515360000096
Is its ith element.
Finally, the method comprises the following steps:
logit′i=sigmoid(logit′att,i+logit′re,i)
wherein sigmoid of outermost layer is for will'iLimited to the interval [0,1 ]]Of preventing logit'iToo large to have an excessive impact on the final answer prediction.
Step 6.1 in step 6 is indispensable, since logit 'is calculated if step 6.2 is relied upon only'iThe calculation of step 6.2 is a feature m generated by a highly dependent regular expressionkAnd nkRegular expression generalization is poor, so step 6.1 is needed as a complement to improve generalization. Step 6.1 represents the condition P and the conclusion Q of each symbolic knowledge respectively by using a trainable vector, and calculates logit 'by combining an attention mechanism'att,iThe information provided by the symbolized knowledge can be utilized, and the defect of poor generalization caused by the regular expression can be avoided.
Preferably, the specific steps of 7 are as follows: firstly, calculating logarithms logit 'of ith positions in the input sequence, which are answer starting and ending positions respectively's,i、logit′e,iThe specific calculation method is as follows:
logit′s,i=logits,i+αlogit′i
logit′e,i=logite,i+αlogit′i
wherein, logits,i、logite,iThe value of (c) has been calculated in step 3,
Figure BDA0002636515360000097
the weight of (2) is a hyper-parameter. The starting position of the answer in the input sequence is then predicted
Figure BDA0002636515360000101
And end position
Figure BDA0002636515360000102
Figure BDA0002636515360000103
s.t.i≤j
n+3≤i<n+3+m
n+3≤j<n+3+m
After obtaining the start and end positions, extracting the second position in the input sequence
Figure BDA0002636515360000104
To the first
Figure BDA0002636515360000105
Outputting the words as answer segments;
the model training process also needs to calculate the Loss function Loss3It consists of two parts: loss1、Loss2. Wherein Loss1The cross entropy of the prediction result and the sample labeling answer is formed, and the calculation mode is as follows:
p′s=softmax(logit′s)
p′e=softmax(logit′e)
Figure BDA0002636515360000106
wherein
Figure BDA0002636515360000107
Figure BDA0002636515360000108
The starting position and the ending position of the correct answer respectively labeled for the training sample in the input are positive integers which meet the requirement
Figure BDA0002636515360000109
Figure BDA00026365153600001010
If the sample is marked as being problematic, then the order is given
Figure BDA00026365153600001011
Figure BDA00026365153600001012
Loss2Is for logit'iThe constructed loss function is formed by two-class cross entropy. First of all, is constructed
Figure BDA00026365153600001013
Label when the ith position of the input sequence is "covered" by the sample-labeled answeri1 is ═ 1; otherwise labeli0. At this time, each position in the input sequence can be divided into two types according to whether the position is covered by the answer labeled by the sample, and the Loss is calculated by using the cross entropy of the two types2
Figure BDA00026365153600001014
The final sum of the two is used as the final Loss function Loss3And according to Loss3Back propagation to update the models involved in 2-7Parameters (including parameters in the BERT model in 2). Comprises the following steps:
Loss3=Loss1+βLoss2
wherein
Figure BDA0002636515360000111
Is a hyper-parameter, is Loss2The weight of (c).
In another aspect, the present invention further provides an electronic device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of implementing a tokenized knowledge and neural network based question-answer model as described above.
In another aspect, the present invention also provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the method for implementing a tokenized knowledge and neural network-based question-answer model as described above.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the aforementioned method for implementing a tokenized knowledge-and-neural network-based question-answer model.
Has the advantages that:
compared with the prior art, the invention has the following characteristics:
a combination mode of symbolized knowledge and a neural network is designed, so that the model can depend on the neural network and also can depend on the symbolized knowledge constructed manually when predicting answers, the reasoning process of the model is more explanatory, and meanwhile, the robustness and the accuracy of the question-answering model are improved.
Drawings
FIG. 1 is a flowchart illustrating a general extraction-type question answering system;
FIG. 2 is a schematic diagram of the main workflow within the model of the present invention (assuming that the problem can be solved);
FIG. 3 is a flow chart of a process embodying (training) the present invention;
fig. 4 is a flow chart of a specific use (prediction) process of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and examples.
FIG. 1 shows the workflow of a general extraction-type question-answering system; FIG. 2 illustrates (under conditions where the problem can be solved) the main workflow within the model of the present invention; FIG. 3 is a flow chart of a specific implementation (training) of a question-answering model based on symbolic knowledge and neural networks; fig. 4 is a specific use (prediction) flow of a question-answering model based on symbolic knowledge and neural network.
Example 1:
there is a training sample that contains the problem:
q="How long is the Yellow River?"
context:
c="The Yellow River,a large river in northern China,has a total
length of about 5464kilometers and a watershed area of about
752443square kilometers."
and correct answer to sample annotation: 5464 killometers "
The above example illustrates a method for implementing (training) a question-answering model based on symbolic knowledge and a neural network, which includes the following steps:
1: a symbolized knowledge base is constructed, and two pieces of knowledge are assumed to exist in the knowledge base:
Figure BDA0002636515360000121
Figure BDA0002636515360000131
knowledge 1, regular expression REPIs to check if the entered question text starts with "How long", REQThe method is to extract a text segment in a context text, wherein the form of the text segment conforms to the form of 'number + length unit', such as: "1 kilometers", "3 kilometers", "203 km"; knowledge 2, regular expression REPIs to check whether the entered question text starts with "When", REQThe method is to extract a text segment in a context text, wherein the form of the text segment conforms to the form of 'month + date', such as: "January 13", "Feb.29", "July.2".
2: and (4) constructing an input sequence of the problem-context, and transmitting the input sequence into a BERT model to obtain a coded vector sequence H.
Pre-processing the input question q and the context c by word segmentation and the like, and splicing into an input sequence [ < CLS >, how, long, …? < SEP >, the, yellow, …, killometers, < SEP >, < PAD >, …, < PAD > ]. The input sequence was introduced into the BERT model (Jacob Depressin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep biological transformations for Language understanding. InProcedents of the 2019Conference of the North American chapters of the Association for comparative Linear Technologies: Human Language Technologies, Volume 1(Long and Short Papers), pp.4171-4186, Minneapolis, Minnesota, June 2019.Association for comparative Linear logicsistances.doi: 10.18653/v 42/N19-1423. URL: https/36/anti-1422. good vector codes were obtained.
3: h is to beiTransmitting a full connection layer, calculating according to the formula in the step 3, and respectively obtaining the logarithm logit of each position in the input as the initial position and the end position of the answers,iAnd logite,i
4: by calculating logitunanswerableAnd logitanswerableTo determine whether the question q can be answered by the context c. If not, thenMake the answer null, return the answer, calculate the Loss function Loss0Updating the model parameters (including parameters in the BERT model) involved in the 2-4 by back propagation until the processing flow of the current sample is finished; otherwise, the model judgment problem can be solved without calculating Loss function Loss0Update model parameters and return answers, should go to 5.
5: generating characteristic information m of symbolized knowledgek、nk
For symbolized knowledge 1, REpThe matching problem q is successful, so m1=1;REQThe segment of the matching content c is "5464 killometers", and the segments thereof correspond to the starting position and the ending position in the input of 23 and 24, respectively, so that n1=[0,0,…,0,1,1,0,…0]Wherein 23, 24 th element n1,23、n1,24Is 1, the rest is 0. Similarly for the symbolized knowledge 2, m is available2=0,n2=[0,0,…,0,0]I.e. n2All elements in (1) are 0.
6: using the feature information of attention machine mechanism and tokenized knowledge, logarithms 'of tokenized knowledge to determine whether a position of an input sequence is covered by the answer "are calculated using the method of claim 6'i
7: predicting answer position, outputting answer, calculating Loss function Loss according to method in invention content 73And updates the model parameters involved in 2-7 (including the parameters in the BERT model in 2). So far, the processing flow of the current sample is finished.
8: and (3) training the model by continuously using new samples and repeating the invention contents 2-7 to finally obtain a well-trained question-answer model based on symbolic knowledge and a neural network.
Example 2:
the problems of the existing user input are as follows:
q="How long is the Yellow River?"
and the question-answering system retrieves the context associated with the question according to the question:
c="The Yellow River,a large river in northern China,has a total
length of about 5464kilometers and a watershed area of about
752443square kilometers."
the above example illustrates a method for using a question-and-answer model based on symbolic knowledge and a neural network, which includes the following steps:
1: and (3) constructing an input sequence input of the problem-context, and transmitting the input sequence input into a BERT model to obtain a coded vector sequence H.
Pre-processing the input question q and the context c by word segmentation and the like, and splicing into an input sequence [ < CLS >, how, long, …? < SEP >, the, yellow, …, killometers, < SEP >, < PAD >, …, < PAD > ]. And (4) after the input sequence is transmitted into a BERT model, obtaining a coded vector sequence H.
2: h is to beiTransmitting the data into a full connection layer, calculating according to a formula in the invention content 3, and respectively obtaining the logarithm logit of each position in the input as an answer starting position and an answer ending positions,iAnd logite,i
3: by calculating logitunanswerableAnd logitanswerableTo determine whether the question q can be answered by the context c. If the answer is not answered, the answer is enabled to be empty and returned, and the processing flow is ended; otherwise, the model judges that the question can be solved without returning an answer, and turns to 4.
4: generating characteristic information m of symbolized knowledgek、nk
Referring to example 1, RE for knowledge 1 in the tokenized knowledge base constructed during the training phasepThe matching problem q is successful, so m1=1;REQThe segment of the matching content c is "5464 killometers", and the segments thereof correspond to the starting position and the ending position in the input of 23 and 24, respectively, so that n1=[0,0,…,0,1,1,0,…0]Wherein 23, 24 th element n1,23、n1,24Is 1, the rest is 0. Similarly for the symbolized knowledge 2, m is available2=0,n2=[0,0,…,0,0]I.e. n2All elements in (1) are 0.
5: using the feature information of attention machine mechanism and tokenized knowledge, logarithms 'of tokenized knowledge to determine whether a position of an input sequence is covered by the answer "are calculated using the method of claim 6'i
6: the answer position is predicted according to the method in the invention section 7, and the corresponding text segment is extracted from the input as the answer according to the answer position and the answer is output.
Performance comparison experiment of question-answer model and reference model based on symbolized knowledge and neural network
Evaluation indexes are as follows:
the invention is evaluated based on symbolic knowledge and a question-and-answer model of a neural network. Two indexes exist to evaluate the accuracy of the question-answer model responses:
exact match (hereinafter abbreviated as EM):
the EM calculation is as follows:
let a test sample contain the question q, the context c containing the answer to the question and the correct answer a to the question, the existing model F, the model prediction result is a'. Then there are:
a′=F(q,c)
Figure BDA0002636515360000161
f1-score (hereinafter referred to as F1):
f1 is typically used as an indicator of the performance of the model in a two-class question, and each word in the model prediction results can be divided into two classes according to whether it is "covered" by the answer. Since the calculation method of F1 is more complex, before the calculation method of F1 is introduced, the concept is introduced first:
TP (true Positive): the positive example samples are predicted as the number of positive examples samples.
FP (false Positive): negative samples are predicted as the number of positive samples.
Fn (false negative): the positive samples are predicted as the number of negative samples.
Precision/precision (precision): the calculation method refers to the proportion of the positive samples in the samples judged as positive examples by the classifier, and comprises the following steps:
Figure BDA0002636515360000162
recall/recall (recall): the proportion of the samples judged as positive examples by the classifier in the positive example samples is calculated as follows:
Figure BDA0002636515360000171
final calculation F1:
Figure BDA0002636515360000172
data set:
the present invention was trained and evaluated separately using the training and validation sets in the Stanford question-answering dataset SQuAD v 2.0. The SQuAD v2.0 adds fifty thousand pairs of non-returnable questions on the basis of the SQuAD v1.1, and about fifteen thousand pairs of question and answer samples are obtained. This data set, while adding only unanswerable questions to the SQuAD v1.1, greatly increases the challenges of this data set. A model with F1 reaching 86% on SQuAD v1.1 could only reach 66% on SQuAD v 2.0.
A reference model:
in this experiment, a native BERT model (BERT-Base, unorased) published by Google on gitub was used as a reference model, and source codes of BERT model matching the squid v2.0 task were also published on gitub. The process of predicting the answer is highly similar to steps 2-4 above, except that it is done by comparing logit in step 4answerableAnd logitunanswerableWhen the answer is found to be answered (i.e. logit)unanswerable≤logitanswerable) Will return the position of answer directly
Figure BDA0002636515360000173
Figure BDA0002636515360000174
And pass through the Loss function Loss0And updating the neural network parameters.
And (3) hyper-parameter:
in the present experiment, in addition to the common hyperparameters, such as the batch _ size and the learning rate, the hyperparameters carried by the models, such as the maximum length max _ seq _ length, α and β of the input sequence, are also referred to. Preferably, the following values of the hyper-parameter are set.
TABLE 1 values of the hyper-parameters and description thereof
Figure BDA0002636515360000181
The experimental results are as follows:
in the experiment, the learning rate learning _ rate, the number of rounds of training epochs and the batch size batch _ size were selected as hyper-parameters, and the reference model and the model of the present invention were tested several times.
TABLE 2 Performance comparison test
Figure BDA0002636515360000182
Figure BDA0002636515360000191
And (3) performance comparison analysis:
in the above experiments, the baseline model gave the best performance in Trials 10, and the model of the present invention gave the best performance in Trials 12. Compared with a reference model, the absolute improvement of the F1-score is 1.06%, and the EM absolute improvement is 0.96%.
The experimental data is observed, the performance of the invention is better than the optimal performance of the reference model in Trials 6, 9 and 12, and the invention considers that the invention is improved on the BERT model, the BERT model is pre-trained, and the newly added network parameters on the BERT model are not trained, so the model in the invention needs higher learning rate and longer training time than the reference model. Under the conditions of correct learning rate and sufficient training time, the method can obtain higher performance than a reference model.
In conclusion, the question-answer model performance based on symbolic knowledge and the neural network provided by the invention is superior to that of the comparison model, and the symbolic knowledge is integrated into the neural network, so that the performance of the question-answer model can be effectively improved, and the effectiveness of the question-answer model is proved.
In another aspect, the present invention further provides an electronic device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of implementing a tokenized knowledge and neural network based question-answer model as described above.
In another aspect, the present invention also provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the method for implementing a tokenized knowledge and neural network-based question-answer model as described above.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the aforementioned method for implementing a tokenized knowledge-and-neural network-based question-answer model.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not constitute a limitation on the element itself.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A realization method of a question-answer model based on symbolized knowledge and a neural network is characterized in that: the method comprises the following steps:
1: constructing a symbolized knowledge base;
2: constructing an input sequence input of a problem q-context c, transmitting the input sequence input into a BERT model to obtain a coded vector sequence H,
Figure FDA0002636515350000011
wherein hidden _ size is a hyper-parameter of the model;
3: h is to beiTransmitting a full connection layer, and respectively obtaining a log value logit of each position in the input as an answer starting position and an answer ending position, wherein
Figure FDA0002636515350000012
Is the ith row vector of the H, i is more than or equal to 1 and less than or equal to | input |, let logits,iThe ith position representing the input is the location of the answer start position, locatione,iIndicating that the ith position of the input is a location of an answer end position;
4: judging whether the q can be solved through the c according to the logit, if the q can not be solved, returning null as a result, and calculating a Loss function Loss according to a labeled answer corresponding to a training sample q-c in the model training process0Reversely updating the 2-4 neural network parameters, and then ending the process; otherwise, continuing the step 5;
5: respectively matching the q and the c by using the kth knowledge in the symbolized knowledge base to generate the characteristic information m of the symbolized knowledgek、nkK is a natural number, k is more than or equal to 1 and less than or equal to z, and z is the number of knowledge in the symbolized knowledge base;
6: calculating logarithms logic 'of symbolic knowledge to judge whether a certain position of the input sequence is covered by the answer or not by using attention mechanism and characteristic information of the symbolic knowledge'i
7: according to the logit and the logit'iPredicting and outputting the answer, and calculating a Loss function Loss according to the labeled answer and the predicted answer corresponding to the training sample q-c in the model training process3And updates the 2-7 neural network parameters in reverse.
2. The method of claim 1, wherein: each piece of knowledge in the tokenized knowledge base is represented by a natural language form, a tokenized form and two corresponding regular expressions REp、REQIs formed of (i) wherein REpTo representRegular expression of conditions, REQRegular expressions representing conclusions.
3. The method of claim 1, wherein: the specific steps of the step 2 are as follows: the input question q and the input context c are participled and spliced into an input sequence input ═ c<CLS>,q1,…,qi,…,qn,<SEP>,c1,c2,…,cj…,cm,<SEP>,<PAD>,…,<PAD>]Where n is the length of the question q, m is the length of the context c, | input | ═ max _ seq _ length, | input | denotes the length of the input sequence, max _ seq _ length is a hyper-parameter of the model, being a positive integer,<CLS>indicating the beginning of a sentence, qiRepresents the ith word in the question q, 1 ≦ i ≦ n,<SEP>denotes a separation mark, cjRepresents the jth word in context c, 1 ≦ j ≦ m,<PAD>a padding flag is indicated for padding the sequence so that | input | ═ max _ seq _ length.
The input sequence is transmitted into a BERT model, so that a coded vector sequence H can be obtained at the output layer of the BERT model,
Figure FDA0002636515350000026
where hidden _ size is a hyper-parameter of the model.
4. The method of claim 1, wherein: the specific steps of the step 3 are as follows: calculating the ith word in the input, i.e. the h, byiAs the log values logit of the answer start position and end position:
logits,i=Wshi+bs
logite,i=Wehi+be
wherein
Figure FDA0002636515350000021
In order to be a weight parameter, the weight parameter,
Figure FDA0002636515350000022
is a bias term; logits,iIs that
Figure FDA0002636515350000023
The ith element of (1), which is a real number, indicates that the ith position of the input is a logit value of the answer start position; logite,iIs that
Figure FDA0002636515350000024
The ith element of (2), which is also a real number, indicates that the ith position of the input is the logarithm of the answer end position.
5. The method of claim 1, wherein: the specific steps of the step 4 are as follows: the model first assumes that the question q can be solved, with the answer at the start position pos in inputsAnd end position poseThe calculation method is as follows:
Figure FDA0002636515350000025
s.t.i≤j
n+3≤i<n+3+m
n+3≤j<n+3+m
where n is the length of the question q and m is the length of the context c;
then, a logit value logit is calculated by which the problem can be solvedanswerableAnd a logit value logit that the question cannot be solvedunanswerableThe calculation method is as follows:
Figure FDA0002636515350000031
logitunanswerable=logits,1+logite,1
wherein the content of the first and second substances,
Figure FDA0002636515350000032
respectively represent logits、logiteTo (1) a
Figure FDA0002636515350000033
Figure FDA0002636515350000034
A value of an element;
if logitunanswerable>logitanswerableIf the model judges that the question can not be answered, the answer is null, the answer is returned, and the Loss function Loss needs to be calculated in the model training process0And back-propagating the model parameters involved in updates 2-4; otherwise, the model judges that the problem can be solved, and 5 is turned;
the Loss0The cross entropy is calculated as follows:
ps=softmax(logits)
pe=softmax(logite)
Figure FDA0002636515350000035
wherein the content of the first and second substances,
Figure FDA0002636515350000036
the starting position and the ending position of the correct answer respectively labeled for the training sample in the input are positive integers which meet the requirement
Figure FDA0002636515350000037
If the sample is marked as being problematic, then the order is given
Figure FDA0002636515350000038
6. The method of claim 1, wherein: the specific steps of the step 5 are as follows: symbolization for the k-th barKnowledge, regular expression RE using this piece of symbolized knowledgep、REQMatching the question q and the context c respectively and generating a feature mk、nk;REpIs a conditional regular expression, REQIs a regular expression of conclusions; k is more than or equal to 1 and less than or equal to z, and z is the number of knowledge in the symbolized knowledge base;
wherein m iskRE representing the kth symbolized knowledgepWhether there is a match problem q, if so, mk1 is ═ 1; otherwise mk=0;
Figure FDA0002636515350000041
Obtaining REQAll matching segments in the upper and lower parts of the input, if the ith position of the input is "covered" by a matching segment, then nkThe ith element n ofk,i1, otherwise nk,i0; let p and p 'be the starting and ending positions of a matched text segment in the input, and if p ≦ i ≦ p', it is said that the ith position of the input is "covered" by the text segment.
7. The method of claim 1, wherein: the concrete steps of the step 6 are as follows: through logit'att,i、logit′re,iTo calculate logit'i(ii) a Wherein
Figure FDA0002636515350000042
Representing the logarithm of the ith position of the input considered to be covered by the answer text segment by symbolized knowledge;
Figure FDA0002636515350000043
as calculated from the attention mechanism, the attention-oriented mechanism,
Figure FDA0002636515350000044
from said mk、nkCalculating to obtain;
step 6.1: logit'att,iObtained by the following process:
first calculate pP,k
Figure FDA0002636515350000045
sP,k=∑iattP,k,ihi
logitP,k=wPsP,k+bP
Figure FDA0002636515350000046
Wherein the content of the first and second substances,
Figure FDA0002636515350000047
is a matrix of parameters that is,
Figure FDA0002636515350000048
is the representation of the symbolized form of the k-th knowledge in the symbolized knowledge base, namely the trainable embedded vector corresponding to the condition P in the k-th symbolized knowledge,
Figure FDA0002636515350000049
represents the attention score of the condition P on the ith word in the input,
Figure FDA00026365153500000410
the information which is collected in the input and is related to the kth symbolized knowledge after the attention score is combined is stored in the input;
Figure FDA00026365153500000411
trainable weight vectors and bias terms, respectively;
Figure FDA00026365153500000412
a logarithm of the problem q representing the condition P in the kth piece of symbolized knowledge;
Figure FDA00026365153500000413
representing the probability that the condition P is satisfied in the k-th symbolized knowledge on the question q;
second, the logit of conclusion Q is calculatedQ,k,iAnd logit'att,iThe calculation method is as follows:
Figure FDA00026365153500000414
logit′att,i=∑kpP,klogitQ,k,i
wherein the content of the first and second substances,
Figure FDA0002636515350000051
is a matrix of parameters that is,
Figure FDA0002636515350000052
is the trainable embedded vector corresponding to the conclusion Q in the kth symbolized knowledge,
Figure FDA0002636515350000053
the ith word representing the input sequence is selected as the logarithm of the conclusion Q; then using the previously calculated pP,kAs weight pairs logitQ,k,iAre weighted and summed to obtain
Figure FDA0002636515350000054
It represents: after all symbolized knowledge is used and an attention mechanism is combined, the logarithms of the ith positions of the input are covered by the answers;
step 6.2: logit're,iObtained by the following process:
ti=∑kmknk,i
logit′re=tanh(Wtt+bt)
Figure FDA0002636515350000055
is that
Figure FDA0002636515350000056
The (i) th element of (a),
Figure FDA0002636515350000057
is a matrix of parameters that is,
Figure FDA0002636515350000058
is a vector of the offset to the offset,
Figure FDA0002636515350000059
is its ith element;
finally, the method comprises the following steps:
logit′i=sigmoid(logit′att,i+logit′re,i)
wherein sigmoid of outermost layer is for will'iLimited to the interval [0,1 ]]Of preventing logit'iToo large to have an excessive impact on the final answer prediction.
8. The method according to any one of claims 1 to 7, wherein: the specific steps of the step 7 are as follows: firstly, calculating logarithms logit 'of ith positions in the input sequence, which are answer starting and ending positions respectively's,i、logit′e,iThe specific calculation method is as follows:
logit′s,i=logits,i+αlogit′i
logit′e,i=logite,i+αlogit′i
wherein the content of the first and second substances,
Figure FDA00026365153500000510
is logit'iThe weight of (a) is a hyper-parameter;
the starting position of the answer in the input sequence is then predicted
Figure FDA00026365153500000511
And end position
Figure FDA00026365153500000512
Figure FDA00026365153500000513
s.t.i≤j
n+3≤i<n+3+m
n+3≤j<n+3+m
After obtaining the start and end positions, extracting the second position in the input sequence
Figure FDA0002636515350000061
To the first
Figure FDA0002636515350000062
Outputting the words as answer segments;
the model training process also needs to calculate the Loss function Loss3It consists of two parts: loss1、Loss2(ii) a Wherein Loss1The cross entropy of the prediction result and the sample labeling answer is formed, and the calculation mode is as follows:
p′s=softmax(logit′s)
p′e=softmax(logit′e)
Figure FDA0002636515350000063
wherein
Figure FDA0002636515350000064
The starting position and the ending position of the correct answer respectively labeled for the training sample in the input are positive integers which meet the requirement
Figure FDA0002636515350000065
If the sample is marked as being problematic, then the order is given
Figure FDA0002636515350000066
Loss2Is for logit'iThe constructed loss function is composed of two classification cross entropies: first of all, is constructed
Figure FDA0002636515350000067
Label when the ith position of the input sequence is "covered" by the sample-labeled answeri1 is ═ 1; otherwise labeli0; at this time, each position in the input sequence can be divided into two types according to whether the position is covered by the answer labeled by the sample, and the Loss is calculated by using the cross entropy of the two types2
Figure FDA0002636515350000068
The final sum of the two is used as the final Loss function Loss3And according to Loss3The back propagation is performed to update the model parameters involved in 2-7, as follows:
Loss3=Loss1+βLoss2
wherein
Figure FDA0002636515350000071
Is a hyper-parameter, is Loss2The weight of (c).
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a tokenized knowledge and neural network based question-answer model implementation of any one of the preceding claims 1-8.
10. A computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform a method of implementing a tokenized knowledge-and-neural network-based question-answer model according to any one of claims 1 to 8.
CN202010826838.1A 2020-08-17 2020-08-17 Method for implementing question-answer model based on symbolized knowledge and neural network Active CN112035629B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010826838.1A CN112035629B (en) 2020-08-17 2020-08-17 Method for implementing question-answer model based on symbolized knowledge and neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010826838.1A CN112035629B (en) 2020-08-17 2020-08-17 Method for implementing question-answer model based on symbolized knowledge and neural network

Publications (2)

Publication Number Publication Date
CN112035629A true CN112035629A (en) 2020-12-04
CN112035629B CN112035629B (en) 2023-02-17

Family

ID=73577377

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010826838.1A Active CN112035629B (en) 2020-08-17 2020-08-17 Method for implementing question-answer model based on symbolized knowledge and neural network

Country Status (1)

Country Link
CN (1) CN112035629B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408279A (en) * 2021-06-23 2021-09-17 平安科技(深圳)有限公司 Training method, device and equipment of sequence labeling model and storage medium
CN116842155A (en) * 2023-06-30 2023-10-03 北京百度网讯科技有限公司 Text generation method, training method and device of text generation model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020188586A1 (en) * 2001-03-01 2002-12-12 Veale Richard A. Multi-layered semiotic mechanism for answering natural language questions using document retrieval combined with information extraction
CN110083692A (en) * 2019-04-22 2019-08-02 齐鲁工业大学 A kind of the text interaction matching process and device of finance knowledge question
CN110647619A (en) * 2019-08-01 2020-01-03 中山大学 Common sense question-answering method based on question generation and convolutional neural network
CN110909864A (en) * 2019-10-22 2020-03-24 北京大学 Natural language task processing method and device combining regular expression and neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020188586A1 (en) * 2001-03-01 2002-12-12 Veale Richard A. Multi-layered semiotic mechanism for answering natural language questions using document retrieval combined with information extraction
CN110083692A (en) * 2019-04-22 2019-08-02 齐鲁工业大学 A kind of the text interaction matching process and device of finance knowledge question
CN110647619A (en) * 2019-08-01 2020-01-03 中山大学 Common sense question-answering method based on question generation and convolutional neural network
CN110909864A (en) * 2019-10-22 2020-03-24 北京大学 Natural language task processing method and device combining regular expression and neural network

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408279A (en) * 2021-06-23 2021-09-17 平安科技(深圳)有限公司 Training method, device and equipment of sequence labeling model and storage medium
CN113408279B (en) * 2021-06-23 2022-05-20 平安科技(深圳)有限公司 Training method, device and equipment of sequence labeling model and storage medium
CN116842155A (en) * 2023-06-30 2023-10-03 北京百度网讯科技有限公司 Text generation method, training method and device of text generation model

Also Published As

Publication number Publication date
CN112035629B (en) 2023-02-17

Similar Documents

Publication Publication Date Title
CN109992782B (en) Legal document named entity identification method and device and computer equipment
CN111611361B (en) Intelligent reading, understanding, question answering system of extraction type machine
CN109885672B (en) Question-answering type intelligent retrieval system and method for online education
US11501182B2 (en) Method and apparatus for generating model
CN109271505B (en) Question-answering system implementation method based on question-answer pairs
US10534863B2 (en) Systems and methods for automatic semantic token tagging
CN109871538A (en) A kind of Chinese electronic health record name entity recognition method
CN111460092B (en) Multi-document-based automatic complex problem solving method
CN112667794A (en) Intelligent question-answer matching method and system based on twin network BERT model
CN112800170A (en) Question matching method and device and question reply method and device
CN114943230B (en) Method for linking entities in Chinese specific field by fusing common sense knowledge
CN110390049B (en) Automatic answer generation method for software development questions
CN112328800A (en) System and method for automatically generating programming specification question answers
CN111966812A (en) Automatic question answering method based on dynamic word vector and storage medium
CN111476038A (en) Long text generation method and device, computer equipment and storage medium
CN115599899B (en) Intelligent question-answering method, system, equipment and medium based on aircraft knowledge graph
CN113196277A (en) System for retrieving natural language documents
CN112035629B (en) Method for implementing question-answer model based on symbolized knowledge and neural network
CN111552773A (en) Method and system for searching key sentence of question or not in reading and understanding task
US20210374276A1 (en) Smart document migration and entity detection
CN113282711A (en) Internet of vehicles text matching method and device, electronic equipment and storage medium
CN111581364B (en) Chinese intelligent question-answer short text similarity calculation method oriented to medical field
CN115905487A (en) Document question and answer method, system, electronic equipment and storage medium
CN117648429B (en) Question-answering method and system based on multi-mode self-adaptive search type enhanced large model
CN114356990A (en) Base named entity recognition system and method based on transfer learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant