CN112035629A

CN112035629A - Method for implementing question-answer model based on symbolized knowledge and neural network

Info

Publication number: CN112035629A
Application number: CN202010826838.1A
Authority: CN
Inventors: 何钺; 吴昊; 黄河燕
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2020-08-17
Filing date: 2020-08-17
Publication date: 2020-12-04
Anticipated expiration: 2040-08-17
Also published as: CN112035629B

Abstract

The invention relates to a method for realizing a question-answering model based on symbolic knowledge and a neural network, and belongs to the technical field of extraction type question-answering. Firstly, converting knowledge composed of natural languages into symbolic knowledge in a first-order logic mode, and then combining a regular expression to generate characteristics which can be identified by a neural network, so that information of the symbolic knowledge can be combined into the neural network; meanwhile, aiming at the problem that the generalization of the symbolized knowledge is reduced only by using the regular expression, a method based on an attention mechanism is provided, and the generalization of the symbolized knowledge in the question and answer process can be improved by utilizing the associated information between the symbolized knowledge and the input text. Compared with the prior art, the method combines the advantages of the rule-based question-answer model and the deep learning-based question-answer model, so that the reasoning process of the model is more explanatory, and the robustness and the accuracy of the question-answer model are improved.

Description

Method for implementing question-answer model based on symbolized knowledge and neural network

Technical Field

The invention relates to a method for realizing a question-answer model, in particular to a method for realizing a simple extraction type question-answer model by combining symbolic knowledge in a traditional question-answer model based on a neural network, belonging to the technical field of deep learning question-answer.

Background

Before the internet era, people were accustomed to obtaining information that they wanted to know by consulting documents, using search engines, and the like. The document review usually needs to spend a lot of time to retrieve useful knowledge, and although the information retrieval time can be greatly reduced by using a search engine, the search results returned by the search engine are often of uneven quality, the content is redundant, and people still need to spend time to retrieve the truly useful information. It would be desirable to have a question-answering system that can answer questions posed by a user in natural language in accurate and concise natural language.

Based on this need, question-answering systems have become a research hotspot in the industry and academia. A typical question-answering system has a workflow: the system carries out information retrieval and processing such as sorting on retrieved information according to questions input by a user, and the core question-answering model answers the questions according to the information input by the user and processed.

From the point of view of answering questions, there are two types of common question-answering systems: one is an extraction type question-answering system, i.e. after the user puts forward the question, the system will search the relevant document in the database, finally extract a text fragment in the document as the final answer; the other is a generating question-answering system, which generates answer text as the final answer after referring to the relevant document. The extraction type question-answering system is easier to construct than the generation type question-answering system, so the extraction type question-answering system is more widely applied.

From the structural aspect, common question-answering systems include a rule-based question-answering system, a neural network-based question-answering system, and other question-answering systems: the question-answering system based on the rules needs to consume a large amount of labor cost to construct the rules and the templates, can only be suitable for a certain vertical field, but has strong interpretability and strong logic in the reasoning process; the question-answering system based on the neural network depends on training data, the reasoning process is black-boxed, the interpretability is poor, and the generalization is strong; other question-answering systems are e.g. knowledge-graph based, knowledge-base based, etc.

Currently, in the field of natural language processing, a question-answering system based on a neural network is more widely applied than a question-answering system based on rules. Rule-based question-answering systems are gradually being replaced due to their poor generalization and high construction costs. However, since the question-answering system based on the rule can better compensate the defects of poor interpretability and weak inference logic of the system based on the neural network, in recent years, researchers have started to research how to combine the rule and the neural network to construct the question-answering system.

A complete question-answering system is complex in structure and relates to steps of information retrieval, document relevance ranking and the like, and the patent only relates to relevant rights of a core question-answering model in the question-answering system. The existing framework of the extraction type question-answering model is as follows:

the existing user proposes a question q in a natural language form, and the question answering system retrieves a related context c recorded in the natural language form according to the question q, wherein the context c may contain a text segment required for answering the question q. Question q and context c are both a non-empty string. The goals of the question-answer model are: the question q and the context c are received as input, and whether the context c contains a segment required for answering the question q is judged. If yes, extracting a text segment capable of answering the question q from the c and using the text segment as an answer a^*Returning; otherwise, order the answer a^*Is an empty string and returns. The model is modeled using a formula as follows:

where c (c) represents all non-empty substrings of string c, a non-empty substring of a string refers to a subsequence of non-zero consecutive characters in the string. continain (q, c) is a function used for judging whether the document c contains a segment required for answering the question q, and if yes, returning a true value true; otherwise a false value false is returned.

Disclosure of Invention

The invention aims to solve the problems partially or completely, is improved on the basis of a BERT model, and provides a more efficient and robust method for realizing the question-answering model compared with the traditional question-answering model only based on the BERT.

The object of the present invention is achieved by the following technical means.

The principle of the method is that firstly, knowledge composed of natural language is converted into symbolized knowledge in a first-order logic mode, and then characteristics which can be identified by a neural network are generated by combining a regular expression, so that information of the symbolized knowledge can be combined into the neural network; meanwhile, aiming at the problem that the generalization of the symbolized knowledge is reduced only by using the regular expression, a method based on an attention mechanism is provided, and the generalization of the symbolized knowledge in the question and answer process can be improved by utilizing the associated information between the symbolized knowledge and the input text.

The invention provides a method for realizing a question-answer model based on symbolic knowledge and a neural network, which comprises the following contents:

1: constructing a symbolized knowledge base;

2: constructing an input sequence input of a problem q-context c, transmitting the input sequence input into a BERT model to obtain a coded vector sequence H,

wherein hidden _ size is a hyper-parameter of the model;

3: h is to be_iTransmitting a full connection layer, and respectively obtaining a log value logit of each position in the input as an answer starting position and an answer ending position, wherein

Representing the ith row vector of the H, i is more than or equal to 1 and less than or equal to | input |, and let logit_s,iIndicating that the ith position of said input is the answer start positionlogit，logit_e,iIndicating that the ith position of the input is a location of an answer end position;

4: judging whether the question q can be solved through the context c according to the logit, if the question q cannot be solved, returning null as a result, and calculating a Loss function Loss according to a labeled answer corresponding to a training sample q-c in the model training process₀Reversely updating the 2-4 neural network parameters, and then ending the process; otherwise, continuing the step 5;

5: respectively matching q and c by using the kth knowledge in the symbolized knowledge base to generate characteristic information m of the symbolized knowledge_k、n_kK is a natural number, k is more than or equal to 1 and less than or equal to z, and z is the number of knowledge in the symbolized knowledge base;

6: calculating logarithms logic 'of symbolized knowledge to judge whether a certain position of an input sequence is covered by an answer or not by using feature information of attention mechanism and symbolized knowledge'_i；

7: according to logit and logit'_iPredicting and outputting the answer, and calculating a Loss function Loss according to the labeled answer and the predicted answer corresponding to the training sample q-c in the model training process₃And updates the 2-7 neural network parameters in reverse.

Preferably, the construction process of 1 is as follows:

first, various knowledge related to the problem is collected and constructed, such as:

knowledge 1: if the question is question length, then the answer should be in the form of a number + units of length

Knowledge 2: if the question is asking time, then the answer should be in the form of month + date

……

These gathered and structured knowledge are in natural language form in the form of "if the question is … …, then the answer should be in the form of … …". It is then symbolized using first order logic. That is, for a piece of knowledge, it can be symbolized as: p → Q, where the condition P is "question is … …", and the conclusion Q is "answer should be in the form of … …", P, Q has different meanings depending on the specific knowledge. Then the knowledge symbolIn the transformed representation P → Q, two regular expressions RE are constructed_p、RE_QCorresponding to condition P and conclusion Q, respectively. As knowledge 1, condition P is "question is question length" and conclusion Q is "answer should be in the form of number + length units". Since the question of the query length usually starts with "How long", let RE_pThe question sentence is started by the "How long" and is used for judging whether the question sentence starts by the "How long"; for a segment in the form of a number + length unit, RE can be made to be_Q＝"[1-9][0-9](killometers.

After all knowledge is subjected to the operations, the knowledge is organized into a symbolized knowledge base. The symbolic knowledge base contains a plurality of pieces of knowledge, and each piece of knowledge is represented by a natural language form, a symbolic form and two corresponding regular expressions RE_p、RE_QAnd (4) forming.

Preferably, the specific steps of 2 are as follows: preprocessing the input question q and the context c by word segmentation and the like, and splicing into an input sequence input ═ 2 [, ]<CLS>,q₁,…,q_i,…,q_n,<SEP>,c₁,c₂,…,c_k…,c_m,<SEP>,<PAD>,…,<PAD>]Where n is the length of the problem, m is the length of the context, | input | ═ max _ seq _ length, | input | represents the length of the input sequence, max _ seq _ length is a hyper-parameter of the model, and is a positive integer.<CLS>Indicating the beginning of a sentence, q_iRepresents the ith word in the question q, 1 ≦ i ≦ n,<SEP>denotes a separation mark, c_kRepresents the kth word in context c, 1. ltoreq. k. ltoreq.m,<PAD>a padding flag is indicated for padding the sequence so that | input | ═ max _ seq _ length.

The input sequence is transmitted into a BERT model, so that a coded vector sequence H can be obtained at the output layer of the BERT model,

where hidden _ size is a hyper-parameter of the model.

Preferably, the specific steps of 3 are as follows: h is to be_iA full connection layer is introducedAnd respectively obtaining the logit (logit) value of each position in the input as the start position and the end position of the answer. Wherein

The ith row vector representing H is processed by a BERT model, and H is_iWhich contains the semantic information of the ith word in the input sequence input. The logarithms can be regarded as probabilities that have not been normalized, and the larger the value, the higher the probability. The formula is expressed as:

logit_s,i＝W_sh_i+b_s

logit_e,i＝W_eh_i+b_e

wherein

In order to be a weight parameter, the weight parameter,

is a bias term; logit_s,iIs that

The ith element of (1), which is a real number, indicates that the ith position of input is the value of logit of the answer start position; logit_e,iIs that

The ith element of (2), which is also a real number, indicates that the ith position of input is the logarithm of the answer end position.

Preferably, the specific steps of 4 are as follows: the model first assumes that the question q can be solved, with the answer at the start position pos in input_sAnd end position pos_eThe calculation method is as follows:

s.t.i≤j

n+3≤i<n+3+m

n+3≤j<n+3+m

where n is the length of the question q and m is the length of the context c. The constraint is to ensure that the answer start position precedes the end position, and that the answer start/end positions are all within the context c of the input sequence input.

The model then calculates the value of logit that the problem can be solved for_answerableAnd a logit value logit that the question cannot be solved_unanswerableThe calculation method is as follows:

logit_unanswerable＝logit_s,1+logit_e,1

preferably, the model will enter the first word of the sequence input (i.e., the model will enter the first word of the sequence input<CLS>) Location of location_s,1And logit_e,1Is regarded as the sum of_unanswerable。

Respectively represent logit_s、logit_eTo (1) a

First, the

And (4) each element.

After calculating logit_answerableAnd logit_unanswerableThereafter, the next step is decided by judging the magnitude relation therebetween. If logit_unanswerable>logit_answerableIf the model judges that the question can not be answered, the answer is null, the answer is returned, and the Loss function Loss is calculated₀Updating the model parameters (including parameters in the BERT model in 2) involved in 2-4 through back propagation until the processing flow of the current sample is finished; otherwise, the model judgment problem can be solved without calculating Loss function Loss₀Update model parameters and return answers, should go to 5.

When logit is_unanswerable>logit_answerableLoss function Loss used by model₀The cross entropy is calculated as follows:

p_s＝softmax(logit_s)

p_e＝softmax(logit_e)

wherein the content of the first and second substances,

the starting position and the ending position of the correct answer respectively labeled for the training sample in the input are positive integers which meet the requirement

If the sample is marked as being problematic, then the order is given

Preferably, the specific steps of 5 are as follows: for the k-th symbolized knowledge, the RE corresponding to the k-th symbolized knowledge is used_p、RE_QMatching the question q and the context c respectively and generating a feature m_k、n_k。

Wherein m is_kRE representing the kth symbolized knowledge_pWhether there is a match problem q, if there is a match m_k1 is ═ 1; otherwise there is m_k＝0。

Obtaining RE_QAll matching segments in the context part of an input, n if the ith position of the input is "covered" by a matching segment_kThe ith element n of_k,i1, otherwise n_k,i0. Let p, p' be a matching text segment in inputIf p is less than or equal to i is less than or equal to p', the ith position of input is called to be covered by the text segment.

Preferably, the specific steps of 6 are as follows: through logit'_att,i、logit′_re,iTo calculate logit'_i. Wherein

Representing the logarithms of the i-th position of the input sequence considered by the symbolized knowledge to be "covered" by the answer text segment.

Calculated from the attention mechanism (step 6.1),

from m_k、n_kCalculated (step 6.2);

step 6.1: logit'_att,iObtained by the following process: first calculate p_P,k：

logit_P,k＝w_Ps_P,k+b_P

Is a matrix of parameters that is,

is trainable embedding corresponding to condition P in kth symbolized knowledgeThe vector of the vector is then calculated,

the attention score of the condition P representing the kth tokenized knowledge on the ith word of the input sequence,

which stores the information related to the kth tokenization knowledge gathered in the input sequence input in combination with the attention score.

Trainable weight vectors and bias terms, respectively;

a logarithm of the problem q representing the condition P in the kth piece of symbolized knowledge;

indicates the probability that the condition P holds at the question q in the kth piece of symbolized knowledge.

The assumptions of the above calculations are: mapping the condition P in the kth symbolized knowledge to an embedded vector v_P,kThen using the vector and the input sequence to perform attention calculation operation to obtain an attention score att_P,k,iThen, the attention score is taken as the weight pair h_iWeighted summation is carried out to obtain s_P,k. This process can be viewed as gathering and storing information about words related to the kth tokenization knowledge in the input sequence input at s_P,kIn (1). Then to s_P,kLinear transformation to obtain logit_P,kThen to logit_P,kIs calculated to obtain p_P,k，p_P,kIndicating the probability that the current problem q is solved by applying the kth symbolized knowledge.

After calculating the probability that the condition P holds, the logarithms logit of the conclusion Q are calculated_Q,k,iAnd logit'_att,i. The calculation method is as follows:

wherein the content of the first and second substances,

is a matrix of parameters that is,

is the trainable embedded vector corresponding to the conclusion Q in the kth symbolized knowledge,

the ith word representing the input sequence is selected as the logarithms of the conclusion Q. Then using the previously calculated p_P,kAs weight pairs logit_Q,k,iAre weighted and summed to obtain

It represents: after using the full symbolization knowledge in combination with the attention mechanism, the logarithms of the i-th positions of the input sequence are "covered" by the answers.

The assumptions of the above calculations are: mapping the conclusion Q in the kth symbolized knowledge to an embedded vector v_Q,kThen using the vector and the input sequence to do the similar attention mechanism operation to calculate the logit_Q,k,iConclusion Q, which represents the kth tokenization knowledge, considers the logarithm of the ith position of the input sequence "covered" by the answer, and then uses p_P,kLogic of all symbolized knowledge_Q,k,iWeighted summation is carried out to obtain logit'_att,i。

Step 6.2: logit'_re,iObtained by the following process:

logit′_re＝tanh(W_tt+b_t)

is that

The (i) th element of (a),

is a matrix of parameters that is,

is a bias vector.

Is its ith element.

Finally, the method comprises the following steps:

logit′_i＝sigmoid(logit′_att,i+logit′_re,i)

wherein sigmoid of outermost layer is for will'_iLimited to the interval [0,1 ]]Of preventing logit'_iToo large to have an excessive impact on the final answer prediction.

Step 6.1 in step 6 is indispensable, since logit 'is calculated if step 6.2 is relied upon only'_iThe calculation of step 6.2 is a feature m generated by a highly dependent regular expression_kAnd n_kRegular expression generalization is poor, so step 6.1 is needed as a complement to improve generalization. Step 6.1 represents the condition P and the conclusion Q of each symbolic knowledge respectively by using a trainable vector, and calculates logit 'by combining an attention mechanism'_att,iThe information provided by the symbolized knowledge can be utilized, and the defect of poor generalization caused by the regular expression can be avoided.

Preferably, the specific steps of 7 are as follows: firstly, calculating logarithms logit 'of ith positions in the input sequence, which are answer starting and ending positions respectively'_s,i、logit′_e,iThe specific calculation method is as follows:

logit′_s,i＝logit_s,i+αlogit′_i

logit′_e,i＝logit_e,i+αlogit′_i

wherein, logit_s,i、logit_e,iThe value of (c) has been calculated in step 3,

the weight of (2) is a hyper-parameter. The starting position of the answer in the input sequence is then predicted

And end position

s.t.i≤j

n+3≤i<n+3+m

n+3≤j<n+3+m

After obtaining the start and end positions, extracting the second position in the input sequence

To the first

Outputting the words as answer segments;

the model training process also needs to calculate the Loss function Loss₃It consists of two parts: loss₁、Loss₂. Wherein Loss₁The cross entropy of the prediction result and the sample labeling answer is formed, and the calculation mode is as follows:

p′_s＝softmax(logit′_s)

p′_e＝softmax(logit′_e)

wherein

If the sample is marked as being problematic, then the order is given

Loss₂Is for logit'_iThe constructed loss function is formed by two-class cross entropy. First of all, is constructed

Label when the ith position of the input sequence is "covered" by the sample-labeled answer_i1 is ═ 1; otherwise label_i0. At this time, each position in the input sequence can be divided into two types according to whether the position is covered by the answer labeled by the sample, and the Loss is calculated by using the cross entropy of the two types₂：

The final sum of the two is used as the final Loss function Loss₃And according to Loss₃Back propagation to update the models involved in 2-7Parameters (including parameters in the BERT model in 2). Comprises the following steps:

Loss₃＝Loss₁+βLoss₂

wherein

Is a hyper-parameter, is Loss₂The weight of (c).

In another aspect, the present invention further provides an electronic device, including:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of implementing a tokenized knowledge and neural network based question-answer model as described above.

In another aspect, the present invention also provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the method for implementing a tokenized knowledge and neural network-based question-answer model as described above.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the aforementioned method for implementing a tokenized knowledge-and-neural network-based question-answer model.

Has the advantages that:

compared with the prior art, the invention has the following characteristics:

a combination mode of symbolized knowledge and a neural network is designed, so that the model can depend on the neural network and also can depend on the symbolized knowledge constructed manually when predicting answers, the reasoning process of the model is more explanatory, and meanwhile, the robustness and the accuracy of the question-answering model are improved.

Drawings

FIG. 1 is a flowchart illustrating a general extraction-type question answering system;

FIG. 2 is a schematic diagram of the main workflow within the model of the present invention (assuming that the problem can be solved);

FIG. 3 is a flow chart of a process embodying (training) the present invention;

fig. 4 is a flow chart of a specific use (prediction) process of the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and examples.

FIG. 1 shows the workflow of a general extraction-type question-answering system; FIG. 2 illustrates (under conditions where the problem can be solved) the main workflow within the model of the present invention; FIG. 3 is a flow chart of a specific implementation (training) of a question-answering model based on symbolic knowledge and neural networks; fig. 4 is a specific use (prediction) flow of a question-answering model based on symbolic knowledge and neural network.

Example 1:

there is a training sample that contains the problem:

q＝"How long is the Yellow River？"

context:

c＝"The Yellow River,a large river in northern China,has a total

length of about 5464kilometers and a watershed area of about

752443square kilometers."

and correct answer to sample annotation: 5464 killometers "

The above example illustrates a method for implementing (training) a question-answering model based on symbolic knowledge and a neural network, which includes the following steps:

1: a symbolized knowledge base is constructed, and two pieces of knowledge are assumed to exist in the knowledge base:

knowledge 1, regular expression RE_PIs to check if the entered question text starts with "How long", RE_QThe method is to extract a text segment in a context text, wherein the form of the text segment conforms to the form of 'number + length unit', such as: "1 kilometers", "3 kilometers", "203 km"; knowledge 2, regular expression RE_PIs to check whether the entered question text starts with "When", RE_QThe method is to extract a text segment in a context text, wherein the form of the text segment conforms to the form of 'month + date', such as: "January 13", "Feb.29", "July.2".

2: and (4) constructing an input sequence of the problem-context, and transmitting the input sequence into a BERT model to obtain a coded vector sequence H.

Pre-processing the input question q and the context c by word segmentation and the like, and splicing into an input sequence [ < CLS >, how, long, …? < SEP >, the, yellow, …, killometers, < SEP >, < PAD >, …, < PAD > ]. The input sequence was introduced into the BERT model (Jacob Depressin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep biological transformations for Language understanding. InProcedents of the 2019Conference of the North American chapters of the Association for comparative Linear Technologies: Human Language Technologies, Volume 1(Long and Short Papers), pp.4171-4186, Minneapolis, Minnesota, June 2019.Association for comparative Linear logicsistances.doi: 10.18653/v 42/N19-1423. URL: https/36/anti-1422. good vector codes were obtained.

3: h is to be_iTransmitting a full connection layer, calculating according to the formula in the step 3, and respectively obtaining the logarithm logit of each position in the input as the initial position and the end position of the answer_s,iAnd logit_e,i。

4: by calculating logit_unanswerableAnd logit_answerableTo determine whether the question q can be answered by the context c. If not, thenMake the answer null, return the answer, calculate the Loss function Loss₀Updating the model parameters (including parameters in the BERT model) involved in the 2-4 by back propagation until the processing flow of the current sample is finished; otherwise, the model judgment problem can be solved without calculating Loss function Loss₀Update model parameters and return answers, should go to 5.

5: generating characteristic information m of symbolized knowledge_k、n_k。

For symbolized knowledge 1, RE_pThe matching problem q is successful, so m₁＝1；RE_QThe segment of the matching content c is "5464 killometers", and the segments thereof correspond to the starting position and the ending position in the input of 23 and 24, respectively, so that n₁＝[0,0,…,0,1,1,0,…0]Wherein 23, 24 th element n_1,23、n_1,24Is 1, the rest is 0. Similarly for the symbolized knowledge 2, m is available₂＝0，n₂＝[0,0,…,0,0]I.e. n₂All elements in (1) are 0.

6: using the feature information of attention machine mechanism and tokenized knowledge, logarithms 'of tokenized knowledge to determine whether a position of an input sequence is covered by the answer "are calculated using the method of claim 6'_i。

7: predicting answer position, outputting answer, calculating Loss function Loss according to method in invention content 7₃And updates the model parameters involved in 2-7 (including the parameters in the BERT model in 2). So far, the processing flow of the current sample is finished.

8: and (3) training the model by continuously using new samples and repeating the invention contents 2-7 to finally obtain a well-trained question-answer model based on symbolic knowledge and a neural network.

Example 2:

the problems of the existing user input are as follows:

q＝"How long is the Yellow River？"

and the question-answering system retrieves the context associated with the question according to the question:

c＝"The Yellow River,a large river in northern China,has a total

length of about 5464kilometers and a watershed area of about

752443square kilometers."

the above example illustrates a method for using a question-and-answer model based on symbolic knowledge and a neural network, which includes the following steps:

1: and (3) constructing an input sequence input of the problem-context, and transmitting the input sequence input into a BERT model to obtain a coded vector sequence H.

Pre-processing the input question q and the context c by word segmentation and the like, and splicing into an input sequence [ < CLS >, how, long, …? < SEP >, the, yellow, …, killometers, < SEP >, < PAD >, …, < PAD > ]. And (4) after the input sequence is transmitted into a BERT model, obtaining a coded vector sequence H.

2: h is to be_iTransmitting the data into a full connection layer, calculating according to a formula in the invention content 3, and respectively obtaining the logarithm logit of each position in the input as an answer starting position and an answer ending position_s,iAnd logit_e,i。

3: by calculating logit_unanswerableAnd logit_answerableTo determine whether the question q can be answered by the context c. If the answer is not answered, the answer is enabled to be empty and returned, and the processing flow is ended; otherwise, the model judges that the question can be solved without returning an answer, and turns to 4.

4: generating characteristic information m of symbolized knowledge_k、n_k。

Referring to example 1, RE for knowledge 1 in the tokenized knowledge base constructed during the training phase_pThe matching problem q is successful, so m₁＝1；RE_QThe segment of the matching content c is "5464 killometers", and the segments thereof correspond to the starting position and the ending position in the input of 23 and 24, respectively, so that n₁＝[0,0,…,0,1,1,0,…0]Wherein 23, 24 th element n_1,23、n_1,24Is 1, the rest is 0. Similarly for the symbolized knowledge 2, m is available₂＝0，n₂＝[0,0,…,0,0]I.e. n₂All elements in (1) are 0.

5: using the feature information of attention machine mechanism and tokenized knowledge, logarithms 'of tokenized knowledge to determine whether a position of an input sequence is covered by the answer "are calculated using the method of claim 6'_i。

6: the answer position is predicted according to the method in the invention section 7, and the corresponding text segment is extracted from the input as the answer according to the answer position and the answer is output.

Performance comparison experiment of question-answer model and reference model based on symbolized knowledge and neural network

Evaluation indexes are as follows:

the invention is evaluated based on symbolic knowledge and a question-and-answer model of a neural network. Two indexes exist to evaluate the accuracy of the question-answer model responses:

exact match (hereinafter abbreviated as EM):

the EM calculation is as follows:

let a test sample contain the question q, the context c containing the answer to the question and the correct answer a to the question, the existing model F, the model prediction result is a'. Then there are:

a′＝F(q,c)

f1-score (hereinafter referred to as F1):

f1 is typically used as an indicator of the performance of the model in a two-class question, and each word in the model prediction results can be divided into two classes according to whether it is "covered" by the answer. Since the calculation method of F1 is more complex, before the calculation method of F1 is introduced, the concept is introduced first:

TP (true Positive): the positive example samples are predicted as the number of positive examples samples.

FP (false Positive): negative samples are predicted as the number of positive samples.

Fn (false negative): the positive samples are predicted as the number of negative samples.

Precision/precision (precision): the calculation method refers to the proportion of the positive samples in the samples judged as positive examples by the classifier, and comprises the following steps:

recall/recall (recall): the proportion of the samples judged as positive examples by the classifier in the positive example samples is calculated as follows:

final calculation F1:

data set:

the present invention was trained and evaluated separately using the training and validation sets in the Stanford question-answering dataset SQuAD v 2.0. The SQuAD v2.0 adds fifty thousand pairs of non-returnable questions on the basis of the SQuAD v1.1, and about fifteen thousand pairs of question and answer samples are obtained. This data set, while adding only unanswerable questions to the SQuAD v1.1, greatly increases the challenges of this data set. A model with F1 reaching 86% on SQuAD v1.1 could only reach 66% on SQuAD v 2.0.

A reference model:

in this experiment, a native BERT model (BERT-Base, unorased) published by Google on gitub was used as a reference model, and source codes of BERT model matching the squid v2.0 task were also published on gitub. The process of predicting the answer is highly similar to steps 2-4 above, except that it is done by comparing logit in step 4_answerableAnd logit_unanswerableWhen the answer is found to be answered (i.e. logit)_unanswerable≤logit_answerable) Will return the position of answer directly

And pass through the Loss function Loss₀And updating the neural network parameters.

And (3) hyper-parameter:

in the present experiment, in addition to the common hyperparameters, such as the batch _ size and the learning rate, the hyperparameters carried by the models, such as the maximum length max _ seq _ length, α and β of the input sequence, are also referred to. Preferably, the following values of the hyper-parameter are set.

TABLE 1 values of the hyper-parameters and description thereof

The experimental results are as follows:

in the experiment, the learning rate learning _ rate, the number of rounds of training epochs and the batch size batch _ size were selected as hyper-parameters, and the reference model and the model of the present invention were tested several times.

TABLE 2 Performance comparison test

And (3) performance comparison analysis:

in the above experiments, the baseline model gave the best performance in Trials 10, and the model of the present invention gave the best performance in Trials 12. Compared with a reference model, the absolute improvement of the F1-score is 1.06%, and the EM absolute improvement is 0.96%.

The experimental data is observed, the performance of the invention is better than the optimal performance of the reference model in Trials 6, 9 and 12, and the invention considers that the invention is improved on the BERT model, the BERT model is pre-trained, and the newly added network parameters on the BERT model are not trained, so the model in the invention needs higher learning rate and longer training time than the reference model. Under the conditions of correct learning rate and sufficient training time, the method can obtain higher performance than a reference model.

In conclusion, the question-answer model performance based on symbolic knowledge and the neural network provided by the invention is superior to that of the comparison model, and the symbolic knowledge is integrated into the neural network, so that the performance of the question-answer model can be effectively improved, and the effectiveness of the question-answer model is proved.

at least one processor; and the number of the first and second groups,

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not constitute a limitation on the element itself.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof.

The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A realization method of a question-answer model based on symbolized knowledge and a neural network is characterized in that: the method comprises the following steps:

1: constructing a symbolized knowledge base;

wherein hidden _ size is a hyper-parameter of the model;

Is the ith row vector of the H, i is more than or equal to 1 and less than or equal to | input |, let logit_s,iThe ith position representing the input is the location of the answer start position, location_e,iIndicating that the ith position of the input is a location of an answer end position;

4: judging whether the q can be solved through the c according to the logit, if the q can not be solved, returning null as a result, and calculating a Loss function Loss according to a labeled answer corresponding to a training sample q-c in the model training process₀Reversely updating the 2-4 neural network parameters, and then ending the process; otherwise, continuing the step 5;

5: respectively matching the q and the c by using the kth knowledge in the symbolized knowledge base to generate the characteristic information m of the symbolized knowledge_k、n_kK is a natural number, k is more than or equal to 1 and less than or equal to z, and z is the number of knowledge in the symbolized knowledge base;

6: calculating logarithms logic 'of symbolic knowledge to judge whether a certain position of the input sequence is covered by the answer or not by using attention mechanism and characteristic information of the symbolic knowledge'_i；

7: according to the logit and the logit'_iPredicting and outputting the answer, and calculating a Loss function Loss according to the labeled answer and the predicted answer corresponding to the training sample q-c in the model training process₃And updates the 2-7 neural network parameters in reverse.

2. The method of claim 1, wherein: each piece of knowledge in the tokenized knowledge base is represented by a natural language form, a tokenized form and two corresponding regular expressions RE_p、RE_QIs formed of (i) wherein RE_pTo representRegular expression of conditions, RE_QRegular expressions representing conclusions.

3. The method of claim 1, wherein: the specific steps of the step 2 are as follows: the input question q and the input context c are participled and spliced into an input sequence input ═ c<CLS>,q₁,…,q_i,…,q_n,<SEP>,c₁,c₂,…,c_j…,c_m,<SEP>,<PAD>,…,<PAD>]Where n is the length of the question q, m is the length of the context c, | input | ═ max _ seq _ length, | input | denotes the length of the input sequence, max _ seq _ length is a hyper-parameter of the model, being a positive integer,<CLS>indicating the beginning of a sentence, q_iRepresents the ith word in the question q, 1 ≦ i ≦ n,<SEP>denotes a separation mark, c_jRepresents the jth word in context c, 1 ≦ j ≦ m,<PAD>a padding flag is indicated for padding the sequence so that | input | ═ max _ seq _ length.

where hidden _ size is a hyper-parameter of the model.

4. The method of claim 1, wherein: the specific steps of the step 3 are as follows: calculating the ith word in the input, i.e. the h, by_iAs the log values logit of the answer start position and end position:

logit_s,i＝W_sh_i+b_s

logit_e,i＝W_eh_i+b_e

wherein

In order to be a weight parameter, the weight parameter,

is a bias term; logit_s,iIs that

The ith element of (1), which is a real number, indicates that the ith position of the input is a logit value of the answer start position; logit_e,iIs that

The ith element of (2), which is also a real number, indicates that the ith position of the input is the logarithm of the answer end position.

5. The method of claim 1, wherein: the specific steps of the step 4 are as follows: the model first assumes that the question q can be solved, with the answer at the start position pos in input_sAnd end position pos_eThe calculation method is as follows:

s.t.i≤j

n+3≤i<n+3+m

n+3≤j<n+3+m

where n is the length of the question q and m is the length of the context c;

then, a logit value logit is calculated by which the problem can be solved_answerableAnd a logit value logit that the question cannot be solved_unanswerableThe calculation method is as follows:

logit_unanswerable＝logit_s,1+logit_e,1

wherein the content of the first and second substances,

respectively represent logit_s、logit_eTo (1) a

A value of an element;

if logit_unanswerable>logit_answerableIf the model judges that the question can not be answered, the answer is null, the answer is returned, and the Loss function Loss needs to be calculated in the model training process₀And back-propagating the model parameters involved in updates 2-4; otherwise, the model judges that the problem can be solved, and 5 is turned;

the Loss₀The cross entropy is calculated as follows:

p_s＝softmax(logit_s)

p_e＝softmax(logit_e)

wherein the content of the first and second substances,

If the sample is marked as being problematic, then the order is given

6. The method of claim 1, wherein: the specific steps of the step 5 are as follows: symbolization for the k-th barKnowledge, regular expression RE using this piece of symbolized knowledge_p、RE_QMatching the question q and the context c respectively and generating a feature m_k、n_k；RE_pIs a conditional regular expression, RE_QIs a regular expression of conclusions; k is more than or equal to 1 and less than or equal to z, and z is the number of knowledge in the symbolized knowledge base;

wherein m is_kRE representing the kth symbolized knowledge_pWhether there is a match problem q, if so, m_k1 is ═ 1; otherwise m_k＝0；

Obtaining RE_QAll matching segments in the upper and lower parts of the input, if the ith position of the input is "covered" by a matching segment, then n_kThe ith element n of_k,i1, otherwise n_k,i0; let p and p 'be the starting and ending positions of a matched text segment in the input, and if p ≦ i ≦ p', it is said that the ith position of the input is "covered" by the text segment.

7. The method of claim 1, wherein: the concrete steps of the step 6 are as follows: through logit'_att,i、logit′_re,iTo calculate logit'_i(ii) a Wherein

Representing the logarithm of the ith position of the input considered to be covered by the answer text segment by symbolized knowledge;

as calculated from the attention mechanism, the attention-oriented mechanism,

from said m_k、n_kCalculating to obtain;

step 6.1: logit'_att,iObtained by the following process:

first calculate p_P,k：

s_P,k＝∑_iatt_P,k,ih_i

logit_P,k＝w_Ps_P,k+b_P

Wherein the content of the first and second substances,

is a matrix of parameters that is,

is the representation of the symbolized form of the k-th knowledge in the symbolized knowledge base, namely the trainable embedded vector corresponding to the condition P in the k-th symbolized knowledge,

represents the attention score of the condition P on the ith word in the input,

the information which is collected in the input and is related to the kth symbolized knowledge after the attention score is combined is stored in the input;

trainable weight vectors and bias terms, respectively;

representing the probability that the condition P is satisfied in the k-th symbolized knowledge on the question q;

second, the logit of conclusion Q is calculated_Q,k,iAnd logit'_att,iThe calculation method is as follows:

logit′_att,i＝∑_kp_P,klogit_Q,k,i

wherein the content of the first and second substances,

is a matrix of parameters that is,

the ith word representing the input sequence is selected as the logarithm of the conclusion Q; then using the previously calculated p_P,kAs weight pairs logit_Q,k,iAre weighted and summed to obtain

It represents: after all symbolized knowledge is used and an attention mechanism is combined, the logarithms of the ith positions of the input are covered by the answers;

step 6.2: logit'_re,iObtained by the following process:

t_i＝∑_km_kn_k,i

logit′_re＝tanh(W_tt+b_t)

is that

The (i) th element of (a),

is a matrix of parameters that is,

is a vector of the offset to the offset,

is its ith element;

finally, the method comprises the following steps:

logit′_i＝sigmoid(logit′_att,i+logit′_re,i)

8. The method according to any one of claims 1 to 7, wherein: the specific steps of the step 7 are as follows: firstly, calculating logarithms logit 'of ith positions in the input sequence, which are answer starting and ending positions respectively'_s,i、logit′_e,iThe specific calculation method is as follows:

logit′_s,i＝logit_s,i+αlogit′_i

logit′_e,i＝logit_e,i+αlogit′_i

wherein the content of the first and second substances,

is logit'_iThe weight of (a) is a hyper-parameter;

the starting position of the answer in the input sequence is then predicted

And end position

s.t.i≤j

n+3≤i<n+3+m

n+3≤j<n+3+m

To the first

Outputting the words as answer segments;

the model training process also needs to calculate the Loss function Loss₃It consists of two parts: loss₁、Loss₂(ii) a Wherein Loss₁The cross entropy of the prediction result and the sample labeling answer is formed, and the calculation mode is as follows:

p′_s＝softmax(logit′_s)

p′_e＝softmax(logit′_e)

wherein

If the sample is marked as being problematic, then the order is given

Loss₂Is for logit'_iThe constructed loss function is composed of two classification cross entropies: first of all, is constructed

Label when the ith position of the input sequence is "covered" by the sample-labeled answer_i1 is ═ 1; otherwise label_i0; at this time, each position in the input sequence can be divided into two types according to whether the position is covered by the answer labeled by the sample, and the Loss is calculated by using the cross entropy of the two types₂：

The final sum of the two is used as the final Loss function Loss₃And according to Loss₃The back propagation is performed to update the model parameters involved in 2-7, as follows:

Loss₃＝Loss₁+βLoss₂

wherein

Is a hyper-parameter, is Loss₂The weight of (c).

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a tokenized knowledge and neural network based question-answer model implementation of any one of the preceding claims 1-8.

10. A computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform a method of implementing a tokenized knowledge-and-neural network-based question-answer model according to any one of claims 1 to 8.