CN113641809A

CN113641809A - XLNET-BiGRU-CRF-based intelligent question answering method

Info

Publication number: CN113641809A
Application number: CN202110913182.1A
Authority: CN
Inventors: 刘大伟; 胡笳; 车少帅; 张邱鸣; 张玮
Original assignee: Clp Hongxin Information Technology Co ltd
Current assignee: Clp Hongxin Information Technology Co ltd
Priority date: 2021-08-10
Filing date: 2021-08-10
Publication date: 2021-11-12
Anticipated expiration: 2041-08-10
Also published as: CN113641809B

Abstract

The invention discloses an intelligent question-answering method based on XLNET-BiGRU-CRF, which comprises the following steps: training an XLNET Chinese model; obtaining corpus data; constructing and training an XLNet-BiGRU-CRF neural network model; performing entity recognition on the text content of the user problem to be recognized; extracting a plurality of related questions with corresponding entities in a database according to the entity recognition result, respectively comparing the user question with the plurality of related questions by using the Embedding sentence vector cosine similarity, taking the answer of the related question with the maximum similarity score as a target result, and simultaneously providing the related questions with the second and third similarity scores for the user to serve as the similar questions for the reference of the user. The invention utilizes the trained model to process the text corpus of the user question and combines the knowledge graph retrieval method to obtain the question answer more quickly and accurately.

Description

XLNET-BiGRU-CRF-based intelligent question answering method

Technical Field

The invention belongs to the technical field of intelligent question answering, and particularly relates to an intelligent question answering method based on XLNet-BiGRU-CRF.

Background

In recent years, with the development of big data and artificial intelligence technology, the question-answering system has been applied to various industries, and the question-answering system also becomes a key component of an intelligent robot, and influences the important link of the communication between the robot and people.

The traditional question-answering system is generally based on keyword retrieval, and does not consider semantic information of the question. The knowledge-graph-based question-answering system can perform online analysis processing on the text of a specific question asked by a questioner, and then search to output the best matching answer, so that the accurate answer to the question can be quickly obtained. Knowledge maps typically store data in a triplet format, such as "< advanced math > < press > < wuhan university press >", where "advanced math" and "wuhan university press" are two entities, respectively, and "press" is a relationship between the two entities. The input to such a question-and-answer system is a text query, and then one or a set of triples most relevant to the query is found in the knowledge base, and the corresponding entities in the triples are returned.

The current mainstream methods are: a method based on relational classification, a method based on search and a method based on semantic parsing. Taking a method based on relational classification as an example, the method firstly predicts an entity and a relation from a question, and then finds an answer entity according to the two. The common characteristics of the methods are that a question and corresponding logic expression data are required to train a prediction model, compared with the construction of a knowledge graph, the integration cost of special labeling data is higher, and a labeler is required to master certain professional knowledge including field professional knowledge and query language knowledge. Whereas semantic analysis based approaches present an obstacle between logical expressions and natural language semantics. Meanwhile, compared with the front-edge models such as BERT (best oriented Autoregressive prediction for Language Understanding), XLNT (Generalized Autoregressive prediction for Language Understanding) and the like, the common models such as CNN, LSTM and the like have poor model training effect and accuracy and lack correlation analysis on words or words in the problem text.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides an intelligent question-answering method based on XLNET-BiGRU-CRF, and in order to realize the purpose, the invention adopts the following technical scheme:

an intelligent question-answering method based on XLNet-BiGRU-CRF comprises the following steps:

step 1: training an XLNET Chinese model based on large-scale unmarked corpora, wherein the XLNET model comprises an arrangement language model, a double-flow attention mechanism and a Transformer-XL core component;

step 2: acquiring corpus data for constructing a knowledge graph and a named entity recognition model, preprocessing and labeling the corpus data, storing triple data obtained by preprocessing the corpus data into a Neo4j database, respectively extracting Embedding sentence vectors of a plurality of problems corresponding to the triple data according to the XLNET Chinese model trained in the step 1, and storing the Embedding sentence vectors into a Neo4j database; the triple consists of a question entity, a question attribute and an answer;

and step 3: constructing an XLNet-BiGRU-CRF neural network model based on the XLNet Chinese model trained in the step 1, and training the XLNet-BiGRU-CRF model by using the training corpus data labeled in the step 2;

and 4, step 4: performing entity recognition on the text content of the user problem to be recognized by using the trained XLNET-BiGRU-CRF model to obtain an entity recognition result;

and 5: extracting a plurality of related triad data with corresponding entities in the Neo4j database according to the entity identification result in the step 4, extracting an Embedding sentence vector of a user problem to be identified by using an XLNT Chinese model, respectively comparing the Embedding sentence vector with the extracted Embedding sentence vectors of a plurality of problems corresponding to the related triad data in cosine similarity, taking an answer corresponding to the problem with the maximum similarity score as a target result, and simultaneously providing the problems and answers corresponding to the related triads with the second and third similarity scores for the user to serve as similar problems for the user to refer to.

Further, the arrangement language model described in step 1 is used to randomly disorder the order of the Chinese characters in the text sentence, and given a text sequence with a length of T, the arrangement combinations of the different orders of the Chinese characters are a_TA is one of permutation and combination and a is belonged to A_TThe modeling process of the arrangement language model is expressed as

Wherein the content of the first and second substances,

indicating the desire for all combinations of permutations,

for permutation and combination of the t-th element, x, in a text sequence_a＜tFor permutation and combination of the 1 st to (t-1) th elements in the text sequence of a, theta is a parameter of a model to be trained, and p_θRepresenting the conditional probability.

Further, in step 1, the dual-stream attention mechanism includes a text content attention stream and a query attention stream, the text content attention stream represents a self-attention mechanism including location information and content information, the query attention stream represents an input stream including only location information, content information of a current location is not revealed when a desired predicted location is predicted, and the text content attention stream and the query attention stream are combined to extract features related to context information; the dual flow attention mechanism is specifically represented as follows:

wherein the content of the first and second substances,

respectively, representing the query attention flow matrix vectors of the m-1 th layer and the m-1 th layer, which only contain the position information of the input text,

respectively representing the content attention flow matrix vectors of the m-1 th layer and the m-1 th layer,

the content Attention flow matrix vector of the m-1 layer when the arrangement is combined into the 1 st to (t-1) th elements in the text sequence of a is shown, the Attention represents the classic self-Attention mechanism, and the calculation formula is as follows:

wherein Q, K and V are input word vector matrixes, and dim is the dimension of the input vector.

Further, the XLNET Chinese language model in the step 1 takes a Transformer-XL framework as a core, and introduces a circulation mechanism and a relative position coding mechanism to utilize semantic information of context to dig out potential relations in text vectors.

Further, in step 3, the feature vector output by the XLNet Chinese language model is input to the BiGRU network, the BiGRU network controls the transmission and the cutoff of the information through the gate, and the specific state calculation formula is

Wherein x is_tAn input vector representing the current t moment and a characteristic vector representing the t word in the text; h is_t、h_t-1Respectively representing hidden layer state matrix vectors at the current time t and the previous time;

the candidate hidden layer state representing the current time t is also new at the current timeTo memorize the data. z is a radical of_tRepresenting an update gate for controlling the extent to which the state information of the previous moment is brought into the current state, z_tThe larger the value of (A) is, the more state information at the previous moment is kept; r is_tIndicating a reset gate for controlling the extent to which status information from a previous moment is ignored, r_tSmaller values of (c) indicate more rejection. w is a_z、w_r、

Weight matrices representing the update gate, the reset gate and the candidate hidden states, respectively. σ denotes sigmoid nonlinear activation function, tanh denotes tanh activation function, and σ denotes dot product of vector.

And the output vector passing through the BiGRU network coding unit is Z, and the output vector Z is subjected to softmax probability normalization and then is input to a CRF layer. For a given input sequence X, the probability of predicting the output tag sequence y is defined as S (X, y), where y ═ y₁,y₂,……y_n) Representing a tag sequence with n numbers of words contained in a sentence. The formula for S (X, y) is as follows:

wherein the content of the first and second substances,

the output vector of the BiGRU network coding unit is an element of Z.

Is the element of the probability transition matrix output by the CRF layer, representing the slave label y_t-1To y_tThe transition probability of (2) is such that more reasonable tag sequences are obtained by utilizing the dependency between tags. It can be seen that the probability of the whole tag sequence y is the sum of the scores of the modules, and the score of each position is composed of two parts, one part is the output probability matrix of the BiGRU network coding unit, and the other part is the output transition probability matrix of the CRF layer. Will be provided withAfter normalization processing is carried out by the formula, the final prediction probability of the tag sequence y is obtained, and the formula is as follows:

wherein Y represents all possible tag sequences,

is one of all possible tag sequences;

the loss function L of the CRF layer adopts a negative log-likelihood function, and the formula is as follows:

and training and updating parameters of the whole named entity recognition model by using a loss function of a CRF layer by adopting an Adam algorithm, wherein the parameters comprise model parameters including a BiGRU network and the CRF layer, and the parameters of the XLNet Chinese model are kept unchanged.

Further, in the step 4, the text content of the user problem to be identified is input into a trained XLNET-BiGRU-CRF model, the text is converted into a feature vector after passing through the XLNET Chinese model, the feature vector is subjected to feature extraction through a BiGRU network, and finally, the maximum possible labeling sequence in the text is obtained by adopting a Viterbi algorithm on a CRF layer to serve as a result of named entity identification.

Further, the cosine similarity calculation formula in step 5 is as follows:

wherein score is the similarity value, V_queryEmbedding sentence vector, V, for user question_corpusIs the Embedding sentence vector of the correlation problem.

The invention has the advantages and beneficial effects that: (1) the XLNET model used by the invention is completed based on unsupervised training on large-scale label-free data, and based on the arrangement language model, the pre-training can be better combined with context semantic information, so that the XLNET model has strong text feature expression capability. (2) Based on the knowledge graph and the Neo4j database, the stored data set can be visualized more conveniently, and meanwhile, the searching speed is improved; (3) the XLNET model has strong text feature expression capability, and the introduction of the bidirectional GRU cycle structure can better achieve the common coding of context information. The access of the CRF layer can effectively solve the problem that the dependency between labels is not considered in the traditional entity identification, and the three modes are combined, so that the accuracy of the identification result is further improved.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a sample pre-processed corpus used to build a named entity recognition model.

Detailed Description

To facilitate an understanding of the structure, features and technical content of the present invention, a person skilled in the art will further describe the present invention with reference to the accompanying drawings.

As shown in fig. 1, the method of the present invention comprises the following steps:

and S1, training the XLNET Chinese model based on the large-scale unmarked corpora.

The XLNET Chinese model mainly comprises an arrangement language model, a double-flow attention mechanism and a Transformer-XL core component. Wherein the purpose of arranging the language model is to randomly shuffle the Chinese characters in the text sentence, for the Chinese character x_iThe sequence of Chinese characters originally appearing behind it { x_i+1,…,x_nIt may also appear in front of it. Suppose that all permutations of a text sequence of length T are A_TA is one of permutation and combination and a is belonged to A_TThen the modeling process of the ranking language model is represented as

Wherein the content of the first and second substances,

The XLNet adopts a dual-stream Attention mechanism, wherein a text content Attention stream represents a Self-Attention mechanism containing position information and content information, and a query Attention stream only contains an input stream of the position information, so that when the query Attention stream is used for predicting a required predicted position, any content information of the current position cannot be leaked, and the two streams complement each other, so that the characteristics of related context information can be better extracted, and the dual-stream Attention mechanism is specifically represented as follows:

wherein the content of the first and second substances,

the content Attention flow matrix vectors of the m-th and m-1-th layers respectively contain the content information and the position information of the input text, and the Attention represents the classic self-Attention mechanism and the calculation formula is as follows:

The XLNET Chinese model takes a transformer-xl frame as a core, introduces a circulation mechanism and relative position coding, and can better utilize context semantic information to dig out potential relations in text vectors.

The XLNET Chinese model is trained on large-scale label-free data to obtain corresponding model parameters, and the characteristic vector representation of an input sequence can be obtained through reasoning.

And S2, acquiring training corpus data for constructing a knowledge graph and a named entity recognition model, preprocessing the data and labeling the data.

The preprocessed triple data of the corpus data is stored in a Neo4j database to construct a knowledge graph, and a data set is generally composed of triples (such as < advanced mathematics > < publisher > < Wuhan university Press >) composed of question entities, question attributes and answers. And simultaneously, extracting an Embedding sentence vector of the triple sentence by using the XLNET Chinese model trained in the S1, and storing the Embedding sentence vector into a Neo4j database.

Marking the problem text according to the triple entity, and constructing a marking corpus of the named entity recognition model, as shown in fig. 2, only the entity needs to be recognized, and marking is performed by using [ "O", "B-LOC", and "I-LOC" ], wherein O represents other non-entities, B-LOC represents the beginning of the entity, and I-LOC represents the non-initial character of the entity.

S3, constructing an XLNet-BiGRU-CRF neural network model on the basis of the XLNet Chinese model trained in the step S1, and training the model by using the data marked in the step S2.

Firstly, the labeled corpus is input into a trained XLNET Chinese model, the XLNET Chinese model outputs a feature vector, and then the feature vector is input into a BiGRU neural network model. The BiGRU network is actually a simplification of the LSTM network, and controls the passing and stopping of information through a gate, and the specific state calculation formula is as follows:

wherein x is_tAn input vector representing the current t moment and a characteristic vector representing the t word in the text; h is_t、h_t-1Are respectively provided withRepresenting hidden layer state matrix vectors at the current time t and the previous time;

the candidate hidden layer state representing the current time t is also new memory of the current time. z is a radical of_tRepresenting an update gate for controlling the extent to which the state information of the previous moment is brought into the current state, z_tThe larger the value of (A) is, the more state information at the previous moment is kept; r is_tIndicating a reset gate for controlling the extent to which status information from a previous moment is ignored, r_tSmaller values of (c) indicate more rejection. w is a_z、w_r、

wherein the content of the first and second substances,

the output vector of the BiGRU network coding unit is an element of Z.

Is the element of the probability transition matrix output by the CRF layer, representing the slave label y_t-1To y_tSo that more reasonable labels are obtained by utilizing the dependency between labelsA signature sequence. It can be seen that the probability of the whole tag sequence y is the sum of the scores of the modules, and the score of each position is composed of two parts, one part is the output probability matrix of the BiGRU network coding unit, and the other part is the output transition probability matrix of the CRF layer. After normalization processing is carried out on the formula, the final prediction probability of the label sequence y is obtained, and the formula is as follows:

where Y represents all possible tag sequences. The loss function of the CRF layer adopts a negative log-likelihood function, and the formula is as follows:

updating parameters of the whole named entity recognition model by using a loss function of a CRF layer by adopting an Adam algorithm, wherein the parameters comprise model parameters including a BiGRU neural network model and the CRF layer, the parameters of the XLNet Chinese model are kept unchanged, and when a loss value generated by the model meets a set requirement or reaches a set maximum iteration number, the training of the model is terminated.

S4, performing entity recognition on the text content of the user problem to be recognized by using the XLNET-BiGRU-CRF model trained in S3 to obtain a recognition result, and the method mainly comprises the following steps:

s4-1, inputting text data to be recognized into a trained XLNET-BiGRU-CRF neural network model;

and S4-2, converting text data into a feature vector after an XLNET Chinese model, extracting features of the feature vector through a BiGRU network, and finally solving the maximum possible labeling sequence in the text by adopting a Viterbi algorithm on a CRF layer, namely the result of named entity recognition.

S5, extracting a plurality of related problems with corresponding entities in the Neo4j database according to the recognition result of the named entities, extracting the Embedding sentence vectors of the user problems to be recognized by using an XLNet Chinese model, respectively comparing the cosine similarity with the Embedding sentence vectors with the related problems with the corresponding entities in the Neo4j database stored in the S2, taking the answer of the problem with the highest similarity as a target result, and simultaneously providing the second-ranked problem and the third-ranked problem for the user as similar problems for the user to refer to. The corresponding cosine similarity is calculated as follows:

The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims

1. An intelligent question-answering method based on XLNET-BiGRU-CRF is characterized by comprising the following steps:

step 1: training an XLNET Chinese model based on large-scale unmarked corpus, wherein the XLNET Chinese model comprises an arrangement language model, a double-flow attention mechanism and a Transformer-XL core component;

step 2: acquiring training corpus data for constructing a knowledge graph and a named entity recognition model, preprocessing and labeling the training corpus data, storing triple data obtained by preprocessing the training corpus data into a Neo4j database, respectively extracting an Embedding sentence vector of a problem corresponding to the triple data according to the XLNET Chinese model trained in the step 1, and storing the Embedding sentence vector into a Neo4j database; the triple consists of a question entity, a question attribute and an answer;

and 5: extracting a plurality of related triad data with corresponding entities in the Neo4j database according to the entity identification result in the step 4, extracting an Embedding sentence vector of a user problem to be identified by using an XLNT Chinese model, respectively comparing the Embedding sentence vector with cosine similarity of the extracted Embedding sentence vector of the problem corresponding to the plurality of related triad data, taking an answer corresponding to the problem with the highest similarity score as a target result, and simultaneously providing the problems and answers corresponding to the related triads with the second and third similarity scores for the user to serve as similar problems for reference of the user.

2. The method as claimed in claim 1, wherein the permutation language model in step 1 is used for randomly disordering the sequence of chinese characters in the text sentence, and given a text sequence with length T, the permutation combination of different sequences of chinese characters is a_TA is one of permutation and combination and a is belonged to A_TThe modeling process of the arrangement language model is expressed as

Wherein the content of the first and second substances,

indicating the desire for all combinations of permutations,

for permutation and combination of the t-th element, x, in a text sequence_a＜tFor permutation and combination of the 1 st to (t-1) th elements in the text sequence of a, theta is a parameter of a model to be trained, and p_θIndication barThe piece probability.

3. The XLNet-BiGRU-CRF-based intelligent question answering method of claim 1, wherein the dual-stream attention mechanism in step 1 includes a text content attention stream and a query attention stream, the text content attention stream represents a self-attention mechanism including location information and content information, the query attention stream represents an input stream including only location information, the content information of the current location is not revealed when the desired predicted location is predicted, and the text content attention stream and the query attention stream are combined to extract features related to context information; the dual flow attention mechanism is specifically represented as follows:

wherein the content of the first and second substances,

the query attention flow matrix vectors representing the m-th and m-1-th layers, respectively, contain only location information of the input text,

content attention flow matrix vectors representing the m-th layer and the m-1 th layer, respectively, containing content information of the input text and position information,

4. The method as claimed in claim 1, wherein the XLNet-BiGRU-CRF based intelligent question-answering method is characterized in that in step 1, the XLNet chinese language model takes a fransformer-XL framework as a core, and a circulation mechanism and a relative position coding mechanism are introduced to exploit the semantic information of the context and extract the latent relation in the text vector.

5. The method of claim 1, wherein in step 3, the characteristic vector outputted from the XLNet chinese language model is inputted to the BiGRU network, and the BiGRU network controls the transmission and cutoff of information through the gate, and the specific state calculation formula is XLNet-BiGRU-CRF

Wherein x is_tAn input vector representing the current t moment and a characteristic vector representing the t word in the text; h is_t、h_t-1Respectively representing hidden layer state matrix vectors at the current moment and the previous moment;

the candidate hidden layer state representing the current time, also a new memory of the current time, z_tRepresenting an update gate for controlling the extent to which the state information of the previous moment is brought into the current state, z_tThe larger the value of (A) is, the more state information at the previous moment is kept; r is_tIndicating a reset gate for controlling the extent to which status information from a previous moment is ignored, r_tSmaller values of (A) indicate more discard, w_z、w_r、

Weight matrix respectively representing update gate, reset gate and candidate hidden state, sigma representing sigmoid nonlinear activation function, tanh represents the tanh activation function, x represents the dot-product of the vector;

the output vector passing through the BiGRU network coding unit is Z, the output vector Z is subjected to softmax probability normalization and then is input into a CRF layer, and for a given input sequence X, the probability of predicting an output label sequence y is defined as S (X, y), wherein y is (y ═ y₁,y₂,……y_n) The tag sequence with the number of n words in the sentence is represented, and the calculation formula of S (X, y) is as follows:

wherein the content of the first and second substances,

is an element of the output vector of the BiGRU network coding unit being Z,

is the element of the probability transition matrix output by the CRF layer, representing the slave label y_t-1To y_tThe above formula is normalized to obtain the final prediction probability p (y | X) of the tag sequence y,

wherein Y represents all possible tag sequences,

is one of all possible tag sequences;

the loss function L of the CRF layer employs a negative log-likelihood function,

6. The method as claimed in claim 1, wherein in step 4, the text content of the user question to be identified is inputted into a trained XLNet-BiGRU-CRF model, the text content is converted into a feature vector after passing through an XLNet chinese model, the feature vector is subjected to feature extraction through a BiGRU network, and finally, the most probable annotation sequence in the text is obtained as the result of named entity identification by using a viterbi algorithm in a CRF layer.

7. The XLNET-BiGRU-CRF-based intelligent question-answering method of claim 1, wherein the cosine similarity in step 5 is calculated by the formula: