CN113641809A - XLNET-BiGRU-CRF-based intelligent question answering method - Google Patents

XLNET-BiGRU-CRF-based intelligent question answering method Download PDF

Info

Publication number
CN113641809A
CN113641809A CN202110913182.1A CN202110913182A CN113641809A CN 113641809 A CN113641809 A CN 113641809A CN 202110913182 A CN202110913182 A CN 202110913182A CN 113641809 A CN113641809 A CN 113641809A
Authority
CN
China
Prior art keywords
xlnet
model
bigru
crf
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110913182.1A
Other languages
Chinese (zh)
Other versions
CN113641809B (en
Inventor
刘大伟
胡笳
车少帅
张邱鸣
张玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Clp Hongxin Information Technology Co ltd
Original Assignee
Clp Hongxin Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Clp Hongxin Information Technology Co ltd filed Critical Clp Hongxin Information Technology Co ltd
Priority to CN202110913182.1A priority Critical patent/CN113641809B/en
Publication of CN113641809A publication Critical patent/CN113641809A/en
Application granted granted Critical
Publication of CN113641809B publication Critical patent/CN113641809B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/169Annotation, e.g. comment data or footnotes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses an intelligent question-answering method based on XLNET-BiGRU-CRF, which comprises the following steps: training an XLNET Chinese model; obtaining corpus data; constructing and training an XLNet-BiGRU-CRF neural network model; performing entity recognition on the text content of the user problem to be recognized; extracting a plurality of related questions with corresponding entities in a database according to the entity recognition result, respectively comparing the user question with the plurality of related questions by using the Embedding sentence vector cosine similarity, taking the answer of the related question with the maximum similarity score as a target result, and simultaneously providing the related questions with the second and third similarity scores for the user to serve as the similar questions for the reference of the user. The invention utilizes the trained model to process the text corpus of the user question and combines the knowledge graph retrieval method to obtain the question answer more quickly and accurately.

Description

XLNET-BiGRU-CRF-based intelligent question answering method
Technical Field
The invention belongs to the technical field of intelligent question answering, and particularly relates to an intelligent question answering method based on XLNet-BiGRU-CRF.
Background
In recent years, with the development of big data and artificial intelligence technology, the question-answering system has been applied to various industries, and the question-answering system also becomes a key component of an intelligent robot, and influences the important link of the communication between the robot and people.
The traditional question-answering system is generally based on keyword retrieval, and does not consider semantic information of the question. The knowledge-graph-based question-answering system can perform online analysis processing on the text of a specific question asked by a questioner, and then search to output the best matching answer, so that the accurate answer to the question can be quickly obtained. Knowledge maps typically store data in a triplet format, such as "< advanced math > < press > < wuhan university press >", where "advanced math" and "wuhan university press" are two entities, respectively, and "press" is a relationship between the two entities. The input to such a question-and-answer system is a text query, and then one or a set of triples most relevant to the query is found in the knowledge base, and the corresponding entities in the triples are returned.
The current mainstream methods are: a method based on relational classification, a method based on search and a method based on semantic parsing. Taking a method based on relational classification as an example, the method firstly predicts an entity and a relation from a question, and then finds an answer entity according to the two. The common characteristics of the methods are that a question and corresponding logic expression data are required to train a prediction model, compared with the construction of a knowledge graph, the integration cost of special labeling data is higher, and a labeler is required to master certain professional knowledge including field professional knowledge and query language knowledge. Whereas semantic analysis based approaches present an obstacle between logical expressions and natural language semantics. Meanwhile, compared with the front-edge models such as BERT (best oriented Autoregressive prediction for Language Understanding), XLNT (Generalized Autoregressive prediction for Language Understanding) and the like, the common models such as CNN, LSTM and the like have poor model training effect and accuracy and lack correlation analysis on words or words in the problem text.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an intelligent question-answering method based on XLNET-BiGRU-CRF, and in order to realize the purpose, the invention adopts the following technical scheme:
an intelligent question-answering method based on XLNet-BiGRU-CRF comprises the following steps:
step 1: training an XLNET Chinese model based on large-scale unmarked corpora, wherein the XLNET model comprises an arrangement language model, a double-flow attention mechanism and a Transformer-XL core component;
step 2: acquiring corpus data for constructing a knowledge graph and a named entity recognition model, preprocessing and labeling the corpus data, storing triple data obtained by preprocessing the corpus data into a Neo4j database, respectively extracting Embedding sentence vectors of a plurality of problems corresponding to the triple data according to the XLNET Chinese model trained in the step 1, and storing the Embedding sentence vectors into a Neo4j database; the triple consists of a question entity, a question attribute and an answer;
and step 3: constructing an XLNet-BiGRU-CRF neural network model based on the XLNet Chinese model trained in the step 1, and training the XLNet-BiGRU-CRF model by using the training corpus data labeled in the step 2;
and 4, step 4: performing entity recognition on the text content of the user problem to be recognized by using the trained XLNET-BiGRU-CRF model to obtain an entity recognition result;
and 5: extracting a plurality of related triad data with corresponding entities in the Neo4j database according to the entity identification result in the step 4, extracting an Embedding sentence vector of a user problem to be identified by using an XLNT Chinese model, respectively comparing the Embedding sentence vector with the extracted Embedding sentence vectors of a plurality of problems corresponding to the related triad data in cosine similarity, taking an answer corresponding to the problem with the maximum similarity score as a target result, and simultaneously providing the problems and answers corresponding to the related triads with the second and third similarity scores for the user to serve as similar problems for the user to refer to.
Further, the arrangement language model described in step 1 is used to randomly disorder the order of the Chinese characters in the text sentence, and given a text sequence with a length of T, the arrangement combinations of the different orders of the Chinese characters are aTA is one of permutation and combination and a is belonged to ATThe modeling process of the arrangement language model is expressed as
Figure BDA0003204434230000021
Wherein the content of the first and second substances,
Figure BDA0003204434230000022
indicating the desire for all combinations of permutations,
Figure BDA0003204434230000023
for permutation and combination of the t-th element, x, in a text sequencea<tFor permutation and combination of the 1 st to (t-1) th elements in the text sequence of a, theta is a parameter of a model to be trained, and pθRepresenting the conditional probability.
Further, in step 1, the dual-stream attention mechanism includes a text content attention stream and a query attention stream, the text content attention stream represents a self-attention mechanism including location information and content information, the query attention stream represents an input stream including only location information, content information of a current location is not revealed when a desired predicted location is predicted, and the text content attention stream and the query attention stream are combined to extract features related to context information; the dual flow attention mechanism is specifically represented as follows:
Figure BDA0003204434230000024
wherein the content of the first and second substances,
Figure BDA0003204434230000025
respectively, representing the query attention flow matrix vectors of the m-1 th layer and the m-1 th layer, which only contain the position information of the input text,
Figure BDA0003204434230000026
respectively representing the content attention flow matrix vectors of the m-1 th layer and the m-1 th layer,
Figure BDA0003204434230000027
the content Attention flow matrix vector of the m-1 layer when the arrangement is combined into the 1 st to (t-1) th elements in the text sequence of a is shown, the Attention represents the classic self-Attention mechanism, and the calculation formula is as follows:
Figure BDA0003204434230000031
wherein Q, K and V are input word vector matrixes, and dim is the dimension of the input vector.
Further, the XLNET Chinese language model in the step 1 takes a Transformer-XL framework as a core, and introduces a circulation mechanism and a relative position coding mechanism to utilize semantic information of context to dig out potential relations in text vectors.
Further, in step 3, the feature vector output by the XLNet Chinese language model is input to the BiGRU network, the BiGRU network controls the transmission and the cutoff of the information through the gate, and the specific state calculation formula is
Figure BDA0003204434230000032
Wherein x istAn input vector representing the current t moment and a characteristic vector representing the t word in the text; h ist、ht-1Respectively representing hidden layer state matrix vectors at the current time t and the previous time;
Figure BDA0003204434230000033
the candidate hidden layer state representing the current time t is also new at the current timeTo memorize the data. z is a radical oftRepresenting an update gate for controlling the extent to which the state information of the previous moment is brought into the current state, ztThe larger the value of (A) is, the more state information at the previous moment is kept; r istIndicating a reset gate for controlling the extent to which status information from a previous moment is ignored, rtSmaller values of (c) indicate more rejection. w is az、wr
Figure BDA0003204434230000034
Weight matrices representing the update gate, the reset gate and the candidate hidden states, respectively. σ denotes sigmoid nonlinear activation function, tanh denotes tanh activation function, and σ denotes dot product of vector.
And the output vector passing through the BiGRU network coding unit is Z, and the output vector Z is subjected to softmax probability normalization and then is input to a CRF layer. For a given input sequence X, the probability of predicting the output tag sequence y is defined as S (X, y), where y ═ y1,y2,……yn) Representing a tag sequence with n numbers of words contained in a sentence. The formula for S (X, y) is as follows:
Figure BDA0003204434230000035
wherein the content of the first and second substances,
Figure BDA0003204434230000037
the output vector of the BiGRU network coding unit is an element of Z.
Figure BDA0003204434230000036
Is the element of the probability transition matrix output by the CRF layer, representing the slave label yt-1To ytThe transition probability of (2) is such that more reasonable tag sequences are obtained by utilizing the dependency between tags. It can be seen that the probability of the whole tag sequence y is the sum of the scores of the modules, and the score of each position is composed of two parts, one part is the output probability matrix of the BiGRU network coding unit, and the other part is the output transition probability matrix of the CRF layer. Will be provided withAfter normalization processing is carried out by the formula, the final prediction probability of the tag sequence y is obtained, and the formula is as follows:
Figure BDA0003204434230000041
wherein Y represents all possible tag sequences,
Figure BDA0003204434230000042
is one of all possible tag sequences;
the loss function L of the CRF layer adopts a negative log-likelihood function, and the formula is as follows:
Figure BDA0003204434230000043
and training and updating parameters of the whole named entity recognition model by using a loss function of a CRF layer by adopting an Adam algorithm, wherein the parameters comprise model parameters including a BiGRU network and the CRF layer, and the parameters of the XLNet Chinese model are kept unchanged.
Further, in the step 4, the text content of the user problem to be identified is input into a trained XLNET-BiGRU-CRF model, the text is converted into a feature vector after passing through the XLNET Chinese model, the feature vector is subjected to feature extraction through a BiGRU network, and finally, the maximum possible labeling sequence in the text is obtained by adopting a Viterbi algorithm on a CRF layer to serve as a result of named entity identification.
Further, the cosine similarity calculation formula in step 5 is as follows:
Figure BDA0003204434230000044
wherein score is the similarity value, VqueryEmbedding sentence vector, V, for user questioncorpusIs the Embedding sentence vector of the correlation problem.
The invention has the advantages and beneficial effects that: (1) the XLNET model used by the invention is completed based on unsupervised training on large-scale label-free data, and based on the arrangement language model, the pre-training can be better combined with context semantic information, so that the XLNET model has strong text feature expression capability. (2) Based on the knowledge graph and the Neo4j database, the stored data set can be visualized more conveniently, and meanwhile, the searching speed is improved; (3) the XLNET model has strong text feature expression capability, and the introduction of the bidirectional GRU cycle structure can better achieve the common coding of context information. The access of the CRF layer can effectively solve the problem that the dependency between labels is not considered in the traditional entity identification, and the three modes are combined, so that the accuracy of the identification result is further improved.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a sample pre-processed corpus used to build a named entity recognition model.
Detailed Description
To facilitate an understanding of the structure, features and technical content of the present invention, a person skilled in the art will further describe the present invention with reference to the accompanying drawings.
As shown in fig. 1, the method of the present invention comprises the following steps:
and S1, training the XLNET Chinese model based on the large-scale unmarked corpora.
The XLNET Chinese model mainly comprises an arrangement language model, a double-flow attention mechanism and a Transformer-XL core component. Wherein the purpose of arranging the language model is to randomly shuffle the Chinese characters in the text sentence, for the Chinese character xiThe sequence of Chinese characters originally appearing behind it { xi+1,…,xnIt may also appear in front of it. Suppose that all permutations of a text sequence of length T are ATA is one of permutation and combination and a is belonged to ATThen the modeling process of the ranking language model is represented as
Figure BDA0003204434230000051
Wherein the content of the first and second substances,
Figure BDA0003204434230000052
for permutation and combination of the t-th element, x, in a text sequencea<tFor permutation and combination of the 1 st to (t-1) th elements in the text sequence of a, theta is a parameter of a model to be trained, and pθRepresenting the conditional probability.
The XLNet adopts a dual-stream Attention mechanism, wherein a text content Attention stream represents a Self-Attention mechanism containing position information and content information, and a query Attention stream only contains an input stream of the position information, so that when the query Attention stream is used for predicting a required predicted position, any content information of the current position cannot be leaked, and the two streams complement each other, so that the characteristics of related context information can be better extracted, and the dual-stream Attention mechanism is specifically represented as follows:
Figure BDA0003204434230000053
wherein the content of the first and second substances,
Figure BDA0003204434230000054
respectively, representing the query attention flow matrix vectors of the m-1 th layer and the m-1 th layer, which only contain the position information of the input text,
Figure BDA0003204434230000055
the content Attention flow matrix vectors of the m-th and m-1-th layers respectively contain the content information and the position information of the input text, and the Attention represents the classic self-Attention mechanism and the calculation formula is as follows:
Figure BDA0003204434230000056
wherein Q, K and V are input word vector matrixes, and dim is the dimension of the input vector.
The XLNET Chinese model takes a transformer-xl frame as a core, introduces a circulation mechanism and relative position coding, and can better utilize context semantic information to dig out potential relations in text vectors.
The XLNET Chinese model is trained on large-scale label-free data to obtain corresponding model parameters, and the characteristic vector representation of an input sequence can be obtained through reasoning.
And S2, acquiring training corpus data for constructing a knowledge graph and a named entity recognition model, preprocessing the data and labeling the data.
The preprocessed triple data of the corpus data is stored in a Neo4j database to construct a knowledge graph, and a data set is generally composed of triples (such as < advanced mathematics > < publisher > < Wuhan university Press >) composed of question entities, question attributes and answers. And simultaneously, extracting an Embedding sentence vector of the triple sentence by using the XLNET Chinese model trained in the S1, and storing the Embedding sentence vector into a Neo4j database.
Marking the problem text according to the triple entity, and constructing a marking corpus of the named entity recognition model, as shown in fig. 2, only the entity needs to be recognized, and marking is performed by using [ "O", "B-LOC", and "I-LOC" ], wherein O represents other non-entities, B-LOC represents the beginning of the entity, and I-LOC represents the non-initial character of the entity.
S3, constructing an XLNet-BiGRU-CRF neural network model on the basis of the XLNet Chinese model trained in the step S1, and training the model by using the data marked in the step S2.
Firstly, the labeled corpus is input into a trained XLNET Chinese model, the XLNET Chinese model outputs a feature vector, and then the feature vector is input into a BiGRU neural network model. The BiGRU network is actually a simplification of the LSTM network, and controls the passing and stopping of information through a gate, and the specific state calculation formula is as follows:
Figure BDA0003204434230000061
wherein x istAn input vector representing the current t moment and a characteristic vector representing the t word in the text; h ist、ht-1Are respectively provided withRepresenting hidden layer state matrix vectors at the current time t and the previous time;
Figure BDA0003204434230000062
the candidate hidden layer state representing the current time t is also new memory of the current time. z is a radical oftRepresenting an update gate for controlling the extent to which the state information of the previous moment is brought into the current state, ztThe larger the value of (A) is, the more state information at the previous moment is kept; r istIndicating a reset gate for controlling the extent to which status information from a previous moment is ignored, rtSmaller values of (c) indicate more rejection. w is az、wr
Figure BDA0003204434230000063
Weight matrices representing the update gate, the reset gate and the candidate hidden states, respectively. σ denotes sigmoid nonlinear activation function, tanh denotes tanh activation function, and σ denotes dot product of vector.
And the output vector passing through the BiGRU network coding unit is Z, and the output vector Z is subjected to softmax probability normalization and then is input to a CRF layer. For a given input sequence X, the probability of predicting the output tag sequence y is defined as S (X, y), where y ═ y1,y2,……yn) Representing a tag sequence with n numbers of words contained in a sentence. The formula for S (X, y) is as follows:
Figure BDA0003204434230000071
wherein the content of the first and second substances,
Figure BDA0003204434230000072
the output vector of the BiGRU network coding unit is an element of Z.
Figure BDA0003204434230000073
Is the element of the probability transition matrix output by the CRF layer, representing the slave label yt-1To ytSo that more reasonable labels are obtained by utilizing the dependency between labelsA signature sequence. It can be seen that the probability of the whole tag sequence y is the sum of the scores of the modules, and the score of each position is composed of two parts, one part is the output probability matrix of the BiGRU network coding unit, and the other part is the output transition probability matrix of the CRF layer. After normalization processing is carried out on the formula, the final prediction probability of the label sequence y is obtained, and the formula is as follows:
Figure BDA0003204434230000074
where Y represents all possible tag sequences. The loss function of the CRF layer adopts a negative log-likelihood function, and the formula is as follows:
Figure BDA0003204434230000075
updating parameters of the whole named entity recognition model by using a loss function of a CRF layer by adopting an Adam algorithm, wherein the parameters comprise model parameters including a BiGRU neural network model and the CRF layer, the parameters of the XLNet Chinese model are kept unchanged, and when a loss value generated by the model meets a set requirement or reaches a set maximum iteration number, the training of the model is terminated.
S4, performing entity recognition on the text content of the user problem to be recognized by using the XLNET-BiGRU-CRF model trained in S3 to obtain a recognition result, and the method mainly comprises the following steps:
s4-1, inputting text data to be recognized into a trained XLNET-BiGRU-CRF neural network model;
and S4-2, converting text data into a feature vector after an XLNET Chinese model, extracting features of the feature vector through a BiGRU network, and finally solving the maximum possible labeling sequence in the text by adopting a Viterbi algorithm on a CRF layer, namely the result of named entity recognition.
S5, extracting a plurality of related problems with corresponding entities in the Neo4j database according to the recognition result of the named entities, extracting the Embedding sentence vectors of the user problems to be recognized by using an XLNet Chinese model, respectively comparing the cosine similarity with the Embedding sentence vectors with the related problems with the corresponding entities in the Neo4j database stored in the S2, taking the answer of the problem with the highest similarity as a target result, and simultaneously providing the second-ranked problem and the third-ranked problem for the user as similar problems for the user to refer to. The corresponding cosine similarity is calculated as follows:
Figure BDA0003204434230000081
wherein score is the similarity value, VqueryEmbedding sentence vector, V, for user questioncorpusIs the Embedding sentence vector of the correlation problem.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims (7)

1. An intelligent question-answering method based on XLNET-BiGRU-CRF is characterized by comprising the following steps:
step 1: training an XLNET Chinese model based on large-scale unmarked corpus, wherein the XLNET Chinese model comprises an arrangement language model, a double-flow attention mechanism and a Transformer-XL core component;
step 2: acquiring training corpus data for constructing a knowledge graph and a named entity recognition model, preprocessing and labeling the training corpus data, storing triple data obtained by preprocessing the training corpus data into a Neo4j database, respectively extracting an Embedding sentence vector of a problem corresponding to the triple data according to the XLNET Chinese model trained in the step 1, and storing the Embedding sentence vector into a Neo4j database; the triple consists of a question entity, a question attribute and an answer;
and step 3: constructing an XLNet-BiGRU-CRF neural network model based on the XLNet Chinese model trained in the step 1, and training the XLNet-BiGRU-CRF model by using the training corpus data labeled in the step 2;
and 4, step 4: performing entity recognition on the text content of the user problem to be recognized by using the trained XLNET-BiGRU-CRF model to obtain an entity recognition result;
and 5: extracting a plurality of related triad data with corresponding entities in the Neo4j database according to the entity identification result in the step 4, extracting an Embedding sentence vector of a user problem to be identified by using an XLNT Chinese model, respectively comparing the Embedding sentence vector with cosine similarity of the extracted Embedding sentence vector of the problem corresponding to the plurality of related triad data, taking an answer corresponding to the problem with the highest similarity score as a target result, and simultaneously providing the problems and answers corresponding to the related triads with the second and third similarity scores for the user to serve as similar problems for reference of the user.
2. The method as claimed in claim 1, wherein the permutation language model in step 1 is used for randomly disordering the sequence of chinese characters in the text sentence, and given a text sequence with length T, the permutation combination of different sequences of chinese characters is aTA is one of permutation and combination and a is belonged to ATThe modeling process of the arrangement language model is expressed as
Figure FDA0003204434220000011
Wherein the content of the first and second substances,
Figure FDA0003204434220000012
indicating the desire for all combinations of permutations,
Figure FDA0003204434220000013
for permutation and combination of the t-th element, x, in a text sequencea<tFor permutation and combination of the 1 st to (t-1) th elements in the text sequence of a, theta is a parameter of a model to be trained, and pθIndication barThe piece probability.
3. The XLNet-BiGRU-CRF-based intelligent question answering method of claim 1, wherein the dual-stream attention mechanism in step 1 includes a text content attention stream and a query attention stream, the text content attention stream represents a self-attention mechanism including location information and content information, the query attention stream represents an input stream including only location information, the content information of the current location is not revealed when the desired predicted location is predicted, and the text content attention stream and the query attention stream are combined to extract features related to context information; the dual flow attention mechanism is specifically represented as follows:
Figure FDA0003204434220000021
wherein the content of the first and second substances,
Figure FDA0003204434220000022
the query attention flow matrix vectors representing the m-th and m-1-th layers, respectively, contain only location information of the input text,
Figure FDA0003204434220000023
content attention flow matrix vectors representing the m-th layer and the m-1 th layer, respectively, containing content information of the input text and position information,
Figure FDA0003204434220000024
the content Attention flow matrix vector of the m-1 layer when the arrangement is combined into the 1 st to (t-1) th elements in the text sequence of a is shown, the Attention represents the classic self-Attention mechanism, and the calculation formula is as follows:
Figure FDA0003204434220000025
wherein Q, K and V are input word vector matrixes, and dim is the dimension of the input vector.
4. The method as claimed in claim 1, wherein the XLNet-BiGRU-CRF based intelligent question-answering method is characterized in that in step 1, the XLNet chinese language model takes a fransformer-XL framework as a core, and a circulation mechanism and a relative position coding mechanism are introduced to exploit the semantic information of the context and extract the latent relation in the text vector.
5. The method of claim 1, wherein in step 3, the characteristic vector outputted from the XLNet chinese language model is inputted to the BiGRU network, and the BiGRU network controls the transmission and cutoff of information through the gate, and the specific state calculation formula is XLNet-BiGRU-CRF
Figure FDA0003204434220000026
Wherein x istAn input vector representing the current t moment and a characteristic vector representing the t word in the text; h ist、ht-1Respectively representing hidden layer state matrix vectors at the current moment and the previous moment;
Figure FDA0003204434220000027
the candidate hidden layer state representing the current time, also a new memory of the current time, ztRepresenting an update gate for controlling the extent to which the state information of the previous moment is brought into the current state, ztThe larger the value of (A) is, the more state information at the previous moment is kept; r istIndicating a reset gate for controlling the extent to which status information from a previous moment is ignored, rtSmaller values of (A) indicate more discard, wz、wr
Figure FDA0003204434220000028
Weight matrix respectively representing update gate, reset gate and candidate hidden state, sigma representing sigmoid nonlinear activation function, tanh represents the tanh activation function, x represents the dot-product of the vector;
the output vector passing through the BiGRU network coding unit is Z, the output vector Z is subjected to softmax probability normalization and then is input into a CRF layer, and for a given input sequence X, the probability of predicting an output label sequence y is defined as S (X, y), wherein y is (y ═ y1,y2,……yn) The tag sequence with the number of n words in the sentence is represented, and the calculation formula of S (X, y) is as follows:
Figure FDA0003204434220000031
wherein the content of the first and second substances,
Figure FDA0003204434220000032
is an element of the output vector of the BiGRU network coding unit being Z,
Figure FDA0003204434220000033
is the element of the probability transition matrix output by the CRF layer, representing the slave label yt-1To ytThe above formula is normalized to obtain the final prediction probability p (y | X) of the tag sequence y,
Figure FDA0003204434220000034
wherein Y represents all possible tag sequences,
Figure FDA0003204434220000035
is one of all possible tag sequences;
the loss function L of the CRF layer employs a negative log-likelihood function,
Figure FDA0003204434220000036
and training and updating parameters of the whole named entity recognition model by using a loss function of a CRF layer by adopting an Adam algorithm, wherein the parameters comprise model parameters including a BiGRU network and the CRF layer, and the parameters of the XLNet Chinese model are kept unchanged.
6. The method as claimed in claim 1, wherein in step 4, the text content of the user question to be identified is inputted into a trained XLNet-BiGRU-CRF model, the text content is converted into a feature vector after passing through an XLNet chinese model, the feature vector is subjected to feature extraction through a BiGRU network, and finally, the most probable annotation sequence in the text is obtained as the result of named entity identification by using a viterbi algorithm in a CRF layer.
7. The XLNET-BiGRU-CRF-based intelligent question-answering method of claim 1, wherein the cosine similarity in step 5 is calculated by the formula:
Figure FDA0003204434220000037
wherein score is the similarity value, VqueryEmbedding sentence vector, V, for user questioncorpusIs the Embedding sentence vector of the correlation problem.
CN202110913182.1A 2021-08-10 2021-08-10 Intelligent question-answering method based on XLnet model and knowledge graph Active CN113641809B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110913182.1A CN113641809B (en) 2021-08-10 2021-08-10 Intelligent question-answering method based on XLnet model and knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110913182.1A CN113641809B (en) 2021-08-10 2021-08-10 Intelligent question-answering method based on XLnet model and knowledge graph

Publications (2)

Publication Number Publication Date
CN113641809A true CN113641809A (en) 2021-11-12
CN113641809B CN113641809B (en) 2023-12-08

Family

ID=78420446

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110913182.1A Active CN113641809B (en) 2021-08-10 2021-08-10 Intelligent question-answering method based on XLnet model and knowledge graph

Country Status (1)

Country Link
CN (1) CN113641809B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114582449A (en) * 2022-01-17 2022-06-03 内蒙古大学 Electronic medical record named entity standardization method and system based on XLNet-BiGRU-CRF model
CN114970563A (en) * 2022-07-28 2022-08-30 山东大学 Chinese question generation method and system fusing content and form diversity

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160357855A1 (en) * 2015-06-02 2016-12-08 International Business Machines Corporation Utilizing Word Embeddings for Term Matching in Question Answering Systems
CN110083831A (en) * 2019-04-16 2019-08-02 武汉大学 A kind of Chinese name entity recognition method based on BERT-BiGRU-CRF
CN112650845A (en) * 2020-12-30 2021-04-13 西安交通大学 Question-answering system and method based on BERT and knowledge representation learning
CN113010693A (en) * 2021-04-09 2021-06-22 大连民族大学 Intelligent knowledge graph question-answering method fusing pointer to generate network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160357855A1 (en) * 2015-06-02 2016-12-08 International Business Machines Corporation Utilizing Word Embeddings for Term Matching in Question Answering Systems
CN110083831A (en) * 2019-04-16 2019-08-02 武汉大学 A kind of Chinese name entity recognition method based on BERT-BiGRU-CRF
CN112650845A (en) * 2020-12-30 2021-04-13 西安交通大学 Question-answering system and method based on BERT and knowledge representation learning
CN113010693A (en) * 2021-04-09 2021-06-22 大连民族大学 Intelligent knowledge graph question-answering method fusing pointer to generate network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
梁淑蓉 等: "基于XLNet的情感分析模型", 《科学技术与工程》, vol. 21, no. 17, pages 7200 - 7207 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114582449A (en) * 2022-01-17 2022-06-03 内蒙古大学 Electronic medical record named entity standardization method and system based on XLNet-BiGRU-CRF model
CN114970563A (en) * 2022-07-28 2022-08-30 山东大学 Chinese question generation method and system fusing content and form diversity
CN114970563B (en) * 2022-07-28 2022-10-25 山东大学 Chinese question generation method and system fusing content and form diversity

Also Published As

Publication number Publication date
CN113641809B (en) 2023-12-08

Similar Documents

Publication Publication Date Title
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN112115238B (en) Question-answering method and system based on BERT and knowledge base
CN111985239B (en) Entity identification method, entity identification device, electronic equipment and storage medium
CN110609891A (en) Visual dialog generation method based on context awareness graph neural network
CN109508459B (en) Method for extracting theme and key information from news
CN111046179B (en) Text classification method for open network question in specific field
CN108363743A (en) A kind of intelligence questions generation method, device and computer readable storage medium
CN106569998A (en) Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN112487820B (en) Chinese medical named entity recognition method
CN110222163A (en) A kind of intelligent answer method and system merging CNN and two-way LSTM
CN114943230B (en) Method for linking entities in Chinese specific field by fusing common sense knowledge
CN111914556B (en) Emotion guiding method and system based on emotion semantic transfer pattern
CN111858896B (en) Knowledge base question-answering method based on deep learning
CN112749562A (en) Named entity identification method, device, storage medium and electronic equipment
CN111159345B (en) Chinese knowledge base answer acquisition method and device
CN113297364A (en) Natural language understanding method and device for dialog system
CN113641809B (en) Intelligent question-answering method based on XLnet model and knowledge graph
CN113051922A (en) Triple extraction method and system based on deep learning
CN114443813A (en) Intelligent online teaching resource knowledge point concept entity linking method
CN115169349A (en) Chinese electronic resume named entity recognition method based on ALBERT
CN115391520A (en) Text emotion classification method, system, device and computer medium
CN111444720A (en) Named entity recognition method for English text
CN113011196B (en) Concept-enhanced representation and one-way attention-containing subjective question automatic scoring neural network model
CN114548106A (en) Method for recognizing science collaborative activity named entity based on ALBERT
CN116522165B (en) Public opinion text matching system and method based on twin structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant