CN113641809B - Intelligent question-answering method based on XLnet model and knowledge graph - Google Patents

Intelligent question-answering method based on XLnet model and knowledge graph Download PDF

Info

Publication number
CN113641809B
CN113641809B CN202110913182.1A CN202110913182A CN113641809B CN 113641809 B CN113641809 B CN 113641809B CN 202110913182 A CN202110913182 A CN 202110913182A CN 113641809 B CN113641809 B CN 113641809B
Authority
CN
China
Prior art keywords
model
xlnet
text
vector
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110913182.1A
Other languages
Chinese (zh)
Other versions
CN113641809A (en
Inventor
刘大伟
胡笳
车少帅
张邱鸣
张玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Clp Hongxin Information Technology Co ltd
Original Assignee
Clp Hongxin Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Clp Hongxin Information Technology Co ltd filed Critical Clp Hongxin Information Technology Co ltd
Priority to CN202110913182.1A priority Critical patent/CN113641809B/en
Publication of CN113641809A publication Critical patent/CN113641809A/en
Application granted granted Critical
Publication of CN113641809B publication Critical patent/CN113641809B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/169Annotation, e.g. comment data or footnotes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses an intelligent question-answering method based on an XLnet model and a knowledge graph, which comprises the following steps: training an XLNet Chinese model; acquiring corpus data; constructing an XLnet-BiGRU-CRF neural network model and training; entity recognition is carried out on text contents of user problems to be recognized; and extracting a plurality of related questions with corresponding entities from the database according to the entity recognition results, respectively comparing the user questions with the plurality of related questions by using the Embedding sentence vector cosine similarity, taking the answer of the related question with the largest similarity score as a target result, and simultaneously providing the second and third related questions ranked by the similarity score for the user as the similar questions for reference by the user. According to the method, the text corpus of the user problem is processed by using the trained model, and the answers to the problems can be obtained more rapidly and accurately by combining a knowledge graph retrieval method.

Description

Intelligent question-answering method based on XLnet model and knowledge graph
Technical Field
The invention belongs to the technical field of intelligent question and answer, and particularly relates to an intelligent question and answer method based on an XLnet model and a knowledge graph.
Background
In recent years, with the development of big data and artificial intelligence technology, a question-answering system is already applied to various industries, and the question-answering system also becomes a key component of an intelligent robot, which affects the important link of robot and human communication.
Conventional question-answering systems are generally based on keyword retrieval, and do not consider semantic information of the questions. The question-answering system based on the knowledge graph can conduct online analysis processing on the text of the specific questions presented by the questioner, and then conduct retrieval to output the best matched answers, so that accurate answers to the questions can be obtained quickly. The knowledge graph typically stores data in a triplet format, such as "< higher mathematics > < press > < university of martial arts press >", where "higher mathematics" and "university of martial arts press" are two entities, respectively, and "press" is a relationship between the two entities. The input to such a question-and-answer system is a sentence text query, and then one or a set of triples that are most relevant to the query are found in the knowledge base and the corresponding entities in the triples are returned.
The current mainstream methods are: a method based on relation classification, a method based on searching and a method based on semantic parsing. Taking a method based on relation classification as an example, the method predicts the entity and the relation from the question, and then finds out the answer entity according to the entity and the relation. The method has the common characteristics that a prediction model needs to be trained by questions and corresponding logic expression data, and compared with the construction of a knowledge graph, the method has higher cost for labeling a special data set and needs a labeling person to master certain expertise including field expertise and query language knowledge. While methods based on semantic analysis have obstacles between logical expressions and natural language semantics. Meanwhile, compared with the front edge models such as BERT, XLNet (Generalized Autoregressive Pretraining for Language Understanding), the common models such as CNN, LSTM and the like have poor training effect and accuracy, and lack correlation analysis on characters or words in a problem text.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an intelligent question-answering method based on an XLnet model and a knowledge graph, and in order to achieve the purposes, the invention adopts the following technical scheme:
an intelligent question-answering method based on an XLnet model and a knowledge graph comprises the following steps:
step 1: training an XLNet Chinese model based on a large-scale unlabeled corpus, wherein the XLNet model comprises an arrangement language model, a double-flow attention mechanism and a transducer-XL core component;
step 2: acquiring training corpus data for constructing a knowledge graph and a named entity recognition model, preprocessing and labeling the training corpus data, storing triplet data obtained by preprocessing the training corpus data into a Neo4j database, and respectively extracting the Embedding sentence vectors of a plurality of problems corresponding to the triplet data according to the XLNet Chinese model trained in the step 1 and storing the Embedding sentence vectors into the Neo4j database; the triples consist of a question entity, a question attribute and an answer;
step 3: constructing an XLnet-BiGRU-CRF neural network model based on the XLnet Chinese model trained in the step 1, and training the XLnet-BiGRU-CRF model by using the training corpus data marked in the step 2;
step 4: performing entity recognition on the text content of the user problem to be recognized by using the trained XLnet-BiGRU-CRF model to obtain an entity recognition result;
step 5: extracting a plurality of related triples data with corresponding entities in the Neo4j database according to the entity identification result in the step 4, extracting an Embedding sentence vector of a user problem to be identified by utilizing an XLNet Chinese model, respectively comparing the Embedding sentence vector with the Embedding sentence vector of a plurality of problems corresponding to the extracted related triples data to make cosine similarity comparison, taking an answer corresponding to the problem with the largest similarity score as a target result, and simultaneously providing the problems and the answer corresponding to the second and third related triples of the similarity score ranking to a user as a similar problem for reference of the user.
Further, the permutation language model described in step 1 is used to randomly disorder the order of Chinese characters in text sentences, and the given text sequence with length T has a permutation combination of different orders of Chinese characters of A T A is one of permutation and combination and a epsilon A T The modeling process of the permutation language model is expressed as
Wherein the method comprises the steps of,Representing the desire of all permutation and combination +.>To arrange and combine the t-th element in a text sequence, x a<t For arranging and combining the 1 st to (t-1) th elements in the a text sequence, theta is a model parameter to be trained, and p θ Representing conditional probabilities.
Further, the dual stream attention mechanism in step 1 includes a text content attention stream and a query attention stream, the text content attention stream represents a self-attention mechanism including location information and content information, the query attention stream represents an input stream including only location information, the content information of the current location is not revealed when the required predicted location is predicted, and the text content attention stream and the query attention stream are combined to extract the characteristics of the related context information; the dual stream attention mechanism is specifically represented as follows:
wherein,query attention stream matrix vectors respectively representing m, m-1 layers, which contain only position information of input text,/->Content attention flow matrix vectors respectively representing m-th and m-1 th layers,/th>Representing the m-1 layer content Attention flow matrix vector when the arrangement and combination are 1 st to (t-1) elements in the a text sequence, wherein the Attention represents a classical self-Attention mechanism, and the calculation formula is as follows:
wherein Q, K, V are the input word vector matrix and dim is the input vector dimension.
Further, in the step 1, the XLNet chinese language model uses a transducer-XL framework as a core, and introduces a circulation mechanism and a relative position coding mechanism to use semantic information of a context to mine potential relations in a text vector.
In step 3, the feature vector output by the XLNet chinese language model is input to the biglu network, and the biglu network passes through the transmission and cutoff of the gate control information, and the specific state calculation formula is
Wherein x is t An input vector representing the current t moment, and a feature vector representing the t word in the text; h is a t 、h t-1 The hidden layer state matrix vectors at the current time t and the previous time t are respectively represented;the candidate hidden layer state at the current time t is also the new memory at the current time. z t Indicating the update gate for controlling the extent to which the state information of the previous moment is brought into the current state, z t The larger the value of (c) indicates the more state information remains at the previous time; r is (r) t Represents a reset gate for controlling the degree of ignoring state information of the previous moment, r t Smaller values of (c) indicate more rejection. w (w) z 、w r 、/>Respectively representing the weight matrix of the update gate, the reset gate and the candidate hidden states. Sigma represents a sigmoid nonlinear activation function, tanh represents a tanh activation function, and x represents a point multiplication of a vector.
And the output vector passing through the BiGRU network coding unit is Z, and the output vector Z is input to the CRF layer after softmax probability normalization. For a given input sequence X, the probability of predicting the output tag sequence y is defined as S (X, y), where y= (y) 1 ,y 2 ,……y n ) A tag sequence representing the number n of words contained in a sentence. The calculation formula of S (X, y) is as follows:
wherein,the element with the output vector Z of the BiGRU network coding unit is provided. />Is an element of the probability transition matrix output by the CRF layer, representing the output from the tag y t-1 To y t By utilizing the dependence among the labels, more reasonable label sequences are obtained. The probability of the whole tag sequence y can be seen to be the sum of the scores of the modules, and the score of each position is formed by two parts, wherein one part is the output probability matrix of the BiGRU network coding unit, and the other part is the output transition probability matrix of the CRF layer. The final prediction probability of the tag sequence y is obtained after normalization processing is carried out on the formula, and the formula is as follows:
wherein Y represents all possible tag sequences,one of all possible tag sequences;
the loss function L of the CRF layer adopts a negative log likelihood function, and the formula is as follows:
the Adam algorithm is adopted, the loss function of the CRF layer is utilized, parameters of the whole named entity recognition model are trained and updated, the parameters comprise model parameters including a BiGRU network and the CRF layer, and parameters of the XLNet Chinese model are kept unchanged.
Further, in the step 4, text content of the user problem to be identified is input into the XLNet-biglu-CRF model after training, the text is converted into feature vectors after passing through the XLNet chinese model, the feature vectors are subjected to feature extraction through the biglu network, and finally, the most probable labeling sequence in the text is obtained in the CRF layer by adopting the viterbi algorithm as a result of named entity identification.
Further, the calculation formula of the cosine similarity in step 5 is as follows:
wherein score is a similarity value, V query An Embedding sentence vector V for user problems corpus An Embedding sentence vector for the related problem.
The invention has the advantages and beneficial effects that: (1) The XLNet model used in the invention is completed based on the non-supervision training of large-scale non-labeling data, and the training can be better combined with context semantic information based on the permutation language model, so that the invention has strong text feature expression capability. (2) Based on the knowledge graph and the Neo4j database, the stored data set can be visualized more conveniently, and the searching speed is improved; (3) The XLNet model has strong text feature expression capability, and the introduction of the bidirectional GRU loop structure can better achieve the common coding of the context information. The CRF layer is accessed, so that the problem that the dependence among labels is not considered in the traditional entity identification can be effectively solved, and the accuracy of the identification result is further improved by combining the three modes.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a sample of a preprocessed corpus used to construct a named entity recognition model.
Detailed Description
The present invention is further described below with reference to the accompanying drawings in order to facilitate a study of the structure, features and technical contents of the present invention.
As shown in fig. 1, the method of the present invention comprises the steps of:
s1, training an XLNet Chinese model based on a large-scale non-labeling corpus.
The XLNet Chinese model mainly comprises an arrangement language model, a double-flow attention mechanism and a transducer-XL core component. Wherein the purpose of the language model is to randomly shuffle Chinese characters in text sentences, for Chinese character x i The sequence { x } of the Chinese character originally appearing behind it i+1 ,…,x n It may also appear in front of it. Assume that all permutations of a text sequence of length T combine to A T A is one of permutation and combination and a epsilon A T The modeling process of the permutation language model is expressed as
Wherein,to arrange and combine the t-th element in a text sequence, x a<t For arranging and combining the 1 st to (t-1) th elements in the a text sequence, theta is a model parameter to be trained, and p θ Representing conditional probabilities.
The XLNet adopts a double-flow Attention mechanism, wherein a text content Attention stream represents a Self-Attention mechanism containing position information and content information, a query Attention stream only contains an input stream of the position information, so that any content information of a current position can not be revealed when the query Attention stream predicts a required predicted position by utilizing the query Attention stream, the two mutually complement each other, and the characteristics of related context information can be better extracted, and the double-flow Attention mechanism is specifically represented as follows:
wherein,query attention stream matrix vectors respectively representing m, m-1 layers, which contain only position information of input text,/->Content Attention stream matrix vectors respectively representing m-th and m-1 th layers, which contain content information and position information of input text, and Attention represents classical self-Attention mechanism, and the calculation formula is as follows:
wherein Q, K, V are the input word vector matrix and dim is the input vector dimension.
The XLNet Chinese model takes a transducer-xl frame as a core, introduces a circulation mechanism and relative position coding, and can better utilize context semantic information to mine potential relations in text vectors.
The XLNet Chinese model is trained on large-scale unlabeled data to obtain corresponding model parameters, and feature vector representation of an input sequence can be obtained through reasoning.
S2, training corpus data for constructing a knowledge graph and a named entity recognition model are obtained, and the data are preprocessed and labeled.
The preprocessed triplet data of the corpus data is stored in a Neo4j database for constructing a knowledge graph, and the data set is generally composed of triples (such as < higher mathematics > < publishing company > < wuhan university publishing company >) composed of question entities, question attributes and answers. And simultaneously, extracting an Embedding sentence vector of the triple sentence by using the XLNET Chinese model trained in the step S1, and storing the Embedding sentence vector in a Neo4j database.
Labeling problem texts according to triple entities, and constructing labeling corpus of a named entity recognition model, wherein only the entities need to be recognized, and [ "O", "B-LOC", "I-LOC" ] are used for labeling, wherein O represents other non-entities, B-LOC represents entity start, and I-LOC represents entity non-start words, as shown in fig. 2.
S3, constructing an XLnet-BiGRU-CRF neural network model on the basis of the XLnet Chinese model trained in the step S1, and simultaneously training the model by using the labeled data in the step S2.
Firstly, inputting the labeled corpus into a trained XLNet Chinese model, outputting feature vectors by the XLNet Chinese model, and then inputting the feature vectors into a BiGRU neural network model. The BiGRU network is actually simplification of the LSTM network, and the specific state calculation formula is as follows through the transmission and cut-off of gate control information:
wherein x is t An input vector representing the current t moment, and a feature vector representing the t word in the text; h is a t 、h t-1 The hidden layer state matrix vectors at the current time t and the previous time t are respectively represented;the candidate hidden layer state at the current time t is also the new memory at the current time. z t Indicating the update gate for controlling the extent to which the state information of the previous moment is brought into the current state, z t The larger the value of (c) indicates the more state information remains at the previous time; r is (r) t Represents a reset gate for controlling the degree of ignoring state information of the previous moment, r t Smaller values of (c) indicate more rejection. w (w) z 、w r 、/>Respectively representing the weight matrix of the update gate, the reset gate and the candidate hidden states. Sigma represents a sigmoid nonlinear activation function, tanh represents a tanh activation function, and x represents a point multiplication of a vector.
And the output vector passing through the BiGRU network coding unit is Z, and the output vector Z is input to the CRF layer after softmax probability normalization. For a given input sequence X, the probability of predicting the output tag sequence y is defined as S (X, y), where y= (y) 1 ,y 2 ,……y n ) A tag sequence representing the number n of words contained in a sentence. The calculation formula of S (X, y) is as follows:
wherein,the element with the output vector Z of the BiGRU network coding unit is provided. />Is an element of the probability transition matrix output by the CRF layer, representing the output from the tag y t-1 To y t By utilizing the dependence among the labels, more reasonable label sequences are obtained. The probability of the whole tag sequence y can be seen to be the sum of the scores of the modules, and the score of each position is formed by two parts, wherein one part is the output probability matrix of the BiGRU network coding unit, and the other part is the output transition probability matrix of the CRF layer. The final prediction probability of the tag sequence y is obtained after normalization processing is carried out on the formula, and the formula is as follows:
wherein Y represents all possible tag sequences. The loss function of the CRF layer adopts a negative log likelihood function, and the formula is as follows:
and updating parameters of the whole named entity recognition model by using a loss function of a CRF layer by adopting an Adam algorithm, wherein the parameters comprise model parameters including a BiGRU neural network model and the CRF layer, parameters of an XLNet Chinese model are kept unchanged, and when a loss value generated by the model meets a set requirement or reaches a set maximum iteration number, training of the model is stopped.
S4, performing entity recognition on text content of the user problem to be recognized by using the XLnet-BiGRU-CRF model trained in the S3 to obtain a recognition result, wherein the method mainly comprises the following steps of:
s4-1, inputting text data to be identified into a trained XLnet-BiGRU-CRF neural network model;
s4-2, converting the text data into feature vectors after passing through an XLNet Chinese model, carrying out feature extraction on the feature vectors through a BiGRU network, and finally solving the maximum possible labeling sequence in the text by adopting a Viterbi algorithm in a CRF layer, namely, obtaining a named entity recognition result.
S5, extracting a plurality of related questions with corresponding entities in the Neo4j database according to the named entity recognition result, extracting an Embedding sentence vector of the user question to be recognized by using the XLNet Chinese model, and then respectively carrying out cosine similarity comparison with the Embedding sentence vector which is already stored in the Neo4j database and has the related questions of the corresponding entities in S2, taking the answer of the question with the highest similarity as a target result, and simultaneously providing the second and third ranked questions for the user as similar questions for reference of the user. The corresponding cosine similarity is calculated as follows:
wherein score is a similarity value, V query An Embedding sentence vector V for user problems corpus An Embedding sentence vector for the related problem.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the invention without departing from the principles thereof are intended to be within the scope of the invention as set forth in the following claims.

Claims (2)

1. An intelligent question-answering method based on an XLnet model and a knowledge graph is characterized by comprising the following steps:
step 1: training an XLNET Chinese model based on a large-scale unlabeled corpus, wherein the XLNET Chinese model comprises an arrangement language model, a double-flow attention mechanism and a transducer-XL core component;
step 2: acquiring training corpus data for constructing a knowledge graph and a named entity recognition model, preprocessing and labeling the training corpus data, storing triplet data obtained by preprocessing the training corpus data into a Neo4j database, and respectively extracting an Embedding sentence vector of a problem corresponding to the triplet data according to the XLNet Chinese model trained in the step 1 and storing the Embedding sentence vector into the Neo4j database; the triples consist of a question entity, a question attribute and an answer;
step 3: constructing an XLnet-BiGRU-CRF neural network model based on the XLnet Chinese model trained in the step 1, and training the XLnet-BiGRU-CRF model by using the training corpus data marked in the step 2;
step 4: performing entity recognition on the text content of the user problem to be recognized by using the trained XLnet-BiGRU-CRF model to obtain an entity recognition result;
step 5: extracting a plurality of related triples of data with corresponding entities in a Neo4j database according to the entity identification result in the step 4, extracting an Embedding sentence vector of a user problem to be identified by utilizing an XLNet Chinese model, respectively comparing the Embedding sentence vector with the Embedding sentence vector of the problem corresponding to the extracted plurality of related triples of data to make cosine similarity comparison, taking an answer corresponding to the problem with the highest similarity score as a target result, and simultaneously providing the problems and the answer corresponding to the related triples of which the similarity scores are ranked second and third to a user as a similar problem for reference by the user;
the permutation language model described in step 1 is used for randomly disturbing the sequence of Chinese characters in text sentences, and given a text sequence with length T, the permutation combination of different sequences of Chinese characters is A T A is one of permutation and combination and a epsilon A T The modeling process of the permutation language model is expressed as
Wherein,representing the desire of all permutation and combination +.>To arrange and combine the t-th element in a text sequence, x a<t For arranging and combining the 1 st to (t-1) th elements in the a text sequence, theta is a model parameter to be trained, and p θ Representing conditional probabilities;
the dual-stream attention mechanism in step 1 includes a text content attention stream and a query attention stream, the text content attention stream represents a self-attention mechanism containing position information and content information, the query attention stream represents an input stream containing only position information, the content information of the current position is not revealed when the needed predicted position is predicted, and the text content attention stream and the query attention stream are combined to extract the characteristics of the related context information; the dual stream attention mechanism is specifically represented as follows:
the query attention flow is:
the text content attention stream is:
wherein,query attention stream matrix vectors representing the m-th and m-1-th layers, respectively, which contain only the position information of the input text,/the text>Content attention stream matrix vectors respectively representing the m-th layer and the m-1 th layer, which contain content information and position information of an input text, < ->Representing the m-1 layer content Attention flow matrix vector when the arrangement and combination are 1 st to (t-1) elements in the a text sequence, wherein the Attention represents a classical self-Attention mechanism, and the calculation formula is as follows:
wherein Q, K, V are input word vector matrices, dim is the input vector dimension;
in the step 1, the XLNet Chinese model takes a transducer-XL frame as a core, and introduces a circulation mechanism and a relative position coding mechanism to utilize semantic information of a context so as to mine potential relations in text vectors;
in step 3, the feature vector output by the XLNet Chinese model is input to the BiGRU network, and the BiGRU network is transmitted and cut off through gate control information, and a specific state calculation formula is as follows
Wherein x is t An input vector representing the current t moment, and a feature vector representing the t word in the text; h is a t 、h t-1 Respectively represent the current time and the previous timeA hidden layer state matrix vector at a moment;representing the state of the candidate hidden layer at the current time, which is also the new memory at the current time, z t Indicating the update gate for controlling the extent to which the state information of the previous moment is brought into the current state, z t The larger the value of (c) indicates the more state information remains at the previous time; r is (r) t Represents a reset gate for controlling the degree of ignoring state information of the previous moment, r t Smaller values of (c) indicate more rejects, w z 、w r 、/>Respectively representing a weight matrix of an update gate, a reset gate and a candidate hidden state, wherein sigma represents a sigmoid nonlinear activation function, and tanh represents a tanh activation function;
the output vector passing through the BiGRU network coding unit is Z, the output vector Z is input to the CRF layer after being subjected to softmax probability normalization, and for a given input sequence X, the probability of a predicted output tag sequence y is defined as S (X, y), wherein y= (y) 1 ,y 2 ,……y n ) The calculation formula of S (X, y) representing the tag sequence with n words contained in the sentence is as follows:
wherein,element with output vector Z for BiGRU network coding unit, < >>Is an element of the probability transition matrix output by the CRF layer, representing the output from the tag y t-1 To y t The transition probability of the tag sequence y is obtained after normalization processing of the formula,
wherein Y represents all possible tag sequences,one of all possible tag sequences;
the loss function L of the CRF layer uses a negative log-likelihood function,
training and updating parameters of the whole named entity recognition model by using a loss function of a CRF layer through an Adam algorithm, wherein the parameters comprise model parameters including a BiGRU network and the CRF layer, and parameters of an XLNet Chinese model are kept unchanged;
in the step 4, inputting the text content of the user problem to be identified into the trained XLNet-biglu-CRF model, converting the text content into feature vectors after passing through the XLNet chinese model, extracting features of the feature vectors through the biglu network, and finally solving the most probable labeling sequence in the text at the CRF layer by using viterbi algorithm as the result of named entity identification.
2. The intelligent question-answering method based on the XLnet model and the knowledge graph as claimed in claim 1, wherein the cosine similarity calculation formula in the step 5 is:
wherein score is a similarity value, V query An Embedding sentence vector V for user problems corpus An Embedding sentence vector for the related problem.
CN202110913182.1A 2021-08-10 2021-08-10 Intelligent question-answering method based on XLnet model and knowledge graph Active CN113641809B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110913182.1A CN113641809B (en) 2021-08-10 2021-08-10 Intelligent question-answering method based on XLnet model and knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110913182.1A CN113641809B (en) 2021-08-10 2021-08-10 Intelligent question-answering method based on XLnet model and knowledge graph

Publications (2)

Publication Number Publication Date
CN113641809A CN113641809A (en) 2021-11-12
CN113641809B true CN113641809B (en) 2023-12-08

Family

ID=78420446

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110913182.1A Active CN113641809B (en) 2021-08-10 2021-08-10 Intelligent question-answering method based on XLnet model and knowledge graph

Country Status (1)

Country Link
CN (1) CN113641809B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114582449A (en) * 2022-01-17 2022-06-03 内蒙古大学 Electronic medical record named entity standardization method and system based on XLNet-BiGRU-CRF model
CN114970563B (en) * 2022-07-28 2022-10-25 山东大学 Chinese question generation method and system fusing content and form diversity

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083831A (en) * 2019-04-16 2019-08-02 武汉大学 A kind of Chinese name entity recognition method based on BERT-BiGRU-CRF
CN112650845A (en) * 2020-12-30 2021-04-13 西安交通大学 Question-answering system and method based on BERT and knowledge representation learning
CN113010693A (en) * 2021-04-09 2021-06-22 大连民族大学 Intelligent knowledge graph question-answering method fusing pointer to generate network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10467268B2 (en) * 2015-06-02 2019-11-05 International Business Machines Corporation Utilizing word embeddings for term matching in question answering systems

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083831A (en) * 2019-04-16 2019-08-02 武汉大学 A kind of Chinese name entity recognition method based on BERT-BiGRU-CRF
CN112650845A (en) * 2020-12-30 2021-04-13 西安交通大学 Question-answering system and method based on BERT and knowledge representation learning
CN113010693A (en) * 2021-04-09 2021-06-22 大连民族大学 Intelligent knowledge graph question-answering method fusing pointer to generate network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
梁淑蓉 等.基于XLNet的情感分析模型.《科学技术与工程》.2021,第21卷(第17期),第7200-7207页. *

Also Published As

Publication number Publication date
CN113641809A (en) 2021-11-12

Similar Documents

Publication Publication Date Title
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN109657239B (en) Chinese named entity recognition method based on attention mechanism and language model learning
CN109472024B (en) Text classification method based on bidirectional circulation attention neural network
CN112115238B (en) Question-answering method and system based on BERT and knowledge base
CN113239700A (en) Text semantic matching device, system, method and storage medium for improving BERT
CN110413986A (en) A kind of text cluster multi-document auto-abstracting method and system improving term vector model
CN109508459B (en) Method for extracting theme and key information from news
CN111914556B (en) Emotion guiding method and system based on emotion semantic transfer pattern
CN110263325A (en) Chinese automatic word-cut
CN113641809B (en) Intelligent question-answering method based on XLnet model and knowledge graph
CN111274790B (en) Chapter-level event embedding method and device based on syntactic dependency graph
Zhang et al. n-BiLSTM: BiLSTM with n-gram Features for Text Classification
CN114756681B (en) Evaluation and education text fine granularity suggestion mining method based on multi-attention fusion
CN111476024A (en) Text word segmentation method and device and model training method
Wu et al. An effective approach of named entity recognition for cyber threat intelligence
CN114781375A (en) Military equipment relation extraction method based on BERT and attention mechanism
CN113535897A (en) Fine-grained emotion analysis method based on syntactic relation and opinion word distribution
CN115223021A (en) Visual question-answering-based fruit tree full-growth period farm work decision-making method
CN115169349A (en) Chinese electronic resume named entity recognition method based on ALBERT
CN114969269A (en) False news detection method and system based on entity identification and relation extraction
CN116522165B (en) Public opinion text matching system and method based on twin structure
CN117171333A (en) Electric power file question-answering type intelligent retrieval method and system
CN114372454A (en) Text information extraction method, model training method, device and storage medium
CN117033423A (en) SQL generating method for injecting optimal mode item and historical interaction information
US20230376828A1 (en) Systems and methods for product retrieval

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant