CN112183085A - Machine reading understanding method and device, electronic equipment and computer storage medium - Google Patents

Machine reading understanding method and device, electronic equipment and computer storage medium Download PDF

Info

Publication number
CN112183085A
CN112183085A CN202010955175.3A CN202010955175A CN112183085A CN 112183085 A CN112183085 A CN 112183085A CN 202010955175 A CN202010955175 A CN 202010955175A CN 112183085 A CN112183085 A CN 112183085A
Authority
CN
China
Prior art keywords
vector
article
attention
question
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010955175.3A
Other languages
Chinese (zh)
Inventor
嵇望
安毫亿
王伟凯
钱艳
朱鹏飞
梁青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Yuanchuan New Technology Co ltd
Original Assignee
Hangzhou Yuanchuan New Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Yuanchuan New Technology Co ltd filed Critical Hangzhou Yuanchuan New Technology Co ltd
Priority to CN202010955175.3A priority Critical patent/CN112183085A/en
Publication of CN112183085A publication Critical patent/CN112183085A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a machine reading understanding method, a device, equipment and a medium, wherein the machine reading understanding method comprises the following steps: receiving a question and an article, converting the question into a question vector through calculation of a coding layer, converting the article into an article vector through calculation of the coding layer, calculating the question vector and the article vector through a multiple attention layer to obtain an interaction information vector, wherein the multiple attention layer comprises a trained self-attention model and a plurality of attention matching models, calculating the article through a trained BTM topic model to obtain topic words, encoding the topic words to obtain topic feature vectors, and calculating the interaction information vector and the topic feature vectors through a nonlinear output layer to obtain answers related to the question. By the method and the device, the problem that the accuracy of the answer is low when a single word matching method is used for machine reading understanding is solved, and the accuracy of the machine reading understanding of the answer is improved.

Description

Machine reading understanding method and device, electronic equipment and computer storage medium
Technical Field
The present invention relates to the field of natural language processing, and in particular, to a machine reading and understanding method and apparatus, an electronic device, and a computer storage medium.
Background
Machine-readable understanding is a technique that uses algorithms to make computing mechanisms solve article semantics and answer related questions. In the related art, a question is matched with words in an article by using a word matching method, and then answers related to the question are obtained from the article, but the single word matching method causes the interaction information between the question and the words in the article to be too simple, so that the obtained answers are not related to the question, and the accuracy of the answers is low.
Aiming at the problem that the accuracy of answers is low when a single word matching method is used for machine reading understanding in the related art, an effective solution is not provided at present.
Disclosure of Invention
The embodiment of the application provides a machine reading understanding method, a machine reading understanding device, electronic equipment and a computer storage medium, so as to at least solve the problem that the accuracy of answers is low when a single word matching method is used for machine reading understanding in the related art.
In a first aspect, an embodiment of the present application provides a machine reading understanding method, where the method includes:
receiving a question and an article, converting the question into a question vector through calculation of a coding layer, and converting the article into an article vector through calculation of the coding layer;
calculating the problem vector and the article vector through a multiple attention layer to obtain an interactive information vector, wherein the multiple attention layer comprises a trained self-attention model and a plurality of attention matching models;
calculating the article through a trained BTM topic model to obtain topic words, and coding the topic words to obtain topic feature vectors;
and calculating the interaction information vector and the theme characteristic vector through a nonlinear output layer to obtain an answer related to the question.
In some embodiments, the coding layer comprises a BERT model and a preset gated hole convolution layer, and the transforming the problem into a problem vector by coding layer calculation comprises:
learning the problem based on the BERT model to obtain a first word vector of each single word in the problem, and forming a first intermediate vector by the first word vector of each single word;
calculating the first intermediate vector through the gated cavity convolution layer to obtain the problem vector;
the converting the article into an article vector through the encoding layer calculation comprises:
learning the article based on the BERT model to obtain second word vectors of the individual characters in the article, and forming second intermediate vectors by the second word vectors of the individual characters;
and calculating the second intermediate vector through the gated cavity convolution layer to obtain the article vector.
In some embodiments, the gated hole convolution layer includes a plurality of layers of sequentially connected hole convolution gate units, each hole convolution gate unit is configured to sample an input vector at intervals to obtain an output vector, and use the output vector as an input vector of a next layer of hole convolution gate units.
In some embodiments, where the plurality of attention matching models includes a dot product attention model and a Concat-based attention model, the calculating the question vector and the article vector through multiple attention layers to obtain an interaction information vector includes:
calculating the question vector and the article vector through a trained dot product attention model to obtain a first vector and a second vector, wherein the first vector is the dot product attention vector of the question about the article, and the second vector is the dot product attention vector of the article about the question;
calculating the question vector and the article vector through a trained Concat-based attention model to obtain a third vector and a fourth vector, wherein the third vector is a Concat attention vector of the question about the article, and the fourth vector is a Concat attention vector of the article about the question;
merging the first vector, the second vector, the third vector and the fourth vector to obtain a merged vector;
and calculating the merged vector through the trained self-attention model to obtain the interactive information vector.
In some of these embodiments, the trained BTM topic model is obtained by:
acquiring an article corpus training set;
sampling the theme distribution on the article corpus training set through a Dirichlet distribution function;
sampling the distribution of terms under a plurality of subjects through the Dirichlet distribution function;
and extracting a target theme from the theme distribution, and extracting word pairs from the target theme to make the word pairs obey the term distribution under the target theme.
In some embodiments, the encoding the topic word to obtain a topic feature vector includes:
acquiring a training corpus, learning the training corpus based on a BERT model to obtain a third word vector of each single word in the training corpus, and forming a word vector library by the third word vector of each single word;
and obtaining a third word vector of each single word in the topic words from the word vector library, and forming the topic feature vector by the third word vector of each single word of the topic words.
In some embodiments, the nonlinear output layer includes a preset hyperbolic tangent function, and the calculating the interaction information vector and the topic feature vector through the nonlinear output layer to obtain the answer related to the question includes:
and according to the interaction information vector and the theme characteristic vector, carrying out nonlinear mapping through the hyperbolic tangent function to obtain the answer extracted from the article.
In a second aspect, an embodiment of the present application provides a machine reading and understanding device, including: the system comprises a coding module, a multi-attention module, a theme acquisition module and an answer calculation module;
the encoding module is used for receiving questions and articles, converting the questions into question vectors through encoding layer calculation, and converting the articles into article vectors through the encoding layer calculation;
the multi-attention module is used for calculating the question vector and the article vector through a multi-attention layer to obtain an interactive information vector, wherein the multi-attention layer comprises a trained self-attention model and a plurality of attention matching models;
the topic acquisition module is used for calculating the article through a trained BTM topic model to obtain topic words, and coding the topic words to obtain topic feature vectors;
and the answer calculation module is used for calculating the interaction information vector and the theme characteristic vector through a nonlinear output layer to obtain an answer related to the question.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the processor implements the machine reading understanding method according to the first aspect.
In a fourth aspect, embodiments of the present application provide a computer storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the machine reading understanding method according to the first aspect.
Compared with the related technology, the machine reading understanding method provided by the embodiment of the application obtains the answer related to the problem by receiving the problem and the article, converting the problem into the problem vector through calculation of the coding layer, converting the article into the article vector through calculation of the coding layer, and calculating the problem vector and the article vector through the multiple attention layers to obtain the interactive information vector, wherein the multiple attention layers comprise a trained self-attention model and a plurality of attention matching models, calculating the article through the trained BTM topic model to obtain the topic word, encoding the topic word to obtain the topic feature vector, and calculating the interactive information vector and the topic feature vector through the nonlinear output layer, so that the problem of low accuracy of the answer existing in machine reading understanding by using a single word matching method is solved, the accuracy of understanding the answer by machine reading is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow chart of a machine reading understanding method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a structure of a void convolution gating cell according to an embodiment of the present application;
FIG. 3 is a flow chart of obtaining an interaction information vector according to an embodiment of the application;
FIG. 4 is a flow diagram of a machine reading understanding method in accordance with a preferred embodiment of the present application;
FIG. 5 is a schematic diagram of a machine-readable understanding apparatus according to an embodiment of the present application;
fig. 6 is an internal structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar terms in this application is not intended to be limiting, and can be singular or plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describe the association relationship of the associated objects, there may be three relationships, for example, "a and/or B" may: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally refers to a front and back related object in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The embodiment provides a machine reading understanding method. Fig. 1 is a flowchart of a machine reading understanding method according to an embodiment of the present application, as shown in fig. 1, the flowchart includes the following steps:
s110, receiving the question and the article, converting the question into a question vector through calculation of a coding layer, and converting the article into an article vector through calculation of the coding layer. The coding layer is used for respectively carrying out bottom layer processing on the articles and the problems, carrying out digital coding on the texts of the articles and the problems and converting the articles and the problems into information units which can be processed by a computer. The coding layer is used for coding each single word, phrase and sentence in the article and the question on the basis of understanding the context so as to keep the semantic meaning of the original sentence in the article. In the coding layer, the problem and the article are respectively segmented to obtain a plurality of single characters in respective texts, then each single character is converted into a corresponding word vector, and the word vector of each single character is further coded to obtain word level vector representation of the problem and the article. The conversion of each Word into a Word vector may use any one of Word2vec model, glove (global Vectors for Word representation) model, and elmo (embedding from Language models). The word-level vector representation of the questions and articles can be Bi-directionally encoded using a BiGRU (Bi-directional Gated RNN), also known as a Bi-directional Gated round-robin network.
And S120, calculating the question vector and the article vector through multiple attention layers to obtain an interactive information vector. The multi-attention layer includes a trained self-attention model and a plurality of attention matching models. Because of the relevance between the articles and the questions, the connection between the articles and the questions needs to be established so as to obtain answers related to the questions from the articles according to the interaction information between the articles and the questions. By combining the semantics of the articles and the questions together for consideration, the understanding of the questions can be deepened by means of the semantic analysis of the articles, and the understanding of the articles can be deepened by means of the semantic analysis of the questions, so that the semantic relation between the articles and the questions is focused. The attention mechanism in deep learning is similar to the selective visual attention mechanism of human beings in nature, and the core goal is to select information which is more critical to the current task goal from a plurality of information. The multi-attention layer integrates various attention mechanisms, interaction on the word level in the sentence can be enhanced by using a plurality of attention matching models, and interaction information among words can be enriched. And a long-distance dependency relationship can be established by the self-attention mechanism, important word characteristics in the sentence are found, and the method is suitable for establishing the relationship between the long article and the problem. The interaction information vector obtained by multiple attention layers can contain richer and more effective interaction information between the question and the article.
S130, the articles are calculated through the trained BTM topic model to obtain topic words, and the topic words are encoded to obtain topic feature vectors. After determining the Topic distribution and the word distribution, the BTM (Topic model) Topic model correspondingly takes two words, namely after segmenting a text, carrying out pairing on any two words in the window length, wherein the two words are called a pair of biterm, and then obtaining each Topic word in the document according to the probability that each pair of biterm belongs to each Topic. The BTM topic model can effectively solve the sparsity problem by relaxing the constraint that the entire document must belong to one topic to the point that two words within the window length belong to one topic. The topic words reflect the topic content of the article, so that the topic feature vector contains the topic information of the article.
And S140, calculating the interactive information vector and the theme characteristic vector through a nonlinear output layer to obtain answers related to the questions. By integrating the topic information of the article into the interactive information between the article and the question, the extraction of answers related to the question from the article can be better guided.
Through the steps, through fusing multiple attention mechanisms, interaction information on the level of words in the question and the article is enhanced by using multiple attention matching models, the relevance between the question and the article is enhanced, and meanwhile, the topic information of the article is fused into the interaction information to guide the extraction of answers from the article, so that the relevance degree between the answers and the question is improved, and the problem of low accuracy of the answers existing in machine reading understanding by using a single word matching method is solved.
In some embodiments, the coding layers include a BERT model and preset gated hole convolution layers. Converting the problem into a problem vector through coding layer calculation, specifically comprising: learning the problem based on a BERT model to obtain a first word vector of each single word in the problem, and forming a first intermediate vector by the first word vectors of the single words as query:
Figure BDA0002678359460000071
wherein m represents the problem qThe number of the single words is,
Figure BDA0002678359460000072
and (3) a first word vector, i is 1, i.m, representing the ith single word in the problem q, and then calculating the first intermediate vector through a gated hole convolution layer to obtain a problem vector and marking the problem vector as Vq
Figure BDA0002678359460000073
Wherein, ConvresIndicating a gated hole convolution layer. Similarly, converting the article into an article vector through encoding layer calculation specifically includes: learning the article based on a BERT model to obtain a second word vector of each single word in the article, and forming a second intermediate vector by the second word vectors of the single words as Document:
Figure BDA0002678359460000081
wherein n represents the number of single characters in the article d,
Figure BDA0002678359460000082
and calculating a second word vector representing the jth single word in the article d, wherein j is 1d
Figure BDA0002678359460000083
ConvresIndicating gated hole convolution layer operation.
The BERT (bidirectional Encoder retrieval from transformations) model is a deep bidirectional pre-training language understanding model using a Transformer model as a feature extractor, and essentially learns a good feature representation for words by running a self-supervision learning method on the basis of massive linguistic data. The Transformer model is an NLP classical model proposed by the Google team, and because the Transformer model uses a self-attention mechanism and does not adopt a sequential structure of a cyclic neural network, the Transformer model can be subjected to parallelization training and can have global information. Therefore, the first intermediate vector obtained based on BERT model learning can effectively express the semantics among the single characters in the problem, and similarly, the second intermediate vector can effectively express the semantics among the single characters in the article. The hole convolution is a variation of convolution, and unlike ordinary convolution, hole convolution can sample input text at intervals across text segments with the same hole rate, and by stacking hole convolutions with exponentially increasing hole rates, coverage to most sentence lengths with fewer layers can be achieved. Through the BERT model and the gated cavity convolution layer, the representation capability of the question vector and the article vector is improved, the more accurate interactive information vector can be obtained through the question vector and the article vector, and the accuracy of the answer is further improved.
In some embodiments, the gated hole convolution layer includes a plurality of layers of sequentially connected hole convolution gating cells. FIG. 2 is a schematic structural diagram of a hole convolution gating unit according to an embodiment of the present application, and as shown in FIG. 2, each hole convolution gating unit includes a hole convolution network and a residual error network. And each cavity convolution gate control unit is used for sampling the input vector at intervals to obtain an output vector, and the output vector is used as the input vector of the next layer of cavity convolution gate control unit. The operation of each hole convolution gating cell can be represented by the following formula:
Figure BDA0002678359460000084
wherein X represents an input vector of the hole convolution gate control unit, Y represents an output vector of the hole convolution gate control unit, Conv1 represents convolution operation 1, Conv2 represents convolution operation 2, Conv1 and Conv2 are both hole convolutions, the set filter number and window size of the two are consistent but not shared, and Sig represents a sigmoid activation function.
Ordinary convolution operation can process texts in parallel, the model training time is shortened remarkably, but in order to obtain a good effect on long-distance dependence, a plurality of convolution layers are required to be accumulated, and the risk of gradient disappearance is increased. The gated hole convolution layer introduces a residual mechanism on the basis of the hole convolution network, so that information is transmitted in multiple channels, effective information can be strengthened, useless information is eliminated, and the problem of gradient disappearance brought by a deep network is solved.
In some embodiments, fig. 3 is a flowchart of obtaining an interaction information vector according to an embodiment of the present application, and as shown in fig. 3, in a case that the plurality of attention matching models include a dot product attention model and a Concat-based attention model, calculating a question vector and an article vector through multiple attention layers to obtain the interaction information vector includes the following steps:
s310, calculating the problem vector and the article vector through the trained dot product attention model to obtain a first vector and a second vector. The first vector is question q the dot product attention vector for article d is denoted as Vm1The second vector is the dot product attention vector of article d about question q and is denoted as Vm2. In particular, a first vector Vm1Obtained by the following method: two vectors are combined
Figure BDA0002678359460000091
And
Figure BDA0002678359460000092
after dot product, the dimension is consistent with the original dimension, then nonlinear change is carried out through hyperbolic tangent function, and then v is carried outdAfter linear variation of (A) to obtain the original weight
Figure BDA0002678359460000093
Finally obtaining the normalized weighted vector expression V of the question q about the article dm1I.e. the first vector Vm1Obtained by the following formula:
Figure BDA0002678359460000094
Figure BDA0002678359460000095
Figure BDA0002678359460000096
wherein the content of the first and second substances,
Figure BDA0002678359460000097
a vector representing the jth word in the problem vector,
Figure BDA0002678359460000098
vector, v, representing the t-th word in the article vectordFor a trained parameter, tanh (-) represents a hyperbolic tangent function. The second vector V can be obtained by the same methodm2
And S320, calculating the problem vector and the article vector through a trained attention model based on Concat to obtain a third vector and a fourth vector. Third vector is question q the Concat attention vector for article d is denoted as Vm3The fourth vector is the Concat attention vector of article d for question q, denoted as Vm4. In particular, the third vector Vm3Obtained by the following method: will be provided with
Figure BDA0002678359460000099
Through
Figure BDA00026783594600000910
Vector after matrix mapping and
Figure BDA00026783594600000911
through
Figure BDA00026783594600000912
Combining the vectors after matrix mapping, and after nonlinear change is carried out on the hyperbolic tangent function in order to ensure nonlinearity, passing through vcMapping the vector to obtain the original weight
Figure BDA00026783594600000913
Finally obtaining the normalized weighted vector expression V of the question q about the article dm3I.e. the third vector Vm3Can be obtained by the following formula:
Figure BDA0002678359460000101
Figure BDA0002678359460000102
Figure BDA0002678359460000103
wherein the content of the first and second substances,
Figure BDA0002678359460000104
a vector representing the jth word in the problem vector,
Figure BDA0002678359460000105
the vector representing the t-th single word in the article vector,
Figure BDA0002678359460000107
and vcFor a trained parameter, tanh (-) represents a hyperbolic tangent function. The fourth vector V can be obtained by the same methodm4
S330, combining the first vector, the second vector, the third vector and the fourth vector to obtain a combined vector: vc=Concat(Vm1,Vm2,Vm3,Vm4) Wherein V iscRepresents a merge vector, Vm1、Vm2、Vm3And Vm4Respectively, a first vector, a second vector, a third vector and a fourth vector, and Concat (·) represents a merge operation.
And S340, calculating the merged vector through the trained self-attention model to obtain an interactive information vector. The interaction information vector is obtained by the following formula:
Vs=Attention(VcWc,VsWs) Wherein V iscRepresents a merge vector, VsRepresenting an interaction information vector, WcIs the weight of the combined vector, WsIs the weight of the interaction information vector, Attention (·) represents the self-Attention operation. The self-attention mechanism canAnd important word features in the sentence are found and are continuously adjusted in the process of back propagation, and the weight is changed. Meanwhile, parallel computation can be realized through matrix multiplication in the self-attention computation process, and the training speed of the model is accelerated.
In some embodiments, the trained BTM topic model is obtained by: acquiring an article corpus training set; sampling the theme distribution on the article corpus training set through a Dirichlet distribution function; the lexical item distribution under a plurality of subjects is sampled through a Dirichlet distribution function; and extracting a target theme from the theme distribution, and extracting word pairs from the target theme to make the word pairs obey the distribution of terms under the target theme.
Specifically, a large-scale article corpus is used for constructing an article corpus training set, and word distribution under a theme is sampled from a parameter beta through Dirichlet distribution
Figure BDA0002678359460000106
K is 1, a, K and K are the number of topics, Dir (·) represents a Dirichlet distribution function, in the Dirichlet distribution of a parameter alpha, the topic distribution theta-Dir (alpha) of an article corpus training set is sampled, a target topic Z is extracted from a common parameter theta of the article corpus training set, Z is the Dirichlet distribution of the topic theta and obeys Z-Mult (theta), Mult (·) represents a polynomial distribution function, a word pair in the corpus is set as b,
Figure BDA0002678359460000111
extracting from the extracted subject Z
Figure BDA0002678359460000112
And
Figure BDA0002678359460000113
two words and subject them to
Figure BDA0002678359460000114
In some embodiments, the topic terms are encoded to obtain topic feature vectors by: obtaining a training corpus, learning the training corpus based on a BERT model to obtain a third word vector of each single word in the training corpus, and forming a word vector library by the third word vector of each single word. And obtaining a third word vector of each single word in the topic words from the word vector library, and forming topic feature vectors by the third word vectors of each single word in the topic words. The topic feature vector obtained based on the BERT model can effectively express the semantics among the single characters in the topic words, the representation capability of the topic feature vector is improved, the topic feature vector is integrated into the interactive information vector, and the accuracy of the answer can be further improved.
In some embodiments, the non-linear output layer includes a preset hyperbolic tangent function. Calculating the interactive information vector and the topic feature vector through a nonlinear output layer to obtain answers related to the questions, wherein the answers include: and carrying out nonlinear mapping through a hyperbolic tangent function according to the interactive information vector and the theme characteristic vector to obtain an answer extracted from the article. The answer is predicted by the following formula:
gIR(d,q)=tanh(WsVs+WtVt) Wherein V issRepresenting an interaction information vector, VtRepresenting a topic feature vector, WsAnd WtFor a trained parameter, tanh (-) represents the hyperbolic tangent function, gIR(d, q) represents an answer extracted from the article d that is related to the question q.
The embodiments of the present application are described and illustrated below by means of preferred embodiments.
Fig. 4 is a flow chart diagram of a machine reading understanding method according to the preferred embodiment of the present application. As shown in fig. 4, first, parameters of models and functions in the coding layer, the multi-attention layer, and the non-linear output layer need to be trained. The training process comprises the following steps: the method comprises the steps of obtaining product documents in a specific field based on a large number of customer service question-answer corpora, cleaning and segmenting the corpora, and then pre-training word vectors by using a BERT model to generate the word vectors based on the specific vertical field. Setting parameters of word vector learning model training, wherein the parameters comprise the dimensionality of a word vector, batch processing parameters, the size of a window, an initial learning rate,Word vector matrices, auxiliary vector matrices, and the like. And respectively carrying out text word segmentation and coding on the training problems and the training articles in the training set aiming at large-scale training data comprising a training set and a development set to respectively obtain word level vector representation of the training problems and word level vector representation of the training articles. And respectively carrying out the void convolution calculation on the word level vector representation of each single word in the training question and the training article, wherein the word level vector representation of the training question and the word level vector representation of the training article are respectively calculated through a gate control void convolution layer, so that the training question vector and the training article vector can be obtained, and the gate control void convolution layer comprises a plurality of layers of sequentially connected void convolution gate control units. Then inputting the training question vector and the training article vector into a dot product attention model and an attention model calculation based on Concat to obtain a training dot product attention vector V of the training question about the training article1And training Concat attention vector V3And training article training dot product attention vector V for training question2And training Concat attention vector V4. Will V1、V2、V3And V4Merging to obtain a training merging vector Vc-trainWill Vc-trainAnd calculating to obtain a training interaction information vector through a self-attention model. And calculating the training articles through a trained BTM topic model to obtain training topic words, and coding the training topic words to obtain training topic feature vectors. And carrying out nonlinear combination on the training mutual information vector and the training subject feature vector for answer prediction. And finally, training parameters of models and functions in the coding layer, the multi-attention layer and the nonlinear output layer by using a training set, selecting the parameters with the optimal F1 indexes in the development set for storage, and finishing the training.
Then, the received question and article are input into the model and the function with trained parameters, so that the correct answer of the question can be extracted from the article. The prediction process comprises the following steps: converting the received questions and articles into question vectors V through BERT models and gated hole convolution layers with trained parameters in the coding layer respectivelyqAnd articlesVector Vd. Vector the problem VqAnd the article vector VdInputting the well-trained dot product attention model and the Concat-based attention model in the multi-attention layer to obtain a first vector Vm1Second vector Vm2A third vector Vm3And a fourth vector Vm4And will Vm1、Vm2、Vm3And Vm4After merging, inputting the trained self-attention model for calculation to obtain an interactive information vector Vs. And the article is encoded after being calculated through a trained BTM topic model to obtain a topic feature vector Vt. Subject feature vector VtAnd mutual information vector VsThe answer related to the question can be obtained from the article through the hyperbolic tangent function with the trained parameters in the nonlinear output layer.
The embodiment of the application provides a machine reading understanding device. Fig. 5 is a schematic structural diagram of a machine-readable understanding apparatus according to an embodiment of the present application, and as shown in fig. 5, the apparatus includes an encoding module 510, a multi-attention module 520, a topic obtaining module 530, and an answer calculating module 540: the encoding module 510 is configured to receive a question and an article, convert the question into a question vector through encoding layer calculation, and convert the article into an article vector through the encoding layer calculation; the multiple attention module 520 is configured to calculate the question vector and the article vector through multiple attention layers to obtain an interaction information vector, where the multiple attention layers include a trained self-attention model and multiple attention matching models; the topic acquisition module 530 is configured to calculate the article through a trained BTM topic model to obtain topic words, and encode the topic words to obtain topic feature vectors; the answer calculating module 540 is configured to calculate the interaction information vector and the topic feature vector through a non-linear output layer to obtain an answer related to the question.
For specific limitations of the machine reading understanding apparatus, reference may be made to the above limitations of the machine reading understanding method, which are not described herein again. The various modules in the machine reading and understanding apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
An embodiment of the present application further provides an electronic device, which includes a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform the steps in any of the method embodiments described above.
Optionally, the electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.
In addition, in combination with the machine reading understanding method in the foregoing embodiments, the embodiments of the present application may provide a storage medium to implement. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements any of the machine-readable understanding methods of the above embodiments.
In an embodiment, fig. 6 is a schematic internal structure diagram of an electronic device according to an embodiment of the present application, and as shown in fig. 6, there is provided an electronic device, which may be a server, and its internal structure diagram may be as shown in fig. 6. The electronic device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the electronic device is used for storing data. The network interface of the electronic device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a machine-readable understanding method.
Those skilled in the art will appreciate that the configuration shown in fig. 6 is a block diagram of only a portion of the configuration associated with the present application, and does not constitute a limitation on the electronic device to which the present application is applied, and a particular electronic device may include more or less components than those shown in the drawings, or may combine certain components, or have a different arrangement of components.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A machine-readable understanding method, the method comprising:
receiving a question and an article, converting the question into a question vector through calculation of a coding layer, and converting the article into an article vector through calculation of the coding layer;
calculating the problem vector and the article vector through a multiple attention layer to obtain an interactive information vector, wherein the multiple attention layer comprises a trained self-attention model and a plurality of attention matching models;
calculating the article through a trained BTM topic model to obtain topic words, and coding the topic words to obtain topic feature vectors;
and calculating the interaction information vector and the theme characteristic vector through a nonlinear output layer to obtain an answer related to the question.
2. The method of claim 1, wherein the coding layer comprises a BERT model and a preset gated hole convolution layer, and wherein converting the problem into a problem vector by coding layer computation comprises:
learning the problem based on the BERT model to obtain a first word vector of each single word in the problem, and forming a first intermediate vector by the first word vector of each single word;
calculating the first intermediate vector through the gated cavity convolution layer to obtain the problem vector;
the converting the article into an article vector through the encoding layer calculation comprises:
learning the article based on the BERT model to obtain second word vectors of the individual characters in the article, and forming second intermediate vectors by the second word vectors of the individual characters;
and calculating the second intermediate vector through the gated cavity convolution layer to obtain the article vector.
3. The method according to claim 2, wherein the gated hole convolution layer comprises a plurality of layers of sequentially connected hole convolution gate units, each hole convolution gate unit is configured to sample an input vector at intervals to obtain an output vector, and use the output vector as an input vector of a next layer of hole convolution gate units.
4. The method of claim 1, wherein in a case where the plurality of attention matching models includes a dot product attention model and a Concat-based attention model, the calculating the question vector and the article vector through multiple attention layers to obtain an interaction information vector comprises:
calculating the question vector and the article vector through a trained dot product attention model to obtain a first vector and a second vector, wherein the first vector is the dot product attention vector of the question about the article, and the second vector is the dot product attention vector of the article about the question;
calculating the question vector and the article vector through a trained Concat-based attention model to obtain a third vector and a fourth vector, wherein the third vector is a Concat attention vector of the question about the article, and the fourth vector is a Concat attention vector of the article about the question;
merging the first vector, the second vector, the third vector and the fourth vector to obtain a merged vector;
and calculating the merged vector through the trained self-attention model to obtain the interactive information vector.
5. The method of claim 1, wherein the trained BTM topic model is obtained by:
acquiring an article corpus training set;
sampling the theme distribution on the article corpus training set through a Dirichlet distribution function;
sampling the distribution of terms under a plurality of subjects through the Dirichlet distribution function;
and extracting a target theme from the theme distribution, and extracting word pairs from the target theme to make the word pairs obey the term distribution under the target theme.
6. The method of claim 1, wherein encoding the topic terms into topic feature vectors comprises:
acquiring a training corpus, learning the training corpus based on a BERT model to obtain a third word vector of each single word in the training corpus, and forming a word vector library by the third word vector of each single word;
and obtaining a third word vector of each single word in the topic words from the word vector library, and forming the topic feature vector by the third word vector of each single word of the topic words.
7. The method of claim 1, wherein the nonlinear output layer comprises a preset hyperbolic tangent function, and wherein the calculating the interaction information vector and the topic feature vector through the nonlinear output layer to obtain the answer related to the question comprises:
and according to the interaction information vector and the theme characteristic vector, carrying out nonlinear mapping through the hyperbolic tangent function to obtain the answer extracted from the article.
8. A machine reading understanding apparatus, the apparatus comprising: the system comprises a coding module, a multi-attention module, a theme acquisition module and an answer calculation module;
the encoding module is used for receiving questions and articles, converting the questions into question vectors through encoding layer calculation, and converting the articles into article vectors through the encoding layer calculation;
the multi-attention module is used for calculating the question vector and the article vector through a multi-attention layer to obtain an interactive information vector, wherein the multi-attention layer comprises a trained self-attention model and a plurality of attention matching models;
the topic acquisition module is used for calculating the article through a trained BTM topic model to obtain topic words, and coding the topic words to obtain topic feature vectors;
and the answer calculation module is used for calculating the interaction information vector and the theme characteristic vector through a nonlinear output layer to obtain an answer related to the question.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the machine reading understanding method of any of claims 1 to 7 when executing the computer program.
10. A computer storage medium on which a computer program is stored, the program, when executed by a processor, implementing a machine reading understanding method according to any one of claims 1 to 7.
CN202010955175.3A 2020-09-11 2020-09-11 Machine reading understanding method and device, electronic equipment and computer storage medium Pending CN112183085A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010955175.3A CN112183085A (en) 2020-09-11 2020-09-11 Machine reading understanding method and device, electronic equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010955175.3A CN112183085A (en) 2020-09-11 2020-09-11 Machine reading understanding method and device, electronic equipment and computer storage medium

Publications (1)

Publication Number Publication Date
CN112183085A true CN112183085A (en) 2021-01-05

Family

ID=73920607

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010955175.3A Pending CN112183085A (en) 2020-09-11 2020-09-11 Machine reading understanding method and device, electronic equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN112183085A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966499A (en) * 2021-03-17 2021-06-15 中山大学 Question and answer matching method based on self-adaptive fusion multi-attention network
CN113705664A (en) * 2021-08-26 2021-11-26 南通大学 Model, training method and surface electromyographic signal gesture recognition method
CN114398976A (en) * 2022-01-13 2022-04-26 福州大学 Machine reading understanding method based on BERT and gate control type attention enhancement network
CN114492451A (en) * 2021-12-22 2022-05-13 马上消费金融股份有限公司 Text matching method and device, electronic equipment and computer readable storage medium
CN114564562A (en) * 2022-02-22 2022-05-31 平安科技(深圳)有限公司 Question generation method, device and equipment based on answer guidance and storage medium
CN115169367A (en) * 2022-09-06 2022-10-11 杭州远传新业科技股份有限公司 Dialogue generating method and device, and storage medium
CN114398976B (en) * 2022-01-13 2024-06-07 福州大学 Machine reading and understanding method based on BERT and gating type attention enhancement network

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109460553A (en) * 2018-11-05 2019-03-12 中山大学 A kind of machine reading understanding method based on thresholding convolutional neural networks
CN109492227A (en) * 2018-11-16 2019-03-19 大连理工大学 It is a kind of that understanding method is read based on the machine of bull attention mechanism and Dynamic iterations
CN109657226A (en) * 2018-09-20 2019-04-19 北京信息科技大学 The reading of multi-joint knot attention understands model, system and method
CN109657246A (en) * 2018-12-19 2019-04-19 中山大学 A kind of extraction-type machine reading based on deep learning understands the method for building up of model
CN109739986A (en) * 2018-12-28 2019-05-10 合肥工业大学 A kind of complaint short text classification method based on Deep integrating study
CN110134771A (en) * 2019-04-09 2019-08-16 广东工业大学 A kind of implementation method based on more attention mechanism converged network question answering systems
CN110309305A (en) * 2019-06-14 2019-10-08 中国电子科技集团公司第二十八研究所 Machine based on multitask joint training reads understanding method and computer storage medium
CN110334184A (en) * 2019-07-04 2019-10-15 河海大学常州校区 The intelligent Answer System understood is read based on machine
CN110619123A (en) * 2019-09-19 2019-12-27 电子科技大学 Machine reading understanding method
CN110633472A (en) * 2019-09-19 2019-12-31 电子科技大学 Article and question fusion method based on attention and aggregation mechanism

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657226A (en) * 2018-09-20 2019-04-19 北京信息科技大学 The reading of multi-joint knot attention understands model, system and method
CN109460553A (en) * 2018-11-05 2019-03-12 中山大学 A kind of machine reading understanding method based on thresholding convolutional neural networks
CN109492227A (en) * 2018-11-16 2019-03-19 大连理工大学 It is a kind of that understanding method is read based on the machine of bull attention mechanism and Dynamic iterations
CN109657246A (en) * 2018-12-19 2019-04-19 中山大学 A kind of extraction-type machine reading based on deep learning understands the method for building up of model
CN109739986A (en) * 2018-12-28 2019-05-10 合肥工业大学 A kind of complaint short text classification method based on Deep integrating study
CN110134771A (en) * 2019-04-09 2019-08-16 广东工业大学 A kind of implementation method based on more attention mechanism converged network question answering systems
CN110309305A (en) * 2019-06-14 2019-10-08 中国电子科技集团公司第二十八研究所 Machine based on multitask joint training reads understanding method and computer storage medium
CN110334184A (en) * 2019-07-04 2019-10-15 河海大学常州校区 The intelligent Answer System understood is read based on machine
CN110619123A (en) * 2019-09-19 2019-12-27 电子科技大学 Machine reading understanding method
CN110633472A (en) * 2019-09-19 2019-12-31 电子科技大学 Article and question fusion method based on attention and aggregation mechanism

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966499A (en) * 2021-03-17 2021-06-15 中山大学 Question and answer matching method based on self-adaptive fusion multi-attention network
CN113705664A (en) * 2021-08-26 2021-11-26 南通大学 Model, training method and surface electromyographic signal gesture recognition method
CN113705664B (en) * 2021-08-26 2023-10-24 南通大学 Model, training method and surface electromyographic signal gesture recognition method
CN114492451A (en) * 2021-12-22 2022-05-13 马上消费金融股份有限公司 Text matching method and device, electronic equipment and computer readable storage medium
CN114492451B (en) * 2021-12-22 2023-10-24 马上消费金融股份有限公司 Text matching method, device, electronic equipment and computer readable storage medium
CN114398976A (en) * 2022-01-13 2022-04-26 福州大学 Machine reading understanding method based on BERT and gate control type attention enhancement network
CN114398976B (en) * 2022-01-13 2024-06-07 福州大学 Machine reading and understanding method based on BERT and gating type attention enhancement network
CN114564562A (en) * 2022-02-22 2022-05-31 平安科技(深圳)有限公司 Question generation method, device and equipment based on answer guidance and storage medium
CN114564562B (en) * 2022-02-22 2024-05-14 平安科技(深圳)有限公司 Question generation method, device, equipment and storage medium based on answer guidance
CN115169367A (en) * 2022-09-06 2022-10-11 杭州远传新业科技股份有限公司 Dialogue generating method and device, and storage medium
CN115169367B (en) * 2022-09-06 2022-12-09 杭州远传新业科技股份有限公司 Dialogue generating method and device, and storage medium

Similar Documents

Publication Publication Date Title
CN109947912B (en) Model method based on intra-paragraph reasoning and joint question answer matching
CN112183085A (en) Machine reading understanding method and device, electronic equipment and computer storage medium
CN108427771B (en) Abstract text generation method and device and computer equipment
CN109597891B (en) Text emotion analysis method based on bidirectional long-and-short-term memory neural network
CN111783474B (en) Comment text viewpoint information processing method and device and storage medium
CN107798140B (en) Dialog system construction method, semantic controlled response method and device
CN110928997A (en) Intention recognition method and device, electronic equipment and readable storage medium
Ferrer-i-Cancho et al. Optimal coding and the origins of Zipfian laws
CN110598779A (en) Abstract description generation method and device, computer equipment and storage medium
CN110969020A (en) CNN and attention mechanism-based Chinese named entity identification method, system and medium
CN111966812B (en) Automatic question answering method based on dynamic word vector and storage medium
CN111680494A (en) Similar text generation method and device
CN112560502B (en) Semantic similarity matching method and device and storage medium
CN110795944A (en) Recommended content processing method and device, and emotion attribute determining method and device
CN112528637A (en) Text processing model training method and device, computer equipment and storage medium
CN111259113A (en) Text matching method and device, computer readable storage medium and computer equipment
CN114707005B (en) Knowledge graph construction method and system for ship equipment
CN113536795A (en) Method, system, electronic device and storage medium for entity relation extraction
CN115795044A (en) Knowledge injection-based user relationship mining method and device
CN116050352A (en) Text encoding method and device, computer equipment and storage medium
CN110298046B (en) Translation model training method, text translation method and related device
CN111723572B (en) Chinese short text correlation measurement method based on CNN convolutional layer and BilSTM
CN113806646A (en) Sequence labeling system and training system of sequence labeling model
CN111783430A (en) Sentence pair matching rate determination method and device, computer equipment and storage medium
CN116109980A (en) Action recognition method based on video text matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination