CN112183085A

CN112183085A - Machine reading understanding method and device, electronic equipment and computer storage medium

Info

Publication number: CN112183085A
Application number: CN202010955175.3A
Authority: CN
Inventors: 嵇望; 安毫亿; 王伟凯; 钱艳; 朱鹏飞; 梁青
Original assignee: Hangzhou Yuanchuan New Technology Co ltd
Current assignee: Hangzhou Yuanchuan New Technology Co ltd
Priority date: 2020-09-11
Filing date: 2020-09-11
Publication date: 2021-01-05

Abstract

The invention relates to a machine reading understanding method, a device, equipment and a medium, wherein the machine reading understanding method comprises the following steps: receiving a question and an article, converting the question into a question vector through calculation of a coding layer, converting the article into an article vector through calculation of the coding layer, calculating the question vector and the article vector through a multiple attention layer to obtain an interaction information vector, wherein the multiple attention layer comprises a trained self-attention model and a plurality of attention matching models, calculating the article through a trained BTM topic model to obtain topic words, encoding the topic words to obtain topic feature vectors, and calculating the interaction information vector and the topic feature vectors through a nonlinear output layer to obtain answers related to the question. By the method and the device, the problem that the accuracy of the answer is low when a single word matching method is used for machine reading understanding is solved, and the accuracy of the machine reading understanding of the answer is improved.

Description

Machine reading understanding method and device, electronic equipment and computer storage medium

Technical Field

The present invention relates to the field of natural language processing, and in particular, to a machine reading and understanding method and apparatus, an electronic device, and a computer storage medium.

Background

Machine-readable understanding is a technique that uses algorithms to make computing mechanisms solve article semantics and answer related questions. In the related art, a question is matched with words in an article by using a word matching method, and then answers related to the question are obtained from the article, but the single word matching method causes the interaction information between the question and the words in the article to be too simple, so that the obtained answers are not related to the question, and the accuracy of the answers is low.

Aiming at the problem that the accuracy of answers is low when a single word matching method is used for machine reading understanding in the related art, an effective solution is not provided at present.

Disclosure of Invention

The embodiment of the application provides a machine reading understanding method, a machine reading understanding device, electronic equipment and a computer storage medium, so as to at least solve the problem that the accuracy of answers is low when a single word matching method is used for machine reading understanding in the related art.

In a first aspect, an embodiment of the present application provides a machine reading understanding method, where the method includes:

receiving a question and an article, converting the question into a question vector through calculation of a coding layer, and converting the article into an article vector through calculation of the coding layer;

calculating the problem vector and the article vector through a multiple attention layer to obtain an interactive information vector, wherein the multiple attention layer comprises a trained self-attention model and a plurality of attention matching models;

calculating the article through a trained BTM topic model to obtain topic words, and coding the topic words to obtain topic feature vectors;

and calculating the interaction information vector and the theme characteristic vector through a nonlinear output layer to obtain an answer related to the question.

In some embodiments, the coding layer comprises a BERT model and a preset gated hole convolution layer, and the transforming the problem into a problem vector by coding layer calculation comprises:

learning the problem based on the BERT model to obtain a first word vector of each single word in the problem, and forming a first intermediate vector by the first word vector of each single word;

calculating the first intermediate vector through the gated cavity convolution layer to obtain the problem vector;

the converting the article into an article vector through the encoding layer calculation comprises:

learning the article based on the BERT model to obtain second word vectors of the individual characters in the article, and forming second intermediate vectors by the second word vectors of the individual characters;

and calculating the second intermediate vector through the gated cavity convolution layer to obtain the article vector.

In some embodiments, the gated hole convolution layer includes a plurality of layers of sequentially connected hole convolution gate units, each hole convolution gate unit is configured to sample an input vector at intervals to obtain an output vector, and use the output vector as an input vector of a next layer of hole convolution gate units.

In some embodiments, where the plurality of attention matching models includes a dot product attention model and a Concat-based attention model, the calculating the question vector and the article vector through multiple attention layers to obtain an interaction information vector includes:

calculating the question vector and the article vector through a trained dot product attention model to obtain a first vector and a second vector, wherein the first vector is the dot product attention vector of the question about the article, and the second vector is the dot product attention vector of the article about the question;

calculating the question vector and the article vector through a trained Concat-based attention model to obtain a third vector and a fourth vector, wherein the third vector is a Concat attention vector of the question about the article, and the fourth vector is a Concat attention vector of the article about the question;

merging the first vector, the second vector, the third vector and the fourth vector to obtain a merged vector;

and calculating the merged vector through the trained self-attention model to obtain the interactive information vector.

In some of these embodiments, the trained BTM topic model is obtained by:

acquiring an article corpus training set;

sampling the theme distribution on the article corpus training set through a Dirichlet distribution function;

sampling the distribution of terms under a plurality of subjects through the Dirichlet distribution function;

and extracting a target theme from the theme distribution, and extracting word pairs from the target theme to make the word pairs obey the term distribution under the target theme.

In some embodiments, the encoding the topic word to obtain a topic feature vector includes:

acquiring a training corpus, learning the training corpus based on a BERT model to obtain a third word vector of each single word in the training corpus, and forming a word vector library by the third word vector of each single word;

and obtaining a third word vector of each single word in the topic words from the word vector library, and forming the topic feature vector by the third word vector of each single word of the topic words.

In some embodiments, the nonlinear output layer includes a preset hyperbolic tangent function, and the calculating the interaction information vector and the topic feature vector through the nonlinear output layer to obtain the answer related to the question includes:

and according to the interaction information vector and the theme characteristic vector, carrying out nonlinear mapping through the hyperbolic tangent function to obtain the answer extracted from the article.

In a second aspect, an embodiment of the present application provides a machine reading and understanding device, including: the system comprises a coding module, a multi-attention module, a theme acquisition module and an answer calculation module;

the encoding module is used for receiving questions and articles, converting the questions into question vectors through encoding layer calculation, and converting the articles into article vectors through the encoding layer calculation;

the multi-attention module is used for calculating the question vector and the article vector through a multi-attention layer to obtain an interactive information vector, wherein the multi-attention layer comprises a trained self-attention model and a plurality of attention matching models;

the topic acquisition module is used for calculating the article through a trained BTM topic model to obtain topic words, and coding the topic words to obtain topic feature vectors;

and the answer calculation module is used for calculating the interaction information vector and the theme characteristic vector through a nonlinear output layer to obtain an answer related to the question.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the processor implements the machine reading understanding method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a computer storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the machine reading understanding method according to the first aspect.

Compared with the related technology, the machine reading understanding method provided by the embodiment of the application obtains the answer related to the problem by receiving the problem and the article, converting the problem into the problem vector through calculation of the coding layer, converting the article into the article vector through calculation of the coding layer, and calculating the problem vector and the article vector through the multiple attention layers to obtain the interactive information vector, wherein the multiple attention layers comprise a trained self-attention model and a plurality of attention matching models, calculating the article through the trained BTM topic model to obtain the topic word, encoding the topic word to obtain the topic feature vector, and calculating the interactive information vector and the topic feature vector through the nonlinear output layer, so that the problem of low accuracy of the answer existing in machine reading understanding by using a single word matching method is solved, the accuracy of understanding the answer by machine reading is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a flow chart of a machine reading understanding method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a structure of a void convolution gating cell according to an embodiment of the present application;

FIG. 3 is a flow chart of obtaining an interaction information vector according to an embodiment of the application;

FIG. 4 is a flow diagram of a machine reading understanding method in accordance with a preferred embodiment of the present application;

FIG. 5 is a schematic diagram of a machine-readable understanding apparatus according to an embodiment of the present application;

fig. 6 is an internal structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.

It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.

Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar terms in this application is not intended to be limiting, and can be singular or plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describe the association relationship of the associated objects, there may be three relationships, for example, "a and/or B" may: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally refers to a front and back related object in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.

The embodiment provides a machine reading understanding method. Fig. 1 is a flowchart of a machine reading understanding method according to an embodiment of the present application, as shown in fig. 1, the flowchart includes the following steps:

s110, receiving the question and the article, converting the question into a question vector through calculation of a coding layer, and converting the article into an article vector through calculation of the coding layer. The coding layer is used for respectively carrying out bottom layer processing on the articles and the problems, carrying out digital coding on the texts of the articles and the problems and converting the articles and the problems into information units which can be processed by a computer. The coding layer is used for coding each single word, phrase and sentence in the article and the question on the basis of understanding the context so as to keep the semantic meaning of the original sentence in the article. In the coding layer, the problem and the article are respectively segmented to obtain a plurality of single characters in respective texts, then each single character is converted into a corresponding word vector, and the word vector of each single character is further coded to obtain word level vector representation of the problem and the article. The conversion of each Word into a Word vector may use any one of Word2vec model, glove (global Vectors for Word representation) model, and elmo (embedding from Language models). The word-level vector representation of the questions and articles can be Bi-directionally encoded using a BiGRU (Bi-directional Gated RNN), also known as a Bi-directional Gated round-robin network.

And S120, calculating the question vector and the article vector through multiple attention layers to obtain an interactive information vector. The multi-attention layer includes a trained self-attention model and a plurality of attention matching models. Because of the relevance between the articles and the questions, the connection between the articles and the questions needs to be established so as to obtain answers related to the questions from the articles according to the interaction information between the articles and the questions. By combining the semantics of the articles and the questions together for consideration, the understanding of the questions can be deepened by means of the semantic analysis of the articles, and the understanding of the articles can be deepened by means of the semantic analysis of the questions, so that the semantic relation between the articles and the questions is focused. The attention mechanism in deep learning is similar to the selective visual attention mechanism of human beings in nature, and the core goal is to select information which is more critical to the current task goal from a plurality of information. The multi-attention layer integrates various attention mechanisms, interaction on the word level in the sentence can be enhanced by using a plurality of attention matching models, and interaction information among words can be enriched. And a long-distance dependency relationship can be established by the self-attention mechanism, important word characteristics in the sentence are found, and the method is suitable for establishing the relationship between the long article and the problem. The interaction information vector obtained by multiple attention layers can contain richer and more effective interaction information between the question and the article.

S130, the articles are calculated through the trained BTM topic model to obtain topic words, and the topic words are encoded to obtain topic feature vectors. After determining the Topic distribution and the word distribution, the BTM (Topic model) Topic model correspondingly takes two words, namely after segmenting a text, carrying out pairing on any two words in the window length, wherein the two words are called a pair of biterm, and then obtaining each Topic word in the document according to the probability that each pair of biterm belongs to each Topic. The BTM topic model can effectively solve the sparsity problem by relaxing the constraint that the entire document must belong to one topic to the point that two words within the window length belong to one topic. The topic words reflect the topic content of the article, so that the topic feature vector contains the topic information of the article.

And S140, calculating the interactive information vector and the theme characteristic vector through a nonlinear output layer to obtain answers related to the questions. By integrating the topic information of the article into the interactive information between the article and the question, the extraction of answers related to the question from the article can be better guided.

Through the steps, through fusing multiple attention mechanisms, interaction information on the level of words in the question and the article is enhanced by using multiple attention matching models, the relevance between the question and the article is enhanced, and meanwhile, the topic information of the article is fused into the interaction information to guide the extraction of answers from the article, so that the relevance degree between the answers and the question is improved, and the problem of low accuracy of the answers existing in machine reading understanding by using a single word matching method is solved.

In some embodiments, the coding layers include a BERT model and preset gated hole convolution layers. Converting the problem into a problem vector through coding layer calculation, specifically comprising: learning the problem based on a BERT model to obtain a first word vector of each single word in the problem, and forming a first intermediate vector by the first word vectors of the single words as query:

wherein m represents the problem qThe number of the single words is,

and (3) a first word vector, i is 1, i.m, representing the ith single word in the problem q, and then calculating the first intermediate vector through a gated hole convolution layer to obtain a problem vector and marking the problem vector as V_q：

Wherein, Conv_resIndicating a gated hole convolution layer. Similarly, converting the article into an article vector through encoding layer calculation specifically includes: learning the article based on a BERT model to obtain a second word vector of each single word in the article, and forming a second intermediate vector by the second word vectors of the single words as Document:

wherein n represents the number of single characters in the article d,

and calculating a second word vector representing the jth single word in the article d, wherein j is 1_d：

Conv_resIndicating gated hole convolution layer operation.

The BERT (bidirectional Encoder retrieval from transformations) model is a deep bidirectional pre-training language understanding model using a Transformer model as a feature extractor, and essentially learns a good feature representation for words by running a self-supervision learning method on the basis of massive linguistic data. The Transformer model is an NLP classical model proposed by the Google team, and because the Transformer model uses a self-attention mechanism and does not adopt a sequential structure of a cyclic neural network, the Transformer model can be subjected to parallelization training and can have global information. Therefore, the first intermediate vector obtained based on BERT model learning can effectively express the semantics among the single characters in the problem, and similarly, the second intermediate vector can effectively express the semantics among the single characters in the article. The hole convolution is a variation of convolution, and unlike ordinary convolution, hole convolution can sample input text at intervals across text segments with the same hole rate, and by stacking hole convolutions with exponentially increasing hole rates, coverage to most sentence lengths with fewer layers can be achieved. Through the BERT model and the gated cavity convolution layer, the representation capability of the question vector and the article vector is improved, the more accurate interactive information vector can be obtained through the question vector and the article vector, and the accuracy of the answer is further improved.

In some embodiments, the gated hole convolution layer includes a plurality of layers of sequentially connected hole convolution gating cells. FIG. 2 is a schematic structural diagram of a hole convolution gating unit according to an embodiment of the present application, and as shown in FIG. 2, each hole convolution gating unit includes a hole convolution network and a residual error network. And each cavity convolution gate control unit is used for sampling the input vector at intervals to obtain an output vector, and the output vector is used as the input vector of the next layer of cavity convolution gate control unit. The operation of each hole convolution gating cell can be represented by the following formula:

wherein X represents an input vector of the hole convolution gate control unit, Y represents an output vector of the hole convolution gate control unit, Conv1 represents convolution operation 1, Conv2 represents convolution operation 2, Conv1 and Conv2 are both hole convolutions, the set filter number and window size of the two are consistent but not shared, and Sig represents a sigmoid activation function.

Ordinary convolution operation can process texts in parallel, the model training time is shortened remarkably, but in order to obtain a good effect on long-distance dependence, a plurality of convolution layers are required to be accumulated, and the risk of gradient disappearance is increased. The gated hole convolution layer introduces a residual mechanism on the basis of the hole convolution network, so that information is transmitted in multiple channels, effective information can be strengthened, useless information is eliminated, and the problem of gradient disappearance brought by a deep network is solved.

In some embodiments, fig. 3 is a flowchart of obtaining an interaction information vector according to an embodiment of the present application, and as shown in fig. 3, in a case that the plurality of attention matching models include a dot product attention model and a Concat-based attention model, calculating a question vector and an article vector through multiple attention layers to obtain the interaction information vector includes the following steps:

s310, calculating the problem vector and the article vector through the trained dot product attention model to obtain a first vector and a second vector. The first vector is question q the dot product attention vector for article d is denoted as V_m1The second vector is the dot product attention vector of article d about question q and is denoted as V_m2. In particular, a first vector V_m1Obtained by the following method: two vectors are combined

And

after dot product, the dimension is consistent with the original dimension, then nonlinear change is carried out through hyperbolic tangent function, and then v is carried out_dAfter linear variation of (A) to obtain the original weight

Finally obtaining the normalized weighted vector expression V of the question q about the article d_m1I.e. the first vector V_m1Obtained by the following formula:

wherein the content of the first and second substances,

a vector representing the jth word in the problem vector,

vector, v, representing the t-th word in the article vector_dFor a trained parameter, tanh (-) represents a hyperbolic tangent function. The second vector V can be obtained by the same method_m2。

And S320, calculating the problem vector and the article vector through a trained attention model based on Concat to obtain a third vector and a fourth vector. Third vector is question q the Concat attention vector for article d is denoted as V_m3The fourth vector is the Concat attention vector of article d for question q, denoted as V_m4. In particular, the third vector V_m3Obtained by the following method: will be provided with

Through

Vector after matrix mapping and

through

Combining the vectors after matrix mapping, and after nonlinear change is carried out on the hyperbolic tangent function in order to ensure nonlinearity, passing through v_cMapping the vector to obtain the original weight

Finally obtaining the normalized weighted vector expression V of the question q about the article d_m3I.e. the third vector V_m3Can be obtained by the following formula:

wherein the content of the first and second substances,

a vector representing the jth word in the problem vector,

the vector representing the t-th single word in the article vector,

and v_cFor a trained parameter, tanh (-) represents a hyperbolic tangent function. The fourth vector V can be obtained by the same method_m4。

S330, combining the first vector, the second vector, the third vector and the fourth vector to obtain a combined vector: v_c＝Concat(V_m1,V_m2,V_m3,V_m4) Wherein V is_cRepresents a merge vector, V_m1、V_m2、V_m3And V_m4Respectively, a first vector, a second vector, a third vector and a fourth vector, and Concat (·) represents a merge operation.

And S340, calculating the merged vector through the trained self-attention model to obtain an interactive information vector. The interaction information vector is obtained by the following formula:

V_s＝Attention(V_cW_c,V_sW_s) Wherein V is_cRepresents a merge vector, V_sRepresenting an interaction information vector, W_cIs the weight of the combined vector, W_sIs the weight of the interaction information vector, Attention (·) represents the self-Attention operation. The self-attention mechanism canAnd important word features in the sentence are found and are continuously adjusted in the process of back propagation, and the weight is changed. Meanwhile, parallel computation can be realized through matrix multiplication in the self-attention computation process, and the training speed of the model is accelerated.

In some embodiments, the trained BTM topic model is obtained by: acquiring an article corpus training set; sampling the theme distribution on the article corpus training set through a Dirichlet distribution function; the lexical item distribution under a plurality of subjects is sampled through a Dirichlet distribution function; and extracting a target theme from the theme distribution, and extracting word pairs from the target theme to make the word pairs obey the distribution of terms under the target theme.

Specifically, a large-scale article corpus is used for constructing an article corpus training set, and word distribution under a theme is sampled from a parameter beta through Dirichlet distribution

K is 1, a, K and K are the number of topics, Dir (·) represents a Dirichlet distribution function, in the Dirichlet distribution of a parameter alpha, the topic distribution theta-Dir (alpha) of an article corpus training set is sampled, a target topic Z is extracted from a common parameter theta of the article corpus training set, Z is the Dirichlet distribution of the topic theta and obeys Z-Mult (theta), Mult (·) represents a polynomial distribution function, a word pair in the corpus is set as b,

extracting from the extracted subject Z

And

two words and subject them to

In some embodiments, the topic terms are encoded to obtain topic feature vectors by: obtaining a training corpus, learning the training corpus based on a BERT model to obtain a third word vector of each single word in the training corpus, and forming a word vector library by the third word vector of each single word. And obtaining a third word vector of each single word in the topic words from the word vector library, and forming topic feature vectors by the third word vectors of each single word in the topic words. The topic feature vector obtained based on the BERT model can effectively express the semantics among the single characters in the topic words, the representation capability of the topic feature vector is improved, the topic feature vector is integrated into the interactive information vector, and the accuracy of the answer can be further improved.

In some embodiments, the non-linear output layer includes a preset hyperbolic tangent function. Calculating the interactive information vector and the topic feature vector through a nonlinear output layer to obtain answers related to the questions, wherein the answers include: and carrying out nonlinear mapping through a hyperbolic tangent function according to the interactive information vector and the theme characteristic vector to obtain an answer extracted from the article. The answer is predicted by the following formula:

g^IR(d,q)＝tanh(W_sV_s+W_tV_t) Wherein V is_sRepresenting an interaction information vector, V_tRepresenting a topic feature vector, W_sAnd W_tFor a trained parameter, tanh (-) represents the hyperbolic tangent function, g^IR(d, q) represents an answer extracted from the article d that is related to the question q.

The embodiments of the present application are described and illustrated below by means of preferred embodiments.

Fig. 4 is a flow chart diagram of a machine reading understanding method according to the preferred embodiment of the present application. As shown in fig. 4, first, parameters of models and functions in the coding layer, the multi-attention layer, and the non-linear output layer need to be trained. The training process comprises the following steps: the method comprises the steps of obtaining product documents in a specific field based on a large number of customer service question-answer corpora, cleaning and segmenting the corpora, and then pre-training word vectors by using a BERT model to generate the word vectors based on the specific vertical field. Setting parameters of word vector learning model training, wherein the parameters comprise the dimensionality of a word vector, batch processing parameters, the size of a window, an initial learning rate,Word vector matrices, auxiliary vector matrices, and the like. And respectively carrying out text word segmentation and coding on the training problems and the training articles in the training set aiming at large-scale training data comprising a training set and a development set to respectively obtain word level vector representation of the training problems and word level vector representation of the training articles. And respectively carrying out the void convolution calculation on the word level vector representation of each single word in the training question and the training article, wherein the word level vector representation of the training question and the word level vector representation of the training article are respectively calculated through a gate control void convolution layer, so that the training question vector and the training article vector can be obtained, and the gate control void convolution layer comprises a plurality of layers of sequentially connected void convolution gate control units. Then inputting the training question vector and the training article vector into a dot product attention model and an attention model calculation based on Concat to obtain a training dot product attention vector V of the training question about the training article₁And training Concat attention vector V₃And training article training dot product attention vector V for training question₂And training Concat attention vector V₄. Will V₁、V₂、V₃And V₄Merging to obtain a training merging vector V_c-trainWill V_c-trainAnd calculating to obtain a training interaction information vector through a self-attention model. And calculating the training articles through a trained BTM topic model to obtain training topic words, and coding the training topic words to obtain training topic feature vectors. And carrying out nonlinear combination on the training mutual information vector and the training subject feature vector for answer prediction. And finally, training parameters of models and functions in the coding layer, the multi-attention layer and the nonlinear output layer by using a training set, selecting the parameters with the optimal F1 indexes in the development set for storage, and finishing the training.

Then, the received question and article are input into the model and the function with trained parameters, so that the correct answer of the question can be extracted from the article. The prediction process comprises the following steps: converting the received questions and articles into question vectors V through BERT models and gated hole convolution layers with trained parameters in the coding layer respectively_qAnd articlesVector V_d. Vector the problem V_qAnd the article vector V_dInputting the well-trained dot product attention model and the Concat-based attention model in the multi-attention layer to obtain a first vector V_m1Second vector V_m2A third vector V_m3And a fourth vector V_m4And will V_m1、V_m2、V_m3And V_m4After merging, inputting the trained self-attention model for calculation to obtain an interactive information vector V_s. And the article is encoded after being calculated through a trained BTM topic model to obtain a topic feature vector V_t. Subject feature vector V_tAnd mutual information vector V_sThe answer related to the question can be obtained from the article through the hyperbolic tangent function with the trained parameters in the nonlinear output layer.

The embodiment of the application provides a machine reading understanding device. Fig. 5 is a schematic structural diagram of a machine-readable understanding apparatus according to an embodiment of the present application, and as shown in fig. 5, the apparatus includes an encoding module 510, a multi-attention module 520, a topic obtaining module 530, and an answer calculating module 540: the encoding module 510 is configured to receive a question and an article, convert the question into a question vector through encoding layer calculation, and convert the article into an article vector through the encoding layer calculation; the multiple attention module 520 is configured to calculate the question vector and the article vector through multiple attention layers to obtain an interaction information vector, where the multiple attention layers include a trained self-attention model and multiple attention matching models; the topic acquisition module 530 is configured to calculate the article through a trained BTM topic model to obtain topic words, and encode the topic words to obtain topic feature vectors; the answer calculating module 540 is configured to calculate the interaction information vector and the topic feature vector through a non-linear output layer to obtain an answer related to the question.

For specific limitations of the machine reading understanding apparatus, reference may be made to the above limitations of the machine reading understanding method, which are not described herein again. The various modules in the machine reading and understanding apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

An embodiment of the present application further provides an electronic device, which includes a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform the steps in any of the method embodiments described above.

Optionally, the electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.

In addition, in combination with the machine reading understanding method in the foregoing embodiments, the embodiments of the present application may provide a storage medium to implement. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements any of the machine-readable understanding methods of the above embodiments.

In an embodiment, fig. 6 is a schematic internal structure diagram of an electronic device according to an embodiment of the present application, and as shown in fig. 6, there is provided an electronic device, which may be a server, and its internal structure diagram may be as shown in fig. 6. The electronic device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the electronic device is used for storing data. The network interface of the electronic device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a machine-readable understanding method.

Those skilled in the art will appreciate that the configuration shown in fig. 6 is a block diagram of only a portion of the configuration associated with the present application, and does not constitute a limitation on the electronic device to which the present application is applied, and a particular electronic device may include more or less components than those shown in the drawings, or may combine certain components, or have a different arrangement of components.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A machine-readable understanding method, the method comprising:

2. The method of claim 1, wherein the coding layer comprises a BERT model and a preset gated hole convolution layer, and wherein converting the problem into a problem vector by coding layer computation comprises:

3. The method according to claim 2, wherein the gated hole convolution layer comprises a plurality of layers of sequentially connected hole convolution gate units, each hole convolution gate unit is configured to sample an input vector at intervals to obtain an output vector, and use the output vector as an input vector of a next layer of hole convolution gate units.

4. The method of claim 1, wherein in a case where the plurality of attention matching models includes a dot product attention model and a Concat-based attention model, the calculating the question vector and the article vector through multiple attention layers to obtain an interaction information vector comprises:

5. The method of claim 1, wherein the trained BTM topic model is obtained by:

acquiring an article corpus training set;

6. The method of claim 1, wherein encoding the topic terms into topic feature vectors comprises:

7. The method of claim 1, wherein the nonlinear output layer comprises a preset hyperbolic tangent function, and wherein the calculating the interaction information vector and the topic feature vector through the nonlinear output layer to obtain the answer related to the question comprises:

8. A machine reading understanding apparatus, the apparatus comprising: the system comprises a coding module, a multi-attention module, a theme acquisition module and an answer calculation module;

9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the machine reading understanding method of any of claims 1 to 7 when executing the computer program.

10. A computer storage medium on which a computer program is stored, the program, when executed by a processor, implementing a machine reading understanding method according to any one of claims 1 to 7.