CN113033213A

CN113033213A - Method and device for analyzing text information by using attention model and electronic equipment

Info

Publication number: CN113033213A
Application number: CN202110445332.0A
Authority: CN
Inventors: 焦勇博; 于洋; 杨丝雨; 李钰
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2021-04-23
Filing date: 2021-04-23
Publication date: 2021-06-25

Abstract

The present disclosure provides a method, an apparatus, and an electronic device for analyzing text information using an attention model, which can be applied to artificial intelligence, financial fields, and other fields. The attention model comprises an input layer, a context coding layer, an information interaction layer, an information inference layer and an output layer. The method comprises the following steps: converting the question words and the context sequences included in the input text information into question vectors Q and context vectors P respectively by using an input layer; respectively coding and extracting features of the problem vector and the context vector by using a context coding layer to obtain coding problem features and coding context features; fusing the coding problem characteristic U and the coding context characteristic H by using an information interaction layer to obtain a context characteristic Z for the problem word; obtaining an encoded context paragraph feature M for the context feature Z using the information inference layer; and processing the encoded context paragraph features using the output layer to obtain and output a position of the question word in the context sequence.

Description

Method and device for analyzing text information by using attention model and electronic equipment

Technical Field

The present disclosure relates to the field of artificial intelligence technology, and more particularly, to a method, apparatus, electronic device, computer-readable storage medium, and computer program product for analyzing textual information using an attention model.

Background

With the development of deep learning technology, attention is paid to a machine reading understanding model as a mainstream method for designing the machine reading understanding model. In an attention model of the related art, an attention mechanism is used for calculating the similarity between a context and a question pair, and the feature representation of the context and a paragraph is constructed on the basis of a similarity matrix, so that semantic interaction between the context and the question is realized, and the text representation capability of the model is effectively improved. However, for these model architectures based on attention mechanism, the attention operations of multiple layers are relatively independent, and the attention operation of each layer does not consider the influence of the attention of the previous layer. Therefore, in the multi-layer attention framework, the distribution of multiple layers of attention on the same text can lead to a phenomenon of attention redundancy. In addition, the attention of multiple layers cannot pay attention to the important content of the text, and the problem of lack of attention to important information is caused.

Disclosure of Invention

In view of the above, the present disclosure provides a method and apparatus for analyzing text information using an attention model, an electronic device, and a computer-readable storage medium. And realizing information interaction of the question words and the context sequence by utilizing a multilayer attention model based on the question words and the context sequence of the input text information, and obtaining and outputting the position of the context sequence where the question words are positioned.

One aspect of the present disclosure provides a method of analyzing text information using an attention model including an input layer, a context coding layer, an information interaction layer, an information inference layer, and an output layer, the method including: converting the question words and the context sequences included in the input text information into question vectors Q and context vectors P respectively by using the input layer; coding and feature extraction are respectively carried out on the problem vector Q and the context vector P by using the context coding layer to obtain a coding problem feature U and a coding context feature H; fusing the coding problem characteristic U and the coding context characteristic H by using the information interaction layer to obtain a context characteristic Z for the problem word; obtaining an encoded context paragraph feature M for the context feature Z using the information inference layer; and processing the encoded context paragraph feature M using the output layer to obtain and output a position of the problem word in the context sequence.

According to an embodiment of the present disclosure, the context coding layer includes a plurality of bidirectional Transformer modules, each bidirectional Transformer module including a self-attention module, a feedforward neural network, and an output module.

According to an embodiment of the present disclosure, the information interaction layer includes a plurality of layers of attention modules stacked, each layer of attention module including a question and context alignment module, a context self-alignment module, and a context inference module; the using the information interaction layer to fuse the coding question feature U and the coding context feature H to obtain the context feature Z for the question word includes: obtaining a context paragraph representation V aiming at the question word according to the coding question feature U and the coding context feature H by using the question and context alignment module; obtaining the feature representation of each word in the context sequence according to the coding context feature H and the context paragraph representation V by using the context self-alignment module

And using said context inference module, representing features from said each word

A contextual feature representation Z for the problem word is obtained.

According to an embodiment of the present disclosure, the obtaining, by using the question and context alignment module, a context paragraph representation V for a question word according to the encoded question feature U and the encoded context feature H includes: calculating a first similarity matrix E between the coding problem characteristic U and the coding context characteristic H; calculating the attention distribution Softmax (E) of each word in the context sequence relative to the problem word according to the similarity matrix E; and calculating the context paragraph representation V for the problem word according to the attention distribution Softmax (E).

According to an embodiment of the present disclosure, said calculating said context paragraph representation V for said problem word according to said attention distribution comprises: calculating a context paragraph representation matrix of each word in the context sequence with respect to the question word according to the coding question features U and the attention distribution Softmax (E)

By representing matrices for context paragraphs

The elements in (a) are weighted and summed, and the context paragraph representation V for the problem word is calculated.

According to an embodiment of the present disclosure, said computing a feature representation for each word in said context sequence based on said encoding context feature H and said context paragraph representation V

The method comprises the following steps: calculating a similarity matrix S between each word in the context sequence; calculating an attention distribution Softmax (S) of each word in the context sequence with respect to the context sequence according to the similarity matrix S; and calculating a feature representation of each word in the sequence of contexts from the attention distribution Softmax (S)

According to an embodiment of the present disclosure, for the multi-layered attention module, a similarity matrix E according to the attention of the j-1 st layer^j-1And similarity matrix S^j-1Calculating the similarity matrix E of the attention module at the j-th layer^jAnd similarity matrix S^jJ is greater than or equal to 1 and less than or equal to J, and J is the total number of layers of the attention module.

According to an embodiment of the present disclosure, the method further comprises: determining at least one question corresponding to the question word using the position of the question word in the sequence of contexts to provide a response to the at least one question.

Another aspect of the present disclosure provides an apparatus for analyzing text information using an attention model including an input layer, a context coding layer, an information interaction layer, an information inference layer, and an output layer, the apparatus including:

the system comprises a first module, a second module and a third module, wherein the first module is used for converting a question word and a context sequence included in input text information into a question vector Q and a context vector P respectively by using an input layer; and

the second module is used for respectively coding the problem vector and the context vector by using the context coding layer and extracting the features to obtain coding problem features and coding context features;

a third module, configured to fuse the coding question feature U and the coding context feature H using the information interaction layer to obtain a context feature Z for the question word;

a fourth module for obtaining an encoded context paragraph feature M for the context feature Z using the information inference layer; and

a fifth module, configured to process the encoded context paragraph feature using the output layer to obtain and output a position of the question word in the context sequence.

Another aspect of the present disclosure provides an electronic device including: one or more processors; storage means for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method as described above.

Another aspect of the disclosure provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform a method as described above.

Another aspect of the disclosure provides a computer program product, wherein the product stores a computer program, which when executed is capable of implementing the method according to the above.

According to the embodiment of the disclosure, the problems of attention deficiency and redundancy of a reading understanding model in the related art can be at least partially solved, the position of the content related to the problem word generated according to the method in the context sequence is more accurate, and the method can extract the text semantic features more accurately.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates an application scenario of a method and apparatus for analyzing textual information using an attention model according to an embodiment of the present disclosure;

FIG. 2A schematically illustrates a flow diagram of a method of analyzing textual information using an attention model, in accordance with an embodiment of the present disclosure;

FIG. 2B schematically illustrates a structural schematic of an attention model according to an embodiment of the disclosure;

FIG. 3A schematically illustrates a structural diagram of an attention module in an attention model according to an embodiment of the present disclosure;

FIG. 3B schematically illustrates a flow diagram for fusing an encoding problem feature and an encoding context feature to obtain a problem-specific context feature using an information interaction layer in accordance with an embodiment of the present disclosure;

FIG. 3C schematically illustrates a flow chart for obtaining a context paragraph representation for a problem word according to a method of analyzing textual information using an attention model according to an embodiment of the present disclosure;

FIG. 3D schematically illustrates a flow chart for computing a context paragraph representation for a problem word from an attention distribution according to an embodiment of the present disclosure;

FIG. 3E schematically illustrates a flow diagram for computing a feature representation for each word in a context sequence from an encoded context feature and a context paragraph representation, according to an embodiment of the disclosure;

FIG. 4 schematically shows a structural diagram of a bidirectional transducer module in an attention model according to an embodiment of the disclosure;

FIG. 5 schematically illustrates a block diagram of an apparatus for analyzing textual information using an attention model, in accordance with an embodiment of the present disclosure;

fig. 6 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). The terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features.

The term "highway network" refers to an architecture design method for constructing a deep neural network, which can help to construct a deeper neural network without difficult model training, and the highway network adjusts the information flow proportion through a gating function, so that information can flow in a multilayer network without attenuation, and helps to quickly converge the model.

The customer's interaction with the telephone bank may convert the audio file into a text file via ASR techniques. Machine-reading understanding techniques can be used to extract the client intent from such text, completing the collection and statistics of client intent. This is especially important for unconfigured dialog scenarios. At present, for scenes which cannot be identified and are expressed by customers, a common method is to broadcast a linguist word and remind the customers to select the existing scene conversation. And the machine reading understanding technology can extract conversation scenes which are interesting to the user and are not realized. For example, the audio content is converted into a text file through the audio content fed back by the client, the question and the context information of the client are extracted through machine reading and understanding, and the text information aiming at the question is displayed to the user.

Embodiments of the present disclosure provide a method, apparatus, electronic device, and medium for analyzing text information using an attention model. The method comprises the steps of obtaining a question vector Q and a context vector P through conversion according to a question word and a context sequence in input text information, extracting a coding question feature U and a coding context feature H, further obtaining a context feature Z aiming at the question word, and finally obtaining the position of the question word in the context sequence. The method disclosed by the invention solves the problems of lack of attention and redundancy of the reading understanding model in the related technology, the position of the content related to the problem word generated according to the method in the context sequence is more accurate, and the method can extract the text semantic features more accurately.

Fig. 1 schematically illustrates an exemplary system architecture 100 that may be applied to a method of analyzing textual information using an attention model according to an embodiment of the disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

It should be noted that the method, the apparatus, the electronic device, and the storage medium for analyzing the text information using the attention model provided in the embodiments of the present disclosure may be used in the technical field of artificial intelligence and the financial field, and may also be used in other fields outside the financial field, such as the technical field of machine learning.

As shown in fig. 1, the system architecture 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the

terminal devices

101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that the method for analyzing text information using an attention model provided by the embodiment of the present disclosure may be generally performed by the server 105. Accordingly, the apparatus for analyzing text information using an attention model provided by the embodiments of the present disclosure may be generally disposed in the server 105. The method for analyzing text information using an attention model provided by the embodiments of the present disclosure may also be performed by a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the apparatus for analyzing text information using attention model provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

FIG. 2A schematically illustrates a flow diagram of a method of analyzing textual information using an attention model, in accordance with an embodiment of the disclosure.

Fig. 2B schematically illustrates a structural schematic of an attention model according to an embodiment of the disclosure. As shown in FIG. 2B, the attention model 200 includes an input layer 210, a context coding layer 220, an information interaction layer 230, an information inference layer 240, and an output layer 250.

A method for analyzing text information using an attention model according to an embodiment of the present disclosure will be described in detail below with reference to fig. 2A and 2B.

As shown in fig. 2A, the flow of the method of analyzing text information using an attention model of the embodiment of the present disclosure includes operations S201 to S205.

In operation S201, the question word and the context sequence included in the input text information are converted into a question vector Q and a context vector P, respectively, using the input layer 210.

The problem vector may be, for example, a word vector, a sentence vector. The word vector may convert a word or phrase into a mathematical Representation of a high-dimensional continuous vector space, including One-hot Representation (word vector) and distributed word vectors. The sentence vector is used for expressing sentences with different lengths by using vectors with fixed lengths and providing services for NLP downstream tasks.

According to the embodiment of the disclosure, the problem words and the context sequences in the input text information are converted into the problem vector Q and the context vector P, so that the context global information can be better considered, and more effective semantic reasoning can be carried out.

In the input layer 210, text sequences are encoded from multiple granularities, such as word vectors, character embedding vectors, position embedding vectors, part-of-speech embedding vectors, and the like. Firstly, the input text sequence is embedded by using a pre-trained 300-dimensional Glove word vector, and a vector representation with a fixed dimension is obtained for each word of the input text sequence. Meanwhile, a one-dimensional Convolutional Neural Network (CNN) is used for coding characters of words, each character is mapped into a high-dimensional vector, the dimension of the vector embedded by the characters is the number of channels of a convolutional kernel, and finally, the input vector is subjected to maximum pooling in width, so that the character embedded vector of the words is obtained.

In addition, a position embedding vector of the text sequence is obtained by using an absolute position embedding method and a relative position embedding method used in a Transformer model, and the part of speech sequence of the text is input into a Convolutional Neural Network (CNN) of one layer by performing part of speech tagging on the input text sequence, and the part of speech of the word is coded to obtain a part of speech embedded representation. Then adding the position embedding vector, the part-of-speech embedding vector and the word vector of the sequence, and connecting the word vector and the character vector of the word in series on the vector dimension to finally obtain the embedding representation of the problem and the context paragraphX^QAnd X^P，

Wherein

d is the word embedding dimension.

Through simple addition and splicing of word vectors, an input text sequence is converted into a feature embedded representation containing semantic information, in order to better fuse the text feature information, the text feature representation is input into a two-layer high-speed network, and the feature sequences of a question and a context paragraph are respectively output through the processing of the text feature representation by the high-speed network

And

in operation S202, the problem vector Q and the context vector P are respectively encoded and feature extracted by using the context encoding layer 220, so as to obtain an encoding problem feature U and an encoding context feature H.

For example, a transform model based on attention mechanism is used as the feature encoder. Respectively inputting the text feature representation output by the high-speed network into an encoder consisting of three layers of bidirectional transducer modules, wherein each layer of transducer module consists of a self-attention module, a feedforward neural network and an output module, and performing residual error connection and layer standardization operation on the output part of the modules. Context coding is carried out on text characteristics through a three-layer bidirectional Transformer module, and finally coding problem characteristics U and coding context characteristics H are obtained, wherein

And

according to the embodiment of the disclosure, by adopting the Transformer model as the feature encoder, long text sequences can be well processed, and parallel operation can be performed. A Transformer module is introduced to serve as a question and paragraph information encoder, the feature coding capability of the model is effectively improved, and a pointer network is adopted as a candidate answer boundary predictor to return a final answer.

Fig. 3A schematically illustrates a structural diagram of an attention module in an attention model according to an embodiment of the present disclosure.

In operation S203, the information interaction layer 230 is used to fuse the encoding question feature U and the encoding context feature H to obtain a context feature Z for the question word.

For example, the information interaction layer 230 includes a plurality of layers of attention modules 231 stacked. For example, the number of layers of the attention module 231 may be three, four, or five, and the specific number of layers may be set according to actual requirements. The attention module 231 forms an information interaction layer in a serial manner.

In an embodiment of the present disclosure, as shown in FIG. 3A, each level of attention module 231 may include a question and context alignment module 310, a context self-alignment module 320, and a context inference module 330. Wherein an input of the problem and context alignment module 310 is connected to an output of the context coding layer 220 to obtain the coding problem feature U and the coding context feature H from the context coding layer 220. The output of the question and context alignment module 310 is connected to the input of the context self-alignment module 320, and the output of the context self-alignment module 320 is connected to the context inference module 330. In an embodiment of the present disclosure, the question and context alignment module 310, the context self-alignment module 320, and the context inference module 330 are connected in series.

In the disclosed embodiment, the multi-level attention module is formed by stacking a plurality of single-level attention modules 231, each single-level attention module operation is connected using a multi-level alignment attention mechanism, which is calculated according to equation (1),

for example, take the case of three layers of attention-alignment layer connections as an example:

Z¹，E¹，S¹＝Align¹(U，H)

Z²，E²，S²＝Align²(U，H，Z¹，E¹，S¹)

Z³，E³，S³＝Align³(U，H，Z²，E²，S²) (1)

wherein, Align^lIndicating the level 1 attention-aligned layer, except for the similarity matrix E of the first layer¹，S¹In addition, the similarity matrices in the remaining layers are updated using the formula (2) and the formula (3). At the same time, residual join Z is used in each layer^l＝Align^l(Z^l-1)+Z^l-1And obtaining the characteristic output of each layer.

Formula (2):

wherein σ (·) is sigmoid function, W_g1Is a trainable parameter matrix.

Formula (3):

wherein, W_g2In order to train the parameter matrix, the parameters are,

and (4) context sequence characteristic representation output by the previous attention layer.

According to the embodiment of the disclosure, in the multi-layer attention module, by aligning the question and the context sequence, the extracted context sequence high-dimensional feature is also aligned with the context sequence feature. By the method, the context sequence information can be continuously supplemented to the multi-layer attention operation, and the loss of real context characteristic information during deep attention operation is avoided.

Fig. 3B schematically illustrates a flow diagram for fusing an encoding problem feature and an encoding context feature to obtain a problem-specific context feature using an information interaction layer according to an embodiment of the present disclosure.

In the embodiment of the present disclosure, the process of fusing the encoding problem feature U and the encoding context feature H using the information interaction layer 230 to obtain the context feature Z for the problem includes operations S301 to S303.

In operation S301, using the problem and context alignment module 310, a context paragraph representation V for the problem is obtained according to the encoding problem feature U and the encoding context feature H.

In operation S302, using the context self-alignment module 320, a feature representation of each word in the context sequence is obtained according to the encoding context feature H and the context paragraph representation V

In operation S303, a context inference module 330 is used to represent the word according to the characteristics of each word

A contextual feature representation Z for the problem is obtained.

In an embodiment of the present disclosure, the question and context alignment module 310 is configured to process information interaction between a question and a context paragraph, and obtain a context paragraph representation V for the question according to the encoded question feature U and the encoded context feature H.

Fig. 3C schematically illustrates a flowchart of obtaining a context paragraph representation for a problem word according to a method of analyzing textual information using an attention model according to an embodiment of the present disclosure.

As shown in fig. 3C, the operation flow of obtaining the context paragraph representation V for the question includes operations S311 to S313.

In operation S311, a first similarity matrix E between the coding problem feature U and the coding context feature H is calculated.

For example, a given coding problem feature U and coding context feature H are formulated as follows:

U＝{u₁，u₂…，u_m}

H＝{h₁，h₂…，h_n}

where m and n represent the length of the question word and the context paragraph, respectively. Attention similarity between coding problem feature U and coding context feature H

The matrix can be calculated from equation (4):

wherein i ∈ [1, m ]]，j∈[1，n]，E_ijRepresenting the semantic similarity between the ith word in the question sequence and the jth word in the context paragraph,

is a similarity function between words, which is the same in different models.

Similarity calculation function

Definition ofAs shown in equation (5):

where, [ ] is a vector element level multiplication;]for the operation of the concatenation between the vectors,

is a trainable weight matrix.

In operation S312, an attention distribution softmax (E) of each word in the context sequence with respect to the question word is calculated according to the similarity matrix E.

For example, the attention distribution softmax (e) may be calculated according to the number of question words in the context. For example, the attention distribution of the jth word about the question in the context can be obtained from the similarity matrix

In operation S313, a context paragraph representation V for the problem word is calculated according to the attention distribution softmax (e).

FIG. 3D schematically illustrates a flow chart for computing a context paragraph representation for a problem word from an attention distribution according to an embodiment of the present disclosure.

As shown in fig. 3D, the flowchart of calculating the context paragraph representation V for the problem word according to the attention distribution softmax (e) includes operations S321 to S322.

In operation S321, a context paragraph representation matrix of each word in the context sequence with respect to the question word is computed according to the encoded question feature U and the attention distribution softmax (e)

For example, attention distribution of jth word in context with respect to question word is obtained based on similarity testimonials

To obtainContext paragraph representation matrix of jth word with respect to problem word

In operation S322, a matrix is represented by the context paragraph

The elements in (a) are weighted and summed to calculate a context paragraph representation V for the problem word.

For example, by creating an output gate through a highway network, the input proportion of the context feature is controlled, and the output feature is subjected to nonlinear change, and the finally obtained question and the context paragraph expression V for the question word output by the context alignment module is shown as formula (6):

wherein, W_rAnd

for trainable weight parameters, a context paragraph for a question word is represented as

According to the embodiment of the disclosure, by introducing a highway network into the problem and context alignment module, better control of the input proportion of the problem-aware context paragraph features and the problem of gradient disappearance caused by multi-layer attention calculation can be realized.

In an embodiment of the disclosure, as shown in fig. 3A, the context self-alignment module 320 is configured to perform a self-attention operation on context paragraphs, capture semantic information between context paragraphs at different layers, and obtain a feature representation of each word in a context sequence

Fig. 3E schematically illustrates a flow diagram for computing a feature representation for each word in a context sequence from an encoding context feature and a context paragraph representation according to an embodiment of the disclosure.

As shown in FIGS. 3A and 3E, using context self-alignment module 320, a feature representation for each word in the context sequence is obtained based on the encoded context feature H and the context paragraph representation V

Includes operations S331 to S333.

In the context self-alignment module 320, long-distance dependencies between different words in the context are captured by self-attention operations on the encoding context features H.

In operation S331, a similarity matrix S between each word in the context sequence is calculated.

For example, a self-attention operation is performed using the initial encoding context characteristics H, a similarity matrix S between each word in the context sequence is calculated according to formula (5) and formula (7),

wherein equation (7) is:

wherein i ∈ [1, n ]]，j∈[1，n]When i is equal to j, the similarity between the context paragraph word and itself is calculated, which will cause the similarity of the paragraph word to be concentrated on itself and not be expanded to other words, so the similarity at this time is set to 0, and the similarity between the paragraph word and itself is avoided from being not calculated. When i ≠ j, it indicatesSimilarity between different words in a context paragraph, whereby a similarity calculation function is used

And calculating semantic similarity between different words.

Similarity matrix between each word in context sequence

As shown in equation (8):

wherein, W_sIs a trainable weight parameter matrix.

In operation S332, an attention distribution softmax (S) of each word in the context sequence with respect to the context sequence is calculated according to the similarity matrix S.

For example, by performing Softmax normalization on the self-similarity matrix of the context sequence, the attention distribution of each word in the context sequence with respect to the context can be obtained

In operation S333, a feature representation of each word in the sequence of contexts is computed according to the attention distribution Softmax (S)

For example, weighted summation is performed by using the attention distribution as the weight of the context sequence feature vector to obtain the weighted feature representation of each paragraph word

As shown in formula (9):

in an embodiment of the present disclosure, the context inference module 330 is configured to further infer context paragraph information, and fuse semantic information of the context to obtain a context feature representation Z for the problem.

As shown in FIG. 3A, a context inference module 330 is used to characterize a representation based on the characteristics of each word

A contextual feature representation Z for the problem is obtained.

For example, a weighted feature representation of each paragraph word to be obtained

Inputting into a bidirectional LSTM loop network, further coding and fusing semantic information of the coding problem feature U and the coding context feature H, and outputting a context feature representation aiming at the problem

In embodiments of the present disclosure, for a multi-tier attention module, the similarity matrix E according to tier j-1 attention^j ^-1And similarity matrix S^j-1Calculating the similarity matrix E of the attention module at the j-th layer^jAnd similarity matrix S^jJ is greater than or equal to 1 and less than or equal to J, and J is the total number of layers of the attention module.

In operation S204, the encoded context paragraph feature M for the context feature Z of the question word is obtained using the information inference layer.

In the embodiment of the disclosure, the information inference layer is mainly used for performing inference on the context characteristics Z of the question words so as to capture semantic information combining the question and the context sequence. In the layer, a two-layer bidirectional Transformer network structure is adopted as an encoder of the semantic information between the context and the question, and the upper part of the question word is aimed atInputting the context characteristics Z into the encoder, and further fusing the problem information into the context paragraphs to obtain the encoded context paragraph characteristics

In operation S205, the encoded context paragraph feature M is processed using the output layer to obtain and output the position of the question word in the context sequence.

In embodiments of the present disclosure, the primary role of the output layer is to predict the true answer to a question in a contextual context. The output layer is mainly used for predicting the specific starting position and the specific ending position of the question answers appearing in the context sequence. And predicting the network structure by using a pointer network as an answer.

For example, feature encoding the encoded topic features U using attention pooling operations generates an initial hidden state for the pointer network

Then, inputting the coded context paragraph feature M output by the information inference layer into the pointer network, and calculating the initial position p of the answer in the context paragraph¹The specific calculation is shown in formula (10):

after obtaining the probability distribution of the initial position of the answer, generating the initial state v of the answer ending position prediction module based on the probability distribution of the initial position and the output vector of the LSTM network²＝∑p¹V, wherein

Is the hidden state of the LSTM network output in the answer starting position prediction module. Finally, the probability p of the ending position of the answer is calculated according to the probability distribution vector of the starting position²As shown in formula (11):

wherein, w₁，w₂，

Are trainable weight parameters.

The probability distribution of the initial position and the end position of the answer output by the output layer is used for establishing a loss function of the model by using maximum likelihood estimation, and the objective function is shown as an equation (12):

wherein theta is all trainable parameters in the model,

the real starting position and the real ending position of the answer in the above and below paragraphs in the ith training example are respectively.

In an embodiment of the present disclosure, the method of analyzing text information using an attention model further includes determining at least one question corresponding to the question word using a position of the question word in the context sequence to provide a response to the at least one question.

In the embodiment of the disclosure, in the communication process between the customer service and the client or in the interaction process between the client and the automatic telephone customer service, the audio information in the conversation or interaction process is converted into the text file. And analyzing the text file by using the method for analyzing the text information by using the attention model in the embodiment of the disclosure. Then, according to the questions posed by the client, the context information is extracted based on the questions, and finally the content associated with the questions posed by the client is shown for the client. By the method for analyzing the text information by the attention model, more accurate context information can be extracted aiming at problems.

Fig. 4 schematically shows a structural diagram of a bidirectional Transformer module in an attention model according to an embodiment of the present disclosure.

As shown in fig. 4, the bidirectional Transformer module 400 includes a self-attention module 410, a feedforward neural network 420, and an output module 430.

Wherein the self-attention module 410 is configured to implement contextual feature extraction in conjunction with a convolutional neural network. The feedforward neural network 420 is used to perform feedback processing on the input problem vector Q and the context vector P, and the like. The output module 430 is configured to output data in the bidirectional Transformer module.

Fig. 5 schematically illustrates a block diagram of an apparatus for analyzing text information using an attention model according to an embodiment of the present disclosure.

As shown in fig. 5, the apparatus 500 for analyzing text information using an attention model includes a first module 510, a second module 520, a third module 530, a fourth module 540, and a fifth module 550.

Wherein the first module 510 is configured to convert a question word and a context sequence included in the input text information into a question vector Q and a context vector P, respectively, using the input layer.

The second module 520 is configured to use the context coding layer to perform coding and feature extraction on the problem vector and the context vector, respectively, to obtain a coding problem feature and a coding context feature.

The third module 530 is configured to fuse the encoded question feature U and the encoded context feature H using the information interaction layer to obtain a context feature Z for the question word.

The fourth module 540 is configured to use the information inference layer to derive an encoded context paragraph feature M for the context feature Z.

The fifth module 550 is configured to process the encoded context paragraph feature using the output layer to obtain and output a position of the problem word in the context sequence.

Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.

For example, any plurality of the first module 510, the second module 520, the third module 530, the fourth module 540, and the fifth module 550 may be combined and implemented in one module, or any one of the modules may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the first module 510, the second module 520, the third module 530, the fourth module 540, and the fifth module 550 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware by any other reasonable manner of integrating or packaging a circuit, or in any one of three implementations of software, hardware, and firmware, or in any suitable combination of any of them. Alternatively, at least one of the first module 510, the second module 520, the third module 530, the fourth module 540 and the fifth module 550 may be implemented at least partially as a computer program module, which, when executed, may perform a corresponding function.

Fig. 6 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, an electronic device 600 according to an embodiment of the present disclosure includes a processor 601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. Processor 601 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 601 may also include onboard memory for caching purposes. Processor 601 may include a single processing unit or multiple processing units for performing different actions of a method flow according to embodiments of the disclosure.

In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are stored. The processor 601, the ROM602, and the RAM 603 are connected to each other via a bus 604. The processor 601 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM602 and/or RAM 603. It is to be noted that the programs may also be stored in one or more memories other than the ROM602 and RAM 603. The processor 601 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

Electronic device 600 may also include input/output (I/O) interface 605, input/output (I/O) interface 605 also connected to bus 604, according to an embodiment of the disclosure. The electronic device 600 may also include one or more of the following components connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/electronic device described in the above embodiments; or may exist separately and not be incorporated into the device/electronic apparatus. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM602 and/or RAM 603 described above and/or one or more memories other than the ROM602 and RAM 603.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method provided by the embodiments of the present disclosure, when the computer program product is run on an electronic device, for causing the electronic device to carry out the method for analyzing textual information using an attention model provided by the embodiments of the present disclosure.

The computer program, when executed by the processor 601, performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed in the form of a signal on a network medium, downloaded and installed through the communication section 609, and/or installed from the removable medium 611. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, apparatus, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A method of analyzing textual information using an attention model comprising an input layer, a context coding layer, an information interaction layer, an information inference layer, and an output layer, the method comprising:

converting the question words and the context sequences included in the input text information into question vectors Q and context vectors P respectively by using the input layer; and

respectively coding and extracting features of the problem vector Q and the context vector P by using the context coding layer to obtain a coding problem feature U and a coding context feature H;

fusing the coding problem characteristic U and the coding context characteristic H by using the information interaction layer to obtain a context characteristic Z for the problem word;

obtaining an encoded context paragraph feature M for the context feature Z using the information inference layer; and

processing the encoded context paragraph feature M using the output layer to obtain and output a position of the question word in the context sequence.

2. The method of claim 1, wherein the context coding layer comprises a plurality of bi-directional Transformer modules, each bi-directional Transformer module comprising a self-attention module, a feed-forward neural network, and an output module.

3. The method of claim 1 or 2, wherein the information interaction layer comprises a stack of multiple layers of attention modules, each layer of attention module comprising a question and context alignment module, a context self-alignment module, and a context inference module;

the using the information interaction layer to fuse the coding question feature U and the coding context feature H to obtain the context feature Z for the question word includes:

obtaining a context paragraph representation V aiming at the question word according to the coding question feature U and the coding context feature H by using the question and context alignment module;

obtaining the feature representation of each word in the context sequence according to the coding context feature H and the context paragraph representation V by using the context self-alignment module

And

using said context inference module, according to said each wordRepresentation of features

A contextual feature representation Z for the problem word is obtained.

4. The method of claim 3, wherein the deriving a context paragraph representation V for a question word from the encoded question feature U and the encoded context feature H using the question-and-context alignment module comprises:

calculating a first similarity matrix E between the coding problem characteristic U and the coding context characteristic H;

calculating the attention distribution Softmax (E) of each word in the context sequence relative to the problem word according to the similarity matrix E; and

calculating the context paragraph representation V for the problem word according to the attention distribution Softmax (E).

5. The method of claim 4, wherein said computing the context paragraph representation V for the problem word from the attention distribution comprises:

calculating a context paragraph representation matrix of each word in the context sequence with respect to the question word according to the coding question features U and the attention distribution Softmax (E)

By representing matrices for context paragraphs

6. The method of claim 5, wherein the context order is computed as a function of the encoding context feature H and the context paragraph representation VFeature representation of each word in a column

The method comprises the following steps:

calculating a similarity matrix S between each word in the context sequence;

calculating an attention distribution Softmax (S) of each word in the context sequence with respect to the context sequence according to the similarity matrix S; and

computing a feature representation of each word in the sequence of contexts according to the attention distribution Softmax (S)

7. The method of claim 6, wherein for the multi-layered attention module, similarity matrix E according to layer j-1 attention^j-1And similarity matrix S^j-1Calculating the similarity matrix E of the attention module at the j-th layer^jAnd similarity matrix S^jJ is greater than or equal to 1 and less than or equal to J, and J is the total number of layers of the attention module.

8. The method of any of claims 1 to 7, further comprising: determining at least one question corresponding to the question word using the position of the question word in the sequence of contexts to provide a response to the at least one question.

9. An apparatus for analyzing text information using an attention model, the attention model including an input layer, a context coding layer, an information interaction layer, an information inference layer, and an output layer, the apparatus comprising:

10. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-8.

11. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 8.

12. A computer program product, wherein the product stores a computer program which, when executed, is capable of implementing the method according to any one of claims 1 to 8.