CN113378925B

CN113378925B - Method and device for generating double attention training sequence and readable storage medium

Info

Publication number: CN113378925B
Application number: CN202110646605.8A
Authority: CN
Inventors: 胡光敏; 娄坤; 姜黎
Original assignee: Hangzhou Ccvui Intelligent Technology Co ltd
Current assignee: Hangzhou Ccvui Intelligent Technology Co ltd
Priority date: 2021-06-10
Filing date: 2021-06-10
Publication date: 2022-09-20
Anticipated expiration: 2041-06-10
Also published as: CN113378925A

Abstract

The invention provides a method and a device for generating a double attention training sequence and a readable storage medium, and relates to the field of deep learning of computers. According to the method, a double-attention training sequence corresponding to the input text is generated by constructing a double-attention mechanism model facing the importance of characters in the input text and the correlation degree of the corresponding slot value of the characters in the input text, the correlation degree of the slot value is measured by taking a query paraphrase matrix of the slot value as prior knowledge, and the training sequence can have the importance characteristic and the correlation characteristic at the same time; the invention relates the training sequence front and back through Bi-lstm layer, and calculates the weight ratio, thereby obtaining the importance of the character in the input text, and using the training sequence as the training data can obtain the model with better attention to important information; the method reduces the number of the slot value labels used by the traditional labeling method through a double-sequence labeling method, and is beneficial to fitting of a training model and improving the efficiency.

Description

Method and device for generating double attention training sequence and readable storage medium

Technical Field

The invention relates to the field of computer deep learning, in particular to a method and a device for generating a double attention training sequence and a readable storage medium.

Background

With the continuous development of the computer field, various technologies based on machine learning are also continuously innovated.

Deep learning (deep learning) is a branch of machine learning, an algorithm that attempts to perform high-level abstraction of data using multiple processing layers that contain complex structures or consist of multiple nonlinear transformations. Deep learning is an algorithm based on data representation learning in machine learning, and various deep learning frameworks such as a convolutional neural network, a deep confidence network, a recurrent neural network and the like have been applied to the fields of computer vision, speech recognition, natural language processing, audio recognition, bioinformatics and the like and have obtained excellent effects.

The deep learning technology for natural language processing plays a very important role, and the deep learning technology can be used for processing natural language to enable a machine to finish corresponding operations of question answering, recording, query and the like, so that the deep learning technology has extremely wide application in the field of current intelligent services.

On one hand, the construction of the learning model can affect the natural language processing capability, on the other hand, the input training data of the learning model also has direct influence on the natural language processing capability, and before model training, the training data needs to be screened, labeled, preprocessed and the like, so that the purpose is that the input training data can train the model which is wanted by the user. Processing of training data becomes a key to improving natural language processing capabilities.

In the prior art, the IOBES is generally adopted to label a slot value sequence, the types of slot value labels are too many, model training and fitting are difficult, slot value label information is not effectively utilized, but IO, IOE1 and the like are adopted, so that various labels cannot be effectively distinguished in the scheme.

For this purpose, the application numbers are: the invention application of CN202011024360.7 provides a deep learning sequence labeling method, a device and a computer readable storage medium; preprocessing each word in a sentence of a text to be processed by utilizing an initialized embedding layer to obtain a word vector of each word in the text to be processed; processing the word vector through a bi-lstm layer to obtain text characteristics of the text to be processed; processing the text features through a softmax layer to obtain the prediction marking positions of the text features; and processing the predicted labeling position of the text characteristic through a loss layer to finish the sequence labeling of the text to be processed. The application aims to provide a novel method for calculating the loss value, which improves the accuracy of sequence labeling in deep learning, but the types of groove value labels are still too many, and the training sequence only contains attention information.

Therefore, it is necessary to provide a new training sequence generation method that can effectively utilize the slot value and reduce the types of slot value labels to solve the above-mentioned technical problems.

Disclosure of Invention

In order to solve the technical problem, the method for generating the double-attention training sequence is characterized in that the double-attention training sequence corresponding to the input text is generated by constructing a double-attention training model facing the importance of the characters in the input text and the correlation degree of the corresponding slot values of the characters in the input text, and the double-attention training sequence is used for training and learning a deep learning model facing language understanding;

the double attention mechanism model is used for vector conversion of an input text, dimension conversion of a vector of the input text, association conversion of an input text matrix, and state association of importance of characters in the input text and association of corresponding slot values of the characters in the input text;

the dual attention mechanism model comprises a character-hidden state path, a slot value-query value path and a state association path;

obtaining an associated hidden state matrix of the input text through a character-hidden state path, wherein the associated hidden state matrix is used for measuring the importance of each character in the input text;

obtaining a paraphrasing matrix of the slot value sequence through a slot value-query value path, wherein the paraphrasing matrix is used for measuring the association degree of the slot value corresponding to the character in the input text;

and the state association path is used for carrying out state association on the paraphrase matrix and the association hidden state matrix to obtain an association state matrix, and the association state matrix is used for generating a double-attention training sequence.

As a further solution, the dual-attention machine modeling model comprises an input text layer, an Embedding layer, a Bi-lstm layer, a query value paraphrasing layer, a Bi-attention layer, a density layer and a Softmax function layer;

the input text layer comprises a text-vector conversion layer and a text-slot value labeling layer; the text-vector conversion layer converts the input text into an input text vector taking indexes as elements through a character-index dictionary; the text-slot value labeling layer is used for labeling the slot values of all characters in the input text and obtaining a slot value sequence corresponding to the input text;

the Embedding layer is used for preprocessing each character in an input text to obtain a word vector of each character in the input text and form a vector matrix, and mapping low-latitude vectors to high-latitude vectors according to dimensional requirements and form a mapping matrix of the high-latitude vectors;

the Bi-LSTM layer comprises a forward LSTM and a backward LSTM; the Bilstm layer inputs the high weft vector matrix converted by the Embedding layer into the Bilstm layer, and bitwise splices the hidden vector output by the forward LSTM and the hidden state vector of the backward LSTM at each position to obtain hidden state vectors related to each other;

the query value paraphrasing layer is used for performing groove value paraphrasing on a groove value sequence corresponding to the input text, and paraphrasing contents are stored in each groove value element paraphrasing vector;

the Bi-attention layer is used for analyzing the importance of characters in an input text and the corresponding groove value association of the characters in the input text and obtaining a double-attention training sequence containing the importance of the characters and the double-attention of the groove value association;

the Dense layer is used for performing dimension transformation on an input vector through linear transformation and outputting a set dimension vector;

and the Softmax function layer calculates the weight ratio of each element in the input vector through a normalized exponential function, and performs classified output according to the weight ratio.

As a further solution, the dual-attention machine model obtains the associative hidden state matrix through a character-hidden state path, where the character-hidden state path includes the following steps:

converting input text into input text vectors through the input text layer

Inputting the input text vector into an Embedding layer, and converting the input text vector into an input text matrix with the same dimension as the slot value sequence through the Embedding layer;

inputting an input text matrix to a Bi-lstm layer; the Bi-LSTM layer obtains a forward hidden state vector and a backward hidden state vector of the input text matrix through a forward LSTM and a backward LSTM;

the Bi-lstm layer splices the forward hidden state vector and the backward hidden state vector according to corresponding positions of elements to obtain associated hidden state vectors which are related in the front and back;

and obtaining the associated hidden state vector of each input text vector in the input text matrix through a Bi-lstm layer, and forming an associated hidden state matrix.

As a further solution, the dual-attention machine model obtains the paraphrase matrix through a slot-query path, and the slot-query path comprises the following steps:

marking the slot value of each character in the input text to obtain a corresponding slot value sequence;

explaining each slot value element in the slot value sequence through a query value explanation layer, and storing explanation contents into each slot value element explanation vector;

and obtaining a paraphrase matrix through the slot value elements and the corresponding paraphrase vectors.

As a further solution, the slot value labeling is performed manually and/or by machine; the paraphrase content in the paraphrase vector comprises synonymy replacement words of the slot values, the text meaning of the slot values, the association degree information of the replacement words of the slot values and the paraphrase information corresponding to the slot values, and the dimension of the paraphrase vector is in direct proportion to the content amount of the paraphrase.

As a further solution, the dual-attention mechanism model obtains a correlation state matrix through a state correlation path, and the state correlation path comprises the following steps:

performing matrix multiplication on the correlation hidden state matrix and the paraphrase matrix to obtain a correlation state matrix;

performing important state direction summation on the correlation state matrix to obtain a character important state vector;

solving the importance weight value of each character in the input text in the character importance state vector through a Softmax function layer;

taking the importance weight value of each character in the input text as a vector element to obtain a character importance vector;

performing correlation state direction summation on the correlation state matrix to obtain a slot value correlation state vector;

solving the association degree weight value of each slot value and the corresponding character in the slot value association state vector through a Softmax function layer;

taking the association degree weight value of each slot value and the corresponding character as a vector element to obtain a slot value association degree vector;

and performing head-to-tail splicing on the character importance vector and the slot value association vector in the last dimension to obtain double attention vectors.

As a further solution, the significant status direction, and the associated status direction are determined by:

setting vector elements of an input text vector as row direction elements of an associated hidden state matrix, and expanding the column direction of the associated hidden state matrix through an Embedding layer;

setting the slot value elements as row elements of the paraphrase matrix, and setting the vector elements of the slot value elements corresponding to the paraphrase vector as column elements of the paraphrase matrix;

the row direction of the correlation state matrix is the correlation state direction; the column direction of the associated state matrix is then the significant state direction.

As a further solution, before the training learning of the dual-attention force vector by the deep learning model facing language understanding, a dual-sequence labeling is carried out, wherein the dual-sequence labeling comprises the following steps:

labeling the double attention vectors by an IOB2 method, and obtaining an original labeling sequence;

removing the labeling prefix of the original labeling sequence, and dividing the original labeling sequence without the prefix into a starting labeling sequence and an ending labeling sequence;

the initial labeling sequence is used for recording initial labels of the same type of labels, and only the first appearing label is reserved for a plurality of continuously appearing labels of the same type;

the end labeling sequence is used for recording end labels of the same type of label, and only the last label is reserved for a plurality of labels of the same type which continuously appear.

As a further solution, the present invention provides a dual attention training sequence generating apparatus, the electronic device comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the dual attention training sequence generation method as described above.

As a further solution, the present invention provides a dual attention training sequence generation computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the dual attention training sequence generation method as described above.

Compared with the related art, the method, the device and the readable storage medium for generating the double attention training sequence have the following beneficial effects:

1. according to the method, a double-attention training sequence corresponding to the input text is generated by constructing a double-attention mechanism model facing the importance of characters in the input text and the correlation degree of the corresponding slot value of the characters in the input text, the correlation degree of the slot value is measured by taking a query paraphrase matrix of the slot value as prior knowledge, and the training sequence can have the importance characteristic and the correlation characteristic at the same time;

2. the invention relates the training sequence front and back through Bi-lstm layer, and calculates the weight ratio, thereby obtaining the importance of the character in the input text, and using the training sequence as the training data can obtain the model with better attention to important information;

3. the method reduces the number of the slot value labels used by the traditional labeling method through a double-sequence labeling method, and is beneficial to fitting of a training model and improving the efficiency.

Drawings

FIG. 1 is a system diagram of a method for generating a dual attention training sequence according to a preferred embodiment of the present invention;

FIG. 2 is a schematic diagram of a state association path of a dual attention training sequence generation method according to a preferred embodiment of the present invention;

fig. 3 is a schematic diagram of a dual sequence labeling process of a dual attention training sequence generation method according to a preferred embodiment of the present invention.

Detailed Description

The invention is further described with reference to the following figures and embodiments.

As shown in fig. 1 to fig. 3, the method for generating a dual attention training sequence according to the present invention is characterized in that a dual attention training sequence corresponding to an input text is generated by constructing a dual attention training model oriented to importance of characters in the input text and relevance of corresponding slot values of the characters in the input text, and the dual attention training sequence is used for training and learning a deep learning model oriented to language understanding; the double attention mechanism model is used for vector conversion of an input text, dimension conversion of a vector of the input text, association conversion of an input text matrix, and state association of importance of characters in the input text and association of corresponding slot values of the characters in the input text; the dual attention mechanism model comprises a character-hidden state path, a slot value-query value path and a state association path; obtaining an associated hidden state matrix of the input text through a character-hidden state path, wherein the associated hidden state matrix is used for measuring the importance of each character in the input text; obtaining a paraphrasing matrix of the slot value sequence through a slot value-query value path, wherein the paraphrasing matrix is used for measuring the association degree of the slot value corresponding to the character in the input text; the state association path is used for carrying out state association on the paraphrase matrix and the association hidden state matrix to obtain an association state matrix, and the association state matrix is used for generating a double attention training sequence.

It should be noted that: the double-attention machine system can enable training data to take two aspects of characteristics into consideration, and has a positive effect on natural language identification, information in natural language is not all important, so that people need to evaluate the importance of each character input in the text, and have a side to understand important characters, but not influence of important content on a model is small, meanwhile, corresponding slot values of characters in the input text are not completely matched, and the slot values corresponding to the characters cannot be singly considered to be strictly corresponding to the characters, so that people need to evaluate the corresponding slot values of the characters, and the matching degree is reflected by the relevance of the corresponding slot values of the characters in the input text, and the slot values can be filled by selecting a label with the highest score; and the optimal slot value filling is selected by means of the Softmax function.

and the Softmax function layer respectively calculates the weight ratio of each element in the input vector through a normalization index function, and classifies and outputs the elements according to the weight ratio.

It should be noted that: the character-hidden state path is mainly used for correlating characters in an input text with one another, and each character in the input text is influenced by the context and has different importance degrees, for example, the input text: "how do I want to go to Shanghai in tomorrow, where weather? "there" is obviously "Shanghai", but if the context is not associated, the model cannot associate the two, and the context is associated through the Bi-lstm layer in the embodiment, so that the training of the model is facilitated.

converting input text into input text vectors through the input text layer

It should be noted that: the slot-query path mainly explains the definition of the slot in detail, and provides other alternative and similar slot values for replacement, calculates the association degrees respectively, and outputs the vector with the highest association degree for machine learning.

As a further solution, the dual-attention mechanism model obtains the correlation state matrix through a state correlation path, and the state correlation path comprises the following steps:

carrying out important state direction summation on the correlation state matrix to obtain a character important state vector;

It should be noted that: the correlation state direction and the important state direction are determined according to the horizontal and vertical coordinate elements of the correlation hidden state matrix and the paraphrase matrix, and the implementation selects the mode, and the row direction of the correlation state matrix is fixedly taken as the correlation state direction; and the column direction of the associated state matrix is taken as an important state direction, so that the implementation is convenient.

It should be noted that: splicing the character importance vector and the slot value association degree vector to obtain a double attention force vector which is twice as large as the dimension of the initially input text slot value sequence, and reducing the dimension into an initial dimension through a Dense layer.

reducing the dimension of the double attention vectors into an initial dimension through a Dense layer, wherein the initial dimension is the same as the groove value sequence;

It should be noted that: before training, we need to label the training data so that the model knows where the slot values are located in its sequence, and the existing labeling method is: IOB1, IOE1, IOE2, IOBES, IO, etc. are commonly used, but there are many labels in the conventional labeling method, thereby affecting the efficiency of the model, for example, IOBES include 5 kinds of labels, S label is used to represent text blocks composed of a single character, when more than one character is composed, the first character always uses B label, the last character always uses E label, the middle character uses I label, and O, i.e., Other, represents Other, and is used to mark irrelevant characters. In order to solve the above problem, the present embodiment labels the training data by a dual-sequence labeling method, for example:

the original labels noted by IOB2 are: "O O O O B-reserve _ type O B-city I-city B-state O B-party _ size _ description I-party _ size _ description I-party _ size _ description".

The original tag with the prefix removed is "O O O O residual _ type O city state O party _ size _ description party _ size _ description _ size _ description".

The initial tagging sequence is used for recording initial tags of the same type of tags, and only the first occurring tag is reserved for a plurality of continuously occurring tags of the same type, so that the initial sequence is as follows: "O O O O residual _ type O city O state O party _ size _ description O O O".

The end labeling sequence is used for recording end labels of the same type of labels, and only the last label is reserved for a plurality of labels of the same type which continuously appear, so that the end labeling sequence is as follows: "O O O O residual _ type O city state O O O O party _ size _ description".

The corresponding position of each slot value can be determined through the starting sequence and the ending labeling sequence, and the related label is only used for marking an irrelevant character O, so that the model learning efficiency is higher.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method for generating a double-attention training sequence is characterized in that the double-attention training sequence corresponding to an input text is generated by constructing a double-attention mechanism model facing importance of characters in the input text and correlation of corresponding slot values of the characters in the input text, and the double-attention training sequence is used for training and learning a deep learning model facing language understanding;

the state association path is used for carrying out state association on the paraphrase matrix and the association hidden state matrix to obtain an association state matrix, and the association state matrix is used for generating a double-attention training sequence;

the double attention mechanism model comprises an input text layer, an Embelling layer, a Bi-lstm layer, a query value paraphrasing layer, a Bi-attention layer, a Dense layer and a Softmax function layer;

the Embedding layer is used for preprocessing each character in an input text to obtain a word vector of each character in the input text and form a vector matrix, and mapping low latitude vectors to high latitude vectors according to dimensional requirements and form a mapping matrix of the high latitude vectors;

the Dense layer is used for carrying out dimension transformation on an input vector through linear transformation and outputting a set dimension vector;

the Softmax function layer calculates weight ratio of each element in the input vector through a normalization index function, and performs classified output according to the weight ratio;

the dual attention mechanism model obtains an associated state matrix through a state associated path, and the state associated path comprises the following steps:

performing important state vector summation on the correlation state matrix to obtain a character important state vector;

performing correlation state vector summation on the correlation state matrix to obtain a slot value correlation state vector;

2. The method for generating a bi-attention training sequence according to claim 1, wherein the bi-attention mechanism model obtains the associative hidden state matrix through a character-hidden state path, and the character-hidden state path comprises the following steps:

converting input text into input text vectors through the input text layer

3. The method as claimed in claim 1, wherein the dual attention mechanism model obtains the paraphrase matrix through a bin-query path, and the bin-query path comprises the following steps:

4. The method for generating a bi-attention training sequence according to claim 3, wherein the slot value labeling is performed manually and/or by machine; the paraphrase content in the paraphrase vector comprises synonymy replacement words of the slot values, the text meaning of the slot values, the association degree information of the replacement words of the slot values and the paraphrase information corresponding to the slot values, and the dimension of the paraphrase vector is in direct proportion to the content amount of the paraphrase.

5. The method of claim 1, wherein the significant state vector and the associated state vector are determined by:

the row direction of the correlation state matrix is a correlation state vector; the column direction of the associated state matrix is then the significant state vector.

6. The method as claimed in claim 5, wherein a dual-sequence labeling is performed before the dual-attention vector is trained and learned by a deep learning model for language understanding, the dual-sequence labeling comprises the following steps:

7. A dual attention training sequence generating apparatus, characterized in that the apparatus comprises at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the dual attention training sequence generation method of any one of claims 1 to 6.

8. A dual attention training sequence generation computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements a dual attention training sequence generation method as claimed in any one of claims 1 to 6.