CN113378925B - Method and device for generating double attention training sequence and readable storage medium - Google Patents

Method and device for generating double attention training sequence and readable storage medium Download PDF

Info

Publication number
CN113378925B
CN113378925B CN202110646605.8A CN202110646605A CN113378925B CN 113378925 B CN113378925 B CN 113378925B CN 202110646605 A CN202110646605 A CN 202110646605A CN 113378925 B CN113378925 B CN 113378925B
Authority
CN
China
Prior art keywords
vector
input text
matrix
layer
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110646605.8A
Other languages
Chinese (zh)
Other versions
CN113378925A (en
Inventor
胡光敏
娄坤
姜黎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Ccvui Intelligent Technology Co ltd
Original Assignee
Hangzhou Ccvui Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Ccvui Intelligent Technology Co ltd filed Critical Hangzhou Ccvui Intelligent Technology Co ltd
Priority to CN202110646605.8A priority Critical patent/CN113378925B/en
Publication of CN113378925A publication Critical patent/CN113378925A/en
Application granted granted Critical
Publication of CN113378925B publication Critical patent/CN113378925B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for generating a double attention training sequence and a readable storage medium, and relates to the field of deep learning of computers. According to the method, a double-attention training sequence corresponding to the input text is generated by constructing a double-attention mechanism model facing the importance of characters in the input text and the correlation degree of the corresponding slot value of the characters in the input text, the correlation degree of the slot value is measured by taking a query paraphrase matrix of the slot value as prior knowledge, and the training sequence can have the importance characteristic and the correlation characteristic at the same time; the invention relates the training sequence front and back through Bi-lstm layer, and calculates the weight ratio, thereby obtaining the importance of the character in the input text, and using the training sequence as the training data can obtain the model with better attention to important information; the method reduces the number of the slot value labels used by the traditional labeling method through a double-sequence labeling method, and is beneficial to fitting of a training model and improving the efficiency.

Description

Method and device for generating double attention training sequence and readable storage medium
Technical Field
The invention relates to the field of computer deep learning, in particular to a method and a device for generating a double attention training sequence and a readable storage medium.
Background
With the continuous development of the computer field, various technologies based on machine learning are also continuously innovated.
Deep learning (deep learning) is a branch of machine learning, an algorithm that attempts to perform high-level abstraction of data using multiple processing layers that contain complex structures or consist of multiple nonlinear transformations. Deep learning is an algorithm based on data representation learning in machine learning, and various deep learning frameworks such as a convolutional neural network, a deep confidence network, a recurrent neural network and the like have been applied to the fields of computer vision, speech recognition, natural language processing, audio recognition, bioinformatics and the like and have obtained excellent effects.
The deep learning technology for natural language processing plays a very important role, and the deep learning technology can be used for processing natural language to enable a machine to finish corresponding operations of question answering, recording, query and the like, so that the deep learning technology has extremely wide application in the field of current intelligent services.
On one hand, the construction of the learning model can affect the natural language processing capability, on the other hand, the input training data of the learning model also has direct influence on the natural language processing capability, and before model training, the training data needs to be screened, labeled, preprocessed and the like, so that the purpose is that the input training data can train the model which is wanted by the user. Processing of training data becomes a key to improving natural language processing capabilities.
In the prior art, the IOBES is generally adopted to label a slot value sequence, the types of slot value labels are too many, model training and fitting are difficult, slot value label information is not effectively utilized, but IO, IOE1 and the like are adopted, so that various labels cannot be effectively distinguished in the scheme.
For this purpose, the application numbers are: the invention application of CN202011024360.7 provides a deep learning sequence labeling method, a device and a computer readable storage medium; preprocessing each word in a sentence of a text to be processed by utilizing an initialized embedding layer to obtain a word vector of each word in the text to be processed; processing the word vector through a bi-lstm layer to obtain text characteristics of the text to be processed; processing the text features through a softmax layer to obtain the prediction marking positions of the text features; and processing the predicted labeling position of the text characteristic through a loss layer to finish the sequence labeling of the text to be processed. The application aims to provide a novel method for calculating the loss value, which improves the accuracy of sequence labeling in deep learning, but the types of groove value labels are still too many, and the training sequence only contains attention information.
Therefore, it is necessary to provide a new training sequence generation method that can effectively utilize the slot value and reduce the types of slot value labels to solve the above-mentioned technical problems.
Disclosure of Invention
In order to solve the technical problem, the method for generating the double-attention training sequence is characterized in that the double-attention training sequence corresponding to the input text is generated by constructing a double-attention training model facing the importance of the characters in the input text and the correlation degree of the corresponding slot values of the characters in the input text, and the double-attention training sequence is used for training and learning a deep learning model facing language understanding;
the double attention mechanism model is used for vector conversion of an input text, dimension conversion of a vector of the input text, association conversion of an input text matrix, and state association of importance of characters in the input text and association of corresponding slot values of the characters in the input text;
the dual attention mechanism model comprises a character-hidden state path, a slot value-query value path and a state association path;
obtaining an associated hidden state matrix of the input text through a character-hidden state path, wherein the associated hidden state matrix is used for measuring the importance of each character in the input text;
obtaining a paraphrasing matrix of the slot value sequence through a slot value-query value path, wherein the paraphrasing matrix is used for measuring the association degree of the slot value corresponding to the character in the input text;
and the state association path is used for carrying out state association on the paraphrase matrix and the association hidden state matrix to obtain an association state matrix, and the association state matrix is used for generating a double-attention training sequence.
As a further solution, the dual-attention machine modeling model comprises an input text layer, an Embedding layer, a Bi-lstm layer, a query value paraphrasing layer, a Bi-attention layer, a density layer and a Softmax function layer;
the input text layer comprises a text-vector conversion layer and a text-slot value labeling layer; the text-vector conversion layer converts the input text into an input text vector taking indexes as elements through a character-index dictionary; the text-slot value labeling layer is used for labeling the slot values of all characters in the input text and obtaining a slot value sequence corresponding to the input text;
the Embedding layer is used for preprocessing each character in an input text to obtain a word vector of each character in the input text and form a vector matrix, and mapping low-latitude vectors to high-latitude vectors according to dimensional requirements and form a mapping matrix of the high-latitude vectors;
the Bi-LSTM layer comprises a forward LSTM and a backward LSTM; the Bilstm layer inputs the high weft vector matrix converted by the Embedding layer into the Bilstm layer, and bitwise splices the hidden vector output by the forward LSTM and the hidden state vector of the backward LSTM at each position to obtain hidden state vectors related to each other;
the query value paraphrasing layer is used for performing groove value paraphrasing on a groove value sequence corresponding to the input text, and paraphrasing contents are stored in each groove value element paraphrasing vector;
the Bi-attention layer is used for analyzing the importance of characters in an input text and the corresponding groove value association of the characters in the input text and obtaining a double-attention training sequence containing the importance of the characters and the double-attention of the groove value association;
the Dense layer is used for performing dimension transformation on an input vector through linear transformation and outputting a set dimension vector;
and the Softmax function layer calculates the weight ratio of each element in the input vector through a normalized exponential function, and performs classified output according to the weight ratio.
As a further solution, the dual-attention machine model obtains the associative hidden state matrix through a character-hidden state path, where the character-hidden state path includes the following steps:
converting input text into input text vectors through the input text layer
Inputting the input text vector into an Embedding layer, and converting the input text vector into an input text matrix with the same dimension as the slot value sequence through the Embedding layer;
inputting an input text matrix to a Bi-lstm layer; the Bi-LSTM layer obtains a forward hidden state vector and a backward hidden state vector of the input text matrix through a forward LSTM and a backward LSTM;
the Bi-lstm layer splices the forward hidden state vector and the backward hidden state vector according to corresponding positions of elements to obtain associated hidden state vectors which are related in the front and back;
and obtaining the associated hidden state vector of each input text vector in the input text matrix through a Bi-lstm layer, and forming an associated hidden state matrix.
As a further solution, the dual-attention machine model obtains the paraphrase matrix through a slot-query path, and the slot-query path comprises the following steps:
marking the slot value of each character in the input text to obtain a corresponding slot value sequence;
explaining each slot value element in the slot value sequence through a query value explanation layer, and storing explanation contents into each slot value element explanation vector;
and obtaining a paraphrase matrix through the slot value elements and the corresponding paraphrase vectors.
As a further solution, the slot value labeling is performed manually and/or by machine; the paraphrase content in the paraphrase vector comprises synonymy replacement words of the slot values, the text meaning of the slot values, the association degree information of the replacement words of the slot values and the paraphrase information corresponding to the slot values, and the dimension of the paraphrase vector is in direct proportion to the content amount of the paraphrase.
As a further solution, the dual-attention mechanism model obtains a correlation state matrix through a state correlation path, and the state correlation path comprises the following steps:
performing matrix multiplication on the correlation hidden state matrix and the paraphrase matrix to obtain a correlation state matrix;
performing important state direction summation on the correlation state matrix to obtain a character important state vector;
solving the importance weight value of each character in the input text in the character importance state vector through a Softmax function layer;
taking the importance weight value of each character in the input text as a vector element to obtain a character importance vector;
performing correlation state direction summation on the correlation state matrix to obtain a slot value correlation state vector;
solving the association degree weight value of each slot value and the corresponding character in the slot value association state vector through a Softmax function layer;
taking the association degree weight value of each slot value and the corresponding character as a vector element to obtain a slot value association degree vector;
and performing head-to-tail splicing on the character importance vector and the slot value association vector in the last dimension to obtain double attention vectors.
As a further solution, the significant status direction, and the associated status direction are determined by:
setting vector elements of an input text vector as row direction elements of an associated hidden state matrix, and expanding the column direction of the associated hidden state matrix through an Embedding layer;
setting the slot value elements as row elements of the paraphrase matrix, and setting the vector elements of the slot value elements corresponding to the paraphrase vector as column elements of the paraphrase matrix;
the row direction of the correlation state matrix is the correlation state direction; the column direction of the associated state matrix is then the significant state direction.
As a further solution, before the training learning of the dual-attention force vector by the deep learning model facing language understanding, a dual-sequence labeling is carried out, wherein the dual-sequence labeling comprises the following steps:
labeling the double attention vectors by an IOB2 method, and obtaining an original labeling sequence;
removing the labeling prefix of the original labeling sequence, and dividing the original labeling sequence without the prefix into a starting labeling sequence and an ending labeling sequence;
the initial labeling sequence is used for recording initial labels of the same type of labels, and only the first appearing label is reserved for a plurality of continuously appearing labels of the same type;
the end labeling sequence is used for recording end labels of the same type of label, and only the last label is reserved for a plurality of labels of the same type which continuously appear.
As a further solution, the present invention provides a dual attention training sequence generating apparatus, the electronic device comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the dual attention training sequence generation method as described above.
As a further solution, the present invention provides a dual attention training sequence generation computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the dual attention training sequence generation method as described above.
Compared with the related art, the method, the device and the readable storage medium for generating the double attention training sequence have the following beneficial effects:
1. according to the method, a double-attention training sequence corresponding to the input text is generated by constructing a double-attention mechanism model facing the importance of characters in the input text and the correlation degree of the corresponding slot value of the characters in the input text, the correlation degree of the slot value is measured by taking a query paraphrase matrix of the slot value as prior knowledge, and the training sequence can have the importance characteristic and the correlation characteristic at the same time;
2. the invention relates the training sequence front and back through Bi-lstm layer, and calculates the weight ratio, thereby obtaining the importance of the character in the input text, and using the training sequence as the training data can obtain the model with better attention to important information;
3. the method reduces the number of the slot value labels used by the traditional labeling method through a double-sequence labeling method, and is beneficial to fitting of a training model and improving the efficiency.
Drawings
FIG. 1 is a system diagram of a method for generating a dual attention training sequence according to a preferred embodiment of the present invention;
FIG. 2 is a schematic diagram of a state association path of a dual attention training sequence generation method according to a preferred embodiment of the present invention;
fig. 3 is a schematic diagram of a dual sequence labeling process of a dual attention training sequence generation method according to a preferred embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and embodiments.
As shown in fig. 1 to fig. 3, the method for generating a dual attention training sequence according to the present invention is characterized in that a dual attention training sequence corresponding to an input text is generated by constructing a dual attention training model oriented to importance of characters in the input text and relevance of corresponding slot values of the characters in the input text, and the dual attention training sequence is used for training and learning a deep learning model oriented to language understanding; the double attention mechanism model is used for vector conversion of an input text, dimension conversion of a vector of the input text, association conversion of an input text matrix, and state association of importance of characters in the input text and association of corresponding slot values of the characters in the input text; the dual attention mechanism model comprises a character-hidden state path, a slot value-query value path and a state association path; obtaining an associated hidden state matrix of the input text through a character-hidden state path, wherein the associated hidden state matrix is used for measuring the importance of each character in the input text; obtaining a paraphrasing matrix of the slot value sequence through a slot value-query value path, wherein the paraphrasing matrix is used for measuring the association degree of the slot value corresponding to the character in the input text; the state association path is used for carrying out state association on the paraphrase matrix and the association hidden state matrix to obtain an association state matrix, and the association state matrix is used for generating a double attention training sequence.
It should be noted that: the double-attention machine system can enable training data to take two aspects of characteristics into consideration, and has a positive effect on natural language identification, information in natural language is not all important, so that people need to evaluate the importance of each character input in the text, and have a side to understand important characters, but not influence of important content on a model is small, meanwhile, corresponding slot values of characters in the input text are not completely matched, and the slot values corresponding to the characters cannot be singly considered to be strictly corresponding to the characters, so that people need to evaluate the corresponding slot values of the characters, and the matching degree is reflected by the relevance of the corresponding slot values of the characters in the input text, and the slot values can be filled by selecting a label with the highest score; and the optimal slot value filling is selected by means of the Softmax function.
As a further solution, the dual-attention machine modeling model comprises an input text layer, an Embedding layer, a Bi-lstm layer, a query value paraphrasing layer, a Bi-attention layer, a density layer and a Softmax function layer;
the input text layer comprises a text-vector conversion layer and a text-slot value labeling layer; the text-vector conversion layer converts the input text into an input text vector taking indexes as elements through a character-index dictionary; the text-slot value labeling layer is used for labeling the slot values of all characters in the input text and obtaining a slot value sequence corresponding to the input text;
the Embedding layer is used for preprocessing each character in an input text to obtain a word vector of each character in the input text and form a vector matrix, and mapping low-latitude vectors to high-latitude vectors according to dimensional requirements and form a mapping matrix of the high-latitude vectors;
the Bi-LSTM layer comprises a forward LSTM and a backward LSTM; the Bilstm layer inputs the high weft vector matrix converted by the Embedding layer into the Bilstm layer, and bitwise splices the hidden vector output by the forward LSTM and the hidden state vector of the backward LSTM at each position to obtain hidden state vectors related to each other;
the query value paraphrasing layer is used for performing groove value paraphrasing on a groove value sequence corresponding to the input text, and paraphrasing contents are stored in each groove value element paraphrasing vector;
the Bi-attention layer is used for analyzing the importance of characters in an input text and the corresponding groove value association of the characters in the input text and obtaining a double-attention training sequence containing the importance of the characters and the double-attention of the groove value association;
the Dense layer is used for performing dimension transformation on an input vector through linear transformation and outputting a set dimension vector;
and the Softmax function layer respectively calculates the weight ratio of each element in the input vector through a normalization index function, and classifies and outputs the elements according to the weight ratio.
It should be noted that: the character-hidden state path is mainly used for correlating characters in an input text with one another, and each character in the input text is influenced by the context and has different importance degrees, for example, the input text: "how do I want to go to Shanghai in tomorrow, where weather? "there" is obviously "Shanghai", but if the context is not associated, the model cannot associate the two, and the context is associated through the Bi-lstm layer in the embodiment, so that the training of the model is facilitated.
As a further solution, the dual-attention machine model obtains the associative hidden state matrix through a character-hidden state path, where the character-hidden state path includes the following steps:
converting input text into input text vectors through the input text layer
Inputting the input text vector into an Embedding layer, and converting the input text vector into an input text matrix with the same dimension as the slot value sequence through the Embedding layer;
inputting an input text matrix to a Bi-lstm layer; the Bi-LSTM layer obtains a forward hidden state vector and a backward hidden state vector of the input text matrix through a forward LSTM and a backward LSTM;
the Bi-lstm layer splices the forward hidden state vector and the backward hidden state vector according to corresponding positions of elements to obtain associated hidden state vectors which are related in the front and back;
and obtaining the associated hidden state vector of each input text vector in the input text matrix through a Bi-lstm layer, and forming an associated hidden state matrix.
It should be noted that: the slot-query path mainly explains the definition of the slot in detail, and provides other alternative and similar slot values for replacement, calculates the association degrees respectively, and outputs the vector with the highest association degree for machine learning.
As a further solution, the dual-attention machine model obtains the paraphrase matrix through a slot-query path, and the slot-query path comprises the following steps:
marking the slot value of each character in the input text to obtain a corresponding slot value sequence;
explaining each slot value element in the slot value sequence through a query value explanation layer, and storing explanation contents into each slot value element explanation vector;
and obtaining a paraphrase matrix through the slot value elements and the corresponding paraphrase vectors.
As a further solution, the slot value labeling is performed manually and/or by machine; the paraphrase content in the paraphrase vector comprises synonymy replacement words of the slot values, the text meaning of the slot values, the association degree information of the replacement words of the slot values and the paraphrase information corresponding to the slot values, and the dimension of the paraphrase vector is in direct proportion to the content amount of the paraphrase.
As a further solution, the dual-attention mechanism model obtains the correlation state matrix through a state correlation path, and the state correlation path comprises the following steps:
performing matrix multiplication on the correlation hidden state matrix and the paraphrase matrix to obtain a correlation state matrix;
carrying out important state direction summation on the correlation state matrix to obtain a character important state vector;
solving the importance weight value of each character in the input text in the character importance state vector through a Softmax function layer;
taking the importance weight value of each character in the input text as a vector element to obtain a character importance vector;
performing correlation state direction summation on the correlation state matrix to obtain a slot value correlation state vector;
solving the association degree weight value of each slot value and the corresponding character in the slot value association state vector through a Softmax function layer;
taking the association degree weight value of each slot value and the corresponding character as a vector element to obtain a slot value association degree vector;
and performing head-to-tail splicing on the character importance vector and the slot value association vector in the last dimension to obtain double attention vectors.
As a further solution, the significant status direction, and the associated status direction are determined by:
setting vector elements of an input text vector as row direction elements of an associated hidden state matrix, and expanding the column direction of the associated hidden state matrix through an Embedding layer;
setting the slot value elements as row elements of the paraphrase matrix, and setting the vector elements of the slot value elements corresponding to the paraphrase vector as column elements of the paraphrase matrix;
the row direction of the correlation state matrix is the correlation state direction; the column direction of the associated state matrix is then the significant state direction.
It should be noted that: the correlation state direction and the important state direction are determined according to the horizontal and vertical coordinate elements of the correlation hidden state matrix and the paraphrase matrix, and the implementation selects the mode, and the row direction of the correlation state matrix is fixedly taken as the correlation state direction; and the column direction of the associated state matrix is taken as an important state direction, so that the implementation is convenient.
It should be noted that: splicing the character importance vector and the slot value association degree vector to obtain a double attention force vector which is twice as large as the dimension of the initially input text slot value sequence, and reducing the dimension into an initial dimension through a Dense layer.
As a further solution, before the training learning of the dual-attention force vector by the deep learning model facing language understanding, a dual-sequence labeling is carried out, wherein the dual-sequence labeling comprises the following steps:
reducing the dimension of the double attention vectors into an initial dimension through a Dense layer, wherein the initial dimension is the same as the groove value sequence;
labeling the double attention vectors by an IOB2 method, and obtaining an original labeling sequence;
removing the labeling prefix of the original labeling sequence, and dividing the original labeling sequence without the prefix into a starting labeling sequence and an ending labeling sequence;
the initial labeling sequence is used for recording initial labels of the same type of labels, and only the first appearing label is reserved for a plurality of continuously appearing labels of the same type;
the end labeling sequence is used for recording end labels of the same type of label, and only the last label is reserved for a plurality of labels of the same type which continuously appear.
It should be noted that: before training, we need to label the training data so that the model knows where the slot values are located in its sequence, and the existing labeling method is: IOB1, IOE1, IOE2, IOBES, IO, etc. are commonly used, but there are many labels in the conventional labeling method, thereby affecting the efficiency of the model, for example, IOBES include 5 kinds of labels, S label is used to represent text blocks composed of a single character, when more than one character is composed, the first character always uses B label, the last character always uses E label, the middle character uses I label, and O, i.e., Other, represents Other, and is used to mark irrelevant characters. In order to solve the above problem, the present embodiment labels the training data by a dual-sequence labeling method, for example:
the original labels noted by IOB2 are: "O O O O B-reserve _ type O B-city I-city B-state O B-party _ size _ description I-party _ size _ description I-party _ size _ description".
The original tag with the prefix removed is "O O O O residual _ type O city state O party _ size _ description party _ size _ description _ size _ description".
The initial tagging sequence is used for recording initial tags of the same type of tags, and only the first occurring tag is reserved for a plurality of continuously occurring tags of the same type, so that the initial sequence is as follows: "O O O O residual _ type O city O state O party _ size _ description O O O".
The end labeling sequence is used for recording end labels of the same type of labels, and only the last label is reserved for a plurality of labels of the same type which continuously appear, so that the end labeling sequence is as follows: "O O O O residual _ type O city state O O O O party _ size _ description".
The corresponding position of each slot value can be determined through the starting sequence and the ending labeling sequence, and the related label is only used for marking an irrelevant character O, so that the model learning efficiency is higher.
As a further solution, the present invention provides a dual attention training sequence generating apparatus, the electronic device comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the dual attention training sequence generation method as described above.
As a further solution, the present invention provides a dual attention training sequence generation computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the dual attention training sequence generation method as described above.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (8)

1. A method for generating a double-attention training sequence is characterized in that the double-attention training sequence corresponding to an input text is generated by constructing a double-attention mechanism model facing importance of characters in the input text and correlation of corresponding slot values of the characters in the input text, and the double-attention training sequence is used for training and learning a deep learning model facing language understanding;
the double attention mechanism model is used for vector conversion of an input text, dimension conversion of a vector of the input text, association conversion of an input text matrix, and state association of importance of characters in the input text and association of corresponding slot values of the characters in the input text;
the dual attention mechanism model comprises a character-hidden state path, a slot value-query value path and a state association path;
obtaining an associated hidden state matrix of the input text through a character-hidden state path, wherein the associated hidden state matrix is used for measuring the importance of each character in the input text;
obtaining a paraphrasing matrix of the slot value sequence through a slot value-query value path, wherein the paraphrasing matrix is used for measuring the association degree of the slot value corresponding to the character in the input text;
the state association path is used for carrying out state association on the paraphrase matrix and the association hidden state matrix to obtain an association state matrix, and the association state matrix is used for generating a double-attention training sequence;
the double attention mechanism model comprises an input text layer, an Embelling layer, a Bi-lstm layer, a query value paraphrasing layer, a Bi-attention layer, a Dense layer and a Softmax function layer;
the input text layer comprises a text-vector conversion layer and a text-slot value labeling layer; the text-vector conversion layer converts the input text into an input text vector taking indexes as elements through a character-index dictionary; the text-slot value labeling layer is used for labeling the slot values of all characters in the input text and obtaining a slot value sequence corresponding to the input text;
the Embedding layer is used for preprocessing each character in an input text to obtain a word vector of each character in the input text and form a vector matrix, and mapping low latitude vectors to high latitude vectors according to dimensional requirements and form a mapping matrix of the high latitude vectors;
the Bi-LSTM layer comprises a forward LSTM and a backward LSTM; the Bilstm layer inputs the high weft vector matrix converted by the Embedding layer into the Bilstm layer, and bitwise splices the hidden vector output by the forward LSTM and the hidden state vector of the backward LSTM at each position to obtain hidden state vectors related to each other;
the query value paraphrasing layer is used for performing groove value paraphrasing on a groove value sequence corresponding to the input text, and paraphrasing contents are stored in each groove value element paraphrasing vector;
the Bi-attention layer is used for analyzing the importance of characters in an input text and the corresponding groove value association of the characters in the input text and obtaining a double-attention training sequence containing the importance of the characters and the double-attention of the groove value association;
the Dense layer is used for carrying out dimension transformation on an input vector through linear transformation and outputting a set dimension vector;
the Softmax function layer calculates weight ratio of each element in the input vector through a normalization index function, and performs classified output according to the weight ratio;
the dual attention mechanism model obtains an associated state matrix through a state associated path, and the state associated path comprises the following steps:
performing matrix multiplication on the correlation hidden state matrix and the paraphrase matrix to obtain a correlation state matrix;
performing important state vector summation on the correlation state matrix to obtain a character important state vector;
solving the importance weight value of each character in the input text in the character importance state vector through a Softmax function layer;
taking the importance weight value of each character in the input text as a vector element to obtain a character importance vector;
performing correlation state vector summation on the correlation state matrix to obtain a slot value correlation state vector;
solving the association degree weight value of each slot value and the corresponding character in the slot value association state vector through a Softmax function layer;
taking the association degree weight value of each slot value and the corresponding character as a vector element to obtain a slot value association degree vector;
and performing head-to-tail splicing on the character importance vector and the slot value association vector in the last dimension to obtain double attention vectors.
2. The method for generating a bi-attention training sequence according to claim 1, wherein the bi-attention mechanism model obtains the associative hidden state matrix through a character-hidden state path, and the character-hidden state path comprises the following steps:
converting input text into input text vectors through the input text layer
Inputting the input text vector into an Embedding layer, and converting the input text vector into an input text matrix with the same dimension as the slot value sequence through the Embedding layer;
inputting an input text matrix to a Bi-lstm layer; the Bi-LSTM layer obtains a forward hidden state vector and a backward hidden state vector of the input text matrix through a forward LSTM and a backward LSTM;
the Bi-lstm layer splices the forward hidden state vector and the backward hidden state vector according to corresponding positions of elements to obtain associated hidden state vectors which are related in the front and back;
and obtaining the associated hidden state vector of each input text vector in the input text matrix through a Bi-lstm layer, and forming an associated hidden state matrix.
3. The method as claimed in claim 1, wherein the dual attention mechanism model obtains the paraphrase matrix through a bin-query path, and the bin-query path comprises the following steps:
marking the slot value of each character in the input text to obtain a corresponding slot value sequence;
explaining each slot value element in the slot value sequence through a query value explanation layer, and storing explanation contents into each slot value element explanation vector;
and obtaining a paraphrase matrix through the slot value elements and the corresponding paraphrase vectors.
4. The method for generating a bi-attention training sequence according to claim 3, wherein the slot value labeling is performed manually and/or by machine; the paraphrase content in the paraphrase vector comprises synonymy replacement words of the slot values, the text meaning of the slot values, the association degree information of the replacement words of the slot values and the paraphrase information corresponding to the slot values, and the dimension of the paraphrase vector is in direct proportion to the content amount of the paraphrase.
5. The method of claim 1, wherein the significant state vector and the associated state vector are determined by:
setting vector elements of an input text vector as row direction elements of an associated hidden state matrix, and expanding the column direction of the associated hidden state matrix through an Embedding layer;
setting the slot value elements as row elements of the paraphrase matrix, and setting the vector elements of the slot value elements corresponding to the paraphrase vector as column elements of the paraphrase matrix;
the row direction of the correlation state matrix is a correlation state vector; the column direction of the associated state matrix is then the significant state vector.
6. The method as claimed in claim 5, wherein a dual-sequence labeling is performed before the dual-attention vector is trained and learned by a deep learning model for language understanding, the dual-sequence labeling comprises the following steps:
reducing the dimension of the double attention vectors into an initial dimension through a Dense layer, wherein the initial dimension is the same as the groove value sequence;
labeling the double attention vectors by an IOB2 method, and obtaining an original labeling sequence;
removing the labeling prefix of the original labeling sequence, and dividing the original labeling sequence without the prefix into a starting labeling sequence and an ending labeling sequence;
the initial labeling sequence is used for recording initial labels of the same type of labels, and only the first appearing label is reserved for a plurality of continuously appearing labels of the same type;
the end labeling sequence is used for recording end labels of the same type of label, and only the last label is reserved for a plurality of labels of the same type which continuously appear.
7. A dual attention training sequence generating apparatus, characterized in that the apparatus comprises at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the dual attention training sequence generation method of any one of claims 1 to 6.
8. A dual attention training sequence generation computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements a dual attention training sequence generation method as claimed in any one of claims 1 to 6.
CN202110646605.8A 2021-06-10 2021-06-10 Method and device for generating double attention training sequence and readable storage medium Active CN113378925B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110646605.8A CN113378925B (en) 2021-06-10 2021-06-10 Method and device for generating double attention training sequence and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110646605.8A CN113378925B (en) 2021-06-10 2021-06-10 Method and device for generating double attention training sequence and readable storage medium

Publications (2)

Publication Number Publication Date
CN113378925A CN113378925A (en) 2021-09-10
CN113378925B true CN113378925B (en) 2022-09-20

Family

ID=77573419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110646605.8A Active CN113378925B (en) 2021-06-10 2021-06-10 Method and device for generating double attention training sequence and readable storage medium

Country Status (1)

Country Link
CN (1) CN113378925B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710704A (en) * 2018-05-28 2018-10-26 出门问问信息科技有限公司 Determination method, apparatus, electronic equipment and the storage medium of dialogue state
CN111985229A (en) * 2019-05-21 2020-11-24 腾讯科技(深圳)有限公司 Sequence labeling method and device and computer equipment
CN112115714A (en) * 2020-09-25 2020-12-22 平安国际智慧城市科技股份有限公司 Deep learning sequence labeling method and device and computer readable storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI648515B (en) * 2013-11-15 2019-01-21 美商克萊譚克公司 Measurement targets and their measurement, target design files, measurement methods and computer-based equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710704A (en) * 2018-05-28 2018-10-26 出门问问信息科技有限公司 Determination method, apparatus, electronic equipment and the storage medium of dialogue state
CN111985229A (en) * 2019-05-21 2020-11-24 腾讯科技(深圳)有限公司 Sequence labeling method and device and computer equipment
CN112115714A (en) * 2020-09-25 2020-12-22 平安国际智慧城市科技股份有限公司 Deep learning sequence labeling method and device and computer readable storage medium

Also Published As

Publication number Publication date
CN113378925A (en) 2021-09-10

Similar Documents

Publication Publication Date Title
CN110598203B (en) Method and device for extracting entity information of military design document combined with dictionary
CN109062893B (en) Commodity name identification method based on full-text attention mechanism
CN111985239B (en) Entity identification method, entity identification device, electronic equipment and storage medium
CN111666427B (en) Entity relationship joint extraction method, device, equipment and medium
CN111488931B (en) Article quality evaluation method, article recommendation method and corresponding devices
CN114580424B (en) Labeling method and device for named entity identification of legal document
CN112183083A (en) Abstract automatic generation method and device, electronic equipment and storage medium
CN112100384B (en) Data viewpoint extraction method, device, equipment and storage medium
CN115062134B (en) Knowledge question-answering model training and knowledge question-answering method, device and computer equipment
CN111738169A (en) Handwriting formula recognition method based on end-to-end network model
CN112800239A (en) Intention recognition model training method, intention recognition method and device
CN109388805A (en) A kind of industrial and commercial analysis on altered project method extracted based on entity
CN114780723B (en) Portrayal generation method, system and medium based on guide network text classification
CN109446523A (en) Entity attribute extraction model based on BiLSTM and condition random field
CN116245097A (en) Method for training entity recognition model, entity recognition method and corresponding device
CN115238691A (en) Knowledge fusion based embedded multi-intention recognition and slot filling model
CN111209362A (en) Address data analysis method based on deep learning
CN112989811B (en) History book reading auxiliary system based on BiLSTM-CRF and control method thereof
CN113378925B (en) Method and device for generating double attention training sequence and readable storage medium
CN111199152A (en) Named entity identification method based on label attention mechanism
CN114444488B (en) Few-sample machine reading understanding method, system, equipment and storage medium
CN116306653A (en) Regularized domain knowledge-aided named entity recognition method
CN113449528B (en) Address element extraction method and device, computer equipment and storage medium
CN115934883A (en) Entity relation joint extraction method based on semantic enhancement and multi-feature fusion
CN116028888A (en) Automatic problem solving method for plane geometry mathematics problem

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant