CN116910190A

CN116910190A - Method, device and equipment for acquiring multi-task perception model and readable storage medium

Info

Publication number: CN116910190A
Application number: CN202310261734.4A
Authority: CN
Inventors: 陈思; 黄毅; 冯俊兰; 邓超
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Ltd Research Institute
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Ltd Research Institute
Priority date: 2023-03-17
Filing date: 2023-03-17
Publication date: 2023-10-20

Abstract

The application discloses a method, a device, equipment and a readable storage medium for acquiring a multi-task perception model, wherein the method comprises the following steps: inputting semantic vectors of the dialogue text into a Bi-directional long-short-term memory Bi-LSTM model to obtain a latent semantic vector representation; acquiring an intention slot vector representation of the dialogue text according to the latent semantic vector representation; acquiring a coding representation of the intended slot according to the latent semantic vector representation and the intended slot vector representation; performing intention recognition and slot filling according to the code representation of the intention slot; and acquiring a multi-task perception model based on the loss function of intention recognition and slot filling as a training target. According to the scheme, the multi-purpose recognition and slot filling multi-task combined training mode is adopted, the purpose of interactive association with the slot is assisted, the context information of the two tasks is fused with each other, and the information interaction between the two tasks ensures the sufficient transfer of the information.

Description

Method, device and equipment for acquiring multi-task perception model and readable storage medium

Technical Field

The application belongs to the technical field of wireless communication, and particularly relates to a method, a device, equipment and a readable storage medium for acquiring a multi-task perception model.

Background

From the perspective of a parameter sharing mechanism, the most classical of the existing multitasking joint training method is a parameter sharing model. As shown in fig. 1, multiple tasks share the same set of parameters during feature extraction at the bottom layer, while the top layer at the back consists of output layers unique to each task. The information interaction between tasks of the parameter sharing multitask training mode is limited to a feature extraction layer shared by a bottom layer, and no additional information interaction between tasks exists, so that the problems of insufficient information transmission or low efficiency are caused.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a readable storage medium for acquiring a multi-task perception model, which can solve the problems of insufficient information transmission or lower efficiency caused by information interaction among tasks in a parameter sharing multi-task training mode, wherein the information interaction among the tasks is only limited to a feature extraction layer shared by a bottom layer, and no additional information interaction among the tasks exists.

In order to solve the above technical problems, an embodiment of the present application provides a method for acquiring a multi-task perception model, including:

inputting semantic vectors of the dialogue text into a Bi-directional long-short-term memory Bi-LSTM model to obtain a latent semantic vector representation;

Acquiring an intention slot vector representation of the dialogue text according to the latent semantic vector representation;

acquiring a coding representation of the intended slot according to the latent semantic vector representation and the intended slot vector representation;

performing intention recognition and slot filling according to the code representation of the intention slot;

and acquiring a multi-task perception model based on the loss function of intention recognition and slot filling as a training target.

Optionally, the inputting the semantic vector of the dialogue text into the Bi-directional long-short term memory Bi-LSTM model to obtain the lingering semantic vector representation includes:

inputting the semantic vector of the dialogue text into a forward Bi-LSTM model to obtain a first state of the semantic vector;

inputting the semantic vector of the dialogue text into the reverse Bi-LSTM model to obtain a second state of the semantic vector;

and determining the latent semantic vector representation after being encoded by the Bi-LSTM model according to the first state and the second state.

Optionally, the obtaining the intended slot vector representation of the dialog text according to the latent semantic vector representation includes:

obtaining weight coefficients of a first semantic vector and other semantic vectors according to dot products of a query vector of the first semantic vector in the latent semantic vector representation and key vectors of each semantic vector in the latent semantic vector representation;

Carrying out weighted summation on the weight coefficient and a value vector of each semantic vector in the latent semantic vector representation to obtain an intention slot vector representation corresponding to the dialogue text sequence;

wherein the first semantic vector is any semantic vector in the lingo-semantic-vector representation.

Optionally, the obtaining the encoded representation of the intended slot according to the linger vector representation and the intended slot vector representation includes:

acquiring an intention distribution matrix and a slot position distribution matrix according to the lingering semantic vector representation and the intention slot position vector representation;

constructing a graph network with intention and slot sites according to the intention distribution matrix, the slot position distribution matrix and the intention slot position vector representation;

according to the graph network, a coded representation of the intended slot is obtained.

Optionally, the obtaining an intention distribution matrix and a slot distribution matrix according to the lingering semantic vector representation and the intention slot vector representation includes:

carrying out convolution iteration on the latent semantic vector representation and the intention slot vector representation through a convolution network to obtain node characterization;

and calculating the node characterization through a normalized exponential function to respectively obtain an intention distribution matrix and a slot position distribution matrix.

Optionally, the performing intention recognition and slot filling according to the coded representation of the intention slot includes:

performing intention recognition according to the coding representation of the intention slot to obtain intention classification;

and carrying out intention recognition according to the code representation of the intention slot to obtain a slot label.

Optionally, the acquiring the multi-task perception model based on the loss function of the intent recognition and the slot filling as a training target includes:

performing cross entropy on the intention classification obtained by intention recognition and the reference intention of the dialogue text to obtain a first numerical value;

cross entropy is carried out on the slot label obtained by filling the slot and the reference slot of the dialogue text, so that a second numerical value is obtained;

acquiring an intended slot loss function based on the first value and the second value;

training based on the intended slot position loss function to obtain a multi-task perception model.

The embodiment of the application also provides a device for acquiring the multi-task perception model, which comprises the following steps:

the first acquisition module is used for inputting semantic vectors of the dialogue text into the Bi-directional long-short-term memory Bi-LSTM model to obtain a latent semantic vector representation;

the second acquisition module is used for acquiring the intention slot position vector representation of the dialogue text according to the latent semantic vector representation;

The third acquisition module is used for acquiring the coding representation of the intention slot according to the linger semantic vector representation and the intention slot vector representation;

the processing module is used for carrying out intention recognition and slot filling according to the coding representation of the intention slot;

and the fourth acquisition module is used for acquiring the multi-task perception model based on the loss function of the intention recognition and the slot filling as a training target.

The embodiment of the application also provides a device for acquiring the multi-task perception model, which comprises a transceiver and a processor;

the processor is configured to: inputting semantic vectors of the dialogue text into a Bi-directional long-short-term memory Bi-LSTM model to obtain a latent semantic vector representation;

The embodiment of the application also provides a device for acquiring the multi-task perception model, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the steps of the method for acquiring the multi-task perception model when executing the program.

The embodiment of the present application also provides a readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above-described method.

The embodiment of the application can achieve at least the following beneficial effects:

1. the multi-task combined training mode of multi-intention recognition and slot filling is adopted to help the intention to be interactively associated with the slot, and the context information of the two tasks are mutually fused, so that the information interaction between the two tasks ensures the full transmission of the information;

2. by adopting the Bi-LSTM model of the self-attention mechanism, the training process is adjusted, the context information transmission of which the intention is identified and the slot filling are independent is realized, the semantic relation between two tasks is deeply mined and transmitted, and the more accurate multi-task perception model can be obtained.

Drawings

FIG. 1 is a schematic diagram of a parameter sharing mechanism;

FIG. 2 is a flow chart of a method for acquiring a multi-task perception model according to an embodiment of the present application;

FIG. 3 is a block diagram of Bi-LSTM;

FIG. 4 is a diagram of the overall architecture of the model;

FIG. 5 is a schematic block diagram of a multi-task perception model acquisition apparatus according to an embodiment of the present application;

fig. 6 shows a block diagram of a multitasking perceptual model acquisition device of an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

The related art related to the present application will be described below.

The method of multi-task joint training refers to improving generalization and performance of a model by weighing training information in a plurality of related tasks. The method adopted is that a plurality of tasks share partial parameters at the bottom layer of the model, and parameters unique to the tasks are used at a specific task layer (usually the top layer), so that the multi-task combined training can achieve the aim of realizing a plurality of subtasks through one model. For example, for a subtask, the criteria for judgment include three aspects: whether the plurality of intents in the dialogue sentence are identified correctly; whether the slot value boundary is correct; whether the slot type is predicted correctly. In general, the multi-task joint training selects a plurality of tasks belonging to the same training type but different training targets for joint learning (if both the intention recognition and the behavior prediction belong to classification tasks, the same modeling process can be selected), and the modeling process is relatively easy. Different types of multitasking training tasks are more complex than the same type of multitasking training, and the recognition of slot value boundaries is more difficult than classification tasks (multi-intent recognition and slot type prediction).

Intent recognition and slot filling are two core subtasks of dialog system natural language understanding, intended to recognize user intent and extract relevant slot information from dialog sentences. Intent recognition can analyze the actual user needs in a dialogue sentence, and is generally regarded as a text classification task. Early text classification adopts a classical support vector machine (Support Vector Machine, SVM) model, and as deep learning goes deep, a convolutional neural network is adopted to extract characteristics of the text. Due to the sequence properties of the text, recurrent neural networks are proposed to better model the sequence features of the text. Later with the proposal of the attention model, the attention mechanism can focus on words in the input sequence that are critical to the current task and ignore words that are as irrelevant to the current task as possible. Advances in technology have led to higher levels of accuracy for this task because an erroneous intent recognition result will lead to the end of the entire dialog process. The public data set Mobile Customer-Service (Mobile CS) in the human dialogue Customer Service field comes from multiple rounds of spoken dialogues between people, and single rounds of dialogues often contain multiple intents, so that the subtask (intention recognition) research scope of the application is multi-intention recognition.

The slot filling refers to the process of marking each word or word in the user statement, so that the task type dialogue system can be helped to clearly define the user instruction, the key slot information contained in the user statement is identified, and the slot filling is generally regarded as a sequence marking task. The existing method mainly comprises the following steps: rule-based methods and statistical-based methods. In the early stage, most of methods based on rules are selected, and modes such as rule templates, word list exhaustion, search history matching and the like are adopted, so that when the extraction mode can accurately reflect language rules, the performance of the method based on the rules is superior to that of a method based on statistics. However, the rule is often dependent on a specific language scene, and when the scene is more complex or the language is more various, the rule preparation process is more time-consuming and labor-consuming, and the recognition performance is also reduced. Therefore, the rule-based method is mainly applied to small-range scene data which are easy to comb and induce in some rules. And based on a statistical method, a machine learning model is adopted in the early stage, and the performance of the task starts to be greatly improved as a deep learning technology starts to be gradually applied to a slot filling task. The slot positions can be marked by BIO codes, B-X represents the starting position of a certain slot position, I-X represents the middle position of a certain slot position, and O represents the position of a non-slot position; x represents a slot type, such as a traffic packet, a main package, etc. Representative methods are: support vector machines, hidden markov, neural networks, etc.

The intention recognition and slot filling are closely related, and because both tasks feature-model the same text conversation, and semantic information of one task often has a guiding effect on the other task, for example, when recognizing that a user intends to "help-query", slot information about a traffic package in the conversation text is focused on, so that the intention recognition and slot filling tasks are often jointly modeled. From the perspective of information transmission, a part of existing researches apply an intention recognition task to a slot filling task, and the recognition of the slot is restrained by using the recognized intention, so that the unidirectional influence of the intention recognition task on the slot filling task is realized; another part of research focuses on the alternate realization of intent-to-slot or slot-to-intent information transfer in an iterative fashion, where only intent information can be transferred to slot filling or slot information can be transferred to an intent recognition task in one model iteration, so that the models have the problem of insufficient or inefficient information transfer between subtasks, thereby affecting the overall performance of the model.

The existing intention recognition and slot filling task joint training method mainly has the following problems:

1. The existing parameter sharing multi-task combined training mode is used, information interaction among tasks is limited to a feature extraction layer shared by a bottom layer, and information interaction among additional tasks is lacked, so that the problems of insufficient information transmission and unreasonable utilization of task interaction information are caused.

2. Research focuses on applying an intention recognition task to a slot filling task, and utilizes the recognized intention to restrain the recognition of the slot, so that only the unidirectional influence of the intention recognition task on the slot filling task is realized, and the information bidirectional interaction process between the tasks cannot be realized.

3. Through the mode of interaction between tasks in an iterative mode, unidirectional information transfer from intention to slot position or from slot position to intention can be realized in turn, so that information can be transferred in one direction only in one model iteration, and the training efficiency is lower.

4. The existing intention recognition and slot filling combined training mode solves the problem of single intention recognition by multi-focusing, however, different intention labels can be randomly and simultaneously appeared in a sentence in the personal dialogue customer service field data based on a spoken language communication platform, and the fact that one sentence only contains one intention label cannot be guaranteed. And therefore cannot be limited to solving the simpler single-intent tag identification problem.

The method, the device, the equipment and the readable storage medium for acquiring the multi-task perception model provided by the embodiment of the application are described in detail through specific embodiments and application scenes thereof with reference to the accompanying drawings.

As shown in fig. 2, at least one embodiment of the present application provides a method for acquiring a multitasking perceptual model, including:

step 201, inputting semantic vectors of a dialogue text into a Bi-directional long-short term memory Bi-LSTM model to obtain a latent semantic vector representation;

it should be noted that, the dialogue text mentioned in the embodiment of the present application refers to a sentence set of a participant dialogue party for performing model training, which may be a section of sentences of the participant dialogue party, for example, the participant dialogue party includes a user a and a user B, and the dialogue text in the embodiment of the present application may be a section of continuous sentences spoken by the user a.

Step 202, according to the latent meaning vector representation, obtaining an intention slot position vector table of the dialogue text

Step 203, obtaining a coding representation of the intended slot according to the latent semantic vector representation and the intended slot vector representation;

step 204, performing intention recognition and slot filling according to the code representation of the intention slot;

Step 205, obtaining a multi-task perception model based on the intent recognition and the loss function of slot filling as training targets.

In the embodiment of the application, the multi-intention recognition and slot filling multi-task combined training mode is adopted to help the intention and the slot to be interactively associated, and the context information of the two tasks is mutually fused, so that the information between the two tasks is fully interacted, and the full transmission of the information is ensured.

Optionally, in an embodiment of the present application, a specific implementation manner of inputting the semantic vector of the dialog text into the Bi-directional long-short term memory Bi-LSTM model to obtain the latent semantic vector representation includes:

step a1, inputting a semantic vector of a dialogue text into a forward Bi-LSTM model to obtain a first state of the semantic vector;

it should be noted that, the step is to input the semantic vector into the forward Bi-LSTM model in a positive sequence manner to perform feature extraction, so as to obtain the first state of the semantic vector.

Step a2, inputting the semantic vector of the dialogue text into a reverse Bi-LSTM model to obtain a second state of the semantic vector;

it should be noted that, in this step, the semantic vector is input into the reverse Bi-LSTM model in an inverted order to perform feature extraction, so as to obtain the second state of the semantic vector.

And a3, determining the expression of the latent semantic vector after being coded by the Bi-LSTM model according to the first state and the second state.

And splicing the first state of the semantic vector obtained after feature extraction with the second state of the semantic vector, and taking the formed word vector as a latent semantic vector representation.

The principle of the implementation of this process is illustrated below.

The input of the Bi-LSTM model is the emadding representation { X ] of dialog text X ₁ ,x ₂ ,...,x _n X, where x _i A token, i.e., a word vector, for the i-th position; embedding represents a digitized word representation, which can be understood as a fixed-size word vector.

To obtain an emailing representation based on the context information of the user dialog text, the Bi-LSTM model is first selected. As the LSTM improves the cyclic neural network model, the model is added with a gating mechanism, and the problem of long-distance dialogue text dependence can be solved. Specifically, a cell unit of LSTM includes three gates, a forget gate, an input gate, and an output gate. When X is given to dialog text X _i Hidden layer state h of position i-1 before the sequence _i-1 And the memory cell states are respectively denoted as h _i-1 And c _i-1 Hidden layering of ith position of dialog text The state is expressed as h _i The calculation process is as follows:

F(x _i ,h _i-1 )＝sigmoid([h _i-1 ,x _i ]W _f +b _f )；

K(x _i ,h _i-1 )＝sigmoid([h _i-1 ,x _i ]W _k +b _k )；

S(x _i ,h _i-1 )＝sigmoid([h _i-1 ,x _i ]W _s +b _s )；

wherein F (x) _i ,h _i-1 ) Representing a forgetting gate output function; k (x) _i ,h _i-1 ) Representing an input gate output function; s (x) _i ,h _i-1 ) Representing an output gate output function; c (x) _i ,h _i-1 ,c _i-1 ) Representation for sequence x _i A function of updating the state of the memory cells; h is a _i-1 A hidden layer state representing the i-1 th position; h is a _i A hidden layer state representing an i-th position; w (W) _f Representing weight coefficients for forgetting gates; b _f Representing a bias value for a forget gate; w (W) _k Representing a weight coefficient for an input gate; b _k Representing a bias value for the input gate; w (W) _s Representing a weight coefficient for the output gate; b _s Representing a bias value for the output gate; c _i-1 Representing the sequence x _i Memory cell state of the previous position i-1; w (W) _c Representing a weight coefficient for a memory cell state; b _c Representing a bias value for the memory cell state; sigmoid () represents the activation function of a neural network element;representing dot product, representing the same position of two same-order vectorsThe elements are multiplied.

First, for the input sequence { x } ₁ ,x ₂ ,...,x _n Each position i on the sequence has LSTM cells to learn from the positive and negative directions respectively, if the state of the hidden layer output by the LSTM cell in the positive direction of the i position isReverse LSTM cell hidden layer state is +. >

Wherein, the liquid crystal display device comprises a liquid crystal display device,the hidden layer state output by the LSTM unit in the forward direction of the i-1 position is represented; />And the hidden layer state of the LSTM unit output in the reverse direction of the i-1 position is shown.

Then, the outputs of the LSTM units in the positive and negative directions are combined to obtain each semantic vector x _i The hidden layer state after Bi-LSTM encoding, i.e., the latent semantic vector representation, is as follows:

H＝{h ₁ ,...,h _i ,...,h _n }；

wherein H represents a latent semantic vector representation obtained after Bi-LSTM encoding; h is a _i Representing the passage through Bi-LSTAnd obtaining the hidden layer state of the ith position after M coding.

The specific structure of the Bi-LSTM encoder is shown in fig. 3.

The semantic vector of the dialogue text is input into the Bi-LSTM model in a positive sequence mode and a reverse sequence mode respectively for feature extraction, and word vectors formed by splicing the extracted feature vectors are determined to be the latent semantic vector representation of the dialogue text. This way, the feature data obtained at the present moment can have both past and future information. It is noted that the 2 LSTM neural network parameters in Bi-LSTM are independent of each other, and they only share word-casting word vector lists.

Optionally, in an embodiment of the present application, the specific implementation manner of obtaining the intended slot vector representation of the dialog text according to the latent semantic vector representation includes:

Step b1, obtaining weight coefficients of a first semantic vector and other semantic vectors according to dot products of a query vector of the first semantic vector in the latent semantic vector representation and key vectors of each semantic vector in the latent semantic vector representation;

it should be noted that, the first semantic vector is any semantic vector in the latent semantic vector representation; in this step, each semantic vector in the latent semantic vector representation needs to be processed identically to obtain the weight coefficient of the semantic vector and other semantic vectors.

Step b2, carrying out weighted summation on the weight coefficient and a value vector of each semantic vector in the latent semantic vector representation to obtain an intention slot vector representation corresponding to the dialogue text sequence;

it should be noted that, the process mainly uses a self-attention mechanism to obtain context information of an indefinite length sequence, i.e. an intended slot vector representation. Specifically, the self-attention mechanism focuses on words in the input sequence, which play a key role in the current decoding task, ignores words irrelevant to the current decoding task as much as possible, gives different weights to the words in the input sequence according to the current decoding task, and obtains a final attention value through weighted summation operation, thereby obtaining a final feature expression of the sequence, namely, the intended slot vector expression.

The implementation principle of this process is exemplified below.

Attention and self-attention mechanisms: the mechanism improves the performance of the model by increasing the weight of important information related to the task and further reducing the attention degree of the model to redundant or noise data, generally comprises the steps of data input, attention weight calculation, data information weighting and the like, for example, a certain word in a given sentence sequence is used as a Query (Query), the attention weight is calculated by calculating the similarity with a keyword (Key), and the values (Value) of all the words are weighted and summed according to the weight to obtain the final sentence semantic representation. The self-attention mechanism (self-attention) is a variation developed on the attention mechanism, focuses more on the dependency or interaction inside the data features, and is widely applied to a pre-training model such as Transformer, BERT and the like and natural language processing of multiple downstream tasks in recent years to obtain better effects. And a Self Attention (SA) mechanism is used to calculate the degree of correlation of each word in the input text sequence with all the words in the sequence, helping the model to capture the contextual semantic links of the words at different positions in the sequence. Because Self Attention calculates the correlation degree between words in the whole sequence, the problem of long-distance text memory loss can be solved, and each word can be carried out simultaneously without any sequence when Attention is calculated, so that the model calculation efficiency is greatly improved.

Specifically, given a semantic vector representation of dialog textWherein n represents the sequence length, m represents the word vector dimension, h _i The word vector representing the ith word in the sequence, namely, obtaining the hidden layer state of the ith position after Bi-LSTM encoding (which can be understood as meaning that the semantic vector of the ith position is obtained after Bi-LSTM encoding); then the query vector q is obtained through different linear transformations _i Key vector k _i Sum vector v _i Other word vectors and so on. The query vector and the key vector are used to represent the attention that the query term should assign to other key terms. The self-attention calculation procedure is to calculate the query vector q of a word _i And respective key vector k _i And a weight coefficient is obtained that represents the degree of correlation between the word and other words. The magnitude of the weight coefficient reflects the degree of similarity of two words in the multidimensional space, the larger the weight coefficient is, the more similar. The weight coefficient is then combined with a value vector v _i The weighted summation is performed here to ensure that attention is focused on the current word, avoid interference of unwanted words, and to derive a vector representation of each word based on the attention weight. d, d _k Representing key vector k _i Is divided by +. >The purpose of (2) is to avoid the gradient vanishing. The final vector represents c _i The calculation formula of (2) is as follows:

M＝{m ₁ ,...,m _i ,...,m _n }；

q _i ＝x _i ·W ^Q ；

k _j ＝x _j ·W ^K ；

v _j ＝x _j ·W ^V ；

wherein W is ^Q 、W ^K And W is ^V Respectively q _i ,k _i And v _i A trainable weight matrix having the same dimensions mxd; c represents the intended slot vector representation corresponding to the dialog text sequence; c _i An intention slot vector representation corresponding to the semantic vector representing the i-th position; q _i A query vector representing a semantic vector for the i-th location; k (k) _j A key vector representing a semantic vector of a j-th position; v _j A value vector representing the semantic vector of the j-th position.

Finally, the outputs of the Bi-LSTM and Self attribute networks are spliced to obtain E, which is used as an input for obtaining the coding representation of the intended slot, and the method is specifically as follows:

optionally, in an embodiment of the present application, a specific implementation manner of obtaining the coded representation of the intended slot according to the latent semantic vector representation and the intended slot vector representation includes:

step c1, acquiring an intention distribution matrix and a slot position distribution matrix according to the latent meaning vector representation and the intention slot position vector representation;

step c2, constructing a graph network with intention and slot sites according to the intention distribution matrix, the slot position distribution matrix and the intention slot position vector representation;

It should be noted that there are three main types of edges of the graph network: intent and intent, slot and slot, and slot and intent. Wherein intent-to-intent connection refers to all intent nodes being interconnected, modeling the relationship between each intent; slot-to-slot connections refer to each slot connecting other slots having co-occurrence relationships to further combine context information and slot dependency modeling; intent to slot connection refers to intent and slot height correlation, each slot connecting all predicted multiple intents to enhance the interaction between intent and slot.

And c3, acquiring the coding representation of the intended slot according to the graph network.

Optionally, in an embodiment of the present application, the obtaining an intention distribution matrix and a specific implementation of the slot distribution matrix according to the lingering semantic vector representation and the intention slot vector representation includes:

step d1, carrying out convolution iteration on the latent meaning vector representation and the intention slot position vector representation through a convolution network to obtain node characterization;

and d2, calculating the node characterization through a normalized exponential function, and respectively obtaining an intention distribution matrix and a slot position distribution matrix.

In order to fully consider the dependency relationship in the statement, the present application introduces a graph rolling network (Graph Convolutional Network, GCN). Firstly, analyzing the latent meaning vector representation and the intention slot position vector representation through StanfordNLP parser to obtain the dependency relationship of a sentence, and constructing an adjacent matrix Where k represents the total number of nodes, A _ij =1 means that node i is connected to node j. The application adopts a l-layer convolution network, and the final node characterization is obtained through l convolution iterations, and the specific formula is as follows:

G ⁽⁰⁾ ＝{g ₁ ,...,g _i ,...,g _n }＝E；

wherein, the liquid crystal display device comprises a liquid crystal display device,representing node characterization after i-th node iteration; w (W) ^l A weight matrix representing the first time; />Representing node characterization after the j node iterates l-1 times; b ^l The offset value for the first time is indicated.

G obtained by the above process ^(k) (G ^(k) Is composed ofConstructed graph network) through normalized index (softmax) function calculation, the distribution matrix Y of the intention and the slot position can be obtained ^I And Y ^S The calculation process is as follows:

wherein y is ^I Representing an intent distribution matrix; y is ^S Representing a slot position distribution matrix; w (W) ^I A weight matrix representing the intent distribution; w (W) ^S A weight matrix representing the slot distribution.

Further, according to the intent distribution matrix, the slot distribution matrix and the corresponding intent slot vector representation, a graph network with intent and slot sites can be constructed: the characterization of the nodes is taken from the intent distribution matrix and the slot position distribution matrix, the nodes are fully connected, and the attention mechanism of the graph attention network (Graph Attention Network, GAT) is adopted:

Wherein a represents a parameter matrix; n (N) _i Representing a set of contiguous nodes centered on an inode; y is _i Represent the first

′

i node tokens (i.e., tokens of the ith intent or slot); y is _i Representing y _i Aggregating updated representations of adjacent node information through an attention mechanism; alpha _ij Representing an intermediate quantity; w (W) _y Representing y _i Weight coefficient of (2); y is _j Representing a characterization of the jth node (i.e., a characterization of the jth intent or slot);representing y _i And y _j Is a weight of attention of (2); j' represents N _i Is included in the network.

Optionally, in an embodiment of the present application, the implementation of intent recognition and slot filling according to the encoded representation of the intent slot includes:

step e1, carrying out intention recognition according to the coding representation of the intention slot to obtain intention classification;

and e2, carrying out intention recognition according to the code representation of the intention slot position to obtain a slot position label.

It should be noted that the following two tasks are completed simultaneously in this process: the method comprises the steps of firstly, intention recognition, and classifying the category of the intention; and secondly, filling the slot, and outputting a slot label sequence based on the coding representation of the intended slot.

Optionally, in an embodiment of the present application, the obtaining a specific implementation of the multi-task perception model based on the loss function of intent recognition and slot filling as a training target includes:

Step f1, performing cross entropy on intention classification obtained by intention recognition and reference intention of the dialogue text to obtain a first numerical value;

it should be noted that, the loss function of the task for identifying the intention is expressed in the form of cross entropy, and the calculation process is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing a first value; />Representing reference intention of the dialogue text, namely real intention corresponding to the dialogue text; y is ^j,I′ Meaning intent classification of dialog text.

Step f2, cross entropy is carried out on the slot label obtained by filling the slot and the reference slot of the dialogue text, so as to obtain a second numerical value;

it should be noted that, the loss function of the slot filling task is expressed in the form of cross entropy, and the calculation process is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing a second value; y is ^j,S′ A slot label representing dialog text; />And representing the reference slot of the dialogue text, namely the real slot corresponding to the dialogue text.

Step f3, acquiring an intended slot loss function based on the first numerical value and the second numerical value;

specifically, the acquisition mode of the intended slot loss function is shown in the following formula:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing an intended slot loss function; alpha represents a super parameter; θ represents all parameter sets.

And f4, training based on the intended slot position loss function to obtain a multi-task perception model.

It should be noted that, the intended slot loss function in the embodiment of the present application adds the loss functions of the two subtasks to obtain a joint loss function, and performs continuous training of the multi-task perception model based on the joint loss function, so as to obtain a multi-task perception model converged by the joint loss function.

In summary, the application provides a multi-task perception model combining intention recognition and slot filling, which has an interactive perception coding and task collaborative graph interaction module and can perform multi-task training of multi-label classification and sequence labeling at the same time. In order to solve the problem of multi-intention correlation, the application proposes to use a drawing force network framework to bidirectionally correlate intention with slot positions, enhance the connection between intention recognition and slot position filling, and realize the mutual promotion of the multi-task in the model training process. In addition, sentences from different sentences have different sentence grammar structures, and the characteristic is fully utilized to help the model capture interactive information of the intention slots in multiple scenes. Specifically, the model models intent by using dependency relationships among sentences, thereby realizing multi-intent knowledge migration at the sentence level.

Specifically, the overall architecture diagram of the multitasking model is shown in FIG. 4. The multitasking perceptual model consists of three parts: the first part is an interactive perception coding module, the part inputs an initial semantic vector into a Bi-LSTM model to obtain a latent semantic vector representation, and then the context-dependent intention slot vector representation, namely fine-granularity semantic information, is fully obtained through a Self attribute mechanism; the second part is to make full use of graph structure information such as dependency relationship to perform task cooperative interaction by a graph attention network-based multitask cooperative graph interaction module to obtain the coding representation of the intention slot; the third part is a decoder, which realizes two functions, namely, the intention vector is subjected to softmax to complete intention classification and realize the intention recognition function, and the slot vector is subjected to softmax to output a slot sequence and realize the slot filling function. The model finally merges the loss functions of the intent recognition task and the slot filling task as a joint training loss function.

The interactive perception coding module is a Bi-LSTM model combined with a self-attention mechanism. The basic model is divided into 2 independent LSTM, the input sequence is respectively input into 2 LSTM neural networks in positive sequence and reverse sequence for feature extraction, and word vectors formed after the extracted feature vectors are spliced are used as final feature expression of the sequence. The multi-task collaborative graph interaction module is a slot intention perception interaction network structure. The process connects all predicted N intents with M slots, constructs a graph network containing N+M nodes, and obtains the node representation of the final intents and slots through repeated iterative updating. The edges of the network structure are mainly three types: intent and intent, slot and slot, and slot and intent. Wherein intent-to-intent connection refers to all intent nodes being interconnected, modeling the relationship between each intent; slot-to-slot connections refer to each slot connecting other slots having co-occurrence relationships to further combine context information and slot dependency modeling; intent to slot connection refers to intent and slot height correlation, each slot connecting all predicted multiple intents to enhance the interaction between intent and slot.

The application combines the two steps together, which is called joint training. The model obtained after the joint optimization comprises parameters thereof, namely a joint training model which integrates two tasks of multi-purpose recognition and slot filling.

In summary, the following problems can be solved by the embodiments of the present application:

1. for the problems of insufficient information transmission and unreasonable utilization of interaction information, the multi-purpose recognition and slot filling multi-task combined training mode is adopted, the interaction correlation between the intention and the slot is assisted, and the performance of the model is further improved.

2. And capturing the relation among the entities by utilizing the supplemental content such as text description information, dependency relationship, graph convolution structural information and the like, and performing supplemental coding on entity slots by combining with an intention recognition task, so as to reasonably utilize multidimensional information.

3. The problem that bidirectional association cannot be established for many times during iteration is solved through the graph attention network model, the relation between intention recognition and slot filling is enhanced, and the interaction of multitasking in the model training process is realized.

4. The method used by the proposal can more efficiently capture the dependence of longer distance, and truly utilizes the good dual-direction context information. The training process is adjusted by combining the Bi-LSTM model of the self-attention mechanism, so that the problem that the model is difficult to train well when the dialog text of the human is long, the intention is multiple, and the frequency of multiple slots is high at present is solved.

Further, the method for acquiring the multi-task perception model provided by the embodiment of the application can achieve the following effects:

1. the problem that bidirectional association cannot be established for many times during iteration is solved by using the graph attention network model, bidirectional information interaction between intention and slot positions is established, connection between intention recognition and slot position filling is enhanced, and performance of respective tasks is mutually improved.

2. The interactive perception coding module adopts a Bi-LSTM model of a self-attention mechanism to adjust a training process, so that the context information transmission of which the intention is identified and the slot filling are independent is realized, and semantic connection between two tasks is deeply mined and transmitted.

3. And taking the text information, the dependency relationship among the entities, the structured graph convolution and other information as supplementary content, capturing the relation among the entities, carrying out supplementary coding on the entity slot positions by combining the intention recognition task, and improving the performance of the whole model by utilizing the multidimensional information.

4. The multi-task collaborative graph interaction module adopts a multi-task joint training mode of multi-intention recognition and slot filling to help the intention to be in interactive association with the slot, and the context information of the two tasks is mutually fused.

As shown in fig. 5, at least one embodiment of the present application further provides a multi-task perception model obtaining apparatus 500, including:

The first obtaining module 501 is configured to input a semantic vector of a dialog text into the Bi-directional long-short term memory Bi-LSTM model to obtain a latent semantic vector representation;

a second obtaining module 502, configured to obtain an intended slot vector representation of the dialog text according to the latent semantic vector representation;

a third obtaining module 503, configured to obtain a coded representation of the intended slot according to the latent semantic vector representation and the intended slot vector representation;

a processing module 504, configured to perform intent recognition and slot filling according to the encoded representation of the intent slots;

a fourth obtaining module 505, configured to obtain a multitasking perception model based on the intent recognition and the loss function of slot filling as training targets.

Optionally, the first obtaining module 501 includes:

the first acquisition unit is used for inputting the semantic vector of the dialogue text into the forward Bi-LSTM model to obtain a first state of the semantic vector;

the second acquisition unit is used for inputting the semantic vector of the dialogue text into the reverse Bi-LSTM model to obtain a second state of the semantic vector;

and the determining unit is used for determining the latent semantic vector representation after being encoded by the Bi-LSTM model according to the first state and the second state.

Optionally, the second obtaining module 502 includes:

the third acquisition unit is used for acquiring the weight coefficients of the first semantic vector and other semantic vectors according to the dot product of the query vector of the first semantic vector in the latent semantic vector representation and the key vector of each semantic vector in the latent semantic vector representation;

the fourth obtaining unit is used for carrying out weighted summation on the weight coefficient and the value vector of each semantic vector in the latent semantic vector representation to obtain the intended slot position vector representation corresponding to the dialogue text sequence;

Optionally, the third obtaining module 503 includes:

a fifth obtaining unit, configured to obtain an intention distribution matrix and a slot distribution matrix according to the lingering semantic vector representation and the intention slot vector representation;

a building unit, configured to build a graph network with intent and slot sites according to the intent distribution matrix, the slot distribution matrix, and the intent slot vector representation;

and a sixth acquisition unit, configured to acquire, according to the graph network, an encoded representation of the intended slot.

Optionally, the fifth obtaining unit is configured to:

Optionally, the processing module 504 includes:

a seventh obtaining unit, configured to identify intent according to the encoded representation of the intent slot, and obtain intent classification;

and the eighth acquisition unit is used for carrying out intention recognition according to the code representation of the intention slot to acquire a slot label.

Optionally, the fourth obtaining module 505 includes:

a ninth obtaining unit, configured to perform cross entropy on the intent classification obtained by the intent recognition and the reference intent of the dialog text, to obtain a first value;

a tenth acquisition unit, configured to perform cross entropy on the slot label obtained by filling the slot and the reference slot of the dialog text, to obtain a second value;

an eleventh acquisition unit configured to acquire an intended slot loss function based on the first value and the second value;

and the twelfth acquisition unit is used for training based on the intended slot loss function to acquire a multi-task perception model.

It should be noted that, the apparatus provided in at least one embodiment of the present application is an apparatus capable of executing the service selection method, and all embodiments of the service selection method are applicable to the apparatus, and achieve the same or similar beneficial effects.

At least one embodiment of the present application also provides a multi-task perception model acquisition apparatus, including a transceiver and a processor;

Optionally, the processor is configured to:

As shown in fig. 6, an embodiment of the present invention further provides a multi-task perception model obtaining apparatus, including a processor 600, a transceiver 610, a memory 620, and a program stored on the memory 620 and executable on the processor 600; the transceiver 610 is connected to the processor 600 and the memory 620 through a bus interface, where the processor 600 is configured to read a program in the memory, and perform the following procedures:

A transceiver 610 for receiving and transmitting data under the control of the processor 600.

Wherein in fig. 6, a bus architecture may comprise any number of interconnected buses and bridges, and in particular one or more processors represented by processor 600 and various circuits of memory represented by memory 620, linked together. The bus architecture may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., which are well known in the art and, therefore, will not be described further herein. The bus interface provides an interface. The transceiver 610 may be a number of elements, i.e., including a transmitter and a receiver, providing a means for communicating with various other apparatus over transmission media, including wireless channels, wired channels, optical cables, etc.

The processor 600 is responsible for managing the bus architecture and general processing, and the memory 620 may store data used by the processor 600 in performing operations.

Alternatively, the processor 600 may be a CPU (Central processing Unit), ASIC (Application Specific Integrated Circuit ), FPGA (Field-Programmable Gate Array, field programmable Gate array) or CPLD (Complex Programmable Logic Device ), and the processor may also employ a multi-core architecture.

The processor is operable to perform any of the methods provided by embodiments of the present application in accordance with the obtained executable instructions by invoking a computer program stored in a memory. The processor and the memory may also be physically separate.

Optionally, the processor is configured to read the program in the memory, and perform the following procedure:

At least one embodiment of the present application further provides a device for acquiring a multi-task perception model, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements each process in the embodiment of the method for acquiring a multi-task perception model when executing the program, and the same technical effects can be achieved, and for avoiding repetition, a detailed description is omitted herein.

At least one embodiment of the present application further provides a computer readable storage medium, on which a computer program is stored, where the program when executed by a processor implements each process in the embodiment of the method for obtaining a multi-task perception model as described above, and the same technical effects can be achieved, and for avoiding repetition, a description is omitted herein. Wherein the computer readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims

1. A method for acquiring a multitasking perceptual model, comprising:

2. The method of claim 1, wherein inputting semantic vectors of dialog text into the Bi-directional long-short term memory Bi-LSTM model results in a lingering semantic vector representation, comprising:

3. The method of claim 1, wherein the obtaining the intended slot vector representation of the dialog text from the linger semantic vector representation comprises:

4. The method according to claim 1, wherein said obtaining an encoded representation of an intended slot from said linger-semantic-vector representation and said intended-slot-vector representation comprises:

5. The method of claim 4, wherein the obtaining an intent distribution matrix and a slot distribution matrix from the linger semantic vector representation and the intent slot vector representation comprises:

6. The method of claim 1, wherein said performing intent recognition and slot filling based on said coded representation of intent slots comprises:

7. The method of claim 1, wherein the loss function based on the intent recognition and slot filling is a training objective, and obtaining a multi-task perception model comprises:

8. A multitasking perceptual model acquisition apparatus, comprising:

9. A multi-task perception model acquisition device, comprising a transceiver and a processor;

10. A multi-tasking perception model obtaining device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the multi-tasking perception model obtaining method according to any of claims 1-7 when executing the program.

11. A readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any of claims 1-7.