CN113239693B

CN113239693B - Training method, device, equipment and storage medium of intention recognition model

Info

Publication number: CN113239693B
Application number: CN202110611219.5A
Authority: CN
Inventors: 李志韬; 王健宗; 程宁; 于凤英
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-06-01
Filing date: 2021-06-01
Publication date: 2023-10-27
Anticipated expiration: 2041-06-01
Also published as: CN113239693A

Abstract

The application relates to the field of artificial intelligence, and particularly discloses a training method, device and equipment for an intention recognition model and a storage medium, wherein the method comprises the following steps: acquiring a sample text and an intention sequence corresponding to the sample text, and adding interactive marks to the sample text to obtain a sample mark sequence; performing attention calculation on the sample mark sequence based on the attention network to obtain attention output of the sample mark sequence; obtaining a first recognition intention according to the attention output of the sample marking sequence and a pre-constructed embedding matrix; inputting the attention output of the sample mark sequence into a multi-layer perceptron network to obtain a second recognition intention of the sample mark sequence; determining the recognition intention according to the first recognition intention and the second recognition intention, training the attention network and the multi-layer perceptron network based on the recognition intention and the intention sequence, and taking the trained attention network and multi-layer perceptron network as an intention recognition model.

Description

Training method, device, equipment and storage medium of intention recognition model

Technical Field

The present application relates to the field of intent recognition, and in particular, to a training method, apparatus, device, and storage medium for an intent recognition model.

Background

With the continuous development of artificial intelligence technology, the application of multi-purpose recognition technology as a subtask of functions such as information retrieval and label recommendation is also wider in the field of artificial intelligence. The multi-intention recognition is to classify a plurality of intention scenes of an input sentence or an input diagram, and the intelligent interaction efficiency can be improved by accurately performing multi-intention recognition. In the prior art, when multi-intention recognition is carried out, the multi-intention recognition is usually converted into a plurality of classification problems or converted into a sequence generation task to be completed, but the reasoning speed of the two methods is low, the real-time interaction is not suitable, and the accuracy of the intention recognition is not high.

Disclosure of Invention

The application provides a training method, device, equipment and storage medium of an intention recognition model, which are used for improving the accuracy of the intention recognition model obtained by training when multi-intention recognition is carried out.

In a first aspect, the present application provides a training method of an intent recognition model, the method comprising:

acquiring a sample text and an intention sequence corresponding to the sample text, and adding an interactive mark to the sample text to obtain a sample mark sequence;

performing attention calculation on the sample marker sequence based on an attention network to obtain attention output of the sample marker sequence;

obtaining a first recognition intention according to the attention output of the sample marking sequence and a pre-constructed embedding matrix;

inputting the attention output of the sample marker sequence into a multi-layer perceptron network to obtain a second recognition intention of the sample marker sequence;

determining an identification intention according to the first identification intention and the second identification intention, training the attention network and the multi-layer perceptron network based on the identification intention and the intention sequence, and taking the trained attention network and multi-layer perceptron network together as an intention identification model.

In a second aspect, the present application also provides a training apparatus for an intent recognition model, the apparatus comprising:

the sample marking module is used for obtaining a sample text and an intention sequence corresponding to the sample text, and adding interactive marks to the sample text to obtain a sample marking sequence;

the attention calculating module is used for carrying out attention calculation on the sample mark sequence based on an attention network to obtain attention output of the sample mark sequence;

the first recognition module is used for obtaining a first recognition intention according to the attention output of the sample marking sequence and a pre-constructed embedding matrix;

a second intention module for inputting the attention output of the sample marker sequence into a multi-layer perceptron network to obtain a second recognition intention of the sample marker sequence;

the model training module is used for determining the recognition intention according to the first recognition intention and the second recognition intention, training the attention network and the multi-layer perceptron network based on the recognition intention and the intention sequence, and taking the trained attention network and multi-layer perceptron network together as an intention recognition model.

In a third aspect, the present application also provides a computer device comprising a memory and a processor; the memory is used for storing a computer program; the processor is configured to execute the computer program and implement the training method of the intention recognition model as described above when the computer program is executed.

In a fourth aspect, the present application also provides a computer readable storage medium storing a computer program, which when executed by a processor causes the processor to implement a training method of an intention recognition model as described above.

The application discloses a training method, a training device, training equipment and training storage media for an intention recognition model, which are used for acquiring a sample text and a corresponding intention sequence, and adding interactive marks to the sample text to obtain a sample mark sequence; performing attention calculation on the sample mark sequence based on the attention network to obtain attention output of the sample mark sequence; then obtaining a first recognition intention according to the attention output of the sample marking sequence and a pre-constructed embedding matrix; inputting the attention output of the sample mark sequence into a multi-layer perceptron network to obtain a second recognition intention of the sample mark sequence; finally, determining the recognition intention according to the first recognition intention and the second recognition intention, training the attention network and the multi-layer perceptron network based on the recognition intention and the intention sequence, and taking the trained attention network and multi-layer perceptron network together as an intention recognition model. The accuracy of the obtained recognition intention is improved by the mutual complementation between the first recognition intention and the second recognition intention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of the steps of a training method of an intention recognition model provided by an embodiment of the present application;

FIG. 2 is a schematic flow chart of steps for adding interactive marks to sample text provided by an embodiment of the present application;

FIG. 3 is a schematic flow chart of steps for obtaining a first recognition intent provided by an embodiment of the present application;

FIG. 4 is a schematic flow chart of the training steps of a multi-layered perceptron network, according to an embodiment of the present application;

FIG. 5 is a schematic flow chart of the steps for intent recognition provided by an embodiment of the present application;

FIG. 6 is a schematic block diagram of a training apparatus for an intent recognition model provided by an embodiment of the present application;

fig. 7 is a schematic block diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.

It is to be understood that the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

The embodiment of the application provides a training method and device of an intention recognition model, computer equipment and a storage medium. According to the training method of the intention recognition model, two intention recognition networks are trained respectively, and mutual complementation of two intention recognition network results is achieved, so that the recognition speed and accuracy of the trained intention recognition model in multi-intention recognition are improved, and the intelligent interaction efficiency and accuracy are further improved.

Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

Referring to fig. 1, fig. 1 is a schematic flowchart of a training method of an intent recognition model according to an embodiment of the present application.

As shown in fig. 1, the training method of the intent recognition model specifically includes: step S101 to step S105.

S101, acquiring a sample text and an intention sequence corresponding to the sample text, and adding interactive marks to the sample text to obtain a sample mark sequence.

The sample text may be directly input text or may be input speech and then converted to text by speech-to-text technology. After the sample text is obtained, the sample text may be preprocessed, for example, special symbols are removed, sensitive information is removed, and then interactive marks are added to the sample text, where the interactive marks are used for attention interaction range. In an implementation, a symbolic marker may be added to the sample text first, where the symbolic marker is used to indicate a start position and an end position of the sample text.

For example, when the sample text is "your work number is X, your address pair is" after adding the sign mark, the sample text becomes "[ sos ] your work number is X, your address pair is [ eos ]". Where sos represents the start position of the sample text and eos is used to represent the end position of the sample text.

In one embodiment, referring to fig. 2, the step of adding interactive markers to the sample text may include:

s1011, performing word segmentation and vectorization on the sample text to obtain a text vector corresponding to the sample text; and S1012, adding interactive marks to the text vector according to a pre-constructed mark matrix.

Firstly, the sample text is segmented to obtain a plurality of segmentation words in the sample text, and then a text vector corresponding to the sample text is determined according to a vector corresponding to each segmentation word in the sample text.

In the implementation process, the sample text can be segmented by using a token, after each segmentation in the sample text is obtained, the dimension of each segmentation is determined according to a pre-constructed configuration document, and then the initial vector of each segmentation is obtained based on a sendencepiece interface. After the initial vectors of all the word segmentation in the sample text are obtained, the initial vectors of all the word segmentation in the sample text are spliced, and then the text vector corresponding to the sample text can be obtained.

The initial vector of each word segment comprises a word segment vector and a relative position vector, and the dimension of the word segment vector is the same as the dimension of the relative position vector. The word segmentation vector refers to a word vector corresponding to the word segmentation, and the relative position vector refers to a vector of relative positions between the word segmentation and the word segmentation, and is used for representing a context.

After the text vector of the sample text is obtained, adding an interactive mark, namely a mask mark, to the text vector according to a pre-constructed mark matrix for guiding attention interaction, and after the interactive mark is added to the sample mark sequence, enabling the segmentation word in each sample mark sequence to only perform attention interaction with the segmentation word before the segmentation word in the process of performing attention calculation. In the specific implementation process, the filling value of the word segmentation position participating in the attention calculation in the marking matrix is 1, and the filling value of the word segmentation position not participating in the attention calculation is 0.

S102, performing attention calculation on the sample mark sequence based on an attention network to obtain attention output of the sample mark sequence.

Because the interactive mark used for guiding attention interaction is added in the sample mark sequence, attention calculation can be carried out according to the interactive mark, and the attention output of the sample mark sequence is obtained. The attention network may be a transformer layer in a Bert or Roberta model or the like. When using the transducer layer in these models as the attention network, the weights of these pre-trained models can be used without changing its structure, thereby speeding up the convergence of the whole intent recognition model at training.

In the implementation process, there are a plurality of transformation layers, and the transformation layers can be pruned according to actual scenes. The output of each layer of transformers is the input of the next layer of transformers, multiple head attention calculations are performed in each layer of transformers, and the output of the last layer of transformers is taken as the attention output of the sample marker sequence.

In an embodiment, the performing attention computation on the sample marker sequence to obtain an attention output of the sample marker sequence includes: determining the radiation range of each word segmentation attention calculation in the sample mark sequence according to the interactive marks in the sample mark sequence; and performing attention interaction according to the radiation range to obtain the attention output of the sample marking sequence.

The radiation range of each word in the sample mark sequence when the attention calculation is carried out is determined according to the interactive marks in the sample mark sequence. Referring to Table 1, the calculated radiation ranges for attention in the sample marker sequences of the present application are shown. Taking sample text as 'your good, I'm job number is X, your address pair is taken as an example, the radiation range of each word in the sample marking sequence is as follows:

TABLE 1

Wherein the rows in table 1 represent the word segment, and the number 1 in each row represents the radiation range of the word segment when performing the attention calculation. For example, in the encoding stage, the first character "you" in the second row may perform attention calculation with the word segments corresponding to columns all having the number 1, i.e., "you" may perform attention interaction with "you", "good", "i", "…", "pair", "eos". That is, each word segment in the sample text can perform attention interaction with all word segments in the sample text, so as to obtain the attention output of the sample marking sequence.

S103, obtaining a first recognition intention according to the attention output of the sample marking sequence and a pre-constructed embedding matrix.

After the attention output of the sample mark sequence is obtained, multiplying the attention output of the sample mark sequence by a pre-constructed embedding matrix to obtain the score of each character. The output character is determined based on the score of each character, thereby determining the first recognition intention based on the resulting plurality of output characters.

In one embodiment, referring to fig. 3, the step of obtaining the first recognition intention may include: s1031, calculating a character score according to the attention output of the sample marking sequence and a pre-constructed embedding matrix; s1032, determining an output character according to the character score, and adding the output character to the tail part of the sample marking sequence to obtain an input sequence; s1033, adding interactive marks to the input sequence, performing attention calculation to obtain attention output of the input sequence, performing the steps according to the attention output cycle of the input sequence to obtain a plurality of output characters, and obtaining a first recognition intention according to the plurality of output characters.

The pre-built embedding matrix is an embedding matrix obtained by a token embedding layer in the vocab. Each character has an embedded vector, and the embedded vectors are combined to obtain an embedded matrix. Multiplying the attention output of the sample marking sequence by a pre-constructed embedding matrix to obtain a character score of each character, selecting a character with the highest character score from the character scores of the characters as a first output character, and adding the output character to the tail part of the sample marking sequence to obtain an input sequence.

And adding interactive marks to the input sequence to perform attention calculation, calculating character scores according to the attention output of the input sequence and a pre-constructed embedding matrix, determining a second output character, and adding the second output character to the tail part of the input sequence again to obtain a new input sequence. And circularly executing the step to obtain a plurality of output characters, combining the plurality of output characters, and converting the plurality of output characters into a character sequence through a token to obtain the first recognition intention.

In a specific implementation, the number of loops may be determined by the length of the resulting input sequence reaching a maximum length, or encountering the terminator eos. The maximum value of the length of the input sequence may be preset.

For example, if the tag sequence is "[ sos ] your work number is X, your address pair is [ eos ]", and the first output character obtained is "self", the input sequence obtained is "[ sos ] your work number is X, your address pair is [ eos ] self.

After the input sequence is obtained, the interactive mark is added to the input sequence again, and the process of adding the interactive mark is the same as the process described above, and will not be described again here. At this time, the radiation ranges of the attention calculations in the input sequence are shown in table 2.

[sos]	You (you)	Good (good)	I am	Worker's work	…	For a pair of	Does not take care of	[eos]	Self-supporting
										You (you)	1	1	1	1	1	1	1	1	0
Good (good)	1	1	1	1	1	1	1	1	0
										I am	1	1	1	1	1	1	1	1	0
Worker's work	1	1	1	1	1	1	1	1	0
										…	1	1	1	1	1	1	1	1	0
For a pair of	1	1	1	1	1	1	1	1	0
										Does not take care of	1	1	1	1	1	1	1	1	0
[eos]	1	1	1	1	1	1	1	1	0
										Self-supporting	1	1	1	1	1	1	1	1	0

TABLE 2

Wherein, the rows in table 2 represent the word or character, the number 1 in each row represents the radiation range of the word or character when the attention calculation is performed, and the number 0 in each row represents that the word or character does not perform the attention calculation with the word or character in the column when the attention calculation is performed. For example, each word in the sample tag sequence may interact with all the words in the sample tag sequence in the encoding phase. The character "self" can only see all the words in the sample tag sequence, so that the character "self" can only interact with all the words in the sample tag sequence.

After attention calculation is performed according to the contents in the table, attention output of the input sequence is obtained, the attention output of the input sequence is multiplied by a pre-built embedding matrix to obtain character scores of each character, a character with the highest character score is selected from the character scores of the characters as a second output character, and then the second output character is added to the tail part of the input sequence to obtain a new input sequence. The new input sequence obtained at this time is "[ sos ] you good, I work number is X, you address pair is [ eos ] self.

In the tagging and attention interaction of a new input sequence, since the character "me" can see all the segmentations and characters before "me", "me" can interact with all the segmentations in the tagging sequence and "self".

The radiation ranges for the final attention calculations after multiple cycles are shown in table 3.

[sos]	You (you)	Good (good)	I am	Worker's work	…	For a pair of	Does not take care of	[eos]	Self-supporting	…	Ground (floor)	Address of the site	[eos]
														You (you)	1	1	1	1	1	1	1	1	0	0	0	0	0
Good (good)	1	1	1	1	1	1	1	1	0	0	0	0	0
														I am	1	1	1	1	1	1	1	1	0	0	0	0	0
Worker's work	1	1	1	1	1	1	1	1	0	0	0	0	0
														…	1	1	1	1	1	1	1	1	0	0	0	0	0
For a pair of	1	1	1	1	1	1	1	1	0	0	0	0	0
														Does not take care of	1	1	1	1	1	1	1	1	0	0	0	0	0
[eos]	1	1	1	1	1	1	1	1	0	0	0	0	0
														Self-supporting	1	1	1	1	1	1	1	1	0	0	0	0	0
…	1	1	1	1	1	1	1	1	1	0	0	0	0
														Ground (floor)	1	1	1	1	1	1	1	1	1	1	0	0	0
Address of the site	1	1	1	1	1	1	1	1	1	1	1	0	0
														[eos]	1	1	1	1	1	1	1	1	1	1	1	1	0

TABLE 3 Table 3

Because of the existence of the interaction mark, the attention interaction range is known when the attention interaction is carried out, so that each character can only carry out the attention interaction with all the characters in the front, and the mode considers the relevance among the intentions, thereby improving the prediction accuracy and also accelerating the reasoning speed.

S104, inputting the attention output of the sample marking sequence into a multi-layer perceptron network to obtain a second identification intention of the sample marking sequence.

And taking the attention output of the sample mark sequence as the input of the multi-layer perceptron network, and obtaining a second recognition intention of the sample mark sequence through the multi-layer perceptron network for correcting the first recognition intention, thereby realizing the controllability of intention generation.

In an implementation, the multi-layer sensor network includes an input layer, layer normalization, a first fully-connected layer, a second fully-connected layer, and an output layer. The number of categories obtained by the second full connection layer through classification is the number of predicted intentions.

In an embodiment, the step of inputting the attention output of the sample marker sequence into a multi-layer perceptron network to obtain the second recognition intent of the sample marker sequence may comprise: inputting the attention output of the sample marker sequence into a multi-layer perceptron network to obtain intent scores of a plurality of predicted intentions of the sample marker sequence; a second recognition intent of the sample tag sequence is determined based on the intent score and a score threshold for each of the predicted intents.

And inputting the attention output of the sample marking sequence into the multi-layer perceptron network to obtain the output of the multi-layer perceptron network. The output of the multi-layer perceptron network is a vector of N1, N represents the number of intention of the output, and each value in the vector represents the intention score of each prediction intention. Comparing the intent score of each predicted intent with a score threshold, wherein if the intent score of the predicted intent is greater than the score threshold, the predicted intent can be used as a second recognition intent; if the intent score of the predicted intent is less than or equal to the score threshold, then the predicted intent cannot be identified as a second identified intent.

For example, if the multi-layer sensor network outputs (0.1,2.3, -2) scores corresponding to three intentions (self-introduced |check address|objection processing), then (1, 0) is obtained by comparing with the score threshold value 0, i.e., the category label that describes the input sequence is "self-introduced|check address".

In the implementation process, the score threshold value can be set to 0, and can be adjusted according to actual conditions.

S105, determining an identification intention according to the first identification intention and the second identification intention, training the attention network and the multi-layer perceptron network based on the identification intention and the intention sequence, and taking the trained attention network and the multi-layer perceptron network together as an intention identification model.

After the first recognition intention and the second recognition intention are obtained respectively, determining a final recognition intention according to the first recognition intention and the second recognition intention. And adjusting parameters of the attention network according to the intention sequence corresponding to the recognition intention and the sample text, so as to train the attention network, and calculating a loss function of the multi-layer perceptron network according to the intention sequence corresponding to the recognition intention and the sample text, so as to train.

After the attention network and the multi-layer perceptron network are trained, the attention network and the multi-layer perceptron network are used as an intention recognition model for carrying out intention recognition.

In an embodiment, the step of determining the recognition intention of the sample text according to the first recognition intention and the second recognition intention may include:

performing de-duplication processing on the first recognition intention and the second recognition intention, and taking the de-duplicated first recognition intention and the de-duplicated second recognition intention as the recognition intention of the sample text.

Comparing whether the multiple intentions in the first recognition intention and the multiple intentions in the second recognition intention are the same, if so, performing duplicate removal processing on the repeated intentions, and then taking all the recognition intentions after duplicate removal as the recognition intentions of the sample text.

In the specific implementation process, the multiple intentions in the first recognition intention and the multiple intentions in the second recognition intention may be combined first, then the duplicate removal process is performed, and the rest of recognition intentions serving as sample texts may be performed.

That is, if the first recognition intention includes three of A, B and C and the second recognition intention includes three of B, D, E, the recognition intention of the finally obtained sample text after the deduplication process is A, B, C, D.

In one embodiment, referring to fig. 4, the step of training the multi-layer sensor network may include:

s1051, classifying the recognition intents according to the intention sequence, and respectively acquiring intention scores of the recognition intents; s1052, calculating a loss function of the multi-layer perceptron network in a smooth approximation calculation mode according to the category of the recognition intention, the intention score of the recognition intention and the score threshold; s1053, training the multi-layer perceptron network according to the loss function of the multi-layer perceptron network.

Classifying the recognition intents according to the intention sequence, classifying the recognition intents into a target category and a non-target category, and respectively obtaining the intention scores of the recognition intents. Wherein the recognition intention in the target category is the same as the intention sequence, and the recognition intention in the non-target category is different from the intention sequence.

To ensure accuracy of prediction, it is required that all intention scores of recognition intents belonging to the target category are greater than all intention scores of recognition intents belonging to the non-target category, and all intention scores of recognition intents belonging to the target category are greater than or equal to the score threshold, and all intention scores of recognition intents belonging to the non-target category are less than the score threshold. Thus, the formula for the loss function may be:

wherein S is _m Meaning score representing non-target category, S _n Representing the intent score of the target category, and γ represents the score threshold.

From the smooth approximation of max, the following formula can be derived:

where Ω 0 represents a set of recognition intents belonging to a non-target category, and Ω 1 represents a set of recognition intents belonging to a target category.

Based on the above formula, the formula of the final loss function can be expressed as:

and calculating the loss value of the multi-layer perceptron network based on a loss function formula, and then performing iterative training on the multi-layer perceptron network according to the calculated loss value until the loss value is minimum, thereby completing the training on the multi-layer perceptron network. The problem of unbalanced multi-label classification samples is solved by adopting the natural advantage of smooth and approximate max of circle_loss.

Referring to fig. 5, a flowchart of applying a trained intent recognition model for intent recognition is shown.

S201, acquiring a text to be identified, and adding interactive marks to the text to be identified to obtain a mark sequence.

S202, performing attention calculation on the marker sequence to obtain attention output of the marker sequence.

S203, inputting the attention output of the marker sequence into a pre-trained intention recognition model to obtain a first recognition intention and a second recognition intention.

The method comprises the steps of inputting attention output of a marker sequence into a pre-trained intention recognition model, wherein the first recognition intention is obtained according to the attention output of the marker sequence and a pre-built embedding matrix; and inputting the attention output of the marker sequence into the multi-layer perceptron network, thereby obtaining a second recognition intent of the sample marker sequence.

S204, determining the recognition intention of the text to be recognized according to the first recognition intention and the second recognition intention.

According to the training method of the intention recognition model, which is provided by the embodiment, the sample text and the corresponding intention sequence are obtained, and interactive marks are added to the sample text to obtain a sample mark sequence; performing attention calculation on the sample mark sequence based on the attention network to obtain attention output of the sample mark sequence; then obtaining a first recognition intention according to the attention output of the sample marking sequence and a pre-constructed embedding matrix; inputting the attention output of the sample mark sequence into a multi-layer perceptron network to obtain a second recognition intention of the sample mark sequence; finally, determining the recognition intention according to the first recognition intention and the second recognition intention, training the attention network and the multi-layer perceptron network based on the recognition intention and the intention sequence, and taking the trained attention network and multi-layer perceptron network together as an intention recognition model. The accuracy of the obtained recognition intention is improved by the mutual complementation between the first recognition intention and the second recognition intention.

Referring to fig. 6, fig. 6 is a schematic block diagram of an apparatus for training an intent recognition model according to an embodiment of the present application, wherein the apparatus is used for performing the foregoing method for training an intent recognition model. The training device of the intention recognition model can be configured in a server or a terminal.

The servers may be independent servers or may be server clusters. The terminal can be electronic equipment such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, wearable equipment and the like.

As shown in fig. 6, the training apparatus 300 of the intention recognition model includes: a sample tagging module 301, an attention calculation module 302, a first recognition module 303, a second intent module 304, and a model training module 305.

The sample marking module 301 is configured to obtain a sample text and an intention sequence corresponding to the sample text, and add an interactive mark to the sample text to obtain a sample marking sequence.

In an embodiment, sample tagging module 301 includes a text-to-sub-module 3011 and a tag-adding sub-module 3012.

The text vector sub-module 3011 is configured to perform word segmentation and vectorization on the sample text to obtain a text vector corresponding to the sample text; the label adding sub-module 3012 is configured to add interactive labels to the text vectors according to a pre-constructed label matrix.

The attention calculating module 302 is configured to perform attention calculation on the sample tag sequence based on an attention network, so as to obtain an attention output of the sample tag sequence.

The first recognition module 303 is configured to obtain a first recognition intention according to the attention output of the sample marker sequence and a pre-constructed embedding matrix.

In an embodiment, the first recognition module 303 includes a score computation sub-module 3031, an input sequence sub-module 3032, and a first intent sub-module 3033.

The score computing sub-module 3031 is used for computing character scores according to the attention output of the sample marking sequence and a pre-constructed embedding matrix; the input sequence submodule 3032 is configured to determine an output character according to the character score, and add the output character to the tail of the sample tag sequence to obtain an input sequence; the first intention submodule 3033 is configured to add an interactive mark to the input sequence, perform attention calculation to obtain attention output of the input sequence, perform the above steps according to the attention output cycle of the input sequence, obtain a plurality of output characters, and obtain a first recognition intention according to the plurality of output characters.

A second intention module 304, configured to input the attention output of the sample tag sequence into a multi-layer perceptron network, and obtain a second recognition intention of the sample tag sequence.

The model training module 305 is configured to determine a recognition intention according to the first recognition intention and the second recognition intention, train the attention network and the multi-layer perceptron network based on the recognition intention and the intention sequence, and use the trained attention network and multi-layer perceptron network together as an intention recognition model.

In an embodiment, the model training module 305 includes an intent acquisition sub-module 3051, a loss function sub-module 3052, and a network training sub-module 3053.

The intention obtaining molecular module 3051 is configured to classify the recognition intention according to the intention sequence, and obtain intention scores of the recognition intents respectively; the loss function submodule 3052 is used for calculating a loss function of the multi-layer perceptron network in a smooth approximate calculation mode according to the category of the recognition intention, the intention score of the recognition intention and the score threshold; the network training sub-module 3053 is configured to train the multi-layered perceptron network according to a loss function of the multi-layered perceptron network.

It should be noted that, for convenience and brevity of description, the training device of the intent recognition model and the specific working process of each module described above may refer to the corresponding process in the foregoing training method embodiment of the intent recognition model, which is not described herein.

The training apparatus of the intention recognition model described above may be implemented in the form of a computer program which is executable on a computer device as shown in fig. 7.

Referring to fig. 7, fig. 7 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device may be a server or a terminal.

Referring to fig. 7, the computer device includes a processor, a memory, and a network interface connected by a system bus, wherein the memory may include a non-volatile storage medium and an internal memory.

The non-volatile storage medium may store an operating system and a computer program. The computer program comprises program instructions that, when executed, cause the processor to perform any one of a training method of an intent recognition model.

The processor is used to provide computing and control capabilities to support the operation of the entire computer device.

The internal memory provides an environment for the execution of a computer program in a non-volatile storage medium that, when executed by a processor, causes the processor to perform any one of a number of training methods for the intent recognition model.

The network interface is used for network communication such as transmitting assigned tasks and the like. It will be appreciated by those skilled in the art that the structure shown in FIG. 7 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

It should be appreciated that the processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Wherein in one embodiment the processor is configured to run a computer program stored in the memory to implement the steps of:

acquiring a sample text and an intention sequence corresponding to the sample text, and adding an interactive mark to the sample text to obtain a sample mark sequence; performing attention calculation on the sample marker sequence based on an attention network to obtain attention output of the sample marker sequence; obtaining a first recognition intention according to the attention output of the sample marking sequence and a pre-constructed embedding matrix; inputting the attention output of the sample marker sequence into a multi-layer perceptron network to obtain a second recognition intention of the sample marker sequence; determining an identification intention according to the first identification intention and the second identification intention, training the attention network and the multi-layer perceptron network based on the identification intention and the intention sequence, and taking the trained attention network and multi-layer perceptron network together as an intention identification model.

In one embodiment, the processor, when implementing the adding of the interactive label to the sample text, is configured to implement:

performing word segmentation and vectorization on the sample text to obtain a text vector corresponding to the sample text; and adding interactive marks to the text vector according to a pre-constructed mark matrix.

In one embodiment, the processor is configured to, when implementing the attention network based attention calculation on the sample marker sequence to obtain an attention output of the sample marker sequence, implement:

determining the radiation range of each word segmentation attention calculation in the mark sequence according to the interactive marks in the sample mark sequence; and performing attention interaction according to the radiation range to obtain the attention output of the sample marking sequence.

In one embodiment, the processor, when implementing the first recognition intention from the attention output of the sample marker sequence and a pre-constructed embedding matrix, is configured to implement:

calculating a character score according to the attention output of the sample marking sequence and a pre-constructed embedding matrix; determining an output character according to the character score, and adding the output character to the tail part of the sample marking sequence to obtain an input sequence; and adding interactive marks to the input sequence, performing attention calculation to obtain attention output of the input sequence, circularly executing the steps according to the attention output of the input sequence to obtain a plurality of output characters, and obtaining a first recognition intention according to the plurality of output characters.

In one embodiment, the processor, when implementing the inputting of the attention output of the sample marker sequence into the multi-layer perceptron network, is configured to implement:

inputting the attention output of the sample marker sequence into a multi-layer perceptron network to obtain intent scores of a plurality of predicted intentions of the sample marker sequence; a second recognition intent of the sample tag sequence is determined based on the intent score and a score threshold for each of the predicted intents.

In one embodiment, the processor, when implementing the determining the recognition intent of the sample text based on the first recognition intent and the second recognition intent, is to implement:

In one embodiment, the processor, when implementing the training of the multi-layered perceptron network based on the recognition intent and the sequence of intents, is to implement:

classifying the recognition intents according to the intention sequence, and respectively acquiring intention scores of the recognition intents; calculating a loss function of the multi-layer perceptron network by adopting a smooth approximation calculation mode according to the category of the recognition intention, the intention score of the recognition intention and the score threshold; training the multi-layer perceptron network according to a loss function of the multi-layer perceptron network.

The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, the computer program comprises program instructions, and the processor executes the program instructions to realize the training method of any intention recognition model provided by the embodiment of the application.

The computer readable storage medium may be an internal storage unit of the computer device according to the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, which are provided on the computer device.

While the application has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

1. A method of training an intent recognition model, comprising:

determining an identification intention according to the first identification intention and the second identification intention, training the attention network and the multi-layer perceptron network based on the identification intention and the intention sequence, and taking the trained attention network and multi-layer perceptron network together as an intention identification model;

the attention network performs attention calculation on the sample mark sequence to obtain attention output of the sample mark sequence, and the attention network comprises the following steps:

determining the radiation range of each word segmentation attention calculation in the mark sequence according to the interactive marks in the sample mark sequence;

performing attention interaction according to the radiation range to obtain attention output of the sample marking sequence;

wherein the obtaining a first recognition intention according to the attention output of the sample marking sequence and a pre-constructed embedding matrix comprises the following steps:

calculating a character score according to the attention output of the sample marking sequence and a pre-constructed embedding matrix;

determining an output character according to the character score, and adding the output character to the tail part of the sample marking sequence to obtain an input sequence;

adding interactive marks to the input sequence, performing attention calculation to obtain attention output of the input sequence, performing steps of calculating the character score and determining the output character according to the character score according to the attention output cycle of the input sequence to obtain a plurality of output characters, and obtaining a first recognition intention according to the plurality of output characters;

wherein the inputting the attention output of the sample marker sequence into the multi-layer perceptron network to obtain a second recognition intent of the sample marker sequence comprises:

inputting the attention output of the sample marker sequence into a multi-layer perceptron network to obtain intent scores of a plurality of predicted intentions of the sample marker sequence;

a second recognition intent of the sample tag sequence is determined based on the intent score and a score threshold for each of the predicted intents.

2. The method of training an intent recognition model as claimed in claim 1, wherein said adding interactive markers to said sample text includes:

performing word segmentation and vectorization on the sample text to obtain a text vector corresponding to the sample text;

and adding interactive marks to the text vector according to a pre-constructed mark matrix.

3. The method of training an intent recognition model as recited in claim 1, wherein said determining recognition intent of the sample text based on the first recognition intent and the second recognition intent includes:

4. The method of training an intent recognition model of claim 1, wherein the training the multi-layered perceptron network based on the recognition intent and the sequence of intents comprises:

classifying the recognition intents according to the intention sequence, and respectively acquiring intention scores of the recognition intents;

calculating a loss function of the multi-layer perceptron network by adopting a smooth approximation calculation mode according to the category of the recognition intention, the intention score of the recognition intention and the score threshold;

training the multi-layer perceptron network according to a loss function of the multi-layer perceptron network.

5. Training device for an intent recognition model, characterized in that it is adapted to implement the method according to any of claims 1 to 4, comprising:

6. A computer device, the computer device comprising a memory and a processor;

the memory is used for storing a computer program;

the processor for executing the computer program and for implementing a training method of an intent recognition model as claimed in any one of the claims 1 to 4 when the computer program is executed.

7. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, causes the processor to implement the training method of the intention recognition model according to any one of claims 1 to 4.