CN113239693A

CN113239693A - Method, device and equipment for training intention recognition model and storage medium

Info

Publication number: CN113239693A
Application number: CN202110611219.5A
Authority: CN
Inventors: 李志韬; 王健宗; 程宁; 于凤英
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-06-01
Filing date: 2021-06-01
Publication date: 2021-08-10
Anticipated expiration: 2041-06-01
Also published as: CN113239693B

Abstract

The application relates to the field of artificial intelligence, and particularly discloses a training method, a device, equipment and a storage medium for an intention recognition model, wherein the method comprises the following steps: acquiring a sample text and an intention sequence corresponding to the sample text, and adding an interactive mark to the sample text to obtain a sample mark sequence; performing attention calculation on the sample marking sequence based on an attention network to obtain attention output of the sample marking sequence; obtaining a first identification intention according to attention output of the sample marking sequence and a pre-constructed embedding matrix; inputting the attention output of the sample marking sequence into a multilayer perceptron network to obtain a second identification intention of the sample marking sequence; and determining recognition intents according to the first recognition intents and the second recognition intents, training the attention network and the multi-layer perceptron network based on the recognition intents and the intention sequences, and taking the trained attention network and the multi-layer perceptron network as intention recognition models together.

Description

Method, device and equipment for training intention recognition model and storage medium

Technical Field

The present application relates to the field of intent recognition, and in particular, to a method, an apparatus, a device, and a storage medium for training an intent recognition model.

Background

With the continuous development of artificial intelligence technology, the multi-purpose recognition technology is used as a subtask with functions of information retrieval, label recommendation and the like, and is widely applied in the field of artificial intelligence. The multi-intention recognition means that a plurality of intention scenes are classified on an input sentence or an input picture, and the intelligent interaction efficiency can be improved by accurately carrying out the multi-intention recognition. In the prior art, when multi-intention recognition is carried out, the multi-intention recognition is usually converted into a plurality of binary problems or converted into a sequence generation task to be completed, but the two methods are low in inference speed, are not suitable for real-time interaction, and are not high in intention recognition accuracy.

Disclosure of Invention

The application provides a training method, a training device, equipment and a storage medium of an intention recognition model, which are used for improving the accuracy of the intention recognition model obtained by training in multi-intention recognition.

In a first aspect, the present application provides a method for training an intent recognition model, the method comprising:

acquiring a sample text and an intention sequence corresponding to the sample text, and adding an interactive mark to the sample text to obtain a sample mark sequence;

performing attention calculation on the sample marking sequence based on an attention network to obtain attention output of the sample marking sequence;

obtaining a first identification intention according to attention output of the sample marking sequence and a pre-constructed embedding matrix;

inputting the attention output of the sample marking sequence into a multilayer perceptron network to obtain a second recognition intention of the sample marking sequence;

and determining recognition intents according to the first recognition intention and the second recognition intention, training the attention network and the multi-layer perceptron network based on the recognition intents and the intention sequence, and taking the trained attention network and the multi-layer perceptron network as intention recognition models together.

In a second aspect, the present application further provides an apparatus for training an intention recognition model, the apparatus including:

the sample marking module is used for acquiring a sample text and an intention sequence corresponding to the sample text, and adding interactive marks to the sample text to obtain a sample marking sequence;

the attention calculation module is used for carrying out attention calculation on the sample marking sequence based on an attention network to obtain the attention output of the sample marking sequence;

the first identification module is used for obtaining a first identification intention according to the attention output of the sample marking sequence and a pre-constructed embedding matrix;

a second intention module, configured to input the attention output of the sample labeling sequence into a multi-layer perceptron network, so as to obtain a second recognition intention of the sample labeling sequence;

and the model training module is used for determining recognition intents according to the first recognition intents and the second recognition intents, training the attention network and the multilayer perceptron network based on the recognition intents and the intention sequences, and taking the trained attention network and the trained multilayer perceptron network as intention recognition models together.

In a third aspect, the present application further provides a computer device comprising a memory and a processor; the memory is used for storing a computer program; the processor is configured to execute the computer program and to implement the method for training an intention recognition model as described above when executing the computer program.

In a fourth aspect, the present application further provides a computer-readable storage medium storing a computer program, which when executed by a processor causes the processor to implement the method for training an intent recognition model as described above.

The application discloses a training method, a training device, equipment and a storage medium of an intention recognition model, wherein a sample text and a corresponding intention sequence are obtained, and interactive marks are added to the sample text to obtain a sample mark sequence; then, performing attention calculation on the sample marking sequence based on an attention network to obtain attention output of the sample marking sequence; then, obtaining a first identification intention according to attention output of the sample marking sequence and a pre-constructed embedded matrix; inputting the attention output of the sample marking sequence into a multilayer perceptron network to obtain a second identification intention of the sample marking sequence; and finally, determining recognition intents according to the first recognition intents and the second recognition intents, training the attention network and the multi-layer perceptron network based on the recognition intents and the intention sequences, and taking the trained attention network and the multi-layer perceptron network together as an intention recognition model. By the mutual complementation between the first recognition intention and the second recognition intention, the accuracy of the obtained recognition intention is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart illustrating the steps of a training method for an intent recognition model provided in an embodiment of the present application;

FIG. 2 is a flowchart illustrating steps provided by an embodiment of the present application for adding interactive labels to sample text;

FIG. 3 is a flowchart illustrating steps provided by an embodiment of the present application to obtain a first recognition intent;

FIG. 4 is a flowchart illustrating steps provided by an embodiment of the present application for training a multi-layered perceptron network;

FIG. 5 is a flow chart illustrating steps provided by an embodiment of the present application for intent recognition;

FIG. 6 is a schematic block diagram of a training apparatus for an intention recognition model provided in an embodiment of the present application;

fig. 7 is a schematic block diagram of a structure of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the elements and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

It is to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

The embodiment of the application provides a training method and device of an intention recognition model, computer equipment and a storage medium. According to the training method of the intention recognition model, the two intention recognition networks are trained respectively, and mutual complementation of results of the two intention recognition networks is achieved, so that the recognition speed and accuracy of the trained intention recognition model during multi-intention recognition are improved, and the intelligent interaction efficiency and accuracy are further improved.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

Referring to fig. 1, fig. 1 is a schematic flow chart of a training method for an intention recognition model according to an embodiment of the present application.

As shown in fig. 1, the method for training the intention recognition model specifically includes: step S101 to step S105.

S101, obtaining a sample text and an intention sequence corresponding to the sample text, and adding interactive marks to the sample text to obtain a sample mark sequence.

The sample text may be directly input text or input speech, which is then converted to text by speech-to-text techniques. After the sample text is acquired, the sample text may be preprocessed, for example, special symbols are removed, sensitive information is removed, and then an interactive mark is added to the sample text, where the interactive mark is used for the attention interactive range. In a specific implementation process, a symbolic mark may be added to the sample text, and the symbolic mark is used for indicating a start position and an end position of the sample text.

For example, when the sample text is "hello, i am X, do you address pair", the sample text becomes "[ sos ] hello, i am X, do you address pair [ eos ] after adding the symbol mark. Where sos represents the start position of the sample text and eos represents the end position of the sample text.

In one embodiment, referring to fig. 2, the step of adding interactive mark to the sample text may include:

s1011, performing word segmentation and vectorization processing on the sample text to obtain a text vector corresponding to the sample text; and S1012, adding interactive marks to the text vectors according to a pre-constructed mark matrix.

Firstly, a sample text is segmented to obtain a plurality of participles in the sample text, and then a text vector corresponding to the sample text is determined according to a vector corresponding to each participle in the sample text.

In a specific implementation process, TOKENIZER can be used for segmenting the sample text, after each participle in the sample text is obtained, the dimension of each participle is determined according to a pre-constructed configuration document, and then an initial vector of each participle is obtained based on a sentencepece interface. After the initial vectors of all the participles in the sample text are obtained, the initial vectors of all the participles in the sample text are spliced, and the text vector corresponding to the sample text can be obtained.

The initial vector of each participle comprises a participle vector and a relative position vector, and the dimensionality of the participle vector is the same as the dimensionality of the relative position vector. The word segmentation vector refers to a word vector corresponding to a word segmentation, and the relative position vector refers to a vector of relative positions between the word segmentation and is used for representing a context relationship.

After the text vector of the sample text is obtained, adding an interactive mark, namely a mask mark, to the text vector according to a pre-constructed mark matrix for guiding attention interaction, and after the interactive mark is added to the sample mark sequence, enabling the participles in each sample mark sequence to be capable of performing attention interaction only with the participles before the participle in the process of performing attention calculation. In a specific implementation process, the filling value of the participle position participating in attention calculation in the mark matrix is 1, and the filling value of the participle position not participating in attention calculation is 0.

S102, performing attention calculation on the sample marking sequence based on an attention network to obtain attention output of the sample marking sequence.

Because the interaction marker for guiding attention interaction is added in the sample marker sequence, attention calculation can be carried out according to the interaction marker, and the attention output of the sample marker sequence is obtained. Wherein, the attention network can be a transformer layer in Bert or Roberta model and the like. When the transformer layer in the models is used as the attention network, the weights of the pre-trained models can be used without changing the structure of the model, so that the convergence of the whole intention recognition model is accelerated during training.

In the specific implementation process, there are generally multiple transform layers, and the transform layers may also be pruned according to the actual scene. The output of each layer of the transformers is the input of the next layer of the transformers, and a multi-head attention calculation is performed in each layer of the transformers, and the output of the last layer of the transformers is used as the attention output of the sample labeling sequence.

In an embodiment, the performing attention calculation on the sample marker sequence to obtain an attention output of the sample marker sequence includes: determining the radiation range of each word segmentation attention calculation in the sample marking sequence according to the interactive marks in the sample marking sequence; and carrying out attention interaction according to the radiation range to obtain the attention output of the sample marking sequence.

And determining the radiation range of each participle in the sample mark sequence when performing attention calculation according to the interactive marks in the sample mark sequence. Please refer to table 1 for the calculated radiation range for the attention in the sample marker sequence in this application. Taking the sample text as "hello, i am X, and do you address pair" as an example, the radiation range of each participle in the sample tag sequence is as follows:

TABLE 1

Wherein, the rows in table 1 represent the participles, and the number 1 in each row represents the radiation range of the participle when performing attention calculation. For example, in the encoding stage, the first character "you" in the second row may perform attention calculation with the participles corresponding to all columns having the number 1 during attention calculation, that is, "you" may perform attention interaction with "you", "good", "i", …, "pairs", "do", "eos". That is, each participle in the sample text can perform attention interaction with all participles in the sample text, and attention output of the sample mark sequence is obtained.

S103, obtaining a first identification intention according to the attention output of the sample marking sequence and a pre-constructed embedding matrix.

And after obtaining the attention output of the sample marking sequence, multiplying the attention output of the sample marking sequence by a pre-constructed embedded matrix to obtain the score of each character. An output character is determined based on the score of each character, thereby determining a first recognition intent based on the plurality of output characters obtained.

In an embodiment, referring to fig. 3, the step of obtaining the first recognition intention may include: s1031, calculating character scores according to attention output of the sample marking sequences and a pre-constructed embedding matrix; s1032, determining output characters according to the character scores, and adding the output characters to the tail of the sample marking sequence to obtain an input sequence; s1033, adding interactive marks to the input sequence, performing attention calculation to obtain attention output of the input sequence, executing the steps circularly according to the attention output of the input sequence to obtain a plurality of output characters, and obtaining a first recognition intention according to the output characters.

Wherein the pre-constructed embedding matrix is the embedding matrix obtained by token embedding layer in the vocab. Each character has an embedded vector, and the embedded vectors are combined to obtain an embedded matrix. Multiplying the attention output of the sample mark sequence with a pre-constructed embedding matrix to obtain a character score of each character, selecting the character with the highest character score from the character scores of all the characters as a first output character, and then adding the output character to the tail of the sample mark sequence to obtain an input sequence.

And adding an interactive mark to the input sequence for attention calculation, then calculating a character score according to the attention output of the input sequence and a pre-constructed embedded matrix, thereby determining a second output character, and adding the second output character to the tail of the input sequence again to obtain a new input sequence. And circularly executing the step to obtain a plurality of output characters, combining the output characters together, and converting the output characters into a character sequence through a tokenizer to obtain the first recognition intention.

In a specific implementation, the determination of the number of loops may be that the length of the obtained input sequence reaches a maximum length, or that a terminator eos is encountered. The maximum value of the length of the input sequence may be preset.

For example, if the tag sequence is "[ sos ] hello, i's job number is X, your address is to [ eos ]", and the first output character obtained is "self", the resulting input sequence is "[ sos ] hello, i's job number is X, your address is to [ eos ] self".

After the input sequence is obtained, the interactive mark is added to the input sequence again, and the process of adding the interactive mark is the same as the process described above, and is not described herein again. At this time, the radiation range of the input sequence for the attention calculation is shown in table 2.

[sos]	You	Good taste	I am	Worker's tool	…	To pair	Does one	[eos]	From
										You	1	1	1	1	1	1	1	1	0
Good taste	1	1	1	1	1	1	1	1	0
										I am	1	1	1	1	1	1	1	1	0
Worker's tool	1	1	1	1	1	1	1	1	0
										…	1	1	1	1	1	1	1	1	0
To pair	1	1	1	1	1	1	1	1	0
										Does one	1	1	1	1	1	1	1	1	0
[eos]	1	1	1	1	1	1	1	1	0
										From	1	1	1	1	1	1	1	1	0

TABLE 2

The rows in table 2 represent the participles or characters, the number 1 in each row represents the radiation range of the participle or character during attention calculation, and the number 0 in each row represents that the participle or character does not perform attention calculation with the participle or character in the column during attention calculation. For example, in the encoding stage, each participle in the sample tag sequence can be attention-interacted with all participles in the sample tag sequence. Since the character "self" can only see all the participles in the sample tag sequence, self can only interact with all the participles in the sample tag sequence.

And after the attention calculation is carried out according to the contents in the table, obtaining the attention output of the input sequence, multiplying the attention output of the input sequence by a pre-constructed embedding matrix to obtain the character score of each character, selecting the character with the highest character score from the character scores of the characters as a second output character, and then adding the second output character to the tail part of the input sequence to obtain a new input sequence. The new input sequence obtained at this time is "[ sos ] hello, i am X, your address is to be my [ eos ] self".

When a new input sequence is marked and attentively interacted, since the character "I" can see all participles and characters before the character "I", the character "I" can be attentively interacted with all participles in the marked sequence and the character "self".

After a number of cycles, the final attention calculation radiation range is shown in table 3.

[sos]	You	Good taste	I am	Worker's tool	…	To pair	Does one	[eos]	From	…	Ground	Address	[eos]
														You	1	1	1	1	1	1	1	1	0	0	0	0	0
Good taste	1	1	1	1	1	1	1	1	0	0	0	0	0
														I am	1	1	1	1	1	1	1	1	0	0	0	0	0
Worker's tool	1	1	1	1	1	1	1	1	0	0	0	0	0
														…	1	1	1	1	1	1	1	1	0	0	0	0	0
To pair	1	1	1	1	1	1	1	1	0	0	0	0	0
														Does one	1	1	1	1	1	1	1	1	0	0	0	0	0
[eos]	1	1	1	1	1	1	1	1	0	0	0	0	0
														From	1	1	1	1	1	1	1	1	0	0	0	0	0
…	1	1	1	1	1	1	1	1	1	0	0	0	0
														Ground	1	1	1	1	1	1	1	1	1	1	0	0	0
Address	1	1	1	1	1	1	1	1	1	1	1	0	0
														[eos]	1	1	1	1	1	1	1	1	1	1	1	1	0

TABLE 3

Due to the existence of the interaction mark, the attention interaction range is known during attention interaction, so that each character can only carry out attention interaction with all the previous characters, the mode considers the relevance between intentions, the prediction accuracy is improved, and the reasoning speed is accelerated.

And S104, inputting the attention output of the sample marking sequence into a multilayer perceptron network to obtain a second recognition intention of the sample marking sequence.

And taking the attention output of the sample marking sequence as the input of the multilayer perceptron network, and obtaining the second recognition intention of the sample marking sequence through the multilayer perceptron network for correcting the first recognition intention, thereby realizing the controllability of intention generation.

In a specific implementation process, the multilayer perceptron network comprises an input Layer, a Layer normalization, a first fully-connected Layer, a second fully-connected Layer and an output Layer. And the number of categories obtained by classifying the second full connection layer is the number of predicted intentions.

In one embodiment, the step of inputting the attention output of the sample labeling sequence into a multi-layer perceptron network to obtain the second recognition intent of the sample labeling sequence may include: inputting the attention output of the sample marking sequence into a multilayer perceptron network to obtain intention scores of a plurality of predicted intentions of the sample marking sequence; determining a second recognition intent of the sample token sequence according to the intent score and a score threshold for each of the predicted intents.

And inputting the attention output of the sample marking sequence into the multilayer perceptron network to obtain the output of the multilayer perceptron network. The output of the multi-layer perceptron network is a vector of N x 1, N represents the number of output intentions, and each value in the vector represents the intention score of each predicted intention. Comparing the intention score of each predicted intention with a score threshold, and if the intention score of the predicted intention is larger than the score threshold, the predicted intention can be used as a second recognition intention; if the intent score of the predicted intent is less than or equal to the score threshold, then the predicted intent cannot be considered a second recognition intent.

For example, if the multi-layer sensor network outputs (0.1, 2.3, -2) scores corresponding to three intentions (introductions | checkup address | objection processing), and compares the scores with the score threshold 0, it can obtain (1, 1, 0), i.e. the category label of the input sequence is "introductions | checkup address".

In a specific implementation process, the setting of the score threshold may be set to 0, or may be adjusted according to actual situations.

S105, determining recognition intents according to the first recognition intents and the second recognition intents, training the attention network and the multilayer perceptron network based on the recognition intents and the intention sequence, and taking the trained attention network and the trained multilayer perceptron network as intention recognition models together.

After the first recognition intention and the second recognition intention are obtained respectively, the final recognition intention is determined according to the first recognition intention and the second recognition intention. And adjusting parameters of the attention network according to the recognition intention and an intention sequence corresponding to the sample text, so as to train the attention network, and calculating a loss function of the multilayer perceptron network according to the recognition intention and the intention sequence corresponding to the sample text, so as to train.

After the attention network and the multilayer perceptron network are trained, the attention network and the multilayer perceptron network are jointly used as an intention recognition model for intention recognition.

In an embodiment, the step of determining the recognition intention of the sample text according to the first recognition intention and the second recognition intention may include:

and performing deduplication processing on the first recognition intention and the second recognition intention, and taking the deduplicated first recognition intention and the deduplicated second recognition intention as the recognition intentions of the sample text together.

And comparing whether the plurality of intentions in the first recognition intention are the same as the plurality of intentions in the second recognition intention, if so, performing de-duplication processing on the repeated intentions, and taking all the de-duplicated intentions as the recognition intentions of the sample text together.

In a specific implementation process, a plurality of intentions in the first recognition intention and a plurality of intentions in the second recognition intention may be merged first, and then, the duplicate removal processing may be performed to obtain the remaining recognition intentions as sample texts.

That is, if the first recognition intention includes A, B and C, and the second recognition intention includes B, D, E, the recognition intention of the finally obtained sample text after the deduplication processing is A, B, C, D.

In an embodiment, referring to fig. 4, the step of training the multi-layer perceptron network may include:

s1051, classifying the recognition intents according to the intention sequence, and respectively obtaining intention scores of the recognition intents; s1052, calculating a loss function of the multilayer perceptron network by adopting a smooth approximate calculation mode according to the category of the recognition intention, the intention score of the recognition intention and a score threshold value; s1053, training the multilayer perceptron network according to the loss function of the multilayer perceptron network.

Classifying the recognition intents according to the intention sequence, dividing the recognition intents into a target category and a non-target category, and respectively obtaining intention scores of the recognition intents. The recognition intention in the target category is the same as the intention sequence, and the recognition intention in the non-target category is not the same as the intention sequence.

In order to ensure the accuracy of the prediction, it is required that all the intention scores of the recognition intents belonging to the target category are greater than all the intention scores of the recognition intents belonging to the non-target category, and that all the intention scores of the recognition intents belonging to the target category are greater than or equal to a score threshold value, and all the intention scores of the recognition intents belonging to the non-target category are less than the score threshold value. Thus, the formula for the loss function may be:

wherein S is_mIntention score, S, representing a non-target category_nRepresents the intent score of the target category and gamma represents the score threshold.

From a smooth approximation of max, the following equation can be derived:

where Ω 0 represents a set of recognition intents belonging to a non-target class, and Ω 1 represents a set of recognition intents belonging to a target class.

Based on the above formula, the final loss function can be formulated as:

and calculating the loss value of the multilayer perceptron network based on a formula of a loss function, and then carrying out iterative training on the multilayer perceptron network according to the calculated loss value until the loss value is minimum, thereby finishing the training of the multilayer perceptron network. The natural advantage of smooth approximation of max of circle _ loss is adopted, and the problem of unbalanced multi-label classification samples is solved.

Referring to FIG. 5, a flow chart for applying a trained intent recognition model for intent recognition is shown.

S201, obtaining a text to be recognized, and adding an interactive mark to the text to be recognized to obtain a mark sequence.

S202, performing attention calculation on the marker sequence to obtain attention output of the marker sequence.

S203, inputting the attention output of the marking sequence into a pre-trained intention recognition model to obtain a first recognition intention and a second recognition intention.

Inputting the attention output of the marker sequence into a pre-trained intention recognition model, and obtaining a first recognition intention according to the attention output of the marker sequence and a pre-constructed embedding matrix; and inputting the attention output of the marker sequence into the multi-layer perceptron network, thereby obtaining a second recognition intention of the sample marker sequence.

S204, determining the recognition intention of the text to be recognized according to the first recognition intention and the second recognition intention.

The method for training the intention recognition model provided by the embodiment obtains a sample text and a corresponding intention sequence, and adds interactive labels to the sample text to obtain a sample label sequence; then, performing attention calculation on the sample marking sequence based on an attention network to obtain attention output of the sample marking sequence; then, obtaining a first identification intention according to attention output of the sample marking sequence and a pre-constructed embedded matrix; inputting the attention output of the sample marking sequence into a multilayer perceptron network to obtain a second identification intention of the sample marking sequence; and finally, determining recognition intents according to the first recognition intents and the second recognition intents, training the attention network and the multi-layer perceptron network based on the recognition intents and the intention sequences, and taking the trained attention network and the multi-layer perceptron network together as an intention recognition model. By the mutual complementation between the first recognition intention and the second recognition intention, the accuracy of the obtained recognition intention is improved.

Referring to fig. 6, fig. 6 is a schematic block diagram of an intention recognition model training apparatus according to an embodiment of the present application, which is used for performing the aforementioned method for training an intention recognition model. Wherein, the training device of the intention recognition model can be configured in a server or a terminal.

The server may be an independent server or a server cluster. The terminal can be an electronic device such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant and a wearable device.

As shown in fig. 6, the training apparatus 300 for the intention recognition model includes: a sample labeling module 301, an attention calculation module 302, a first recognition module 303, a second intent module 304, and a model training module 305.

The sample marking module 301 is configured to obtain a sample text and an intention sequence corresponding to the sample text, and add an interactive mark to the sample text to obtain a sample marking sequence.

In one embodiment, the sample labeling module 301 includes a text vector sub-module 3011 and a label addition sub-module 3012.

The text vector submodule 3011 is configured to perform word segmentation and vectorization processing on the sample text to obtain a text vector corresponding to the sample text; the tag adding submodule 3012 is configured to add an interactive tag to the text vector according to a pre-constructed tag matrix.

An attention calculation module 302, configured to perform attention calculation on the sample label sequence based on an attention network, and obtain an attention output of the sample label sequence.

A first identification module 303, configured to obtain a first identification intention according to the attention output of the sample marker sequence and a pre-constructed embedding matrix.

In an embodiment, the first identification module 303 includes a score calculation submodule 3031, an input sequence submodule 3032, and a first intent submodule 3033.

The score calculating submodule 3031 is used for calculating a character score according to the attention output of the sample marking sequence and a pre-constructed embedding matrix; the input sequence submodule 3032 is configured to determine an output character according to the character score, and add the output character to the tail of the sample marker sequence to obtain an input sequence; the first intention submodule 3033 is configured to add an interaction flag to the input sequence, perform attention calculation to obtain an attention output of the input sequence, perform the above steps according to an attention output cycle of the input sequence to obtain a plurality of output characters, and obtain a first recognition intention according to the plurality of output characters.

A second intention module 304, configured to input the attention output of the sample labeling sequence into the multi-layered perceptron network, so as to obtain a second recognition intention of the sample labeling sequence.

A model training module 305, configured to determine a recognition intent according to the first recognition intent and the second recognition intent, train the attention network and the multi-layer perceptron network based on the recognition intent and the intent sequence, and use the trained attention network and multi-layer perceptron network together as an intent recognition model.

In an embodiment, the model training module 305 includes an intent score sub-module 3051, a loss function sub-module 3052, and a network training sub-module 3053.

The intention score sub-module 3051 is configured to classify the recognition intents according to the intention sequence, and obtain intention scores of the recognition intents respectively; the loss function sub-module 3052 is configured to calculate a loss function of the multi-layer perceptron network in a smooth approximation calculation manner according to the category of the recognition intent, the intent score of the recognition intent, and a score threshold; the network training sub-module 3053 is configured to train the multi-layer perceptron network according to a loss function of the multi-layer perceptron network.

It should be noted that, as will be clearly understood by those skilled in the art, for convenience and brevity of description, the specific working processes of the training apparatus for the intention recognition model and each module described above may refer to the corresponding processes in the aforementioned embodiment of the training method for the intention recognition model, and are not described herein again.

The training means of the above-described intention recognition model may be implemented in the form of a computer program which can be run on a computer device as shown in fig. 7.

Referring to fig. 7, fig. 7 is a schematic block diagram of a computer device according to an embodiment of the present disclosure. The computer device may be a server or a terminal.

Referring to fig. 7, the computer device includes a processor, a memory and a network interface connected by a system bus, wherein the memory may include a nonvolatile storage medium and an internal memory.

The non-volatile storage medium may store an operating system and a computer program. The computer program includes program instructions that, when executed, cause a processor to perform any of the methods of training an intent recognition model.

The processor is used for providing calculation and control capability and supporting the operation of the whole computer equipment.

The internal memory provides an environment for the execution of a computer program on a non-volatile storage medium, which when executed by the processor, causes the processor to perform any of the methods of training an intent recognition model.

The network interface is used for network communication, such as sending assigned tasks and the like. Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

It should be understood that the Processor may be a Central Processing Unit (CPU), and the Processor may be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Wherein, in one embodiment, the processor is configured to execute a computer program stored in the memory to implement the steps of:

acquiring a sample text and an intention sequence corresponding to the sample text, and adding an interactive mark to the sample text to obtain a sample mark sequence; performing attention calculation on the sample marking sequence based on an attention network to obtain attention output of the sample marking sequence; obtaining a first identification intention according to attention output of the sample marking sequence and a pre-constructed embedding matrix; inputting the attention output of the sample marking sequence into a multilayer perceptron network to obtain a second recognition intention of the sample marking sequence; and determining recognition intents according to the first recognition intention and the second recognition intention, training the attention network and the multi-layer perceptron network based on the recognition intents and the intention sequence, and taking the trained attention network and the multi-layer perceptron network as intention recognition models together.

In one embodiment, the processor, in implementing the adding of the interactive mark to the sample text, is configured to implement:

performing word segmentation and vectorization processing on the sample text to obtain a text vector corresponding to the sample text; and adding interactive marks to the text vectors according to a pre-constructed mark matrix.

In one embodiment, the processor, when implementing the attention network-based attention calculation on the sample marker sequence to obtain the attention output of the sample marker sequence, is configured to implement:

determining the radiation range of each word segmentation attention calculation in the marking sequence according to the interactive marks in the sample marking sequence; and carrying out attention interaction according to the radiation range to obtain the attention output of the sample marking sequence.

In one embodiment, the processor, in implementing the deriving the first recognition intent from the attention output of the sample marker sequence and a pre-constructed embedding matrix, is configured to implement:

calculating a character score according to the attention output of the sample marking sequence and a pre-constructed embedding matrix; determining output characters according to the character scores, and adding the output characters to the tail of the sample marking sequence to obtain an input sequence; adding interactive marks to the input sequence, performing attention calculation to obtain attention output of the input sequence, circularly executing the steps according to the attention output of the input sequence to obtain a plurality of output characters, and obtaining a first recognition intention according to the output characters.

In one embodiment, the processor, in implementing the inputting of the attention output of the sample marker sequence into the multi-layered perceptron network, resulting in the second recognition intent of the sample marker sequence, is configured to implement:

inputting the attention output of the sample marking sequence into a multilayer perceptron network to obtain intention scores of a plurality of predicted intentions of the sample marking sequence; determining a second recognition intent of the sample token sequence according to the intent score and a score threshold for each of the predicted intents.

In one embodiment, the processor, in implementing the determining the recognition intent of the sample text from the first recognition intent and the second recognition intent, is configured to implement:

In one embodiment, the processor, in implementing the training of the multi-layered perceptron network based on the recognition intent and the sequence of intents, is configured to implement:

classifying the recognition intents according to the intention sequence, and respectively obtaining intention scores of the recognition intents; calculating a loss function of the multilayer perceptron network by adopting a smooth approximate calculation mode according to the category of the recognition intention, the intention score of the recognition intention and a score threshold value; and training the multilayer perceptron network according to the loss function of the multilayer perceptron network.

The embodiment of the application also provides a computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, the computer program comprises program instructions, and the processor executes the program instructions to realize the method for training any intention recognition model provided by the embodiment of the application.

The computer-readable storage medium may be an internal storage unit of the computer device described in the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the computer device.

While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for training an intention recognition model, comprising:

2. The method for training the intent recognition model according to claim 1, wherein the adding interactive labels to the sample text comprises:

performing word segmentation and vectorization processing on the sample text to obtain a text vector corresponding to the sample text;

and adding interactive marks to the text vectors according to a pre-constructed mark matrix.

3. The method for training an intention recognition model according to claim 1, wherein the performing attention calculation on the sample label sequence based on the attention network to obtain the attention output of the sample label sequence comprises:

determining the radiation range of each word segmentation attention calculation in the marking sequence according to the interactive marks in the sample marking sequence;

and carrying out attention interaction according to the radiation range to obtain the attention output of the sample marking sequence.

4. The method for training the intention recognition model according to claim 1, wherein the obtaining the first recognition intention according to the attention output of the sample label sequence and a pre-constructed embedding matrix comprises:

calculating a character score according to the attention output of the sample marking sequence and a pre-constructed embedding matrix;

determining output characters according to the character scores, and adding the output characters to the tail of the sample marking sequence to obtain an input sequence;

adding interactive marks to the input sequence, performing attention calculation to obtain attention output of the input sequence, circularly executing the steps according to the attention output of the input sequence to obtain a plurality of output characters, and obtaining a first recognition intention according to the output characters.

5. The method for training the intention recognition model according to claim 1, wherein the inputting the attention output of the sample label sequence into a multi-layer perceptron network to obtain the second recognition intention of the sample label sequence comprises:

inputting the attention output of the sample marking sequence into a multilayer perceptron network to obtain intention scores of a plurality of predicted intentions of the sample marking sequence;

determining a second recognition intent of the sample token sequence according to the intent score and a score threshold for each of the predicted intents.

6. The method for training the intention recognition model according to claim 1, wherein the determining the recognition intention of the sample text according to the first recognition intention and the second recognition intention comprises:

7. The method for training the intent recognition model according to claim 1, wherein the training the multi-layer perceptron network based on the recognition intent and the sequence of intentions comprises:

classifying the recognition intents according to the intention sequence, and respectively obtaining intention scores of the recognition intents;

calculating a loss function of the multilayer perceptron network by adopting a smooth approximate calculation mode according to the category of the recognition intention, the intention score of the recognition intention and a score threshold value;

and training the multilayer perceptron network according to the loss function of the multilayer perceptron network.

8. An apparatus for training an intention recognition model, comprising:

9. A computer device, wherein the computer device comprises a memory and a processor;

the memory is used for storing a computer program;

the processor for executing the computer program and implementing the method of training an intent recognition model according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to implement the method of training an intent recognition model according to any of claims 1 to 7.