CN108920622B

CN108920622B - Training method, training device and recognition device for intention recognition

Info

Publication number: CN108920622B
Application number: CN201810694995.4A
Authority: CN
Inventors: 符文君; 吴友政
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2018-06-29
Filing date: 2018-06-29
Publication date: 2021-07-20
Anticipated expiration: 2038-06-29
Also published as: CN108920622A

Abstract

A training method, a device and a recognition device for intention recognition are provided, the method comprises the following steps: obtaining corpus text vectors corresponding to corpora in the corpus text; constructing a joint loss function formula of a training model; acquiring training data; segmenting the training data, mapping the segmented training data to corresponding corpus text vectors, and recording the corpus text vectors as training vectors; predicting the training sample vector by adopting a training model, and calculating the loss function value of each training model according to the prediction result; and calculating a joint loss function value based on the loss function values of the models, judging whether the joint loss function value is smaller than a set threshold value, if so, finishing the training, otherwise, updating the parameters of the models, and continuing the iterative training. Therefore, the generalization capability of intention identification can be improved as much as possible, and the problems of semantic ambiguity and fault tolerance are solved.

Description

Training method, training device and recognition device for intention recognition

Technical Field

The invention relates to the technical field of intelligence, in particular to an intention recognition training method, an intention recognition training device and an intention recognition device based on multi-task learning.

Background

"intention recognition" refers to determining the category of intention to which a user inputs a piece of information for expressing the requirements of a query. The current intention recognition technology is mainly applied to search engines, man-machine conversation systems and the like, and can be particularly divided into methods based on templates/word banks and methods based on supervised classification, wherein the method based on the templates/word banks is used for mining specific intention templates/word banks from historical input of users, and if the input of the users is matched with words in templates/word banks of corresponding categories, the input is considered to belong to the intention category; the method based on supervised classification builds an intention classification model based on historical input data and predicts the input intention category of the user. The applicant has found through research that the defects of the prior art are mainly in the following aspects:

firstly, generalization capability, a template-based method is limited by the problems of coverage rate of a template and a word bank, and a supervised classification-based method is limited by the problems of data scale and data quality of training corpora.

Secondly, ambiguity problem, semantic missing and fault tolerance problem, the short text often has the problems of incompleteness, semantic missing, input error and the like, for example, the user inputs ' playing and bear ' at night ', and actually wants to search ' the midnight palace of the teddy bear '.

In addition, some prior art attempts to classify based on a method of multitask learning. However, this method also has two problems: firstly, the method obtains a text vector based on training of other tasks, then the text vector is spliced with the text vector of a target task, and then a classifier of the task is trained, wherein errors of other tasks may bring negative effects to the task; secondly, if other tasks are irrelevant to the current task, a large amount of irrelevant external information is introduced, and the classification result may be disturbed.

Therefore, how to solve the problems of semantic missing, fault tolerance, ambiguity and generalization in the intent recognition becomes one of the technical problems to be solved urgently by those skilled in the art.

Disclosure of Invention

In view of this, embodiments of the present invention provide a training method, a training device, and a recognition device for intent recognition based on multi-task learning, so as to solve the problems of model generalization ability, ambiguity, and semantic fault tolerance during intent recognition.

In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:

a training method of intent recognition, comprising:

mapping the corpora in the corpus text to a semantic space to obtain a low-dimensional dense vector corresponding to the corpora, and recording the low-dimensional dense vector as a corpus text vector;

constructing a joint loss function formula of the training model, wherein the training model comprises an intention recognition model, a similar short text generation model and an entity recognition model;

acquiring training data for the training model;

segmenting the training data, mapping the processed training data to corresponding corpus text vectors, and recording the mapped training data as training vectors;

inputting the training vector into the training model, and calculating loss function values of the intention recognition model, the similar short text generation model and the entity recognition model based on a prediction result output by the training model and a real result corresponding to the training vector;

substituting the loss function values of the models into the joint loss function formula to obtain joint loss function values;

and judging whether the joint loss function value is smaller than a set value, if not, adjusting training parameters of the intention recognition model, the similar short text generation model and the entity recognition model to reduce the loss function value of the models, continuing iterative training, and if so, finishing the training.

Preferably, in the training method for intention recognition, the constructing a joint loss function formula of the training model includes:

constructing a joint loss function formula of an intention recognition model, a similar short text generation model and an entity recognition model: loss _ total ═ α loss_{intent_recognition}+β*loss_{sim_query_generation}+γ*loss_{entity_recognition}Wherein α, β and γ are predetermined loss weighting factors, and the loss_{intent_recognition}Loss function for the intent recognition model, the loss_{entity_recognition}Loss function for entity recognition model, loss_{sim_query_generation}A loss function of the model is generated for similar text.

Preferably, in the above training method for intention recognition, the mapping corpora in the corpus text to a semantic space to obtain a low-dimensional dense vector corresponding to the corpora, includes:

segmenting the corresponding granularity of a word level, a word level or a pinyin level of the corpus in the corpus database;

training the segmented text based on a neural network model, and expressing the segmented text as a low-dimensional dense vector, wherein the low-dimensional dense vector comprises a word vector, a word vector or a pinyin vector.

Preferably, in the above training method for intention recognition, the training method for intention recognition, in which an intention recognition model is obtained by modeling an intention recognition task, includes:

and modeling aiming at the intention identification task by adopting an LSTM model to obtain an intention identification model, wherein the input of the intention identification model is query, and the output of the intention identification model is an intention category label.

Preferably, in the training method for intention recognition, modeling is performed on the similar short text to obtain a similar short text generation model, and the method includes:

and establishing a similar short text generation model aiming at the similar short text by adopting a Seq2Seq model, wherein the similar short text generation model is input by a user to input a query and output as the similar short text.

Preferably, in the training method for intention recognition, modeling an entity recognition task to obtain an entity recognition task model includes:

constructing a multi-classification model based on a convolutional neural network, and training an entity recognition task based on training data by using the multi-classification model as an entity recognition task model, wherein the training data is as follows: context text containing entities contained in the text to be recognized, the model input is short text containing entities, and the output is entity type tags.

An intent recognition training apparatus comprising:

the corpus vector training unit is used for mapping the corpus in the corpus text to a semantic space to obtain a low-dimensional dense vector corresponding to the corpus and marking the low-dimensional dense vector as a corpus text vector;

the model storage unit is used for storing an intention recognition model, a similar short text generation model and an entity recognition model;

the combined loss function formula storage unit is used for storing a combined loss function formula of a training model, and the training model comprises an intention recognition model, a similar short text generation model and an entity recognition model;

a training data acquisition unit for acquiring training data for the training model;

the training vector acquisition unit is used for segmenting the training data, mapping the processed training data to corresponding corpus text vectors and recording the mapped corpus text vectors as training vectors;

a loss function value calculation unit, configured to input the training vector to the training model, calculate a loss function value of the intention recognition model, the similar short text generation model, and the entity recognition model based on a prediction result output by the training model and a real result corresponding to the training vector, and bring the loss function value of each model into the joint loss function formula to obtain a joint loss function value;

and the parameter adjusting unit is used for judging whether the joint loss function value is smaller than a set value, if not, adjusting the training parameters of the intention recognition model, the similar short text generation model and the entity recognition model so as to reduce the loss function value of the models, continuing iterative training, and if so, finishing the training.

Preferably, in the training apparatus for intention recognition, the joint loss function formula is:

loss_total＝α*loss_{intent_recognition}+β*loss_{sim_query_generation}+γ*loss_{entity_recognition}wherein α, β and γ are predetermined loss weighting factors, and the loss_{intent_recognition}Loss function for the intent recognition model, the loss_{entity_recognition}Loss function for entity recognition model, loss_{sim_query_generation}A loss function of the model is generated for similar text.

Preferably, in the training apparatus for intention recognition, the corpus vector training unit is specifically configured to:

Preferably, in the intention recognition training device, the intention recognition model stored in the model storage means is:

modeling aiming at an intention recognition task by adopting an LSTM model to obtain an intention recognition model, wherein the input of the intention recognition model is a query, and the output of the intention recognition model is an intention category label;

the similar short text generation model stored in the model storage unit is as follows:

establishing a similar short text generation model aiming at the similar short text by adopting a Seq2Seq model, wherein the similar short text generation model is input by a user to input a query and output as the similar short text;

the entity recognition task model stored in the model storage unit is as follows:

the multi-classification model is constructed based on a convolutional neural network, and training data of the multi-classification model are as follows: context text containing entities contained in the text to be recognized, the model input is short text containing entities, and the output is entity type tags.

An intent recognition device comprising a memory and a processor;

the memory is stored with an intention recognition model, a similar short text generation model and an entity recognition model obtained by training with any intention recognition training method, and the processor is used for calling and executing the intention recognition model, the similar short text generation model and the entity recognition model when a query of a user is obtained.

Based on the technical scheme, the scheme provided by the embodiment of the invention can process the input text by acquiring the intention recognition model, the similar short text generation model and the entity recognition model based on a multi-task learning mode, can more effectively learn related language knowledge, improves the generalization capability of the intention recognition model, and can solve the problem of semantic ambiguity existing in intention recognition as far as possible.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic flowchart of a training method for intention recognition disclosed in an embodiment of the present application;

FIG. 2 is an example of building an intent recognition model based on an LSTM model;

FIG. 3 is a schematic diagram of an intention recognition training apparatus provided in the present application;

fig. 4 is a schematic structural diagram of an intention identifying apparatus disclosed in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Aiming at the problem that ambiguity often occurs when the input text is subjected to intent recognition in the prior art, the application discloses a training method for intent recognition, which can include the following steps:

step S101: training the linguistic data in the expectation database to obtain a text vector corresponding to the linguistic data;

specifically, in this step, the corpora in the corpus text are mapped to a semantic space, so as to obtain low-dimensional dense vectors corresponding to the corpora, which are recorded as corpus text vectors, the low-dimensional dense vectors may be word vectors, and pinyin vectors, and during training, the adopted training model may adopt models such as word2 vec. The corpus database stores corpus such as historical text input information, click logs and voice recognition result information of a user, the corpus is in a text form, the text input information is input data input by the user through input equipment, the click logs refer to link contents clicked by the user, for example, the user searches, when a link is clicked and opened through a mouse, the clicked contents and corresponding query are stored in the click logs, the voice information is voice contents when the user searches through voice or performs other voice interaction, and the voice information can be converted into matched characters through a voice recognition technology and then is stored in the corpus database as voice corpus. Specifically, the corpus database may capture corresponding corpora from logs of each system in the electronic device through a preset data capture application, for example, the corpora in the corpus database may be obtained from mass short message data such as a search query log, a click log, a question and answer knowledge base title, a microblog, a voice conversation log and the like through the data capture application, and the captured historical input text information, click log information, voice information and the like are stored in the corpus database.

The method specifically comprises the following steps: firstly, a certain text (corpus) in a corpus database is subjected to granularity segmentation of a corresponding character level/word level/pinyin level, then training is carried out based on a neural network model, and the segmented text is expressed as a low-dimensional dense real number vector. For example, the text is "people in china", and in this case, "chinese/country/person/people", "china/people", "zhongguo/renmin" can be obtained by segmenting the text based on the words/pinyin. The word vector corresponding to the text is obtained in the scheme, so that the problem of unknown words is solved, and the pinyin vector corresponding to the text is obtained, so that the problem of error conversion compatible with voice recognition is solved. Certainly, in order to facilitate data storage and extraction, the corpus text vectors corresponding to each corpus can be stored in a matrix form, specifically, after the low-dimensional dense vectors corresponding to the corpuses are obtained, the low-dimensional dense vectors corresponding to the corpuses are combined based on a word2vec model, and the feature vector matrix corresponding to the text is obtained. For example, in the implementation, a word vector of 100 dimensions is selected, and after training is completed, the "chinese/national/people/civilian" obtains a corresponding word vector matrix of 4 × 100, each row of the matrix corresponds to a vector of a single character, and each column represents each dimension of the word vector, as follows:

"middle" [ -0.1111,0.111, … …,0.1233]

"nation" [0.5324,0.3678, … …, -0.1111]

"human" [ -0.1111,0.1915, … …,0.4996]

"Min" [ -0.1111,0.8113, … …,0.3267 ].

Step S102: constructing a joint loss function formula of the training model, wherein the training model comprises an intention recognition model, a similar short text generation model and an entity recognition model;

before constructing the joint loss function formula of the training model, modeling needs to be performed for each task to obtain the training model, specifically:

modeling is respectively carried out on the intention recognition task, the similar short text task and the entity recognition task to obtain an intention recognition model, a similar short text generation model and an entity recognition model;

for the intention recognition task, when the intention recognition task is trained, training data are query input by a user and an intention category label corresponding to the query, after the query is subjected to word segmentation or word segmentation, each word or word is mapped to a corresponding vector based on the training result of step S101, then each word or word is mapped to all corresponding vectors to be averaged, a vector corresponding to the query is obtained and is marked as V1, then entity recognition is performed on the query, and whether the query matches a specific pattern in a preset regular pattern table is checked, for example: "play laojiu gate", laojiu gate is identified as album by the regular pattern table of presetting, match the mode in the regular pattern table: and playing album, and obtaining a corresponding K-dimensional vector V2 by the query, wherein K is the number of the modes in the regular mode table, if the ith dimension value is 1, the query matches the ith mode in the regular mode table, and if the ith dimension value is 0, the query does not match the modes in the regular mode table. V1 and V2 are then used as inputs to the preset neural network model. In this embodiment, the structure of the neural network includes an input layer, an lstm layer, a dense layer, and a skip layer. See the example of fig. 2 in particular.

In this embodiment, for example, for a VIDEO search engine, if the input query is "PLAY old nine doors", the output intention category label is "PLAY _ VIDEO", the input query is "DOWNLOAD royal glory", the output intention category label is "DOWNLOAD _ GAME", the input query is "member charge", and the output intention label is "PURCHASE".

For the similar short text generation task, the training data is a similar short text pair, and the training data specifically comprises the following three types: the method comprises the steps of firstly, querying query which clicks the same document with the query, secondly, querying query of the same session, thirdly, querying query and title clicked by the query. When modeling is carried out on a similar short text generation task, modeling is carried out on the similar short text based on a Seq2Seq model to obtain a similar short text generation model, the input of the similar short text generation model is a query of a user, and the output of the similar short text generation model is a short text similar to the query.

The goal of the Seq2Seq model is to output a sequence Y based on a given sequence X, and the loss function value is calculated using cross entropy. The Seq2Seq model consists of an encoder and a decoder, wherein the encoder neural network converts an input sequence into a vector with a fixed length, and the decoder neural network generates a new sequence according to the vector generated by the encoder neural network. During model training, vectors of the encoder neural network are shared with other tasks. When predicting text, for example: the inputs to the model are: "how to buy the member", the encoder neural network inputs the text segment into the network in sequence word by word or word by word, converts it into a vector of fixed length, and the decoder neural network converts the vector of fixed length into a new text output, for example, outputs "how to recharge the member".

For an entity recognition task, a multi-classification model is constructed based on a convolutional neural network to serve as an entity recognition model, the multi-classification model is adopted to train the entity recognition task based on training data, and the training data of the entity recognition model is as follows: the context text containing the entity, the model input as short text containing the entity, and the output as entity type tag. The specific sources of training data may be: the Internet such as encyclopedic and the like discloses corpus resources or corpus data constructed by manual labeling of application related personnel.

In this embodiment, the convolutional neural network architecture includes the following layers: the system comprises a vector search layer, a linear conversion layer 1, a Sigmoid layer, a linear conversion layer 2 and a whole sentence analysis layer. When the convolutional neural network is trained, the neural network model is trained by adopting a cross entropy loss function value based on a random gradient descent algorithm. When the entity type label of the input text is predicted, for example, for the input of 'Beijing City', at the time t, the Chinese character to be processed of the system is 'north', the 'north' is mapped into a real number vector through a vector searching layer and transmitted to a linear conversion layer, after the processing of the linear layer and a sigmoid layer, the system scores all possible marks corresponding to the 'north', the mark with the maximum labeling probability is marked, and the higher the score is, the higher the marking probability is. At the next time t +1, the system then processes the next Chinese character "Jing" in the sentence. And at a sentence analysis layer, generating a score network aiming at the processed text, wherein the nodes in the t-th column are scores of all entity type labels corresponding to Chinese characters to be processed at the time t, the connection line identification transition probability between the nodes in the t +1 column and the nodes in the t column is used for describing the conversion possibility among the labels, and finally, based on a viterbi algorithm, finding a path with the highest overall score in the network to be used as a final mark sequence. For example, the entity type label path corresponding to "Beijing City" is "B-LOC I-LOC I-LOC".

After the training models are built, the loss function value of each training model is built, wherein the loss function value of the intention recognition model is loss_{intent_recognition}The loss function value of the entity recognition model is loss_{entity_recognition}Similar textThe value of the loss function of the generative model is

The loss_{intent_recognition}、loss_{entity_recognition}And loss_{sim_query_generation}The value of (a) can be obtained by a mean square error method or a cross entropy method when training the training model. A specific loss weight factor is given to each loss function value, and the sum of the three loss function values after the loss weight factor is given is used as a joint loss function value of the intention recognition model, the similar short text generation model and the entity recognition model, namely, through a joint loss function formula:

loss_total＝α*loss_{intent_recognition}+β*loss_{sim_query_generation}+γ*loss_{entity_recognition}calculating to obtain a combined loss function value loss _ total, wherein alpha, beta and gamma are loss respectively_{intent_recognition}、loss_{entity_recognition}And loss_{sim_query_generation}The values of the alpha, the beta and the gamma are between (0, 1), and the specific values can be adjusted according to the requirements of users. That is, when x is a user input and the loss function of each of the three models is f1(x), which corresponds to loss_{intent_recognition}The loss function of a similar short text model is f2(x), which is equivalent to loss_{entity_recognition}The loss function of the entity recognition model is f3(x), which corresponds to loss_{sim_query_generation}The combined loss function is then a linear weighting of 3 loss functions, a x f1(x) + c x f2(x) + b x f3(x), where a corresponds to α, b corresponds to β, and c corresponds to γ.

Step S103: acquiring training data for the training model;

wherein the training data comprises training data for an intent recognition model, a similar short text generation model, and an entity recognition model;

in this step, the training data is some data in a preset training data set, specific data content of the training data can be selected according to user requirements, the training data is used as an input text of each model, and an actual similar short text and a category value corresponding to each training data are known.

Step S104: carrying out segmentation processing on the training data;

the method specifically comprises the following steps: performing word segmentation, word segmentation or word and word mixed segmentation processing on the training data of the intention recognition model, the similar short text generation model and the entity recognition model;

step S105: mapping the processed training data to corresponding corpus text vectors, and recording the training vectors as training vectors;

performing word segmentation or word mixing segmentation on the input text of each model, when the input text is a Chinese text, selecting a word segmentation model based on hidden Markov, characters or a conditional random field to segment the input text, when the input text is an English text, selecting punctuation and blank space as separators to segment, and if the implementation mode selects word segmentation, the input text is: "i wants to see old nine gates", the input text is segmented into "i/want/see/old nine gates". If the input text is 'I way to play game', the input text is segmented into 'I/way/to/play/game'; if the embodiment selects the segmentation, the segmentation may be performed according to characters, for example, the input text "i want to see all ages and nine doors" may be segmented into "i/want/see/old/nine/doors", and in addition, according to application requirements, the word-mixing segmentation may also be selected, for example, the input text is segmented according to a character segmentation mode for chinese in the input text, and the input text is segmented according to a word segmentation mode for english in the input text, such as: "i/want/see/billons".

After the training data input by each model is segmented, the segmented training data is mapped to the corpus text vector obtained in step S101, and the specific process is as follows: obtaining a corpus database text vector corresponding to a corpus which is the same as the training data in the corpus database, wherein the corpus database text vector is a training vector corresponding to the training data and can be a word vector, a word vector or a pinyin vector. The three models may share a set of word/term/pinyin vectors.

Step S106: inputting the training vector into the training model, and calculating loss function values of the intention recognition model, the similar short text generation model and the entity recognition model based on a prediction result output by the training model and a real result corresponding to the training vector;

in this step, when each model is trained, each task is cycled, a training vector corresponding to a part of training data is randomly extracted as a model input, and a loss function value of each model is calculated according to an output result of the model and a real result corresponding to the training data.

Step S107: calculating to obtain a joint loss function value based on the joint loss function formula and the loss function values of the models;

in this step, each loss function value calculated in step S107 is substituted into the joint loss function formula, so that the joint loss function value can be obtained.

Step S108: judging whether the joint loss function value is smaller than a set value, if so, finishing training, and if not, executing the step S109;

step S109: and updating parameters of the intention recognition model, the similar short text generation model and the entity recognition model based on a back propagation algorithm, synchronously updating the input vector shared by the three models as a model parameter, and continuously executing the step S103 for iterative training.

In the scheme, the input text is processed in a multi-task learning mode, so that the related language knowledge can be more effectively learned, the generalization capability of the intention recognition model is improved, the problems of semantic missing, fault tolerance, ambiguity, generalization and the like in intention recognition are solved as far as possible, and particularly, the parameters of the intention recognition model, the similar short text generation model and the entity recognition model can be updated based on a back propagation algorithm.

Corresponding to the above method, the present application also discloses an intention recognition training device, see fig. 3, which includes:

the corpus vector training unit 01 is used for mapping the corpus in the corpus text to a semantic space to obtain a low-dimensional dense vector corresponding to the corpus, and marking the low-dimensional dense vector as a corpus text vector;

the model storage unit 02 is used for storing an intention recognition model, a similar short text generation model and an entity recognition model;

the combined loss function formula storage unit 03 is used for storing a combined loss function formula of a training model, wherein the training model comprises an intention recognition model, a similar short text generation model and an entity recognition model;

a training data acquisition unit 04 for acquiring training data for the training model;

the training vector acquisition unit 05 is used for segmenting the training data, mapping the processed training data to corresponding corpus text vectors and recording the mapped corpus text vectors as training vectors;

a loss function value calculation unit 06, configured to input the training vector to the training model, calculate a loss function value of the intention recognition model, the similar short text generation model, and the entity recognition model based on a prediction result output by the training model and a real result corresponding to the training vector, and bring the loss function value of each model into the joint loss function formula to obtain a joint loss function value;

and the parameter adjusting unit 07 is used for judging whether the joint loss function value is smaller than a set value, if not, adjusting the training parameters of the intention recognition model, the similar short text generation model and the entity recognition model so as to reduce the loss function value of the models, continuing iterative training, and if so, finishing the training.

Corresponding to the method, the corpus database stores historical text input information, click logs and voice information of users, and can train corpus text vectors and construct training data sets of three models based on the information.

Corresponding to the above method, the corpus vector training unit is specifically configured to, when training a vector:

Corresponding to the above method, the intention recognition model stored in the model storage unit is:

Corresponding to the above training method for intention recognition, the present application also discloses an intention recognition device, and referring to fig. 4, the intention recognition device disclosed in the embodiments of the present application is schematically configured, and the device may include:

a memory 100 and a processor 200;

the intention recognition device further comprises a communication interface 300 and a communication bus 400, wherein the memory 100, the processor 200 and the communication interface 300 are all in communication with each other via the communication bus 400.

The memory 100 is used for storing program codes; the program code includes computer operational instructions. Specifically, the memory stores program codes of an intention recognition model, a similar short text generation model, and an entity recognition model obtained by training the intention recognition training method disclosed in any one of the above embodiments of the present application.

Memory 100 may comprise high-speed RAM memory and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The processor 200 may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present invention. The processor 200 is used to call and execute the program code. Specifically, the processor is configured to invoke and execute the intention recognition model, the similar short text generation model, and the entity recognition model when the query of the user is obtained.

For convenience of description, the above system is described with the functions divided into various modules, which are described separately. Of course, the functionality of the various modules may be implemented in the same one or more software and/or hardware implementations as the present application.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A training method for intention recognition, comprising:

mapping the linguistic data in the linguistic data library text to a semantic space to obtain a low-dimensional dense vector corresponding to the linguistic data, and marking the low-dimensional dense vector as a linguistic data library text vector, wherein the linguistic data comprises historical text input information, a click log and voice recognition result information of a user;

constructing a joint loss function formula of a training model, wherein the training model comprises an intention recognition model, a similar short text generation model and an entity recognition model;

acquiring training data for the training model;

inputting the training vector into the training model, and calculating loss function values of the intention recognition model, the similar short text generation model and the entity recognition model based on a prediction result output by the training model and a real result corresponding to the training vector; wherein the inputting the training vector to the training model comprises: circularly traversing each model included in the training model, randomly extracting a training vector corresponding to a part of training data as model input, and inputting the training vector into the model;

2. The training method for intention recognition according to claim 1, wherein constructing the joint loss function formula of the training model comprises:

3. The method for training intent recognition according to claim 1, wherein the mapping corpora in the corpus text to semantic space to obtain the low-dimensional dense vector corresponding to the corpora comprises:

4. The training method for intention recognition according to claim 1, wherein modeling for the intention recognition task results in an intention recognition model, comprising:

5. The training method for intention recognition according to claim 1, wherein modeling the similar short text to obtain a similar short text generation model comprises:

6. The training method for intention recognition according to claim 1, wherein modeling for the entity recognition task to obtain an entity recognition task model comprises:

7. An intent recognition training device, comprising:

the corpus vector training unit is used for mapping the corpus in the corpus database text to a semantic space to obtain a low-dimensional dense vector corresponding to the corpus and marking the low-dimensional dense vector as a corpus database text vector, wherein the corpus comprises historical text input information, a click log and voice recognition result information of a user;

a loss function value calculation unit, configured to input the training vector to the training model, calculate a loss function value of the intention recognition model, the similar short text generation model, and the entity recognition model based on a prediction result output by the training model and a real result corresponding to the training vector, and bring the loss function value of each model into the joint loss function formula to obtain a joint loss function value; wherein the inputting the training vector to the training model comprises: circularly traversing each model included in the training model, randomly extracting a training vector corresponding to a part of training data as model input, and inputting the training vector into the model;

8. The training device for intention recognition of claim 7, wherein the joint loss function is formulated as:

9. The training device for intention recognition according to claim 7, wherein the corpus vector training unit is specifically configured to:

10. The training apparatus for intention recognition according to claim 7, wherein the intention recognition models stored in the model storage unit are:

11. An intent recognition device comprising a memory and a processor;

the memory stores an intention recognition model, a similar short text generation model and an entity recognition model which are obtained by training through the intention recognition training method according to any one of claims 1 to 6, and the processor is used for calling and executing the intention recognition model, the similar short text generation model and the entity recognition model when a query of a user is obtained.