CN108268442A

CN108268442A - A kind of sentence Intention Anticipation method and system

Info

Publication number: CN108268442A
Application number: CN201711378005.8A
Authority: CN
Inventors: 沈磊; 陈见耸; 朱鹏程
Original assignee: Yutou Technology Hangzhou Co Ltd
Current assignee: Yutou Technology Hangzhou Co Ltd
Priority date: 2017-12-19
Filing date: 2017-12-19
Publication date: 2018-07-10

Abstract

The invention discloses a kind of sentence Intention Anticipation method and system, belong to semantics recognition technical field；It specifically includes：To being split by the directive statement that user instruction is converted to as unit of word；Then word is sent into corresponding first recognition unit and is handled, obtains corresponding intermediate vector；It is converted to form the sentence for being associated with directive statement vector according to all intermediate vectors；Finally sentence vector is sent into one second recognition unit, is obtained with processing and the intent classifier of output order sentence is handled to obtain slot label as a result, intermediate vector is sent into corresponding first taxon simultaneously；The method of training intent classifier model is same as described above.The advantageous effect of above-mentioned technical proposal is：Slot is extracted and is integrated into the identification model of an entirety with intent classifier so that the two can be corrected mutually in training, joint effect and be optimized jointly, so as to promote the accuracy rate of actual prediction, and can accelerate the processing speed of prediction.

Description

Statement intention prediction method and system

Technical Field

The invention relates to the technical field of semantic recognition, in particular to a statement intention prediction method and system.

Background

With the popularization of Artificial Intelligence (Artificial Intelligence) concept and the continuous development of related scientific technologies, more and more electronic devices are beginning to develop towards Artificial Intelligence. The artificial intelligence is characterized in that the intelligent device can perform intelligent human-computer interaction with a user, such as interaction through voice or interaction through inputting characters, and in the interaction process, instructions input by the user no longer have a fixed format, but are input in a user-accustomed mode according to the user's own will, such as voice instructions spoken by the user in a spoken language, and the like. In such interaction processes, under the premise of domain determination, the intelligent device needs to understand the intention of the user and extract useful information from the input of the user for subsequent processing, that is, classifying the intention of the sentence input by the user and extracting the slot (i.e. the defined useful information) therein, which is a key step for processing the instruction input by the user.

In the prior art, independent processing is respectively carried out on slot extraction and intention classification, namely, mutually independent recognition models are respectively adopted for processing in the process of processing user instructions. The drawback of this approach is that in the process of training the intent classification model and using the two models to process the instructions, the mispredictions obtained by the intent classification model cannot be returned to the slot extraction model and corrected, and thus the intent classification model and the slot extraction model cannot be optimized well. In this method, it is necessary to first perform slot extraction in the prediction process, and then perform intent classification according to the prediction result of the slot extraction, so that the input sentence needs to be subjected to classification prediction twice, and the processing speed is slow.

Disclosure of Invention

According to the problems in the prior art, a technical scheme of a sentence intent prediction method and system is provided, which aims to integrate slot extraction and intent classification into an integral recognition model, and the slot extraction and the intent classification can be mutually corrected, jointly influenced and jointly optimized during training, so that the accuracy of actual prediction is improved, and the processing speed can be increased.

The technical scheme specifically comprises the following steps:

a statement intention prediction method is suitable for a human-computer interaction process; pre-training to form an intention prediction model, and applying the intention prediction model to a human-computer interaction process so as to carry out semantic intention classification on a user instruction;

the method for classifying the intention of the user instruction by adopting the intention prediction model specifically comprises the following steps:

step S1, dividing the instruction sentence in the character form converted according to the user instruction by taking the character as a unit;

step S2, sending each word obtained by segmentation into a corresponding first recognition unit, respectively outputting an intermediate vector corresponding to each word after processing, and then executing step S3a and step S3b, respectively;

step S3a, the intermediate vector corresponding to each word is respectively sent to the corresponding first classification unit, the slot label corresponding to each word is respectively output after processing, the slot label is used as the slot prediction result of the instruction statement, and then the exit is carried out;

step S3b, converting all the intermediate vectors associated with the instruction sentence to form a sentence vector associated with the instruction sentence, and then turning to step S4;

step S4, sending the sentence vector into a second recognition unit to obtain and output the intention classification result of the instruction sentence;

in the process of training in advance to form the intention prediction model, the steps S1-S4 are also executed, and training is carried out according to a large number of preset training sentences;

the intention prediction model comprises an intention classification model used for predicting the user instruction and outputting the intention classification result, and a groove extraction model used for predicting the user instruction and outputting the groove prediction result.

Preferably, in the sentence intent prediction method, in step S2, the first recognition unit processes and outputs the corresponding intermediate vector through a deep neural network according to the input word.

Preferably, the sentence is intended to predict the method, wherein the deep neural network is a recurrent neural network, or a gated recurrent unit neural network, or a long-term memory neural network.

Preferably, the sentence intent prediction method, wherein the first classification unit is implemented using a classification network.

Preferably, in the sentence intent prediction method, in step S3b, all the intermediate vectors associated with the instruction sentence form a vector matrix;

in step S3b, the vector matrix is converted into the corresponding sentence vector by using a max-posing layer, a mean-posing layer, or an attention layer.

the last row vector in the output vector matrix is obtained as the sentence vector in step S3 b.

Preferably, the sentence intent prediction method, wherein the second recognition unit is implemented using a classification network.

A statement intention prediction system is suitable for a human-computer interaction process; the method comprises the following steps of training in advance to form an intention prediction model, applying the intention prediction model to a human-computer interaction process to carry out semantic intention classification on a user instruction, and further comprising the following steps:

a dividing unit, configured to divide the instruction sentence in the text form converted according to the user instruction by taking a word as a unit;

the first recognition unit comprises a plurality of first recognition modules, each first recognition module is connected to the segmentation unit and corresponds to each word obtained by segmentation one to one, and each first recognition module processes the word according to the corresponding word and outputs a middle vector of the corresponding word;

the conversion unit is respectively connected with each first identification module and used for converting all the intermediate vectors which are output by all the first identification units and are related to the instruction sentence into sentence vectors which are related to the instruction sentence and outputting the sentence vectors;

the second identification unit is connected with the conversion unit and used for obtaining and outputting the intention classification result of the instruction statement according to the sentence vector;

the first classification units are connected with the first identification modules in a one-to-one correspondence manner, and are used for acquiring and processing the intermediate vectors output by the corresponding first identification units, and then outputting the slot labels of each word respectively to serve as the slot prediction results of the instruction statements;

training according to a large number of preset training sentences by adopting the sentence intention prediction system to obtain the intention prediction model;

The beneficial effects of the above technical scheme are:

1) the sentence intention prediction method can integrate groove extraction and intention classification into an integral recognition model, so that the groove extraction and the intention classification can be mutually corrected, jointly influenced and jointly optimized during training, the accuracy of actual prediction is improved, and the processing speed can be increased.

2) Provided is a sentence intent prediction system capable of implementing the sentence intent prediction method.

Drawings

FIG. 1 is a schematic diagram of a prior art tank extraction model;

FIG. 2 is a diagram illustrating the structure of an intention classification model in the prior art;

FIG. 3 is a schematic flow chart of a sentence intent prediction method according to a preferred embodiment of the present invention;

FIG. 4 is a block diagram illustrating an overall structure of a sentence intent prediction system according to a preferred embodiment of the present invention;

FIG. 5 is a schematic diagram of an intent prediction model that integrates slot extraction and intent classification according to a preferred embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

The invention is further described with reference to the following drawings and specific examples, which are not intended to be limiting.

As shown in fig. 1, in a prior art, during training, an input sentence needs to be segmented into words, and each word is sent to a deep Neural Network related to slot extraction, where the deep Neural Network is a Neural Network that identifies a slot in the sentence, and may be implemented by using, for example, RNN (Recurrent Neural Network) or RNN + CRF (Conditional Random Field), and the like, in fig. 1, the input sentence segmented by taking the word as a unit is processed by using RNN, and an output of the RNN is classified by using a classification Network and then output as a prediction result of slot extraction. In the prediction result, one tag is predicted for each word in the sentence. In general, the labels of the slots can be classified into the following: the beginning, middle, and end of a slot, and others (not part of a slot). Slots that are extractable, for example, in the field of music include: singer name, song name, and song style, the tags for slots within the music domain may include: the beginning of the name of the singer (B-singer), the middle of the name of the singer (M-singer), the end of the name of the singer (E-singer), the beginning of the name of the song (B-song), the middle of the name of the song (M-song), the end of the name of the song (E-song), the beginning of the style of the song (B-style), the middle of the style of the song (M-style), the end of the style of the song (E-style) and other sections (O) not belonging to the slot. Then for a sentence (i want to hear the encounter of grandchild posture), the prediction result obtained after slot extraction should be "O O O B-singer M-singer E-singer O B-song E-song". The RNN network of the slot extraction model of fig. 1 employs bi-directional RNNs, and is therefore represented by bi-directional arrows. Classification networks typically use a single layer neural network.

As shown in fig. 2, after the slot extraction, the sentence with the slot extraction result is input into the intention classification model, and is output into the classification network after the deep neural network processing, and finally, the prediction result of the intention classification of the sentence can be obtained.

In the process, the training process of groove extraction and intention classification is completely split, so that the groove extraction and the intention classification cannot be influenced mutually, and the result of the intention classification performed according to the error prediction result of the groove extraction cannot be fed back to the groove extraction model to correct the groove extraction model, so that the model training may have larger deviation. In addition, in the actual prediction process, since the prediction process is the same as the training process, one sentence needs to be extracted through the slot and then classified according to intentions, and one sentence needs to be classified and predicted twice, so that the processing speed is low.

In a preferred embodiment of the present invention, based on the above-mentioned technical problems, a sentence intent prediction method is provided, which is suitable for use in a human-computer interaction process, and is further suitable for use in a human-computer interaction process of an intelligent device, such as an intelligent sound box or an intelligent robot. In the sentence intent prediction method, an intent prediction model is formed by pre-training, and the intent prediction model is applied to a human-computer interaction process to perform semantic intent classification on a user instruction, and the steps are specifically shown in fig. 3, and include:

step S1, dividing the instruction sentence in character form converted according to the user instruction by taking character as unit;

step S3a, respectively sending the intermediate vector corresponding to each word into the corresponding first classification unit, respectively outputting the slot label corresponding to each word after processing, taking the slot label as the slot prediction result of the instruction statement, and then exiting;

step S3b, converting all intermediate vectors associated with the instruction sentence to form sentence vectors associated with the instruction sentence, and then turning to step S4;

the intention prediction model comprises an intention classification model used for predicting the user instruction and outputting an intention classification result and a groove extraction model used for predicting the user instruction and outputting a groove prediction result.

Specifically, in this embodiment, for the instruction input by the user, if the instruction is a voice instruction, it needs to be converted into a corresponding instruction statement in a text form by using the same voice conversion method as that in the prior art, and if the instruction is a text instruction, the instruction statement is directly used as the instruction statement. Subsequently, in the same way as in the prior art, the instruction sentence is divided in units of words, and each word is sent to a corresponding first recognition unit for processing, wherein the first recognition unit is used for recognizing and obtaining the intermediate vector. Each first recognition unit outputs the intermediate vector of the corresponding word after processing, and the integration of the outputs of all the first recognition units is the vector matrix with the intermediate vector of the corresponding word. The vector matrix is then fed into a conversion unit to convert the instruction sentence into a corresponding sentence vector and output to a second recognition unit. The second identification unit is used for identifying and obtaining the prediction result of the intention classification.

Meanwhile, in the present embodiment, in the step S3a, the predicted result of the slot extraction, that is, the slot extraction model is completed, may be obtained through a first classification unit connected to each first recognition unit. Specifically, the intermediate vector of each word is respectively sent to the corresponding first classification unit for classification processing, and finally, each first classification unit outputs the slot label of each word, namely, the slot prediction result extracted by the slot is output. Meanwhile, in the actual prediction process, the prediction result of the intended classification can be obtained and the prediction result of the slot extraction can be obtained according to the process.

Therefore, in the intention prediction model, in prediction, in the forward calculation process, the slot extraction can influence the intention classification; during model training, intent classification may adversely affect slot extraction. I.e. the fraction of the slot extraction and the fraction intended to be sorted will interact with each other to achieve co-optimization.

Accordingly, the training process of the intention prediction model can also be realized by performing the above steps S1-S4. Specifically, a large number of training sentences are set in advance, and then each training sentence is processed according to the above steps S1 to S4: for a training sentence, firstly, dividing the training sentence by taking a character as a division unit, and then respectively sending each character into a corresponding first recognition unit for training so as to output a middle vector corresponding to the character; converting the vector matrix comprising all the intermediate vectors to obtain sentence vectors corresponding to the training sentences; and finally, the sentence vector is sent to a second recognition unit for training, and the intention classification result of the training sentence is finally obtained. The steps are carried out on all training sentences, so that a complete intention prediction model can be trained. In the intention prediction model, the parts extracted by the grooves and the parts classified by the intention can mutually correct and influence each other during training, thereby realizing common optimization.

Based on the fact that the prediction process and the training process are completely the same, the process of processing the instruction sentence by using the intention prediction model in the actual prediction process is described hereinafter, that is, the process of training the intention prediction model is described. Therefore, hereinafter, the prediction process and the training process are not particularly distinguished.

In a preferred embodiment of the present invention, the first recognition unit processes and outputs a corresponding intermediate vector through a deep neural network according to the input word. Further, the deep Neural Network may be a Recurrent Neural Network (RNN), a Convolutional Neural Network (CNN), or a variant of RNN such as a GRU (Gated Neural Network) Neural Network, or an LSTM (Long Short Term Memory) Neural Network. The deep neural network can be realized by adopting a unidirectional neural network or a bidirectional neural network. For the purpose of description in the present invention, the first identifying unit is defined as being processed by using the RNN network.

In a preferred embodiment of the present invention, the first classifying unit is implemented by using a classification network, and the classification network is usually a single-layer neural network.

In a preferred embodiment of the present invention, in the step S3b, all intermediate vectors associated with the instruction statement form a vector matrix;

then in step S3b, the vector matrix is converted into corresponding sentence vectors using a max-posing layer, or a mean-posing layer, or an attention layer.

Specifically, the method comprises the following steps:

1) the method for obtaining the max-posing layer of sentence vectors from the vector matrix of the intermediate vectors output from the first recognition unit comprises the following steps:

wherein,

s^maxthe output used for representing the max-posing layer, namely the sentence vector formed by conversion;

the vector matrix output by the first identification unit is h, the subscript I represents the ith vector output by the first identification unit, the subscript j represents the jth column in the vector matrix output by the first identification unit, and I represents the number of rows in the vector matrix.

2) The method for obtaining the mean-posing layer of sentence vectors from the vector matrix of the intermediate vectors output from the first recognition unit comprises the following steps:

wherein,

s^meanthe output used to represent the mean-posing layer, i.e., the sentence vector formed by the conversion, and the definitions of the remaining variables are as described above.

3) The method for obtaining the attribute layer of the sentence vector from the vector matrix of the intermediate vector output from the first recognition unit comprises the following steps:

wherein,

s^attnoutput for representing an attention layer;

j represents the maximum value that subscript J can take;

U_ais a parameter matrix;

V_ais a parameter vector, V_a ^TIs a transposed matrix of the parameter vectors;

b is also a parameter vector.

U_a，V_aAnd b can both be obtained by training.

In a preferred embodiment of the present invention, in the step S3b, the last line vector in the output vector matrix is obtained as the sentence vector.

Specifically, in addition to the three optional conversion manners, the last row vector of the vector matrix output by the first identification unit may be selected, that is, the last hidden layer vector (last hidden state) of the RNN network is taken as a sentence vector, and the purpose of converting the sentence vector may also be achieved.

In a preferred embodiment of the present invention, the second identification unit may also be implemented by using a classification network, and the classification network may also be implemented by using a single-layer neural network.

In summary, in the technical solution of the present invention, the two independent models are integrated by converting the sentence with the intermediate vector output by the slot extraction model into the corresponding sentence vector and inputting the sentence vector as the intention prediction model, so that the slot extraction and the intention classification are performed in the same recognition model, and the slot extraction and the intention classification can be influenced and corrected with each other in the training process, thereby achieving the effect of jointly optimizing the slot extraction and the intention classification. In the actual prediction intention classification process, the input sentences can obtain the final prediction result without two classification prediction processes, so that the processing speed is greatly improved.

In the preferred embodiment of the present invention, based on the sentence intent prediction method described above, a sentence intent prediction system is now provided, which is also applicable to human-computer interaction processes.

In the statement intention prediction system, an intention prediction model is also formed by pre-training, and the intention prediction model is applied to a human-computer interaction process so as to carry out semantic intention classification on user instructions.

The sentence intent prediction system is specifically shown in fig. 4, and includes:

a dividing unit 1 for dividing an instruction sentence in a character form converted according to a user instruction in units of characters;

the first recognition unit 2, the first recognition unit 2 includes a plurality of first recognition modules 21, each first recognition module 21 is connected to the segmentation unit 1, each first recognition module 21 corresponds to each word obtained by segmentation one to one, each first recognition module 21 processes according to the corresponding word and outputs the intermediate vector of the corresponding word; each of the first identification modules 21 has a certain relationship with each other, and an identification sequence can be formed by the identification results of all the first identification modules 21;

the conversion unit 3 is respectively connected with each first identification unit 21 and is used for converting all intermediate vectors which are output by all the first identification units and are related to the instruction sentence into sentence vectors which are related to the instruction sentence and outputting the sentence vectors;

the second identification unit 4 is connected with the conversion unit 3 and used for obtaining an intention classification result of the instruction sentence according to the sentence vector;

the first classification units are connected with the first identification modules in a one-to-one correspondence mode and used for acquiring and processing intermediate vectors output by the corresponding first identification units and then respectively outputting the slot labels of all the words to serve as slot prediction results of the instruction statements;

training according to a large number of preset training sentences by adopting a sentence intention prediction system to obtain an intention prediction model;

In this embodiment, in the process of obtaining the intent prediction model through the pre-training, a large number of training sentences may be prepared in advance, and the training sentences may be sent to the sentence intent prediction system to obtain a corresponding intent prediction model through the processing training of the sentence intent prediction system, where the intent prediction model includes a slot extraction model and an intent classification model, and the finally formed intent prediction model is as shown in fig. 5.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims

1. A statement intention prediction method is suitable for a human-computer interaction process; the method is characterized in that an intention prediction model is formed by pre-training, and the intention prediction model is applied to a man-machine interaction process so as to carry out semantic intention classification on a user instruction;

2. The sentence intent prediction method of claim 1 wherein in step S2, the first recognition unit processes and outputs the corresponding intermediate vector through a deep neural network based on the input word.

3. The sentence intent prediction method of claim 2 wherein the deep neural network is a recurrent neural network, or a gated recurrent unit neural network, or a long-term memory neural network.

4. The sentence intent prediction method of claim 1 wherein the first classification unit is implemented using a classification network.

5. The sentence intent prediction method of claim 1 wherein in step S3b, all the intermediate vectors associated with the instruction sentence form a vector matrix;

6. The sentence intent prediction method of claim 2 wherein in step S3b, all the intermediate vectors associated with the instruction sentence form a vector matrix;

7. The sentence intent prediction method of claim 1 wherein the second recognition unit is implemented using a classification network.

8. A statement intention prediction system is suitable for a human-computer interaction process; the method is characterized in that an intention prediction model is formed by pre-training and is applied to a human-computer interaction process so as to carry out semantic intention classification on a user instruction, and the method also comprises the following steps: