CN114139551A

CN114139551A - Method and device for training intention recognition model and method and device for recognizing intention

Info

Publication number: CN114139551A
Application number: CN202111273471.6A
Authority: CN
Inventors: 陈东; 鲁威; 宫学谦; 赵云; 孙迁
Original assignee: SuningCom Co ltd
Current assignee: SuningCom Co ltd
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2022-03-04
Also published as: CA3180493A1

Abstract

The application relates to a training method and a device of an intention recognition model and a method and a device of intention recognition, wherein the training method of the intention recognition model comprises the following steps: the method comprises the steps of obtaining historical text data input by a user through a man-machine conversation system, enabling each historical text data to have an intention label representing the intention of the user, inputting each historical text data into an initial recognition model, extracting semantic features of the historical text data through the initial recognition model, fusing the semantic features to obtain a multi-dimensional semantic feature corresponding to the intention label, training the initial recognition model according to the multi-dimensional semantic feature and the intention label corresponding to the multi-dimensional semantic feature to obtain an intention recognition model, and obtaining the semantic features of the text data from multiple dimensions, so that the intention recognition model can accurately recognize the intention of the user.

Description

Method and device for training intention recognition model and method and device for recognizing intention

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for training an intention recognition model, and a method and an apparatus for intention recognition.

Background

With the rapid accumulation of data, the great improvement of computer operational capability, the continuous evolution of algorithm models and the rapid rise of industrial applications, the development environment of artificial intelligence is also changed greatly. Meanwhile, service enterprises such as banks, insurance enterprises and operators fully realize the importance of the service and actively transfer the service to intelligent service, so that better service experience is provided for users, and the satisfaction degree of the users is improved.

In an intelligent human-machine interaction system, the fact that the intention of a user can be accurately recognized is of great significance for understanding problems posed by the user and desired help. At present, rule matching, feature-based machine learning and deep learning-based models are mainly used for user intention identification, but the existing network structure is difficult to process complicated and various input data of a user and cannot accurately identify the user intention.

Disclosure of Invention

In view of the above, it is necessary to provide an intention recognition model training method and apparatus, and an intention recognition method and apparatus, which can accurately recognize the intention of the user.

In a first aspect, a method for training an intent recognition model is provided, the method comprising:

acquiring historical text data input by a user through a man-machine conversation system, wherein each historical text data has an intention label representing the intention of the user;

inputting each historical text data into an initial recognition model;

extracting semantic features of the historical text data through an initial recognition model;

fusing the semantic features to obtain multi-dimensional semantic features corresponding to the intention labels;

and training the initial recognition model according to the multi-dimensional semantic features and the intention labels corresponding to the multi-dimensional semantic features to obtain an intention recognition model.

In one possible implementation mode, the initial identification model is constructed based on an Encoder-Decoder basic framework and comprises a full connection layer; the semantic features comprise depth semantic features and potential semantic features; fusing semantic features to obtain multi-dimensional semantic features corresponding to the intention labels, wherein the method comprises the following steps:

and fusing the depth semantic features and the potential semantic features through the full-connection layer to obtain the multi-dimensional semantic features corresponding to the intention labels.

In one possible implementation, the initial recognition model further includes a global word vector network, a BiGRU neural network, an Attention neural network, and a Softmax function layer; extracting semantic features of the historical text data through an initial recognition model, wherein the semantic features comprise:

converting the historical text data through a global word vector network to obtain a historical text data vector;

extracting a hidden semantic feature vector of the historical text through a BiGRU neural network according to the historical text data vector;

acquiring the Attention score of the hidden semantic feature vector through an Attention neural network;

according to the attention score of the hidden semantic feature vector, calculating an attention weight value of the hidden semantic feature vector through a Softmax function layer;

and extracting the depth semantic features according to the attention weight value of the hidden semantic feature vector and the hidden semantic feature vector.

In one possible implementation, the initial recognition model further includes a convolutional layer and a global maximum pooling layer of the convolutional neural network; extracting semantic features of the historical text data through an initial recognition model, wherein the semantic features comprise:

extracting the features of the hidden semantic feature vectors through the convolutional layers to obtain local semantic feature vectors;

and sampling the local semantic feature vectors through the global maximum pooling layer to obtain potential semantic features.

In one possible implementation manner, acquiring historical text data input by a user through a man-machine interaction system includes:

acquiring original text data input by a user through a man-machine conversation system;

performing text word segmentation and word deactivation processing on original text data to obtain a text data set;

and acquiring historical text data from the text data set by a hierarchical sampling method.

In a second aspect, a method of intent recognition is provided, the method comprising:

acquiring data to be identified input by a user through a man-machine conversation system;

and performing intention recognition on the data to be recognized through an intention recognition model to obtain a recognition result for representing the intention of the user, wherein the intention recognition model is trained by using the method according to the first aspect or any one of the possible implementation manners of the first aspect.

In a third aspect, an apparatus for intention recognition model training is provided, the apparatus comprising:

the acquisition module is used for acquiring historical text data input by a user through a man-machine conversation system, and each historical text data has an intention label representing the intention of the user;

the input module is used for inputting each historical text data into the initial recognition model;

the extraction module is used for extracting semantic features of the historical text data through the initial recognition model;

the fusion module is used for fusing the semantic features to obtain multi-dimensional semantic features corresponding to the intention labels;

and the training module is used for training the initial recognition model according to the multi-dimensional semantic features and the intention labels corresponding to the multi-dimensional semantic features to obtain an intention recognition model.

In a fourth aspect, there is provided an apparatus for intent recognition, the apparatus comprising:

the acquisition module is used for acquiring data to be identified input by a user through a man-machine conversation system;

the identification module is configured to identify data to be identified through an intention identification model to obtain an identification result for representing an intention of a user, where the intention identification model is trained by using the method according to the first aspect or any one of the possible implementation manners of the first aspect.

In a fifth aspect, a computer device is provided, which comprises a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of the first aspect or any one of the possible implementations of the first aspect, or the steps of the method of the second aspect, when executing the computer program.

In a further aspect, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of the first aspect or any one of the possible implementations of the first aspect, or the steps of the method of the second aspect.

The graph recognition model training method and device and the intent recognition method and device are characterized in that historical text data input by a user through a man-machine conversation system are obtained, each historical text data is provided with an intent tag representing the intent of the user, each historical text data is input into an initial recognition model, semantic features of the historical text data are extracted through the initial recognition model, the semantic features are fused to obtain multi-dimensional semantic features corresponding to the intent tags, the initial recognition model is trained according to the multi-dimensional semantic features and the intent tags corresponding to the multi-dimensional semantic features to obtain the intent recognition model, and the semantic features of the text data are obtained from multiple dimensions, so that the intent recognition model can accurately recognize the intent of the user.

Drawings

FIG. 1 is a diagram of an application environment of a training method of an intention recognition model and an intention recognition method in an embodiment of the present application;

FIG. 2 is a schematic flow chart diagram illustrating a training method for an intent recognition model according to an embodiment of the present application;

FIG. 3 is a schematic flow chart illustrating the process of extracting semantic features of historical text data through an initial recognition model according to an embodiment of the present application;

FIG. 4 is a schematic flow chart diagram illustrating a method for intent recognition in one embodiment of the present application;

FIG. 5 is a block diagram of an apparatus for training a recognition model according to an embodiment of the present application;

FIG. 6 is a block diagram of an intent recognition mechanism in accordance with an embodiment of the present application;

fig. 7 is an internal structural diagram of a computer device in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In real application scenarios, the user's intent is often difficult to understand because: firstly, the input information of a user is multi-spoken, and the same intention is expressed by adopting different expression methods; secondly, the input information of the user is relatively short, and some input information only consists of a plurality of words; thirdly, the user expresses stop words, spoken Buddhist words, Chinese and moral words and the like which are carried in a plurality of languages; fourth, the semantics of Chinese vary widely, such as the ambiguity and implicit intent of the language. These problems make it difficult for the human-computer dialog system to understand the true intention of the user, so that ambiguity is generated on the intention of the user, and the recognition result is inaccurate, and a reply with high accuracy cannot be provided for the user. Therefore, the intelligent human-machine conversation system needs to understand the user's intention more accurately if it is to play a larger role in practical applications.

In order to solve the prior art problems, embodiments of the present application provide a training method and apparatus for an intention recognition model, and an intention recognition method and apparatus. The method for training the intention recognition model and the method for recognizing the intention provided by the application can be applied to the application environment shown in FIG. 1. Wherein the terminal 102 communicates with the server 104 via a network. The method comprises the steps that a terminal 102 obtains historical text data input by a user through a man-machine conversation system, each historical text data has an intention label representing the intention of the user, each historical text data is input into an initial recognition model, semantic features of the historical text data are extracted through the initial recognition model, the semantic features are fused to obtain multi-dimensional semantic features corresponding to the intention labels, and the initial recognition model is trained according to the multi-dimensional semantic features and the intention labels corresponding to the multi-dimensional semantic features to obtain an intention recognition model. Or, the terminal 102 acquires data to be recognized input by a user through a man-machine conversation system, and performs intention recognition on the data to be recognized through an intention recognition model to obtain a recognition result for representing the intention of the user. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, and tablet computers, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.

In some embodiments, as shown in fig. 2, a method for training an intention recognition model is provided, which is described by taking the method as an example for being applied to the terminal in fig. 1, and includes the following steps:

s210, historical text data input by a user through a man-machine conversation system are obtained, and each historical text data has an intention label representing the intention of the user.

Before model training, sample data for training an intention recognition model, namely historical text data generated in a man-machine conversation system, needs to be acquired. The historical text data is sentences input by a user when the man-machine interaction system is used, and comprises consultation sentences, help-seeking sentences, answer sentences and the like, for example, the user opens an application program: "i need to know XXX" and "where XXX is", these statements become the historical text data of the man-machine dialog system to which the application corresponds.

Historical text data input by a user through a man-machine conversation system are obtained, the historical text data are preprocessed and labeled with intention labels representing user intentions, so that each historical text data has an intention label representing the user intention, and an intention recognition model is trained by using a plurality of historical text data with labels.

S220, inputting each historical text data into an initial recognition model.

Each historical text data is input into an initial recognition model for feature extraction.

And S230, extracting semantic features of the historical text data through the initial recognition model.

And performing feature construction and feature extraction on the historical text data from multiple aspects through a corresponding neural network in the initial recognition model to obtain semantic features for training the intention recognition model.

And S240, fusing the semantic features to obtain the multi-dimensional semantic features corresponding to the intention labels.

And connecting and fusing the semantic features of the plurality of aspects extracted from each historical text data to obtain the multi-dimensional semantic features corresponding to the intention labels. The multi-dimensional semantic features parse the historical text data from multiple aspects, so that the sample data used for intent model training more truly reflects the user intent.

And S250, training an initial recognition model according to the multi-dimensional semantic features and the intention labels corresponding to the multi-dimensional semantic features to obtain an intention recognition model.

After the initial recognition model extracts and fuses to obtain the multi-dimensional semantic features, the initial recognition model outputs intention labels corresponding to the multi-dimensional semantic features, and the initial recognition model is trained based on the corresponding relation to obtain the intention recognition model.

In the embodiment of the application, historical text data input by a user through a man-machine conversation system are obtained, each historical text data is provided with an intention label representing the intention of the user, each historical text data is input into an initial recognition model, semantic features of the historical text data are extracted through the initial recognition model, the semantic features are fused to obtain multidimensional semantic features corresponding to the intention labels, the initial recognition model is trained according to the multidimensional semantic features and the intention labels corresponding to the multidimensional semantic features to obtain an intention recognition model, and the semantic features of the historical text data are obtained from multiple dimensions, so that the intention recognition model can accurately recognize the intention of the user.

In some embodiments, the initial recognition model is constructed based on an Encoder-Decoder base framework, including a fully connected layer; the semantic features comprise depth semantic features and potential semantic features; fusing semantic features to obtain multi-dimensional semantic features corresponding to the intention labels, wherein the method comprises the following steps:

The initial recognition model is constructed based on an Encoder-Decoder basic framework, the complexity of an algorithm in the model is improved to a certain extent, the calculation requirement of training from multi-dimensional semantic features of historical text data can be met, and the accuracy of text data intention recognition can be effectively improved.

The deep semantic features are semantic features of hidden states of the historical text data extracted based on the initial recognition model, namely semantic features which highlight text importance in the historical text data. The latent semantic features are latent semantic features of the historical text data extracted based on the initial recognition model, namely, semantic features which supplement text meanings in the historical text data, and although the latent semantic features are only auxiliary features, the latent semantic features are also very important for recognition of the text data.

The full-connection layer is fused with the depth semantic features and the potential semantic features to obtain the multi-dimensional semantic features corresponding to the intention labels, so that semantic data are prevented from being lost, the extracted semantic features can accurately express the intention of the user, and the accuracy of the intention recognition model is improved.

In model training, a Dropout mechanism is introduced, weight connection during training is reduced, partial parameters are discarded, and overfitting of the model is prevented.

In some embodiments, the initial recognition model further comprises a global word vector network, a BiGRU neural network, an Attention neural network, and a Softmax function layer; in S230, the process of extracting semantic features of the historical text data through the initial recognition model specifically includes:

and S231, converting the historical text data through a global word vector network to obtain a historical text data vector.

In order to realize semantic relation between a computer and a user, the global word vector network adopts a Glove global word vector network to convert historical text data into semantic vectors understood by the computer, namely historical text data vectors.

The Glove global word vector network is based on global word co-occurrence statistics, a word vectorization neural network is trained by combining a local context window method, meanwhile, compared with other word vector neural networks, the Glove global word vector network is higher in training speed, higher in internal statistics and better in performance of data sets on large and small scales, and the word-word semantic relevance in text data can be more prominent by obtaining a co-occurrence matrix through a statistical method.

And S232, extracting the hidden semantic feature vector of the historical text through a BiGRU neural network according to the historical text data vector.

In the construction process of the intention recognition model, the problem of unequal input and output lengths in a sequence can be solved by adopting an Encoder-Decoder framework. And because the bidirectional nature of the natural language structure, namely the bidirectional association relation exists between the texts, the bidirectional cyclic neural network model is adopted to carry out time sequence coding on the texts, so that the problem of unidirectional nature of the cyclic neural network model for extracting the text characteristics is solved, and more comprehensive and robust semantic characteristics of the hidden state are extracted by splicing the hidden states output in the forward direction and the reverse direction. Therefore, in the encoding Encoder stage, the BiGRU neural network is adopted to obtain the hidden semantic feature vector h ═ of the historical text data (h)₁，h₂，…，h_T)。

S233, acquiring the Attention score of the hidden semantic feature vector through the Attention neural network.

In conventional natural language processing, it is common to output h at the last time t of the recurrent neural network layer_tAs semantic feature representation of the whole text data, and then for the fixed-length vector h_tAnd decoding to obtain a corresponding semantic sequence. One potential problem with this approach is that once the model attempts to compress all the information of a long sequence of text into a fixed-length vector, some of the information will be lost. Especially for recurrent neural networks, and finallyThe memory of the output of a moment to the remote information of the preamble is often difficult to guarantee, and the performance of the whole neural network is rapidly reduced along with the increase of the length of the text sequence. Therefore, the selection of the importance of the text feature data is enhanced through the Attention neural network.

And scoring the hidden semantic feature vectors by adopting an Attention scoring mechanism of the multiplication Attention Multiplicative Attention through an Attention mechanism of an Attention neural network to obtain the Attention score of each phrase semantic feature vector in the hidden semantic feature vectors. The multi-functional authorization has high-efficiency computing capability and storage capability, and training efficiency and accuracy of the intention recognition model are improved.

And calculating according to the hidden semantic feature vector and the relevance measurement weight to obtain a fractional vector, and then performing dot product on the fractional vector and the hidden semantic feature vector of the hidden state at the last moment of the encoder to obtain an attention score. The formula for calculating the fractional vector can be expressed as:

e_t＝W_hh_t…………………………………(1)

wherein e is_tRepresenting a fractional vector, W_hRepresents a correlation metric weight, h_tAnd (4) a hidden semantic feature vector representing the hidden state of the encoder at the moment t.

And S234, calculating an attention weight value of the hidden semantic feature vector through a Softmax function layer according to the attention score of the hidden semantic feature vector.

And calculating an attention weight value of the hidden semantic feature vector through a Softmax function layer of the intention recognition model, wherein the attention weight value refers to an attention probability distribution value distributed to each input, is a scalar between 0 and 1 and represents the importance degree of each word in decoding, and the larger the attention weight value is, the more important the corresponding information is represented, and the less important the attention weight value is, otherwise, the less important the information is.

The formula for calculating the attention weight value may be expressed as:

wherein, a_tDenotes the attention weight value, s_tHidden semantic feature vector being the hidden state of the encoder at the last moment, e_tRepresenting a fractional vector, T representing the number of elements in the latent semantic feature vector, e_kAnd a fractional vector of the hidden semantic feature vector representing the hidden state of the encoder at the moment k is acquired.

And S235, extracting the deep semantic features according to the attention weight value of the hidden semantic feature vector and the hidden semantic feature vector.

Firstly, the attention weight value is adopted to respectively carry out weighted sum with the hidden semantic feature vector in the hidden state, and the context semantic vector with attention is obtained.

And then splicing the context semantic vector with attention and the hidden semantic feature vector of the hidden state of the encoder at the last moment to obtain the depth semantic feature.

The calculation formula of the context semantic vector can be expressed as:

wherein c represents a context semantic vector, a_tDenotes the attention weight value, h_tAnd (4) a hidden semantic feature vector representing the hidden state of the encoder at the moment t.

The calculation formula of the depth semantic features can be expressed as:

where C represents a depth semantic feature, C represents a context semantic vector, s_tA latent semantic feature vector representing the latent state at the last moment of the encoder.

In some embodiments, the initial recognition model further comprises convolutional layers and global maximum pooling layers of a convolutional neural network; extracting semantic features of the historical text data through an initial recognition model, wherein the semantic features comprise:

Semantic features which highlight text importance in historical text data can be obtained based on an Attention mechanism, but potential semantic features in the historical text data cannot be obtained due to the characteristics of a network structure of the semantic features, and the semantic features are very important for identifying intentions in the text data. Therefore, after the BiGRU model obtains the hidden semantic feature vectors of the hidden states of the historical text data, the feature extraction is continued by adopting the convolutional neural network and the global maximum pooling operation, and the potential semantic features with the time sequence characteristic are obtained. The specific extraction process comprises the following steps:

firstly, a hidden semantic feature vector h of a hidden state of input historical text data is captured in a BiGRU model (h ═ h)₁,h₂,…,h_T) Then, the sliding windows with the step length of 1 and the size of m are adopted to sequentially pair h_1:m,h_2:m+1,…,h_T-m+1:TLocal feature extraction is carried out on the window to obtain a feature graph H ═ H (H)₁,H₂,…,H_T-m+1)。

Then, using global max pooling, sampling the extracted local features, and obtaining the potential semantic feature M as max (c)₁,c₂,…,c_T-m+1)。

In some embodiments, obtaining historical text data input by a user through a human-computer dialog system comprises:

The original text data is sentences which are input by a user through a man-machine conversation system and are not preprocessed, the sentences may contain stop words, slogans, language-atmosphere words and the like, most of the sentences are biased to be spoken, even words with local dialects are possible, and the initial recognition model is difficult to recognize and extract semantic features. Therefore, the original text data needs to be preprocessed, and the preprocessing process specifically includes:

1. performing text word segmentation: a PKUSeg Chinese word segmentation tool with a big north open source is adopted.

2. Stop words: the stop words comprise punctuation, numbers, single words and other nonsense words, such as auxiliary words, tone words and the like, and the stop words are processed by utilizing a manually established stop word dictionary, namely special symbols, punctuation and numbers are replaced.

The method comprises the steps of preprocessing original text data to obtain a text data set, dividing the text data set into a training set and a verification set, dividing the training set into the training set and the verification set by adopting a reservation method, and when the data is divided by the reservation method, adopting a layered sampling method, so that the consistency of the distribution of positive and negative samples of the training set and the verification set can be ensured, and the convergence speed of a model is accelerated.

The hierarchical sampling means that a text data set is divided into a plurality of types or layers according to preset attribute characteristics, then text data are randomly extracted from each type or each layer, and the text data set is divided into a training set verification set, so that the positive and negative sample proportion in the training set and the verification set is consistent with that in the text data set. The hierarchical sampling method increases the commonality of the text data in each type through classification and layering, and is easy to extract representative sample data, so that the text data used for training the intention recognition model can reflect the intention of the user to the greatest extent, and the accuracy of recognizing the intention of the user by the intention recognition model according to the text data input by the user is improved.

When the text data set is divided into a training set and a verification set through hierarchical sampling, the text data in the training set is determined as historical text data used for training of the intention recognition model.

According to the methods described in S210 to S250, after the initial recognition model is trained and the intention recognition model is obtained, the intention recognition model parameter optimization and the intention recognition performance evaluation are also required.

Wherein, a control variable method is adopted in the process of optimizing the parameters of the intention recognition model, namely, the condition influencing the model performance is only ensured, and the parameter tuning comprises the following contents: word vector dimensions, the number of BiGRU neurons, the convolution kernel size parameter, and the value of Dropout.

In the intention recognition performance evaluation process, an intention recognition model is tested for 6 times by using a cross-validation thought for reference. The experimental data of each time are randomly selected from the training set and the verification set, the number of the training set and the verification set is fixed at the same time, the purpose of balancing the experimental data is achieved, and the average value of 6 tests is taken as the standard for measuring the performance of the model in the final experimental result. It should be noted that, two indexes of accuracy Acc and F1 are adopted in performance evaluation, so that the contingency caused by a single evaluation index is avoided.

In some embodiments, as shown in fig. 4, an intention identification method is provided, which is described by taking the method as an example for being applied to the terminal in fig. 1, and includes the following steps:

and S410, acquiring data to be identified input by a user through a man-machine conversation system.

When a user speaks through the man-machine conversation system, the terminal collects voice information of the user. And acquiring sentence data input by the user according to the sound information, wherein the sentence data comprises a consultation sentence, a help-seeking sentence, a reply sentence and the like.

And S420, performing intention recognition on the data to be recognized through an intention recognition model to obtain a recognition result for representing the intention of the user, wherein the intention recognition model is trained by using the method recorded in the steps S210 to S250.

And the intention label corresponding to the data to be recognized represents a recognition result, the data to be recognized is input into the intention recognition model, the intention recognition model outputs the intention label corresponding to the data to be recognized, and the intention of the user can be known according to the intention label. Since the intention recognition model trained according to the method described in steps S210 to S250 can accurately recognize the user intention of the data to be recognized, the accuracy of the human-computer dialog system providing the user with a response is improved.

In the embodiment of the application, the data to be recognized input by the user through the man-machine conversation system is obtained, the intention recognition is carried out on the data to be recognized through the intention recognition model, the user intention of the data to be recognized can be accurately recognized, and the accuracy rate of response provided by the man-machine conversation system for the user is improved.

It should be understood that although the various steps in the flow charts of fig. 2-4 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-4 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.

In some embodiments, as shown in FIG. 5, there is provided an intent recognition model training apparatus 500, comprising: an acquisition module 510, an input module 520, an extraction module 530, a fusion module 540, and a training module 550, wherein:

an obtaining module 510, configured to obtain historical text data input by a user through a man-machine interaction system, where each historical text data has an intention tag representing an intention of the user;

an input module 520 for inputting each historical text data into the initial recognition model;

an extracting module 530, configured to extract semantic features of the historical text data through an initial recognition model;

a fusion module 540, configured to fuse the semantic features to obtain a multidimensional semantic feature corresponding to the intention tag;

and a training module 550, configured to train the initial recognition model according to the multidimensional semantic features and the intention labels corresponding to the multidimensional semantic features, so as to obtain an intention recognition model.

In some embodiments, the initial recognition model is constructed based on an Encoder-Decoder base framework, including a fully connected layer; the semantic features comprise depth semantic features and potential semantic features; the fusion module 540 is specifically configured to:

In some embodiments, the initial recognition model further comprises a global word vector network, a BiGRU neural network, an Attention neural network, and a Softmax function layer; the extracting module 530 is specifically configured to:

In some embodiments, the initial recognition model further comprises convolutional layers and global maximum pooling layers of a convolutional neural network; the extracting module 530 is specifically configured to:

In some embodiments, the obtaining module 510 is specifically configured to:

In some embodiments, as shown in fig. 6, there is provided an intent recognition apparatus 600 comprising: an obtaining module 610 and an identifying module 620, wherein:

the acquisition module 610 is used for acquiring data to be identified input by a user through a man-machine conversation system;

and the identifying module 620 is configured to identify the data to be identified through an intention identification model to obtain an identification result for characterizing the intention of the user, where the intention identification model is trained by using the method according to S210 to S250.

For the specific definition of the intention recognition model training device, the above definition of the intention recognition model training method may be referred to, and for the specific definition of the intention recognition device, the above definition of the intention recognition method may be referred to, and will not be described herein again. The modules in the intention recognition model training device and the intention recognition device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In some embodiments, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 7. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an intent recognition model training method, or an intent recognition method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In some embodiments, there is provided a computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

inputting each historical text data into an initial recognition model;

and training the initial recognition model according to the multi-dimensional semantic features and the intention labels corresponding to the multi-dimensional semantic features to obtain an intention recognition model. Alternatively, the first and second electrodes may be,

and performing intention recognition on the data to be recognized through an intention recognition model to obtain a recognition result for representing the intention of the user, wherein the intention recognition model is trained by using the method described in S210 to S250.

In some embodiments, the processor, when executing the computer program, further performs the steps of: the initial identification model is constructed based on an Encoder-Decoder basic framework and comprises a full connection layer; the semantic features comprise depth semantic features and potential semantic features; fusing semantic features to obtain multi-dimensional semantic features corresponding to the intention labels, wherein the method comprises the following steps: and fusing the depth semantic features and the potential semantic features through the full-connection layer to obtain the multi-dimensional semantic features corresponding to the intention labels.

In some embodiments, the processor, when executing the computer program, further performs the steps of: the initial recognition model further comprises a global word vector network, a BiGRU neural network, an Attention neural network and a Softmax function layer; extracting semantic features of the historical text data through an initial recognition model, wherein the semantic features comprise: converting the historical text data through a Glove global word vector network to obtain a historical text data vector; extracting a hidden semantic feature vector of the historical text through a BiGRU neural network according to the historical text data vector; acquiring the Attention score of the hidden semantic feature vector through an Attention neural network; according to the attention score of the hidden semantic feature vector, calculating an attention weight value of the hidden semantic feature vector through a Softmax function layer; and extracting the depth semantic features according to the attention weight value of the hidden semantic feature vector and the hidden semantic feature vector.

In some embodiments, the processor, when executing the computer program, further performs the steps of: the initial identification model further comprises a convolutional layer and a global maximum pooling layer of the convolutional neural network; extracting semantic features of the historical text data through an initial recognition model, wherein the semantic features comprise: extracting the features of the hidden semantic feature vectors through the convolutional layers to obtain local semantic feature vectors; and sampling the local semantic feature vectors through the global maximum pooling layer to obtain potential semantic features.

In some embodiments, the processor, when executing the computer program, further performs the steps of: acquiring historical text data input by a user through a man-machine conversation system, wherein the historical text data comprises: acquiring original text data input by a user through a man-machine conversation system; performing text word segmentation and word deactivation processing on original text data to obtain a text data set; and acquiring historical text data from the text data set by a hierarchical sampling method.

In some embodiments, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

inputting each historical text data into an initial recognition model;

In some embodiments, the computer program when executed by the processor further performs the steps of: the initial identification model is constructed based on an Encoder-Decoder basic framework and comprises a full connection layer; the semantic features comprise depth semantic features and potential semantic features; fusing semantic features to obtain multi-dimensional semantic features corresponding to the intention labels, wherein the method comprises the following steps: and fusing the depth semantic features and the potential semantic features through the full-connection layer to obtain the multi-dimensional semantic features corresponding to the intention labels.

In some embodiments, the computer program when executed by the processor further performs the steps of: the initial recognition model further comprises a global word vector network, a BiGRU neural network, an Attention neural network and a Softmax function layer; extracting semantic features of the historical text data through an initial recognition model, wherein the semantic features comprise: converting the historical text data through a Glove global word vector network to obtain a historical text data vector; extracting a hidden semantic feature vector of the historical text through a BiGRU neural network according to the historical text data vector; acquiring the Attention score of the hidden semantic feature vector through an Attention neural network; according to the attention score of the hidden semantic feature vector, calculating an attention weight value of the hidden semantic feature vector through a Softmax function layer; and extracting the depth semantic features according to the attention weight value of the hidden semantic feature vector and the hidden semantic feature vector.

In some embodiments, the computer program when executed by the processor further performs the steps of: the initial identification model further comprises a convolutional layer and a global maximum pooling layer of the convolutional neural network; extracting semantic features of the historical text data through an initial recognition model, wherein the semantic features comprise: extracting the features of the hidden semantic feature vectors through the convolutional layers to obtain local semantic feature vectors; and sampling the local semantic feature vectors through the global maximum pooling layer to obtain potential semantic features.

In some embodiments, the computer program when executed by the processor further performs the steps of: acquiring historical text data input by a user through a man-machine conversation system, wherein the historical text data comprises: acquiring original text data input by a user through a man-machine conversation system; performing text word segmentation and word deactivation processing on original text data to obtain a text data set; and acquiring historical text data from the text data set by a hierarchical sampling method.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of training an intent recognition model, the method comprising:

inputting each of the historical text data into an initial recognition model;

extracting semantic features of the historical text data through the initial recognition model;

2. The method of claim 1, wherein the initial recognition model is constructed based on an Encoder-Decoder base framework, comprising a fully connected layer; the semantic features comprise depth semantic features and potential semantic features; the fusing the semantic features to obtain the multi-dimensional semantic features corresponding to the intention labels includes:

and fusing the depth semantic features and the potential semantic features through the full connection layer to obtain the multi-dimensional semantic features corresponding to the intention labels.

3. The method of claim 2, wherein the initial recognition model further comprises a global word vector network, a BiGRU neural network, an Attention neural network, and a Softmax function layer; the extracting semantic features of the historical text data through the initial recognition model comprises:

converting the historical text data through the global word vector network to obtain a historical text data vector;

extracting a hidden semantic feature vector of the historical text through the BiGRU neural network according to the historical text data vector;

acquiring an Attention score of the hidden semantic feature vector through the Attention neural network;

according to the attention score of the hidden semantic feature vector, calculating an attention weight value of the hidden semantic feature vector through the Softmax function layer;

4. The method of claim 3, wherein the initial recognition model further comprises convolutional layers and global max pooling layers of convolutional neural networks; the extracting semantic features of the historical text data through the initial recognition model comprises:

performing feature extraction on the hidden semantic feature vector through the convolutional layer to obtain a local semantic feature vector;

and sampling the local semantic feature vectors through the global maximum pooling layer to obtain the potential semantic features.

5. The method of claim 1, wherein the obtaining historical text data input by a user through a man-machine interaction system comprises:

performing text word segmentation and word deactivation processing on the original text data to obtain a text data set;

and acquiring the historical text data from the text data set by a hierarchical sampling method.

6. A method of intent recognition, the method comprising:

and performing intention recognition on the data to be recognized through an intention recognition model to obtain a recognition result for representing the intention of a user, wherein the intention recognition model is trained by using the method according to any one of claims 1 to 5.

7. An apparatus for intent recognition model training, the apparatus comprising:

the input module is used for inputting each historical text data into an initial recognition model;

8. An apparatus for intent recognition, the apparatus comprising:

a recognition module, configured to recognize the data to be recognized through an intention recognition model, so as to obtain a recognition result for characterizing an intention of a user, where the intention recognition model is trained by using the method according to any one of claims 1 to 5.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any one of claims 1 to 5 or the steps of the method of claim 6 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5 or carries out the steps of the method of claim 6.