CN110968671A

CN110968671A - Intent determination method and device based on Bert

Info

Publication number: CN110968671A
Application number: CN201911219821.3A
Authority: CN
Inventors: 周思丞; 苏少炜; 陈孝良; 常乐
Original assignee: Beijing SoundAI Technology Co Ltd
Current assignee: Beijing SoundAI Technology Co Ltd
Priority date: 2019-12-03
Filing date: 2019-12-03
Publication date: 2020-04-07

Abstract

The invention discloses an intention determining method based on Bert, which is applied to a target Bert model, wherein the target Bert model replaces a fully-connected hidden layer in the Bert model with a non-fully-connected hidden layer, and comprises the following steps: determining a mask vector in an input statement, and taking the mask vector as a prediction target; obtaining a target vector corresponding to the prediction target based on the non-fully-connected hidden layer; and performing multi-purpose recognition on the target vector to determine the target purpose of the input statement. In the above determination method, the target Bert model replaces the fully connected hidden layer in the Bert model with the non-fully connected hidden layer, and the fully connected hidden layer doubles the calculation amount, while the non-fully connected hidden layer reduces the structural complexity of the model and requires less calculation time, thereby reducing the time overhead of intent prediction based on the Bert model.

Description

Intent determination method and device based on Bert

Technical Field

The invention relates to the technical field of voice recognition, in particular to an intention determining method and device based on Bert.

Background

The Bert model (Bidirectional Encoder expressions from transformations) is a new type of language model. The pre-trained deep bi-directional representation is trained by jointly adjusting the bi-directional transformers in all layers.

The core content of the Bert algorithm adopts 24 layers of transform feature extraction neural networks, the combined training of Mask LM and context prediction is adopted, 33 hundred million training corpora are adopted by google, and the time spent on model training and use is very huge while excellent algorithm performance is achieved.

Therefore, in the process of intent prediction based on the Bert model, the time overhead in model training and use is very large. 16 tpu clusters are required to be trained for 3 days, a common GPU needs to be trained for 3 months, and in the using process, engineers need to perform fine-tuning on the Bert model according to data of a specific NLP task, so that the calculation cost is huge.

Disclosure of Invention

In view of the above, the present invention provides an intent determination method and apparatus based on Bert, so as to solve the problem that in the prior art, when a Bert model obtains excellent algorithm performance, time overhead is very large during model training and use. 16 tpu clusters are required to be trained for 3 days, a common GPU needs to be trained for 3 months, however, in the using process, engineers need to perform fine-tuning on the Bert model according to data of a specific NLP task, and therefore, the calculation cost is also a very huge problem, and the specific scheme is as follows:

an intent determination method based on Bert is applied to a target Bert model, wherein the target Bert model replaces a fully-connected hidden layer in the Bert model with a non-fully-connected hidden layer, and comprises the following steps:

determining a mask vector in an input statement, and taking the mask vector as a prediction target;

obtaining a target vector corresponding to the prediction target based on the non-fully-connected hidden layer;

and performing multi-purpose recognition on the target vector to determine the target purpose of the input statement.

Optionally, in the foregoing method, obtaining a target vector corresponding to the predicted target based on the non-fully-connected hidden layer includes:

obtaining each vector output by the prediction target based on the non-fully connected hidden layer;

extracting the characteristic value of each vector, and segmenting each characteristic value according to a preset step length;

and taking the vector corresponding to the maximum characteristic value in each segment as a target vector of the segment.

Optionally, the method for determining a mask vector in an input statement, where the mask vector is used as a prediction target includes:

performing word segmentation processing on the input sentence;

mapping the analysis result into a token vector according to the vocabulary;

randomly selecting a mask vector with preset coverage rate from the token vectors, and taking the mask vector as a prediction target.

Optionally, in the above method, performing multi-intent recognition on the target vector to determine the target intent of the input sentence includes:

carrying out state operation on the target vector to obtain a target vector to be analyzed;

obtaining each candidate intention corresponding to the target vector to be analyzed, and transferring the target vector to be analyzed to a Softmax prediction function to predict each candidate intention to obtain a probability value of each candidate intention;

and selecting the alternative intention corresponding to the highest value from the probability values as the target intention of the input statement.

Optionally, the above method, before obtaining the target vector corresponding to the predicted target based on the non-fully-connected hidden layer, further includes:

providing an alternative answer to the predicted objective.

The above method, optionally, is characterized in that the method of determining a mask vector in an input sentence, and using the mask vector as a prediction target, further includes:

and under the condition that the input sentence is a single sentence, deleting the classification method proposed for the sentence-level features in the Bert model.

An intent determination device based on Bert, applied to a target Bert model, wherein the target Bert model replaces a fully-connected hidden layer in the Bert model with a non-fully-connected hidden layer, and the intent determination device comprises:

the prediction target determining module is used for determining a mask vector in an input statement and taking the mask vector as a prediction target;

an obtaining module, configured to obtain a target vector corresponding to the predicted target based on the non-fully-connected hidden layer;

and the target intention determining module is used for performing multi-intention recognition on the target vector and determining the target intention of the input statement.

The above apparatus, optionally, the obtaining module includes:

an obtaining unit configured to obtain each vector output by the prediction target based on the non-fully-connected hidden layer;

the extracting and segmenting unit is used for extracting the characteristic values of the vectors and segmenting the characteristic values according to a preset step length;

and the determining unit is used for taking the vector corresponding to the maximum characteristic value in each segment as the target vector of the segment.

A storage medium comprising a stored program, wherein the program performs the Bert-based intent determination method described above.

A processor for executing a program, wherein the program when executed performs the Bert based intent determination method described above.

Compared with the prior art, the invention has the following advantages:

the invention discloses an intention determining method based on Bert, which is applied to a target Bert model, wherein a fully-connected hidden layer in the Bert model is replaced by a non-fully-connected hidden layer in the target Bert model, and the method comprises the following steps of: determining a mask vector in an input statement, and taking the mask vector as a prediction target; obtaining a target vector corresponding to the prediction target based on the non-fully-connected hidden layer; and performing multi-purpose recognition on the target vector to determine the target purpose of the input statement. In the above determination method, the target Bert model replaces the fully connected hidden layer in the Bert model with the non-fully connected hidden layer, and the fully connected hidden layer doubles the calculation amount, while the non-fully connected hidden layer reduces the structural complexity of the model and requires less calculation time, thereby reducing the time overhead of intent prediction based on the Bert model.

Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic structural diagram of a prior art Bert model;

FIG. 2 is a flowchart of an intent determination method based on Bert disclosed in an embodiment of the present application;

FIG. 3 is another flowchart of a Bert-based intent determination method disclosed in an embodiment of the present application;

fig. 4 is a block diagram of a Bert-based intent determination apparatus according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The invention discloses an intent determination method and device based on Bert, which are applied to the process of intent determination of an input statement based on the Bert algorithm, wherein the core content of the Bert algorithm adopts a 24-layer transform feature extraction neural network, and joint training of MaskLM and context prediction is adopted. Meanwhile, the google adopts 33 hundred million training corpora, and the time spent on model training and use is huge while excellent algorithm performance is obtained. The Bert model is used as one of the pre-training (fine-tuning) models, and is pre-trained by the unlabeled corpus first, and the corpus in the application field needs to be subjected to fine-tuning training in the using process. The schematic structural diagram of the Bert model is shown in fig. 1, where TRM is a transform module in the Bert model, En is an input of the Bert model, that is, an input statement, and Tn is a result of feature extraction of the transform model. The Transformer is a novel feature extraction tool, replaces an original time sequence structure by an attention mechanism (attention), better realizes parallelization calculation, and can solve the long-term dependence problem.

The execution flow of the intent determination method based on Bert is shown in fig. 2, and the method is applied to a target Bert model, wherein the target Bert model replaces a fully-connected hidden layer in the Bert model with a non-fully-connected hidden layer, and the method comprises the following steps:

s101, determining a mask vector in an input statement, and taking the mask vector as a prediction target;

in the embodiment of the invention, the input sentence can be input into the Bert model only by vectorization, the input sentence is firstly subjected to word segmentation, and is mapped into a randomly initialized token vector according to a vocabulary, wherein the vocabulary is constructed according to a historical training process, and is preferably updated every time a preset duration is set, an update instruction is received or other update conditions are received. And simultaneously, randomly selecting a mask vector covered by 15% from the token vectors as a prediction target of the model.

S102, obtaining a target vector corresponding to the predicted target based on the non-full-connection hidden layer;

in the embodiment of the invention, the target Bert model replaces the fully-connected hidden layer in the Bert model with the non-fully-connected hidden layer, so that the replacing of the Bert model has 3 hundred million parameters, and a large amount of calculation is needed when one vector is input. The reason why the parameters are so huge is that due to the full-connection structure in the middle of the hidden layer of the neural network, the full connection has the advantage that the nonlinear fitting capacity of the model is enhanced, but at the same time, the calculation amount of the model is multiplied. Therefore, in the embodiment of the present invention, the fully connected hidden layer is replaced with the non-fully connected hidden layer, a maxpool method is adopted to perform a dimension reduction process on each vector of the hidden layer state output in the prediction process, and each finally obtained vector is subjected to vector calculation to obtain a target vector, where the vector calculation method may be a method of setting a calculation method according to specific situations, such as direct accumulation, accumulation according to a preset rule, and the like, and is not limited in the embodiment of the present invention.

S103, performing multi-purpose recognition on the target vector, and determining the target purpose of the input statement.

In the embodiment of the present invention, the purpose of the intent recognition task is to implement fine tuning, and the purpose of the fine tuning is to enable the target Bert model to learn a specific task, for example: the input and sentence are: "how do the weather today? ", the intent here is for a weather query, and if the user asks a similar sentence again, the target Bert model will know that this is under the intent of the weather query.

The multi-intention recognition task execution flow is as follows: performing state operation on each target vector to obtain target vectors to be analyzed, wherein each target vector to be analyzed is pre-allocated with a corresponding alternative intention, a specific allocation principle can be allocated according to experience, each alternative intention corresponding to the target vector to be analyzed is obtained, the target vector to be analyzed is transferred to a softmax prediction function to predict each alternative intention, and a back propagation and descending gradient algorithm is adopted to train the target Bert model, wherein the softmax prediction function is as follows:

wherein: a predict-probability value;

performance-prediction probability;

n-number of categories of alternative intents;

in the embodiment of the invention, if the number of the candidate intention categories is 5, the target Bert model can obtain the prediction probability of each category, softmax is a normalization function, and the normalization method has two forms, namely, the number is changed into a decimal between (0 and 1), and data is mapped into a range of 0-1 for processing, so that the method is more convenient and faster. After normalization, the category with the maximum probability value is the target intention of the model, and then error calculation is carried out by utilizing the prediction probability, wherein the calculation formula is as follows:

wherein: n is the number of categories, y is the true predictor (0 or 1), and y' is the prediction probability for that category.

Gradient descent calculation method:

where θ is the parameter to be updated, α is the learning rate,

is the gradient direction of the parameter.

In the embodiment of the present invention, in the prior art, an alternative answer is not provided for the predicted target, a traditional training mode is context prediction, a next word which may appear is predicted through the context, which is equivalent to randomly selecting a word from the word list, and since a Bert model has a slow convergence speed when extra information is less, a prediction cycle is long, the prediction is performed by using a mode of providing an alternative answer for the predicted target, and a specific process is as follows: the target Bert model adopts MaskLM in training, wherein the MaskLM is a similar information leakage mode of completion filling, when the target Bert model is predicted, a plurality of alternative answers (including correct answers) are randomly selected firstly and provided for the target Bert model, and the correct answers selected from the alternative answers help the target Bert model to be rapidly converged.

Furthermore, in order to prevent the target Bert model from learning less knowledge, in addition to adopting a Mask LM mode when inputting a sentence, the knowledge amount learned by the model is randomly increased by 10% Mask between the transform hierarchies.

In the embodiment of the present invention, S101 is equivalent to a pre-training task in the prior art, and aims to enable the Bert model to learn grammar knowledge.

In the embodiment of the present invention, a flow of a method for obtaining a target vector corresponding to the predicted target based on the non-fully-connected hidden layer is shown in fig. 3, and includes the steps of:

s201, obtaining each vector output by the prediction target based on the non-fully-connected hidden layer;

in the embodiment of the present invention, the predicted target may output vectors after passing through the non-fully-connected hidden layer, and each vector output by the predicted target based on the non-fully-connected hidden layer is obtained.

S202, extracting the characteristic values of the vectors, and segmenting the characteristic values according to a preset step length;

in the embodiment of the present invention, the feature values of the vectors are extracted, and after the extraction is completed, a preset step length is selected to segment the feature values, where the preset step length may be set according to experience or specific conditions, and the value of the preset step length is not limited.

And S203, taking the vector corresponding to the maximum characteristic value in each segment as a target vector of the segment.

In the embodiment of the present invention, the vector corresponding to the maximum eigenvalue in each segment is used as the target vector of the segment, for example, if the preset step length is 6, the original 768-dimensional vector of each hidden layer can be reduced to 128-dimensional, because the eigen dimensions of the hidden layers are reduced, the corresponding weight matrix is correspondingly reduced, the calculation amount and the training time are also reduced, and the main information of each token can be retained.

In the embodiment of the invention, based on the intention determination method, the non-information mask prediction in the Bert model is changed into prediction for providing alternative answers, so that the learning difficulty of the model is reduced, meanwhile, a non-full-link hidden layer is adopted to replace a full-link hidden layer, the structural complexity of the model is reduced, the time overhead of the Bert algorithm is optimized, and the time overhead of the Bert algorithm in an industrial production environment meets the requirement.

In the embodiment of the invention, the Bert model provides segment embedding for sentence-level features, and the binary classification problem of sense-level is added.

In the embodiment of the invention, 70 ten thousand training corpuses and 1 ten thousand testing corpuses are used, wherein the training corpuses and the testing corpuses are divided into 9 categories. The analysis is performed by the Bert model and the target Bert model respectively, and the analysis results are shown in table 1:

TABLE 1

By the target Bert model, the time required by the training of the model is greatly reduced, and meanwhile, the lost model precision is within an acceptable range.

Based on the above intent determination method based on Bert, in the embodiment of the present invention, there is further provided an intent determination device based on Bert, which is applied to a target Bert model, where the target Bert model replaces a fully connected hidden layer in the Bert model with a non-fully connected hidden layer, and a structural block diagram of the intent determination device is shown in fig. 4, and includes:

a predicted target determining module 301, an obtaining module 302 and a target intention extracting module 303.

Wherein,

the prediction target determining module 301 is configured to determine a mask vector in an input statement, and use the mask vector as a prediction target;

the obtaining module 302 is configured to obtain a target vector corresponding to the predicted target based on the non-fully-connected hidden layer;

the target intention determining module 303 is configured to perform multi-intention recognition on the target vector, and determine a target intention of the input sentence.

In this embodiment of the present invention, the obtaining module 302 includes:

an acquisition unit 304, an extraction and segmentation unit 305 and a determination unit 306.

Wherein,

the obtaining unit 304 is configured to obtain each vector output by the prediction target based on the non-fully-connected hidden layer;

the extracting and segmenting unit 305 is configured to extract the feature values of the vectors, and segment the feature values according to a preset step length;

the determining unit 306 is configured to use, in each segment, a vector corresponding to the maximum eigenvalue in the segment as a target vector of the segment.

The device comprises a processor and a memory, wherein the prediction target determining module, the obtaining module, the target intention selecting module and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.

The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more than one, provides alternative answers for predicting the target, and performs dimensionality reduction processing on each vector of the hidden layer state, so that the learning complexity and the structure complexity of the Bert model are reduced respectively, and the time overhead of the Bert algorithm is reduced.

The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.

An embodiment of the present invention provides a storage medium on which a program is stored, the program implementing the intent determination method when executed by a processor.

The embodiment of the invention provides a processor, which is used for running a program, wherein the intention determination method is executed when the program runs.

The embodiment of the invention provides equipment, which comprises a processor, a memory and a program which is stored on the memory and can run on the processor, wherein the processor executes the program and realizes the following steps:

performing word segmentation processing on the input sentence;

mapping the analysis result into a token vector according to the vocabulary;

providing an alternative answer to the predicted objective.

The device herein may be a server, a PC, a PAD, a mobile phone, etc.

The present application also provides a computer program product, which, when being executed on a data processing device, is adapted to carry out the program with the following method steps:

performing word segmentation processing on the input sentence;

mapping the analysis result into a token vector according to the vocabulary;

providing an alternative answer to the predicted objective.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the units may be implemented in the same software and/or hardware or in a plurality of software and/or hardware when implementing the invention.

From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The method and the device for determining the intent based on Bert provided by the invention are described in detail above, and the principle and the implementation mode of the invention are explained by applying specific examples in the text, and the description of the above examples is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. An intent determination method based on Bert is applied to a target Bert model, wherein the target Bert model replaces a fully-connected hidden layer in the Bert model with a non-fully-connected hidden layer, and the method comprises the following steps:

2. The method according to claim 1, wherein obtaining a target vector corresponding to the predicted target based on the non-fully-connected hidden layer comprises:

3. The method of claim 1, wherein determining a mask vector in the input sentence, and using the mask vector as a prediction target comprises:

performing word segmentation processing on the input sentence;

mapping the analysis result into a token vector according to the vocabulary;

4. The method of claim 1, wherein performing multi-intent recognition on the target vector to determine the target intent of the input sentence comprises:

5. The method according to claim 1, wherein before obtaining the target vector corresponding to the predicted target based on the non-fully-connected hidden layer, the method further comprises:

providing an alternative answer to the predicted objective.

6. The method of claim 1, wherein determining a mask vector in the input sentence, the mask vector being a prediction target, further comprises:

7. An intent determination device based on Bert, applied to a target Bert model, wherein the target Bert model replaces a fully connected hidden layer in the Bert model with a non-fully connected hidden layer, includes:

8. The apparatus of claim 7, wherein the obtaining module comprises:

9. A storage medium characterized by comprising a stored program, wherein the program executes the Bert-based intention determining method of any one of claims 1 to 6.

10. A processor, configured to run a program, wherein the program when running performs the Bert based intent determination method of any of claims 1 to 6.