CN110968671A - Intent determination method and device based on Bert - Google Patents
Intent determination method and device based on Bert Download PDFInfo
- Publication number
- CN110968671A CN110968671A CN201911219821.3A CN201911219821A CN110968671A CN 110968671 A CN110968671 A CN 110968671A CN 201911219821 A CN201911219821 A CN 201911219821A CN 110968671 A CN110968671 A CN 110968671A
- Authority
- CN
- China
- Prior art keywords
- target
- vector
- hidden layer
- fully
- bert
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 82
- 239000013598 vector Substances 0.000 claims abstract description 162
- 238000012545 processing Methods 0.000 claims description 7
- 238000004458 analytical method Methods 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 4
- 238000003860 storage Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 abstract description 20
- 238000012549 training Methods 0.000 description 18
- 230000008569 process Effects 0.000 description 11
- 238000004422 calculation algorithm Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 9
- 238000000605 extraction Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011946 reduction process Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
Abstract
The invention discloses an intention determining method based on Bert, which is applied to a target Bert model, wherein the target Bert model replaces a fully-connected hidden layer in the Bert model with a non-fully-connected hidden layer, and comprises the following steps: determining a mask vector in an input statement, and taking the mask vector as a prediction target; obtaining a target vector corresponding to the prediction target based on the non-fully-connected hidden layer; and performing multi-purpose recognition on the target vector to determine the target purpose of the input statement. In the above determination method, the target Bert model replaces the fully connected hidden layer in the Bert model with the non-fully connected hidden layer, and the fully connected hidden layer doubles the calculation amount, while the non-fully connected hidden layer reduces the structural complexity of the model and requires less calculation time, thereby reducing the time overhead of intent prediction based on the Bert model.
Description
Technical Field
The invention relates to the technical field of voice recognition, in particular to an intention determining method and device based on Bert.
Background
The Bert model (Bidirectional Encoder expressions from transformations) is a new type of language model. The pre-trained deep bi-directional representation is trained by jointly adjusting the bi-directional transformers in all layers.
The core content of the Bert algorithm adopts 24 layers of transform feature extraction neural networks, the combined training of Mask LM and context prediction is adopted, 33 hundred million training corpora are adopted by google, and the time spent on model training and use is very huge while excellent algorithm performance is achieved.
Therefore, in the process of intent prediction based on the Bert model, the time overhead in model training and use is very large. 16 tpu clusters are required to be trained for 3 days, a common GPU needs to be trained for 3 months, and in the using process, engineers need to perform fine-tuning on the Bert model according to data of a specific NLP task, so that the calculation cost is huge.
Disclosure of Invention
In view of the above, the present invention provides an intent determination method and apparatus based on Bert, so as to solve the problem that in the prior art, when a Bert model obtains excellent algorithm performance, time overhead is very large during model training and use. 16 tpu clusters are required to be trained for 3 days, a common GPU needs to be trained for 3 months, however, in the using process, engineers need to perform fine-tuning on the Bert model according to data of a specific NLP task, and therefore, the calculation cost is also a very huge problem, and the specific scheme is as follows:
an intent determination method based on Bert is applied to a target Bert model, wherein the target Bert model replaces a fully-connected hidden layer in the Bert model with a non-fully-connected hidden layer, and comprises the following steps:
determining a mask vector in an input statement, and taking the mask vector as a prediction target;
obtaining a target vector corresponding to the prediction target based on the non-fully-connected hidden layer;
and performing multi-purpose recognition on the target vector to determine the target purpose of the input statement.
Optionally, in the foregoing method, obtaining a target vector corresponding to the predicted target based on the non-fully-connected hidden layer includes:
obtaining each vector output by the prediction target based on the non-fully connected hidden layer;
extracting the characteristic value of each vector, and segmenting each characteristic value according to a preset step length;
and taking the vector corresponding to the maximum characteristic value in each segment as a target vector of the segment.
Optionally, the method for determining a mask vector in an input statement, where the mask vector is used as a prediction target includes:
performing word segmentation processing on the input sentence;
mapping the analysis result into a token vector according to the vocabulary;
randomly selecting a mask vector with preset coverage rate from the token vectors, and taking the mask vector as a prediction target.
Optionally, in the above method, performing multi-intent recognition on the target vector to determine the target intent of the input sentence includes:
carrying out state operation on the target vector to obtain a target vector to be analyzed;
obtaining each candidate intention corresponding to the target vector to be analyzed, and transferring the target vector to be analyzed to a Softmax prediction function to predict each candidate intention to obtain a probability value of each candidate intention;
and selecting the alternative intention corresponding to the highest value from the probability values as the target intention of the input statement.
Optionally, the above method, before obtaining the target vector corresponding to the predicted target based on the non-fully-connected hidden layer, further includes:
providing an alternative answer to the predicted objective.
The above method, optionally, is characterized in that the method of determining a mask vector in an input sentence, and using the mask vector as a prediction target, further includes:
and under the condition that the input sentence is a single sentence, deleting the classification method proposed for the sentence-level features in the Bert model.
An intent determination device based on Bert, applied to a target Bert model, wherein the target Bert model replaces a fully-connected hidden layer in the Bert model with a non-fully-connected hidden layer, and the intent determination device comprises:
the prediction target determining module is used for determining a mask vector in an input statement and taking the mask vector as a prediction target;
an obtaining module, configured to obtain a target vector corresponding to the predicted target based on the non-fully-connected hidden layer;
and the target intention determining module is used for performing multi-intention recognition on the target vector and determining the target intention of the input statement.
The above apparatus, optionally, the obtaining module includes:
an obtaining unit configured to obtain each vector output by the prediction target based on the non-fully-connected hidden layer;
the extracting and segmenting unit is used for extracting the characteristic values of the vectors and segmenting the characteristic values according to a preset step length;
and the determining unit is used for taking the vector corresponding to the maximum characteristic value in each segment as the target vector of the segment.
A storage medium comprising a stored program, wherein the program performs the Bert-based intent determination method described above.
A processor for executing a program, wherein the program when executed performs the Bert based intent determination method described above.
Compared with the prior art, the invention has the following advantages:
the invention discloses an intention determining method based on Bert, which is applied to a target Bert model, wherein a fully-connected hidden layer in the Bert model is replaced by a non-fully-connected hidden layer in the target Bert model, and the method comprises the following steps of: determining a mask vector in an input statement, and taking the mask vector as a prediction target; obtaining a target vector corresponding to the prediction target based on the non-fully-connected hidden layer; and performing multi-purpose recognition on the target vector to determine the target purpose of the input statement. In the above determination method, the target Bert model replaces the fully connected hidden layer in the Bert model with the non-fully connected hidden layer, and the fully connected hidden layer doubles the calculation amount, while the non-fully connected hidden layer reduces the structural complexity of the model and requires less calculation time, thereby reducing the time overhead of intent prediction based on the Bert model.
Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic structural diagram of a prior art Bert model;
FIG. 2 is a flowchart of an intent determination method based on Bert disclosed in an embodiment of the present application;
FIG. 3 is another flowchart of a Bert-based intent determination method disclosed in an embodiment of the present application;
fig. 4 is a block diagram of a Bert-based intent determination apparatus according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The invention discloses an intent determination method and device based on Bert, which are applied to the process of intent determination of an input statement based on the Bert algorithm, wherein the core content of the Bert algorithm adopts a 24-layer transform feature extraction neural network, and joint training of MaskLM and context prediction is adopted. Meanwhile, the google adopts 33 hundred million training corpora, and the time spent on model training and use is huge while excellent algorithm performance is obtained. The Bert model is used as one of the pre-training (fine-tuning) models, and is pre-trained by the unlabeled corpus first, and the corpus in the application field needs to be subjected to fine-tuning training in the using process. The schematic structural diagram of the Bert model is shown in fig. 1, where TRM is a transform module in the Bert model, En is an input of the Bert model, that is, an input statement, and Tn is a result of feature extraction of the transform model. The Transformer is a novel feature extraction tool, replaces an original time sequence structure by an attention mechanism (attention), better realizes parallelization calculation, and can solve the long-term dependence problem.
The execution flow of the intent determination method based on Bert is shown in fig. 2, and the method is applied to a target Bert model, wherein the target Bert model replaces a fully-connected hidden layer in the Bert model with a non-fully-connected hidden layer, and the method comprises the following steps:
s101, determining a mask vector in an input statement, and taking the mask vector as a prediction target;
in the embodiment of the invention, the input sentence can be input into the Bert model only by vectorization, the input sentence is firstly subjected to word segmentation, and is mapped into a randomly initialized token vector according to a vocabulary, wherein the vocabulary is constructed according to a historical training process, and is preferably updated every time a preset duration is set, an update instruction is received or other update conditions are received. And simultaneously, randomly selecting a mask vector covered by 15% from the token vectors as a prediction target of the model.
S102, obtaining a target vector corresponding to the predicted target based on the non-full-connection hidden layer;
in the embodiment of the invention, the target Bert model replaces the fully-connected hidden layer in the Bert model with the non-fully-connected hidden layer, so that the replacing of the Bert model has 3 hundred million parameters, and a large amount of calculation is needed when one vector is input. The reason why the parameters are so huge is that due to the full-connection structure in the middle of the hidden layer of the neural network, the full connection has the advantage that the nonlinear fitting capacity of the model is enhanced, but at the same time, the calculation amount of the model is multiplied. Therefore, in the embodiment of the present invention, the fully connected hidden layer is replaced with the non-fully connected hidden layer, a maxpool method is adopted to perform a dimension reduction process on each vector of the hidden layer state output in the prediction process, and each finally obtained vector is subjected to vector calculation to obtain a target vector, where the vector calculation method may be a method of setting a calculation method according to specific situations, such as direct accumulation, accumulation according to a preset rule, and the like, and is not limited in the embodiment of the present invention.
S103, performing multi-purpose recognition on the target vector, and determining the target purpose of the input statement.
In the embodiment of the present invention, the purpose of the intent recognition task is to implement fine tuning, and the purpose of the fine tuning is to enable the target Bert model to learn a specific task, for example: the input and sentence are: "how do the weather today? ", the intent here is for a weather query, and if the user asks a similar sentence again, the target Bert model will know that this is under the intent of the weather query.
The multi-intention recognition task execution flow is as follows: performing state operation on each target vector to obtain target vectors to be analyzed, wherein each target vector to be analyzed is pre-allocated with a corresponding alternative intention, a specific allocation principle can be allocated according to experience, each alternative intention corresponding to the target vector to be analyzed is obtained, the target vector to be analyzed is transferred to a softmax prediction function to predict each alternative intention, and a back propagation and descending gradient algorithm is adopted to train the target Bert model, wherein the softmax prediction function is as follows:
wherein: a predict-probability value;
performance-prediction probability;
n-number of categories of alternative intents;
in the embodiment of the invention, if the number of the candidate intention categories is 5, the target Bert model can obtain the prediction probability of each category, softmax is a normalization function, and the normalization method has two forms, namely, the number is changed into a decimal between (0 and 1), and data is mapped into a range of 0-1 for processing, so that the method is more convenient and faster. After normalization, the category with the maximum probability value is the target intention of the model, and then error calculation is carried out by utilizing the prediction probability, wherein the calculation formula is as follows:
wherein: n is the number of categories, y is the true predictor (0 or 1), and y' is the prediction probability for that category.
Gradient descent calculation method:
where θ is the parameter to be updated, α is the learning rate,is the gradient direction of the parameter.
The invention discloses an intention determining method based on Bert, which is applied to a target Bert model, wherein the target Bert model replaces a fully-connected hidden layer in the Bert model with a non-fully-connected hidden layer, and comprises the following steps: determining a mask vector in an input statement, and taking the mask vector as a prediction target; obtaining a target vector corresponding to the prediction target based on the non-fully-connected hidden layer; and performing multi-purpose recognition on the target vector to determine the target purpose of the input statement. In the above determination method, the target Bert model replaces the fully connected hidden layer in the Bert model with the non-fully connected hidden layer, and the fully connected hidden layer doubles the calculation amount, while the non-fully connected hidden layer reduces the structural complexity of the model and requires less calculation time, thereby reducing the time overhead of intent prediction based on the Bert model.
In the embodiment of the present invention, in the prior art, an alternative answer is not provided for the predicted target, a traditional training mode is context prediction, a next word which may appear is predicted through the context, which is equivalent to randomly selecting a word from the word list, and since a Bert model has a slow convergence speed when extra information is less, a prediction cycle is long, the prediction is performed by using a mode of providing an alternative answer for the predicted target, and a specific process is as follows: the target Bert model adopts MaskLM in training, wherein the MaskLM is a similar information leakage mode of completion filling, when the target Bert model is predicted, a plurality of alternative answers (including correct answers) are randomly selected firstly and provided for the target Bert model, and the correct answers selected from the alternative answers help the target Bert model to be rapidly converged.
Furthermore, in order to prevent the target Bert model from learning less knowledge, in addition to adopting a Mask LM mode when inputting a sentence, the knowledge amount learned by the model is randomly increased by 10% Mask between the transform hierarchies.
In the embodiment of the present invention, S101 is equivalent to a pre-training task in the prior art, and aims to enable the Bert model to learn grammar knowledge.
In the embodiment of the present invention, a flow of a method for obtaining a target vector corresponding to the predicted target based on the non-fully-connected hidden layer is shown in fig. 3, and includes the steps of:
s201, obtaining each vector output by the prediction target based on the non-fully-connected hidden layer;
in the embodiment of the present invention, the predicted target may output vectors after passing through the non-fully-connected hidden layer, and each vector output by the predicted target based on the non-fully-connected hidden layer is obtained.
S202, extracting the characteristic values of the vectors, and segmenting the characteristic values according to a preset step length;
in the embodiment of the present invention, the feature values of the vectors are extracted, and after the extraction is completed, a preset step length is selected to segment the feature values, where the preset step length may be set according to experience or specific conditions, and the value of the preset step length is not limited.
And S203, taking the vector corresponding to the maximum characteristic value in each segment as a target vector of the segment.
In the embodiment of the present invention, the vector corresponding to the maximum eigenvalue in each segment is used as the target vector of the segment, for example, if the preset step length is 6, the original 768-dimensional vector of each hidden layer can be reduced to 128-dimensional, because the eigen dimensions of the hidden layers are reduced, the corresponding weight matrix is correspondingly reduced, the calculation amount and the training time are also reduced, and the main information of each token can be retained.
In the embodiment of the invention, based on the intention determination method, the non-information mask prediction in the Bert model is changed into prediction for providing alternative answers, so that the learning difficulty of the model is reduced, meanwhile, a non-full-link hidden layer is adopted to replace a full-link hidden layer, the structural complexity of the model is reduced, the time overhead of the Bert algorithm is optimized, and the time overhead of the Bert algorithm in an industrial production environment meets the requirement.
In the embodiment of the invention, the Bert model provides segment embedding for sentence-level features, and the binary classification problem of sense-level is added.
In the embodiment of the invention, 70 ten thousand training corpuses and 1 ten thousand testing corpuses are used, wherein the training corpuses and the testing corpuses are divided into 9 categories. The analysis is performed by the Bert model and the target Bert model respectively, and the analysis results are shown in table 1:
TABLE 1
By the target Bert model, the time required by the training of the model is greatly reduced, and meanwhile, the lost model precision is within an acceptable range.
Based on the above intent determination method based on Bert, in the embodiment of the present invention, there is further provided an intent determination device based on Bert, which is applied to a target Bert model, where the target Bert model replaces a fully connected hidden layer in the Bert model with a non-fully connected hidden layer, and a structural block diagram of the intent determination device is shown in fig. 4, and includes:
a predicted target determining module 301, an obtaining module 302 and a target intention extracting module 303.
Wherein the content of the first and second substances,
the prediction target determining module 301 is configured to determine a mask vector in an input statement, and use the mask vector as a prediction target;
the obtaining module 302 is configured to obtain a target vector corresponding to the predicted target based on the non-fully-connected hidden layer;
the target intention determining module 303 is configured to perform multi-intention recognition on the target vector, and determine a target intention of the input sentence.
The invention discloses an intention determining method based on Bert, which is applied to a target Bert model, wherein the target Bert model replaces a fully-connected hidden layer in the Bert model with a non-fully-connected hidden layer, and comprises the following steps: determining a mask vector in an input statement, and taking the mask vector as a prediction target; obtaining a target vector corresponding to the prediction target based on the non-fully-connected hidden layer; and performing multi-purpose recognition on the target vector to determine the target purpose of the input statement. In the above determination method, the target Bert model replaces the fully connected hidden layer in the Bert model with the non-fully connected hidden layer, and the fully connected hidden layer doubles the calculation amount, while the non-fully connected hidden layer reduces the structural complexity of the model and requires less calculation time, thereby reducing the time overhead of intent prediction based on the Bert model.
In this embodiment of the present invention, the obtaining module 302 includes:
an acquisition unit 304, an extraction and segmentation unit 305 and a determination unit 306.
Wherein the content of the first and second substances,
the obtaining unit 304 is configured to obtain each vector output by the prediction target based on the non-fully-connected hidden layer;
the extracting and segmenting unit 305 is configured to extract the feature values of the vectors, and segment the feature values according to a preset step length;
the determining unit 306 is configured to use, in each segment, a vector corresponding to the maximum eigenvalue in the segment as a target vector of the segment.
The device comprises a processor and a memory, wherein the prediction target determining module, the obtaining module, the target intention selecting module and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more than one, provides alternative answers for predicting the target, and performs dimensionality reduction processing on each vector of the hidden layer state, so that the learning complexity and the structure complexity of the Bert model are reduced respectively, and the time overhead of the Bert algorithm is reduced.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
An embodiment of the present invention provides a storage medium on which a program is stored, the program implementing the intent determination method when executed by a processor.
The embodiment of the invention provides a processor, which is used for running a program, wherein the intention determination method is executed when the program runs.
The embodiment of the invention provides equipment, which comprises a processor, a memory and a program which is stored on the memory and can run on the processor, wherein the processor executes the program and realizes the following steps:
determining a mask vector in an input statement, and taking the mask vector as a prediction target;
obtaining a target vector corresponding to the prediction target based on the non-fully-connected hidden layer;
and performing multi-purpose recognition on the target vector to determine the target purpose of the input statement.
Optionally, in the foregoing method, obtaining a target vector corresponding to the predicted target based on the non-fully-connected hidden layer includes:
obtaining each vector output by the prediction target based on the non-fully connected hidden layer;
extracting the characteristic value of each vector, and segmenting each characteristic value according to a preset step length;
and taking the vector corresponding to the maximum characteristic value in each segment as a target vector of the segment.
Optionally, the method for determining a mask vector in an input statement, where the mask vector is used as a prediction target includes:
performing word segmentation processing on the input sentence;
mapping the analysis result into a token vector according to the vocabulary;
randomly selecting a mask vector with preset coverage rate from the token vectors, and taking the mask vector as a prediction target.
Optionally, in the above method, performing multi-intent recognition on the target vector to determine the target intent of the input sentence includes:
carrying out state operation on the target vector to obtain a target vector to be analyzed;
obtaining each candidate intention corresponding to the target vector to be analyzed, and transferring the target vector to be analyzed to a Softmax prediction function to predict each candidate intention to obtain a probability value of each candidate intention;
and selecting the alternative intention corresponding to the highest value from the probability values as the target intention of the input statement.
Optionally, the above method, before obtaining the target vector corresponding to the predicted target based on the non-fully-connected hidden layer, further includes:
providing an alternative answer to the predicted objective.
The above method, optionally, is characterized in that the method of determining a mask vector in an input sentence, and using the mask vector as a prediction target, further includes:
and under the condition that the input sentence is a single sentence, deleting the classification method proposed for the sentence-level features in the Bert model.
The device herein may be a server, a PC, a PAD, a mobile phone, etc.
The present application also provides a computer program product, which, when being executed on a data processing device, is adapted to carry out the program with the following method steps:
determining a mask vector in an input statement, and taking the mask vector as a prediction target;
obtaining a target vector corresponding to the prediction target based on the non-fully-connected hidden layer;
and performing multi-purpose recognition on the target vector to determine the target purpose of the input statement.
Optionally, in the foregoing method, obtaining a target vector corresponding to the predicted target based on the non-fully-connected hidden layer includes:
obtaining each vector output by the prediction target based on the non-fully connected hidden layer;
extracting the characteristic value of each vector, and segmenting each characteristic value according to a preset step length;
and taking the vector corresponding to the maximum characteristic value in each segment as a target vector of the segment.
Optionally, the method for determining a mask vector in an input statement, where the mask vector is used as a prediction target includes:
performing word segmentation processing on the input sentence;
mapping the analysis result into a token vector according to the vocabulary;
randomly selecting a mask vector with preset coverage rate from the token vectors, and taking the mask vector as a prediction target.
Optionally, in the above method, performing multi-intent recognition on the target vector to determine the target intent of the input sentence includes:
carrying out state operation on the target vector to obtain a target vector to be analyzed;
obtaining each candidate intention corresponding to the target vector to be analyzed, and transferring the target vector to be analyzed to a Softmax prediction function to predict each candidate intention to obtain a probability value of each candidate intention;
and selecting the alternative intention corresponding to the highest value from the probability values as the target intention of the input statement.
Optionally, the above method, before obtaining the target vector corresponding to the predicted target based on the non-fully-connected hidden layer, further includes:
providing an alternative answer to the predicted objective.
The above method, optionally, is characterized in that the method of determining a mask vector in an input sentence, and using the mask vector as a prediction target, further includes:
and under the condition that the input sentence is a single sentence, deleting the classification method proposed for the sentence-level features in the Bert model.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the units may be implemented in the same software and/or hardware or in a plurality of software and/or hardware when implementing the invention.
From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The method and the device for determining the intent based on Bert provided by the invention are described in detail above, and the principle and the implementation mode of the invention are explained by applying specific examples in the text, and the description of the above examples is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
Claims (10)
1. An intent determination method based on Bert is applied to a target Bert model, wherein the target Bert model replaces a fully-connected hidden layer in the Bert model with a non-fully-connected hidden layer, and the method comprises the following steps:
determining a mask vector in an input statement, and taking the mask vector as a prediction target;
obtaining a target vector corresponding to the prediction target based on the non-fully-connected hidden layer;
and performing multi-purpose recognition on the target vector to determine the target purpose of the input statement.
2. The method according to claim 1, wherein obtaining a target vector corresponding to the predicted target based on the non-fully-connected hidden layer comprises:
obtaining each vector output by the prediction target based on the non-fully connected hidden layer;
extracting the characteristic value of each vector, and segmenting each characteristic value according to a preset step length;
and taking the vector corresponding to the maximum characteristic value in each segment as a target vector of the segment.
3. The method of claim 1, wherein determining a mask vector in the input sentence, and using the mask vector as a prediction target comprises:
performing word segmentation processing on the input sentence;
mapping the analysis result into a token vector according to the vocabulary;
randomly selecting a mask vector with preset coverage rate from the token vectors, and taking the mask vector as a prediction target.
4. The method of claim 1, wherein performing multi-intent recognition on the target vector to determine the target intent of the input sentence comprises:
carrying out state operation on the target vector to obtain a target vector to be analyzed;
obtaining each candidate intention corresponding to the target vector to be analyzed, and transferring the target vector to be analyzed to a Softmax prediction function to predict each candidate intention to obtain a probability value of each candidate intention;
and selecting the alternative intention corresponding to the highest value from the probability values as the target intention of the input statement.
5. The method according to claim 1, wherein before obtaining the target vector corresponding to the predicted target based on the non-fully-connected hidden layer, the method further comprises:
providing an alternative answer to the predicted objective.
6. The method of claim 1, wherein determining a mask vector in the input sentence, the mask vector being a prediction target, further comprises:
and under the condition that the input sentence is a single sentence, deleting the classification method proposed for the sentence-level features in the Bert model.
7. An intent determination device based on Bert, applied to a target Bert model, wherein the target Bert model replaces a fully connected hidden layer in the Bert model with a non-fully connected hidden layer, includes:
the prediction target determining module is used for determining a mask vector in an input statement and taking the mask vector as a prediction target;
an obtaining module, configured to obtain a target vector corresponding to the predicted target based on the non-fully-connected hidden layer;
and the target intention determining module is used for performing multi-intention recognition on the target vector and determining the target intention of the input statement.
8. The apparatus of claim 7, wherein the obtaining module comprises:
an obtaining unit configured to obtain each vector output by the prediction target based on the non-fully-connected hidden layer;
the extracting and segmenting unit is used for extracting the characteristic values of the vectors and segmenting the characteristic values according to a preset step length;
and the determining unit is used for taking the vector corresponding to the maximum characteristic value in each segment as the target vector of the segment.
9. A storage medium characterized by comprising a stored program, wherein the program executes the Bert-based intention determining method of any one of claims 1 to 6.
10. A processor, configured to run a program, wherein the program when running performs the Bert based intent determination method of any of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911219821.3A CN110968671A (en) | 2019-12-03 | 2019-12-03 | Intent determination method and device based on Bert |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911219821.3A CN110968671A (en) | 2019-12-03 | 2019-12-03 | Intent determination method and device based on Bert |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110968671A true CN110968671A (en) | 2020-04-07 |
Family
ID=70032677
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911219821.3A Pending CN110968671A (en) | 2019-12-03 | 2019-12-03 | Intent determination method and device based on Bert |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110968671A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111401077A (en) * | 2020-06-02 | 2020-07-10 | 腾讯科技(深圳)有限公司 | Language model processing method and device and computer equipment |
CN111554304A (en) * | 2020-04-25 | 2020-08-18 | 中信银行股份有限公司 | User tag obtaining method, device and equipment |
CN111783443A (en) * | 2020-06-29 | 2020-10-16 | 百度在线网络技术(北京)有限公司 | Text disturbance detection method, disturbance reduction method, disturbance processing method and device |
CN112053687A (en) * | 2020-07-31 | 2020-12-08 | 出门问问信息科技有限公司 | Voice processing method and device, computer readable storage medium and equipment |
CN112257432A (en) * | 2020-11-02 | 2021-01-22 | 北京淇瑀信息科技有限公司 | Self-adaptive intention identification method and device and electronic equipment |
CN112507704A (en) * | 2020-12-15 | 2021-03-16 | 中国联合网络通信集团有限公司 | Multi-intention recognition method, device, equipment and storage medium |
CN112989800A (en) * | 2021-04-30 | 2021-06-18 | 平安科技(深圳)有限公司 | Multi-intention identification method and device based on Bert sections and readable storage medium |
CN113571062A (en) * | 2020-04-28 | 2021-10-29 | 中国移动通信集团浙江有限公司 | Client tag identification method and device based on voice data and computing equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109977428A (en) * | 2019-03-29 | 2019-07-05 | 北京金山数字娱乐科技有限公司 | A kind of method and device that answer obtains |
CN110232652A (en) * | 2019-05-27 | 2019-09-13 | 珠海格力电器股份有限公司 | Image processing engine processing method, the image processing method for terminal, terminal |
CN110276075A (en) * | 2019-06-21 | 2019-09-24 | 腾讯科技(深圳)有限公司 | Model training method, name entity recognition method, device, equipment and medium |
CN110334210A (en) * | 2019-05-30 | 2019-10-15 | 哈尔滨理工大学 | A kind of Chinese sentiment analysis method merged based on BERT with LSTM, CNN |
CN110442777A (en) * | 2019-06-24 | 2019-11-12 | 华中师范大学 | Pseudo-linear filter model information search method and system based on BERT |
-
2019
- 2019-12-03 CN CN201911219821.3A patent/CN110968671A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109977428A (en) * | 2019-03-29 | 2019-07-05 | 北京金山数字娱乐科技有限公司 | A kind of method and device that answer obtains |
CN110232652A (en) * | 2019-05-27 | 2019-09-13 | 珠海格力电器股份有限公司 | Image processing engine processing method, the image processing method for terminal, terminal |
CN110334210A (en) * | 2019-05-30 | 2019-10-15 | 哈尔滨理工大学 | A kind of Chinese sentiment analysis method merged based on BERT with LSTM, CNN |
CN110276075A (en) * | 2019-06-21 | 2019-09-24 | 腾讯科技(深圳)有限公司 | Model training method, name entity recognition method, device, equipment and medium |
CN110442777A (en) * | 2019-06-24 | 2019-11-12 | 华中师范大学 | Pseudo-linear filter model information search method and system based on BERT |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111554304A (en) * | 2020-04-25 | 2020-08-18 | 中信银行股份有限公司 | User tag obtaining method, device and equipment |
CN113571062A (en) * | 2020-04-28 | 2021-10-29 | 中国移动通信集团浙江有限公司 | Client tag identification method and device based on voice data and computing equipment |
CN111401077A (en) * | 2020-06-02 | 2020-07-10 | 腾讯科技(深圳)有限公司 | Language model processing method and device and computer equipment |
CN111401077B (en) * | 2020-06-02 | 2020-09-18 | 腾讯科技(深圳)有限公司 | Language model processing method and device and computer equipment |
CN111783443A (en) * | 2020-06-29 | 2020-10-16 | 百度在线网络技术(北京)有限公司 | Text disturbance detection method, disturbance reduction method, disturbance processing method and device |
CN111783443B (en) * | 2020-06-29 | 2023-08-15 | 百度在线网络技术(北京)有限公司 | Text disturbance detection method, disturbance recovery method, disturbance processing method and device |
CN112053687A (en) * | 2020-07-31 | 2020-12-08 | 出门问问信息科技有限公司 | Voice processing method and device, computer readable storage medium and equipment |
CN112257432A (en) * | 2020-11-02 | 2021-01-22 | 北京淇瑀信息科技有限公司 | Self-adaptive intention identification method and device and electronic equipment |
CN112507704A (en) * | 2020-12-15 | 2021-03-16 | 中国联合网络通信集团有限公司 | Multi-intention recognition method, device, equipment and storage medium |
CN112507704B (en) * | 2020-12-15 | 2023-10-03 | 中国联合网络通信集团有限公司 | Multi-intention recognition method, device, equipment and storage medium |
CN112989800A (en) * | 2021-04-30 | 2021-06-18 | 平安科技(深圳)有限公司 | Multi-intention identification method and device based on Bert sections and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110968671A (en) | Intent determination method and device based on Bert | |
CN111310438B (en) | Chinese sentence semantic intelligent matching method and device based on multi-granularity fusion model | |
CN108959246A (en) | Answer selection method, device and electronic equipment based on improved attention mechanism | |
CN110929515B (en) | Reading understanding method and system based on cooperative attention and adaptive adjustment | |
CN111783474B (en) | Comment text viewpoint information processing method and device and storage medium | |
CN109948149B (en) | Text classification method and device | |
CN110598206A (en) | Text semantic recognition method and device, computer equipment and storage medium | |
CN108875074A (en) | Based on answer selection method, device and the electronic equipment for intersecting attention neural network | |
CN110232122A (en) | A kind of Chinese Question Classification method based on text error correction and neural network | |
CN111783993A (en) | Intelligent labeling method and device, intelligent platform and storage medium | |
CN110781686B (en) | Statement similarity calculation method and device and computer equipment | |
CN111160000B (en) | Composition automatic scoring method, device terminal equipment and storage medium | |
CN110275928B (en) | Iterative entity relation extraction method | |
CN110633359A (en) | Sentence equivalence judgment method and device | |
CN111583911A (en) | Speech recognition method, device, terminal and medium based on label smoothing | |
CN110874392B (en) | Text network information fusion embedding method based on depth bidirectional attention mechanism | |
CN114386409A (en) | Self-distillation Chinese word segmentation method based on attention mechanism, terminal and storage medium | |
CN110969005B (en) | Method and device for determining similarity between entity corpora | |
CN114064852A (en) | Method and device for extracting relation of natural language, electronic equipment and storage medium | |
CN110929532B (en) | Data processing method, device, equipment and storage medium | |
CN111400492B (en) | Hierarchical feature text classification method and system based on SFM-DCNN | |
CN112579739A (en) | Reading understanding method based on ELMo embedding and gating self-attention mechanism | |
CN113591988B (en) | Knowledge cognitive structure analysis method, system, computer equipment, medium and terminal | |
CN114638229A (en) | Entity identification method, device, medium and equipment of record data | |
CN114398482A (en) | Dictionary construction method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200407 |