CN110968671A - Intent determination method and device based on Bert - Google Patents

Intent determination method and device based on Bert Download PDF

Info

Publication number
CN110968671A
CN110968671A CN201911219821.3A CN201911219821A CN110968671A CN 110968671 A CN110968671 A CN 110968671A CN 201911219821 A CN201911219821 A CN 201911219821A CN 110968671 A CN110968671 A CN 110968671A
Authority
CN
China
Prior art keywords
target
vector
hidden layer
fully
bert
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911219821.3A
Other languages
Chinese (zh)
Inventor
周思丞
苏少炜
陈孝良
常乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing SoundAI Technology Co Ltd
Original Assignee
Beijing SoundAI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing SoundAI Technology Co Ltd filed Critical Beijing SoundAI Technology Co Ltd
Priority to CN201911219821.3A priority Critical patent/CN110968671A/en
Publication of CN110968671A publication Critical patent/CN110968671A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models

Abstract

The invention discloses an intention determining method based on Bert, which is applied to a target Bert model, wherein the target Bert model replaces a fully-connected hidden layer in the Bert model with a non-fully-connected hidden layer, and comprises the following steps: determining a mask vector in an input statement, and taking the mask vector as a prediction target; obtaining a target vector corresponding to the prediction target based on the non-fully-connected hidden layer; and performing multi-purpose recognition on the target vector to determine the target purpose of the input statement. In the above determination method, the target Bert model replaces the fully connected hidden layer in the Bert model with the non-fully connected hidden layer, and the fully connected hidden layer doubles the calculation amount, while the non-fully connected hidden layer reduces the structural complexity of the model and requires less calculation time, thereby reducing the time overhead of intent prediction based on the Bert model.

Description

Intent determination method and device based on Bert
Technical Field
The invention relates to the technical field of voice recognition, in particular to an intention determining method and device based on Bert.
Background
The Bert model (Bidirectional Encoder expressions from transformations) is a new type of language model. The pre-trained deep bi-directional representation is trained by jointly adjusting the bi-directional transformers in all layers.
The core content of the Bert algorithm adopts 24 layers of transform feature extraction neural networks, the combined training of Mask LM and context prediction is adopted, 33 hundred million training corpora are adopted by google, and the time spent on model training and use is very huge while excellent algorithm performance is achieved.
Therefore, in the process of intent prediction based on the Bert model, the time overhead in model training and use is very large. 16 tpu clusters are required to be trained for 3 days, a common GPU needs to be trained for 3 months, and in the using process, engineers need to perform fine-tuning on the Bert model according to data of a specific NLP task, so that the calculation cost is huge.
Disclosure of Invention
In view of the above, the present invention provides an intent determination method and apparatus based on Bert, so as to solve the problem that in the prior art, when a Bert model obtains excellent algorithm performance, time overhead is very large during model training and use. 16 tpu clusters are required to be trained for 3 days, a common GPU needs to be trained for 3 months, however, in the using process, engineers need to perform fine-tuning on the Bert model according to data of a specific NLP task, and therefore, the calculation cost is also a very huge problem, and the specific scheme is as follows:
an intent determination method based on Bert is applied to a target Bert model, wherein the target Bert model replaces a fully-connected hidden layer in the Bert model with a non-fully-connected hidden layer, and comprises the following steps:
determining a mask vector in an input statement, and taking the mask vector as a prediction target;
obtaining a target vector corresponding to the prediction target based on the non-fully-connected hidden layer;
and performing multi-purpose recognition on the target vector to determine the target purpose of the input statement.
Optionally, in the foregoing method, obtaining a target vector corresponding to the predicted target based on the non-fully-connected hidden layer includes:
obtaining each vector output by the prediction target based on the non-fully connected hidden layer;
extracting the characteristic value of each vector, and segmenting each characteristic value according to a preset step length;
and taking the vector corresponding to the maximum characteristic value in each segment as a target vector of the segment.
Optionally, the method for determining a mask vector in an input statement, where the mask vector is used as a prediction target includes:
performing word segmentation processing on the input sentence;
mapping the analysis result into a token vector according to the vocabulary;
randomly selecting a mask vector with preset coverage rate from the token vectors, and taking the mask vector as a prediction target.
Optionally, in the above method, performing multi-intent recognition on the target vector to determine the target intent of the input sentence includes:
carrying out state operation on the target vector to obtain a target vector to be analyzed;
obtaining each candidate intention corresponding to the target vector to be analyzed, and transferring the target vector to be analyzed to a Softmax prediction function to predict each candidate intention to obtain a probability value of each candidate intention;
and selecting the alternative intention corresponding to the highest value from the probability values as the target intention of the input statement.
Optionally, the above method, before obtaining the target vector corresponding to the predicted target based on the non-fully-connected hidden layer, further includes:
providing an alternative answer to the predicted objective.
The above method, optionally, is characterized in that the method of determining a mask vector in an input sentence, and using the mask vector as a prediction target, further includes:
and under the condition that the input sentence is a single sentence, deleting the classification method proposed for the sentence-level features in the Bert model.
An intent determination device based on Bert, applied to a target Bert model, wherein the target Bert model replaces a fully-connected hidden layer in the Bert model with a non-fully-connected hidden layer, and the intent determination device comprises:
the prediction target determining module is used for determining a mask vector in an input statement and taking the mask vector as a prediction target;
an obtaining module, configured to obtain a target vector corresponding to the predicted target based on the non-fully-connected hidden layer;
and the target intention determining module is used for performing multi-intention recognition on the target vector and determining the target intention of the input statement.
The above apparatus, optionally, the obtaining module includes:
an obtaining unit configured to obtain each vector output by the prediction target based on the non-fully-connected hidden layer;
the extracting and segmenting unit is used for extracting the characteristic values of the vectors and segmenting the characteristic values according to a preset step length;
and the determining unit is used for taking the vector corresponding to the maximum characteristic value in each segment as the target vector of the segment.
A storage medium comprising a stored program, wherein the program performs the Bert-based intent determination method described above.
A processor for executing a program, wherein the program when executed performs the Bert based intent determination method described above.
Compared with the prior art, the invention has the following advantages:
the invention discloses an intention determining method based on Bert, which is applied to a target Bert model, wherein a fully-connected hidden layer in the Bert model is replaced by a non-fully-connected hidden layer in the target Bert model, and the method comprises the following steps of: determining a mask vector in an input statement, and taking the mask vector as a prediction target; obtaining a target vector corresponding to the prediction target based on the non-fully-connected hidden layer; and performing multi-purpose recognition on the target vector to determine the target purpose of the input statement. In the above determination method, the target Bert model replaces the fully connected hidden layer in the Bert model with the non-fully connected hidden layer, and the fully connected hidden layer doubles the calculation amount, while the non-fully connected hidden layer reduces the structural complexity of the model and requires less calculation time, thereby reducing the time overhead of intent prediction based on the Bert model.
Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic structural diagram of a prior art Bert model;
FIG. 2 is a flowchart of an intent determination method based on Bert disclosed in an embodiment of the present application;
FIG. 3 is another flowchart of a Bert-based intent determination method disclosed in an embodiment of the present application;
fig. 4 is a block diagram of a Bert-based intent determination apparatus according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The invention discloses an intent determination method and device based on Bert, which are applied to the process of intent determination of an input statement based on the Bert algorithm, wherein the core content of the Bert algorithm adopts a 24-layer transform feature extraction neural network, and joint training of MaskLM and context prediction is adopted. Meanwhile, the google adopts 33 hundred million training corpora, and the time spent on model training and use is huge while excellent algorithm performance is obtained. The Bert model is used as one of the pre-training (fine-tuning) models, and is pre-trained by the unlabeled corpus first, and the corpus in the application field needs to be subjected to fine-tuning training in the using process. The schematic structural diagram of the Bert model is shown in fig. 1, where TRM is a transform module in the Bert model, En is an input of the Bert model, that is, an input statement, and Tn is a result of feature extraction of the transform model. The Transformer is a novel feature extraction tool, replaces an original time sequence structure by an attention mechanism (attention), better realizes parallelization calculation, and can solve the long-term dependence problem.
The execution flow of the intent determination method based on Bert is shown in fig. 2, and the method is applied to a target Bert model, wherein the target Bert model replaces a fully-connected hidden layer in the Bert model with a non-fully-connected hidden layer, and the method comprises the following steps:
s101, determining a mask vector in an input statement, and taking the mask vector as a prediction target;
in the embodiment of the invention, the input sentence can be input into the Bert model only by vectorization, the input sentence is firstly subjected to word segmentation, and is mapped into a randomly initialized token vector according to a vocabulary, wherein the vocabulary is constructed according to a historical training process, and is preferably updated every time a preset duration is set, an update instruction is received or other update conditions are received. And simultaneously, randomly selecting a mask vector covered by 15% from the token vectors as a prediction target of the model.
S102, obtaining a target vector corresponding to the predicted target based on the non-full-connection hidden layer;
in the embodiment of the invention, the target Bert model replaces the fully-connected hidden layer in the Bert model with the non-fully-connected hidden layer, so that the replacing of the Bert model has 3 hundred million parameters, and a large amount of calculation is needed when one vector is input. The reason why the parameters are so huge is that due to the full-connection structure in the middle of the hidden layer of the neural network, the full connection has the advantage that the nonlinear fitting capacity of the model is enhanced, but at the same time, the calculation amount of the model is multiplied. Therefore, in the embodiment of the present invention, the fully connected hidden layer is replaced with the non-fully connected hidden layer, a maxpool method is adopted to perform a dimension reduction process on each vector of the hidden layer state output in the prediction process, and each finally obtained vector is subjected to vector calculation to obtain a target vector, where the vector calculation method may be a method of setting a calculation method according to specific situations, such as direct accumulation, accumulation according to a preset rule, and the like, and is not limited in the embodiment of the present invention.
S103, performing multi-purpose recognition on the target vector, and determining the target purpose of the input statement.
In the embodiment of the present invention, the purpose of the intent recognition task is to implement fine tuning, and the purpose of the fine tuning is to enable the target Bert model to learn a specific task, for example: the input and sentence are: "how do the weather today? ", the intent here is for a weather query, and if the user asks a similar sentence again, the target Bert model will know that this is under the intent of the weather query.
The multi-intention recognition task execution flow is as follows: performing state operation on each target vector to obtain target vectors to be analyzed, wherein each target vector to be analyzed is pre-allocated with a corresponding alternative intention, a specific allocation principle can be allocated according to experience, each alternative intention corresponding to the target vector to be analyzed is obtained, the target vector to be analyzed is transferred to a softmax prediction function to predict each alternative intention, and a back propagation and descending gradient algorithm is adopted to train the target Bert model, wherein the softmax prediction function is as follows:
Figure BDA0002300491500000061
wherein: a predict-probability value;
performance-prediction probability;
n-number of categories of alternative intents;
in the embodiment of the invention, if the number of the candidate intention categories is 5, the target Bert model can obtain the prediction probability of each category, softmax is a normalization function, and the normalization method has two forms, namely, the number is changed into a decimal between (0 and 1), and data is mapped into a range of 0-1 for processing, so that the method is more convenient and faster. After normalization, the category with the maximum probability value is the target intention of the model, and then error calculation is carried out by utilizing the prediction probability, wherein the calculation formula is as follows:
Figure BDA0002300491500000071
wherein: n is the number of categories, y is the true predictor (0 or 1), and y' is the prediction probability for that category.
Gradient descent calculation method:
Figure BDA0002300491500000072
where θ is the parameter to be updated, α is the learning rate,
Figure BDA0002300491500000073
is the gradient direction of the parameter.
The invention discloses an intention determining method based on Bert, which is applied to a target Bert model, wherein the target Bert model replaces a fully-connected hidden layer in the Bert model with a non-fully-connected hidden layer, and comprises the following steps: determining a mask vector in an input statement, and taking the mask vector as a prediction target; obtaining a target vector corresponding to the prediction target based on the non-fully-connected hidden layer; and performing multi-purpose recognition on the target vector to determine the target purpose of the input statement. In the above determination method, the target Bert model replaces the fully connected hidden layer in the Bert model with the non-fully connected hidden layer, and the fully connected hidden layer doubles the calculation amount, while the non-fully connected hidden layer reduces the structural complexity of the model and requires less calculation time, thereby reducing the time overhead of intent prediction based on the Bert model.
In the embodiment of the present invention, in the prior art, an alternative answer is not provided for the predicted target, a traditional training mode is context prediction, a next word which may appear is predicted through the context, which is equivalent to randomly selecting a word from the word list, and since a Bert model has a slow convergence speed when extra information is less, a prediction cycle is long, the prediction is performed by using a mode of providing an alternative answer for the predicted target, and a specific process is as follows: the target Bert model adopts MaskLM in training, wherein the MaskLM is a similar information leakage mode of completion filling, when the target Bert model is predicted, a plurality of alternative answers (including correct answers) are randomly selected firstly and provided for the target Bert model, and the correct answers selected from the alternative answers help the target Bert model to be rapidly converged.
Furthermore, in order to prevent the target Bert model from learning less knowledge, in addition to adopting a Mask LM mode when inputting a sentence, the knowledge amount learned by the model is randomly increased by 10% Mask between the transform hierarchies.
In the embodiment of the present invention, S101 is equivalent to a pre-training task in the prior art, and aims to enable the Bert model to learn grammar knowledge.
In the embodiment of the present invention, a flow of a method for obtaining a target vector corresponding to the predicted target based on the non-fully-connected hidden layer is shown in fig. 3, and includes the steps of:
s201, obtaining each vector output by the prediction target based on the non-fully-connected hidden layer;
in the embodiment of the present invention, the predicted target may output vectors after passing through the non-fully-connected hidden layer, and each vector output by the predicted target based on the non-fully-connected hidden layer is obtained.
S202, extracting the characteristic values of the vectors, and segmenting the characteristic values according to a preset step length;
in the embodiment of the present invention, the feature values of the vectors are extracted, and after the extraction is completed, a preset step length is selected to segment the feature values, where the preset step length may be set according to experience or specific conditions, and the value of the preset step length is not limited.
And S203, taking the vector corresponding to the maximum characteristic value in each segment as a target vector of the segment.
In the embodiment of the present invention, the vector corresponding to the maximum eigenvalue in each segment is used as the target vector of the segment, for example, if the preset step length is 6, the original 768-dimensional vector of each hidden layer can be reduced to 128-dimensional, because the eigen dimensions of the hidden layers are reduced, the corresponding weight matrix is correspondingly reduced, the calculation amount and the training time are also reduced, and the main information of each token can be retained.
In the embodiment of the invention, based on the intention determination method, the non-information mask prediction in the Bert model is changed into prediction for providing alternative answers, so that the learning difficulty of the model is reduced, meanwhile, a non-full-link hidden layer is adopted to replace a full-link hidden layer, the structural complexity of the model is reduced, the time overhead of the Bert algorithm is optimized, and the time overhead of the Bert algorithm in an industrial production environment meets the requirement.
In the embodiment of the invention, the Bert model provides segment embedding for sentence-level features, and the binary classification problem of sense-level is added.
In the embodiment of the invention, 70 ten thousand training corpuses and 1 ten thousand testing corpuses are used, wherein the training corpuses and the testing corpuses are divided into 9 categories. The analysis is performed by the Bert model and the target Bert model respectively, and the analysis results are shown in table 1:
TABLE 1
Figure BDA0002300491500000091
By the target Bert model, the time required by the training of the model is greatly reduced, and meanwhile, the lost model precision is within an acceptable range.
Based on the above intent determination method based on Bert, in the embodiment of the present invention, there is further provided an intent determination device based on Bert, which is applied to a target Bert model, where the target Bert model replaces a fully connected hidden layer in the Bert model with a non-fully connected hidden layer, and a structural block diagram of the intent determination device is shown in fig. 4, and includes:
a predicted target determining module 301, an obtaining module 302 and a target intention extracting module 303.
Wherein the content of the first and second substances,
the prediction target determining module 301 is configured to determine a mask vector in an input statement, and use the mask vector as a prediction target;
the obtaining module 302 is configured to obtain a target vector corresponding to the predicted target based on the non-fully-connected hidden layer;
the target intention determining module 303 is configured to perform multi-intention recognition on the target vector, and determine a target intention of the input sentence.
The invention discloses an intention determining method based on Bert, which is applied to a target Bert model, wherein the target Bert model replaces a fully-connected hidden layer in the Bert model with a non-fully-connected hidden layer, and comprises the following steps: determining a mask vector in an input statement, and taking the mask vector as a prediction target; obtaining a target vector corresponding to the prediction target based on the non-fully-connected hidden layer; and performing multi-purpose recognition on the target vector to determine the target purpose of the input statement. In the above determination method, the target Bert model replaces the fully connected hidden layer in the Bert model with the non-fully connected hidden layer, and the fully connected hidden layer doubles the calculation amount, while the non-fully connected hidden layer reduces the structural complexity of the model and requires less calculation time, thereby reducing the time overhead of intent prediction based on the Bert model.
In this embodiment of the present invention, the obtaining module 302 includes:
an acquisition unit 304, an extraction and segmentation unit 305 and a determination unit 306.
Wherein the content of the first and second substances,
the obtaining unit 304 is configured to obtain each vector output by the prediction target based on the non-fully-connected hidden layer;
the extracting and segmenting unit 305 is configured to extract the feature values of the vectors, and segment the feature values according to a preset step length;
the determining unit 306 is configured to use, in each segment, a vector corresponding to the maximum eigenvalue in the segment as a target vector of the segment.
The device comprises a processor and a memory, wherein the prediction target determining module, the obtaining module, the target intention selecting module and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more than one, provides alternative answers for predicting the target, and performs dimensionality reduction processing on each vector of the hidden layer state, so that the learning complexity and the structure complexity of the Bert model are reduced respectively, and the time overhead of the Bert algorithm is reduced.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
An embodiment of the present invention provides a storage medium on which a program is stored, the program implementing the intent determination method when executed by a processor.
The embodiment of the invention provides a processor, which is used for running a program, wherein the intention determination method is executed when the program runs.
The embodiment of the invention provides equipment, which comprises a processor, a memory and a program which is stored on the memory and can run on the processor, wherein the processor executes the program and realizes the following steps:
determining a mask vector in an input statement, and taking the mask vector as a prediction target;
obtaining a target vector corresponding to the prediction target based on the non-fully-connected hidden layer;
and performing multi-purpose recognition on the target vector to determine the target purpose of the input statement.
Optionally, in the foregoing method, obtaining a target vector corresponding to the predicted target based on the non-fully-connected hidden layer includes:
obtaining each vector output by the prediction target based on the non-fully connected hidden layer;
extracting the characteristic value of each vector, and segmenting each characteristic value according to a preset step length;
and taking the vector corresponding to the maximum characteristic value in each segment as a target vector of the segment.
Optionally, the method for determining a mask vector in an input statement, where the mask vector is used as a prediction target includes:
performing word segmentation processing on the input sentence;
mapping the analysis result into a token vector according to the vocabulary;
randomly selecting a mask vector with preset coverage rate from the token vectors, and taking the mask vector as a prediction target.
Optionally, in the above method, performing multi-intent recognition on the target vector to determine the target intent of the input sentence includes:
carrying out state operation on the target vector to obtain a target vector to be analyzed;
obtaining each candidate intention corresponding to the target vector to be analyzed, and transferring the target vector to be analyzed to a Softmax prediction function to predict each candidate intention to obtain a probability value of each candidate intention;
and selecting the alternative intention corresponding to the highest value from the probability values as the target intention of the input statement.
Optionally, the above method, before obtaining the target vector corresponding to the predicted target based on the non-fully-connected hidden layer, further includes:
providing an alternative answer to the predicted objective.
The above method, optionally, is characterized in that the method of determining a mask vector in an input sentence, and using the mask vector as a prediction target, further includes:
and under the condition that the input sentence is a single sentence, deleting the classification method proposed for the sentence-level features in the Bert model.
The device herein may be a server, a PC, a PAD, a mobile phone, etc.
The present application also provides a computer program product, which, when being executed on a data processing device, is adapted to carry out the program with the following method steps:
determining a mask vector in an input statement, and taking the mask vector as a prediction target;
obtaining a target vector corresponding to the prediction target based on the non-fully-connected hidden layer;
and performing multi-purpose recognition on the target vector to determine the target purpose of the input statement.
Optionally, in the foregoing method, obtaining a target vector corresponding to the predicted target based on the non-fully-connected hidden layer includes:
obtaining each vector output by the prediction target based on the non-fully connected hidden layer;
extracting the characteristic value of each vector, and segmenting each characteristic value according to a preset step length;
and taking the vector corresponding to the maximum characteristic value in each segment as a target vector of the segment.
Optionally, the method for determining a mask vector in an input statement, where the mask vector is used as a prediction target includes:
performing word segmentation processing on the input sentence;
mapping the analysis result into a token vector according to the vocabulary;
randomly selecting a mask vector with preset coverage rate from the token vectors, and taking the mask vector as a prediction target.
Optionally, in the above method, performing multi-intent recognition on the target vector to determine the target intent of the input sentence includes:
carrying out state operation on the target vector to obtain a target vector to be analyzed;
obtaining each candidate intention corresponding to the target vector to be analyzed, and transferring the target vector to be analyzed to a Softmax prediction function to predict each candidate intention to obtain a probability value of each candidate intention;
and selecting the alternative intention corresponding to the highest value from the probability values as the target intention of the input statement.
Optionally, the above method, before obtaining the target vector corresponding to the predicted target based on the non-fully-connected hidden layer, further includes:
providing an alternative answer to the predicted objective.
The above method, optionally, is characterized in that the method of determining a mask vector in an input sentence, and using the mask vector as a prediction target, further includes:
and under the condition that the input sentence is a single sentence, deleting the classification method proposed for the sentence-level features in the Bert model.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the units may be implemented in the same software and/or hardware or in a plurality of software and/or hardware when implementing the invention.
From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The method and the device for determining the intent based on Bert provided by the invention are described in detail above, and the principle and the implementation mode of the invention are explained by applying specific examples in the text, and the description of the above examples is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. An intent determination method based on Bert is applied to a target Bert model, wherein the target Bert model replaces a fully-connected hidden layer in the Bert model with a non-fully-connected hidden layer, and the method comprises the following steps:
determining a mask vector in an input statement, and taking the mask vector as a prediction target;
obtaining a target vector corresponding to the prediction target based on the non-fully-connected hidden layer;
and performing multi-purpose recognition on the target vector to determine the target purpose of the input statement.
2. The method according to claim 1, wherein obtaining a target vector corresponding to the predicted target based on the non-fully-connected hidden layer comprises:
obtaining each vector output by the prediction target based on the non-fully connected hidden layer;
extracting the characteristic value of each vector, and segmenting each characteristic value according to a preset step length;
and taking the vector corresponding to the maximum characteristic value in each segment as a target vector of the segment.
3. The method of claim 1, wherein determining a mask vector in the input sentence, and using the mask vector as a prediction target comprises:
performing word segmentation processing on the input sentence;
mapping the analysis result into a token vector according to the vocabulary;
randomly selecting a mask vector with preset coverage rate from the token vectors, and taking the mask vector as a prediction target.
4. The method of claim 1, wherein performing multi-intent recognition on the target vector to determine the target intent of the input sentence comprises:
carrying out state operation on the target vector to obtain a target vector to be analyzed;
obtaining each candidate intention corresponding to the target vector to be analyzed, and transferring the target vector to be analyzed to a Softmax prediction function to predict each candidate intention to obtain a probability value of each candidate intention;
and selecting the alternative intention corresponding to the highest value from the probability values as the target intention of the input statement.
5. The method according to claim 1, wherein before obtaining the target vector corresponding to the predicted target based on the non-fully-connected hidden layer, the method further comprises:
providing an alternative answer to the predicted objective.
6. The method of claim 1, wherein determining a mask vector in the input sentence, the mask vector being a prediction target, further comprises:
and under the condition that the input sentence is a single sentence, deleting the classification method proposed for the sentence-level features in the Bert model.
7. An intent determination device based on Bert, applied to a target Bert model, wherein the target Bert model replaces a fully connected hidden layer in the Bert model with a non-fully connected hidden layer, includes:
the prediction target determining module is used for determining a mask vector in an input statement and taking the mask vector as a prediction target;
an obtaining module, configured to obtain a target vector corresponding to the predicted target based on the non-fully-connected hidden layer;
and the target intention determining module is used for performing multi-intention recognition on the target vector and determining the target intention of the input statement.
8. The apparatus of claim 7, wherein the obtaining module comprises:
an obtaining unit configured to obtain each vector output by the prediction target based on the non-fully-connected hidden layer;
the extracting and segmenting unit is used for extracting the characteristic values of the vectors and segmenting the characteristic values according to a preset step length;
and the determining unit is used for taking the vector corresponding to the maximum characteristic value in each segment as the target vector of the segment.
9. A storage medium characterized by comprising a stored program, wherein the program executes the Bert-based intention determining method of any one of claims 1 to 6.
10. A processor, configured to run a program, wherein the program when running performs the Bert based intent determination method of any of claims 1 to 6.
CN201911219821.3A 2019-12-03 2019-12-03 Intent determination method and device based on Bert Pending CN110968671A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911219821.3A CN110968671A (en) 2019-12-03 2019-12-03 Intent determination method and device based on Bert

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911219821.3A CN110968671A (en) 2019-12-03 2019-12-03 Intent determination method and device based on Bert

Publications (1)

Publication Number Publication Date
CN110968671A true CN110968671A (en) 2020-04-07

Family

ID=70032677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911219821.3A Pending CN110968671A (en) 2019-12-03 2019-12-03 Intent determination method and device based on Bert

Country Status (1)

Country Link
CN (1) CN110968671A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401077A (en) * 2020-06-02 2020-07-10 腾讯科技(深圳)有限公司 Language model processing method and device and computer equipment
CN111554304A (en) * 2020-04-25 2020-08-18 中信银行股份有限公司 User tag obtaining method, device and equipment
CN111783443A (en) * 2020-06-29 2020-10-16 百度在线网络技术(北京)有限公司 Text disturbance detection method, disturbance reduction method, disturbance processing method and device
CN112053687A (en) * 2020-07-31 2020-12-08 出门问问信息科技有限公司 Voice processing method and device, computer readable storage medium and equipment
CN112257432A (en) * 2020-11-02 2021-01-22 北京淇瑀信息科技有限公司 Self-adaptive intention identification method and device and electronic equipment
CN112507704A (en) * 2020-12-15 2021-03-16 中国联合网络通信集团有限公司 Multi-intention recognition method, device, equipment and storage medium
CN112989800A (en) * 2021-04-30 2021-06-18 平安科技(深圳)有限公司 Multi-intention identification method and device based on Bert sections and readable storage medium
CN113571062A (en) * 2020-04-28 2021-10-29 中国移动通信集团浙江有限公司 Client tag identification method and device based on voice data and computing equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977428A (en) * 2019-03-29 2019-07-05 北京金山数字娱乐科技有限公司 A kind of method and device that answer obtains
CN110232652A (en) * 2019-05-27 2019-09-13 珠海格力电器股份有限公司 Image processing engine processing method, the image processing method for terminal, terminal
CN110276075A (en) * 2019-06-21 2019-09-24 腾讯科技(深圳)有限公司 Model training method, name entity recognition method, device, equipment and medium
CN110334210A (en) * 2019-05-30 2019-10-15 哈尔滨理工大学 A kind of Chinese sentiment analysis method merged based on BERT with LSTM, CNN
CN110442777A (en) * 2019-06-24 2019-11-12 华中师范大学 Pseudo-linear filter model information search method and system based on BERT

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977428A (en) * 2019-03-29 2019-07-05 北京金山数字娱乐科技有限公司 A kind of method and device that answer obtains
CN110232652A (en) * 2019-05-27 2019-09-13 珠海格力电器股份有限公司 Image processing engine processing method, the image processing method for terminal, terminal
CN110334210A (en) * 2019-05-30 2019-10-15 哈尔滨理工大学 A kind of Chinese sentiment analysis method merged based on BERT with LSTM, CNN
CN110276075A (en) * 2019-06-21 2019-09-24 腾讯科技(深圳)有限公司 Model training method, name entity recognition method, device, equipment and medium
CN110442777A (en) * 2019-06-24 2019-11-12 华中师范大学 Pseudo-linear filter model information search method and system based on BERT

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111554304A (en) * 2020-04-25 2020-08-18 中信银行股份有限公司 User tag obtaining method, device and equipment
CN113571062A (en) * 2020-04-28 2021-10-29 中国移动通信集团浙江有限公司 Client tag identification method and device based on voice data and computing equipment
CN111401077A (en) * 2020-06-02 2020-07-10 腾讯科技(深圳)有限公司 Language model processing method and device and computer equipment
CN111401077B (en) * 2020-06-02 2020-09-18 腾讯科技(深圳)有限公司 Language model processing method and device and computer equipment
CN111783443A (en) * 2020-06-29 2020-10-16 百度在线网络技术(北京)有限公司 Text disturbance detection method, disturbance reduction method, disturbance processing method and device
CN111783443B (en) * 2020-06-29 2023-08-15 百度在线网络技术(北京)有限公司 Text disturbance detection method, disturbance recovery method, disturbance processing method and device
CN112053687A (en) * 2020-07-31 2020-12-08 出门问问信息科技有限公司 Voice processing method and device, computer readable storage medium and equipment
CN112257432A (en) * 2020-11-02 2021-01-22 北京淇瑀信息科技有限公司 Self-adaptive intention identification method and device and electronic equipment
CN112507704A (en) * 2020-12-15 2021-03-16 中国联合网络通信集团有限公司 Multi-intention recognition method, device, equipment and storage medium
CN112507704B (en) * 2020-12-15 2023-10-03 中国联合网络通信集团有限公司 Multi-intention recognition method, device, equipment and storage medium
CN112989800A (en) * 2021-04-30 2021-06-18 平安科技(深圳)有限公司 Multi-intention identification method and device based on Bert sections and readable storage medium

Similar Documents

Publication Publication Date Title
CN110968671A (en) Intent determination method and device based on Bert
CN111310438B (en) Chinese sentence semantic intelligent matching method and device based on multi-granularity fusion model
CN108959246A (en) Answer selection method, device and electronic equipment based on improved attention mechanism
CN110929515B (en) Reading understanding method and system based on cooperative attention and adaptive adjustment
CN111783474B (en) Comment text viewpoint information processing method and device and storage medium
CN109948149B (en) Text classification method and device
CN110598206A (en) Text semantic recognition method and device, computer equipment and storage medium
CN108875074A (en) Based on answer selection method, device and the electronic equipment for intersecting attention neural network
CN110232122A (en) A kind of Chinese Question Classification method based on text error correction and neural network
CN111783993A (en) Intelligent labeling method and device, intelligent platform and storage medium
CN110781686B (en) Statement similarity calculation method and device and computer equipment
CN111160000B (en) Composition automatic scoring method, device terminal equipment and storage medium
CN110275928B (en) Iterative entity relation extraction method
CN110633359A (en) Sentence equivalence judgment method and device
CN111583911A (en) Speech recognition method, device, terminal and medium based on label smoothing
CN110874392B (en) Text network information fusion embedding method based on depth bidirectional attention mechanism
CN114386409A (en) Self-distillation Chinese word segmentation method based on attention mechanism, terminal and storage medium
CN110969005B (en) Method and device for determining similarity between entity corpora
CN114064852A (en) Method and device for extracting relation of natural language, electronic equipment and storage medium
CN110929532B (en) Data processing method, device, equipment and storage medium
CN111400492B (en) Hierarchical feature text classification method and system based on SFM-DCNN
CN112579739A (en) Reading understanding method based on ELMo embedding and gating self-attention mechanism
CN113591988B (en) Knowledge cognitive structure analysis method, system, computer equipment, medium and terminal
CN114638229A (en) Entity identification method, device, medium and equipment of record data
CN114398482A (en) Dictionary construction method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200407