CN113177415A

CN113177415A - Semantic understanding method and device, electronic equipment and storage medium

Info

Publication number: CN113177415A
Application number: CN202110481797.1A
Authority: CN
Inventors: 法羚玲; 代旭东; 顾成敏; 赵远
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2021-07-27

Abstract

The invention provides a semantic understanding method, a semantic understanding device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining a text to be understood; inputting the text into a semantic understanding model fusing a plurality of semantic understanding subtasks to obtain a semantic understanding result output by the semantic understanding model; the semantic understanding model is obtained by carrying out multi-task combined distillation training on the basis of a sample text, a sample semantic understanding result and a teacher model corresponding to each semantic understanding subtask, wherein the semantic understanding subtasks comprise entity recognition, problem type recognition and intention recognition. According to the method, the device, the electronic equipment and the storage medium, the semantic understanding model fusing a plurality of semantic understanding subtasks is obtained through the distillation training of multitask combination, the scale of the model is compressed, the operation amount is reduced, and meanwhile the multitask semantic understanding effect realized based on the same model is improved, so that an accurate, reliable and widely applicable semantic understanding scheme is realized.

Description

Semantic understanding method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of natural language processing technologies, and in particular, to a semantic understanding method, an apparatus, an electronic device, and a storage medium.

Background

Semantic understanding means that the machine understands the intention of the user according to the natural language given by the user, and further, can perform corresponding operations. Semantic understanding typically involves multiple tasks such as entity recognition, problem type recognition, and intent recognition.

The multi-task semantic understanding system usually needs to set corresponding models for each task, and the multiple models are trained independently and run independently, so that the computation amount is huge, the maintenance difficulty is high, and the usability in an actual scene is low. And each task can share a coding layer through a multi-task learning mode, and then multi-task operation under a single model is realized through respective independent parameters, so that the calculated amount can be reduced, and the maintenance difficulty can be reduced.

Disclosure of Invention

The invention provides a semantic understanding method, a semantic understanding device, electronic equipment and a storage medium, which are used for solving the defects of large calculation amount and poor effect of a multitask semantic understanding scheme in the prior art.

The invention provides a semantic understanding method, which comprises the following steps:

determining a text to be understood;

inputting the text into a semantic understanding model fusing a plurality of semantic understanding subtasks to obtain a semantic understanding result output by the semantic understanding model;

the semantic understanding model is obtained by carrying out multi-task combined distillation training on the basis of a sample text, a sample semantic understanding result and teacher models corresponding to semantic understanding subtasks respectively, wherein the semantic understanding subtasks comprise entity recognition, problem type recognition and intention recognition.

According to the semantic understanding method provided by the invention, the loss function of the semantic understanding model is determined based on the subtask loss function of each semantic understanding subtask;

the subtask loss function of the entity identification and the question type identification includes a distillation loss, or includes a tag loss and the distillation loss, wherein the distillation loss is determined based on a teacher model of the corresponding semantic understanding subtask.

According to the semantic understanding method provided by the invention, the distillation loss is determined based on the following steps:

determining teacher classification probability distribution of the sample text in any semantic understanding subtask based on a teacher model of the semantic understanding subtask;

determining the student classification probability distribution of the sample text in any semantic understanding subtask based on a semantic understanding model in a training stage;

and determining distillation loss of any semantic understanding subtask based on teacher classification probability distribution and student classification probability distribution of the sample text in any semantic understanding subtask.

According to the semantic understanding method provided by the invention, the subtask loss function of the intention recognition is determined based on the label loss for performing two classifications on each candidate intention.

According to the semantic understanding method provided by the invention, the step of inputting the text into a semantic understanding model fusing a plurality of semantic understanding subtasks to obtain a semantic understanding result output by the semantic understanding model comprises the following steps:

inputting the text into a text coding layer of the semantic understanding model to obtain text semantic codes of the text output by the text coding layer and word semantic codes of each word in the text;

inputting the text semantic code to a semantic recognition layer of the semantic understanding model to obtain a problem type recognition result and an intention recognition result in the semantic understanding result output by the semantic recognition layer;

and inputting the word semantic code of each word into an entity recognition layer of the semantic understanding model to obtain an entity recognition result in the semantic understanding result output by the entity recognition layer.

According to the semantic understanding method provided by the invention, the semantic coding of the text is input to a semantic recognition layer of the semantic understanding model to obtain a problem type recognition result and an intention recognition result in the semantic understanding result output by the semantic recognition layer, and the method comprises the following steps:

inputting the text semantic code to a problem coding layer in the semantic recognition layer to obtain a problem code output by the problem coding layer;

inputting the question code into a question recognition layer in the semantic recognition layer to obtain the question type recognition result output by the question recognition layer;

inputting the text semantic code to an intention coding layer in the semantic recognition layer to obtain an intention code output by the intention coding layer;

inputting the intention code into an intention recognition layer in the semantic recognition layer to obtain the intention recognition result output by the intention recognition layer.

According to the semantic understanding method provided by the invention, the semantic understanding model is obtained by training based on the following steps:

determining an initial model fusing a plurality of semantic understanding subtasks;

and taking the initial model as a student model of each semantic understanding subtask, and performing multi-task combined distillation training on the initial model based on the sample text, the sample semantic understanding result thereof and the teacher model corresponding to each semantic understanding subtask to obtain the semantic understanding model.

The present invention also provides a semantic understanding apparatus, comprising:

the text acquisition unit is used for determining a text to be understood;

the semantic understanding unit is used for inputting the text into a semantic understanding model fusing a plurality of semantic understanding subtasks to obtain a semantic understanding result output by the semantic understanding model;

The present invention also provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of any of the semantic understanding methods described above when executing the computer program.

The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the semantic understanding method as any one of the above.

According to the semantic understanding method, the semantic understanding device, the electronic equipment and the storage medium, a semantic understanding model fusing a plurality of semantic understanding subtasks is obtained through a multi-task combined distillation training mode, semantic understanding is carried out based on the semantic understanding model, the multi-task semantic understanding effect realized based on the same model is improved while the model scale is compressed and the operation amount is reduced, and therefore an accurate, reliable and widely applicable semantic understanding scheme is realized.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

FIG. 1 is a flow diagram of a semantic understanding method provided by the present invention;

FIG. 2 is a schematic flow diagram of a distillation loss determination method provided by the present invention;

FIG. 3 is a schematic diagram of classification labels and classification probability distributions provided by the present invention;

FIG. 4 is a flow chart illustrating step 120 of the semantic understanding method provided by the present invention;

FIG. 5 is a schematic structural diagram of a text coding layer in the semantic understanding model provided by the present invention;

FIG. 6 is a schematic structural diagram of a semantic recognition layer in the semantic understanding model provided by the present invention;

FIG. 7 is a flow chart diagram of a semantic understanding model training method provided by the present invention;

FIG. 8 is a schematic diagram of training a semantic understanding model provided by the present invention;

FIG. 9 is a block diagram of a semantic understanding model provided by the present invention;

FIG. 10 is a schematic diagram of training a semantic understanding model provided by the present invention;

FIG. 11 is a schematic structural diagram of a semantic understanding apparatus provided in the present invention;

fig. 12 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

At present, for a multitask semantic understanding system, two general schemes are provided:

firstly, the corresponding models are respectively set for each task, and a plurality of models are independently trained and independently run. However, the above scheme requires a huge amount of computation for execution, is difficult to maintain, and has low usability in an actual scene. Especially, with the popularization of BERT (Bidirectional Encoder retrieval from transforms) in the technical field of natural language processing, applying BERT to a model corresponding to each task for semantic coding will result in higher operation and deployment costs.

And secondly, integrating the single models in a multi-task learning mode. The multi-task learning is to put a plurality of tasks together for learning, each task shares a model space, respective loss functions of different tasks are accessed after coding, and model parameters are updated in a back propagation mode in a mode of weighting the loss functions of the tasks. The multi-task learning can reduce the calculated amount and reduce the maintenance difficulty, but in the learning process, because each task has a respective exclusive label system and learning mode, the training progress and the learning difficulty of each task are difficult to be concerned about and balanced only by a mode of weighting loss functions corresponding to each task, and the obtained model is difficult to achieve the effect equivalent to an independent model of each task, so that the reliability and the accuracy of semantic understanding are directly influenced.

In view of the above problems, an embodiment of the present invention provides a semantic understanding method. Fig. 1 is a schematic flow chart of a semantic understanding method provided by the present invention, and as shown in fig. 1, the method includes:

at step 110, text to be understood is determined.

Specifically, the text to be understood is a text that needs to be semantically understood, and the text to be understood may be an interactive text directly input by a user during a human-computer interaction process, or may be obtained by performing speech recognition on speech input by the user.

Step 120, inputting the text into a semantic understanding model fusing a plurality of semantic understanding subtasks to obtain a semantic understanding result output by the semantic understanding model;

the semantic understanding model is obtained by carrying out multi-task combined distillation training on the basis of a sample text, a sample semantic understanding result and a teacher model corresponding to each semantic understanding subtask, wherein the semantic understanding subtasks comprise entity recognition, problem type recognition and intention recognition.

In particular, semantic understanding encompasses multiple subtasks, including entity recognition, problem type recognition, and intent recognition. The entity identification refers to identifying entities with specific meanings in a text, such as a person name, a place name, an organization name, a country and the like, a labeling system of the entity identification can be BIO, BIOES and the like, B represents the beginning of an entity, E represents the end of an entity, I represents an entity intermediate word, O represents a non-entity, and S represents a single entity. The question type identification is used for identifying the question type to which the text belongs specifically, common question types comprise a single entity (encyclopedia question), a first-order question, a list question, a second-order question, a reverse question, a comparison question, a non-question and the like, and each question type has an expression mode with respective distinctiveness. Intent recognition refers to recognizing intents related to entities in text, common intents include both relationships and attributes, such as "daughter" being a relationship and "height" being an attribute stored in (xx, daughter, AA), (xx, height, 1.85 meters) of the knowledge graph. Multiple intentions may be involved in a single text, and therefore, intent recognition is often implemented in a multi-label classification manner.

Aiming at the situation that a plurality of subtasks are covered in semantic understanding, a semantic understanding model fusing the plurality of semantic understanding subtasks can be adopted to carry out semantic understanding on the text to be understood. The semantic understanding model fused with the multiple semantic understanding subtasks is a single model integrated in a multi-task learning mode, aiming at each semantic understanding subtask which needs to be realized specifically, a shared text coding layer is arranged in the semantic understanding model, input texts are coded and semantic extracted, in addition, the semantic understanding model can also be respectively provided with a rear-end module aiming at each semantic understanding subtask, and therefore the corresponding semantic understanding subtasks are executed by applying a coding result output by the text coding layer based on the corresponding rear-end modules. Accordingly, the semantic understanding result obtained by the method comprises the result corresponding to each semantic understanding subtask.

In consideration of the independence of each semantic understanding subtask, and a tag system and a learning mode which are exclusive to each semantic understanding subtask, the embodiment of the invention introduces a teacher model aiming at each semantic understanding subtask on the basis of the traditional multi-task learning, and performs knowledge distillation on each semantic understanding subtask on the basis of the teacher model of each semantic understanding subtask, thereby improving the realization effect of each semantic understanding subtask under the multi-task learning on the premise of ensuring that the computation amount and the model scale of the semantic understanding model fused with a plurality of semantic understanding subtasks are as small as possible.

Here, the teacher model of each semantic understanding subtask is a model with a larger scale, more complex model and better task execution effect than branches of each semantic understanding subtask in the semantic understanding model. Correspondingly, the branches of the semantic understanding subtasks in the semantic understanding model can be understood as the student models corresponding to the semantic understanding subtasks.

The teacher model under the single semantic understanding subtask can transfer knowledge in the teacher model to the student model based on the thought of a teacher-student network, so that the network performance of the student model is improved, and the knowledge transfer process is knowledge distillation. The knowledge of the teacher model under the semantic understanding subtasks is transferred to the same semantic understanding model, so that the performance of the semantic understanding model is closer to that of each teacher model.

Before step 120 is executed, a semantic understanding model may be obtained through pre-training, a specific training mode is a multi-task joint distillation training, and the specific training step may include: firstly, a large amount of sample texts are collected, labels of the sample texts under each semantic understanding subtask are labeled, and the labels are used as sample semantic understanding results of the sample texts. In addition, a teacher model corresponding to each semantic understanding subtask is obtained. And then, performing multi-task combined distillation training on the initial model containing each semantic understanding subtask branch based on the sample text and the sample semantic understanding result thereof and the soft label of the representation probability distribution output by each teacher model aiming at the sample text, thereby obtaining the semantic understanding model.

According to the method provided by the embodiment of the invention, the semantic understanding model fusing a plurality of semantic understanding subtasks is obtained through a multi-task combined distillation training mode, the semantic understanding is carried out based on the semantic understanding model, the scale of the model is compressed, the calculation amount is reduced, and meanwhile, the multi-task semantic understanding effect realized based on the same model is improved, so that an accurate, reliable and widely applicable semantic understanding scheme is realized.

Based on the above embodiment, the loss function of the semantic understanding model is determined based on the subtask loss function of each semantic understanding subtask;

Specifically, for the multi-task joint training of the semantic understanding model, each semantic understanding subtask can be used as a training target, and a corresponding subtask loss function is set for each semantic understanding subtask, so as to represent the loss of a branch corresponding to the semantic understanding subtask in the model in the training process. On the basis, subtask loss functions of all semantic understanding subtasks can be integrated to obtain a loss function of the whole semantic understanding model, for example, the sum of the subtask loss functions of all the semantic understanding subtasks can be used as the loss function of the model, for example, the subtask loss functions of all the semantic understanding subtasks can be weighted, and the weighted sum result is used as the loss function of the model.

Further, a subtask loss function can be set for different semantic understanding subtasks. Specifically, in the entity identification, whether each word is an entity or not can be uniquely determined, and the problem type to which the text belongs in the problem type identification is also unique. Therefore, both entity identification and problem type identification can be regarded as single-label classification problems, and the corresponding subtask loss functions are set in a knowledge distillation mode.

Aiming at entity identification and problem type identification, the corresponding subtask loss function comprises distillation loss, namely loss obtained by knowledge distillation evaluation combined with a corresponding teacher model. Taking the problem type identification as an example, the sample text is input into a teacher model for problem type identification, and the teacher model analyzes the problem type to which the sample text belongs, so that not only the problem type to which the sample text specifically belongs but also the probability or score of the sample text belonging to various problem types can be obtained. In the knowledge distillation process, the probability or score of the sample text obtained by the analysis of the teacher model for problem type identification belongs to various problem types can also guide the analysis and understanding of the problem type identification branch in the initial model to the sample text, and compared with the problem type to which the sample text belongs, richer knowledge information can be provided for the learning of the problem type identification analysis in the initial model, and the distillation loss reaction is the difference of the probability or score distribution obtained by the teacher model and the branch of the corresponding subtask in the initial model aiming at the same sample text analysis.

In addition, considering that the teacher model may have a certain error rate, the error of the teacher model may be propagated to the branches only by applying the branches corresponding to the subtasks in the distillation loss training model determined based on the teacher model, so that the label loss may be added to the loss function of the semantic understanding subtasks, and the training of the branches corresponding to the subtasks is guided by the labels corresponding to the subtasks in the sample semantic understanding result corresponding to the sample text, thereby avoiding the error propagation of the teacher model. Here, the tag loss is a gap between a result obtained by analyzing the sample text by the branch of the corresponding subtask in the initial model and a tag of the corresponding subtask in the sample semantic understanding result of the sample text.

In addition, for the intention identification subtask in semantic understanding, considering that a single text may contain multiple intents, that is, the intention is identified as a multi-label classification problem, the multi-label classification problem may be converted into a per-label binary problem, that is, whether each intention is contained in the text or not is judged. After the problem transformation of the intent recognition subtask is completed, the subtask loss function of the intent recognition subtask may also be set to include the distillation loss, or to include the distillation loss and the tag loss, which is not particularly limited in the embodiment of the present invention.

For example, the subtask loss function L_kgMay be as follows:

L_kg＝α′L_soft+β′L_hard

wherein α 'and β' are preset weights, L_softFor distillation loss, L_hardIs a loss of label.

Loss function L of semantic understanding model_{multi task}May be as follows:

L_{multi task}＝αL_kg-n+βL_kg-questype+γL_intent

wherein α, β and γ are preset weights, L_kg-ner、L_kg-questypeAnd L_intentSubtask loss functions for three semantic understanding subtasks are identified for entity identification, problem type identification, and intent identification, respectively.

According to the method provided by the embodiment of the invention, for a single semantic understanding subtask, knowledge of the teacher model is learned in a knowledge distillation mode, so that the branch corresponding to the subtask in the model can have the analysis capability of the teacher model; in addition, the knowledge is distilled and the label training is matched, so that the error of the teacher model is prevented from being transmitted to the student model, and the reliability of the semantic understanding model is further improved.

Based on any of the above embodiments, fig. 2 is a schematic flow chart of a distillation loss determination method provided by the present invention, and as shown in fig. 2, the distillation loss is determined based on the following steps:

determining the student classification probability distribution of the sample text in the semantic understanding subtask based on a semantic understanding model in a training stage;

and determining distillation loss of the semantic understanding subtask based on the teacher classification probability distribution and the student classification probability distribution of the sample text in the semantic understanding subtask.

Specifically, for any semantic understanding subtask, the distillation loss is determined based on the classification probability distribution output by the teacher model and the student model respectively. Here, the student model is a branch in the semantic understanding model for executing the semantic understanding subtask.

And further, inputting the sample text into a teacher model, wherein the teacher model analyzes the sample text, and obtains probability distribution of the sample text relative to each classification result under the semantic understanding subtask, and the probability distribution is recorded as teacher classification probability distribution. And inputting the sample text into a semantic understanding model in a training stage, analyzing the sample text by using a branch corresponding to a semantic understanding subtask in the semantic understanding model, namely a student model, and obtaining probability distribution of the sample text relative to each classification result under the semantic understanding subtask, wherein the probability distribution is recorded as student classification probability distribution. Here, both the teacher classification probability distribution and the student classification probability distribution are used to reflect the distribution of the probabilities or scores that the sample text belongs to various candidate types.

After the classification probability distribution aiming at the same sample text is obtained based on the teacher model and the student model respectively, the distillation loss of the semantic understanding subtask can be determined by combining the difference between the teacher classification probability distribution and the student classification probability distribution, and the distillation loss is acted on the parameter iteration of the student model.

Based on any of the above embodiments, fig. 3 is a schematic diagram of the classification label and the classification probability distribution provided by the present invention, and as shown in fig. 3, it is assumed that the problem type recognition task includes 6 candidate problem types, which are respectively marked as 0, 1, 2, 3, 4, and 5. A sample semantic understanding result hard target of the sample text applied to the label loss is shown in a left diagram in fig. 3, that is, only the positive label question 3 is marked as 1 in 6 candidate question types, and the rest negative label question types are all marked as 0; the classification probability distribution soft target applied to the distillation loss is as shown in the right diagram in fig. 3, and among the 6 candidate problem types, each problem type including a positive label and a negative label has a corresponding classification probability, and the soft target not only reflects the problem type to which the sample text belongs, but also covers information among the problem types learned through characteristics.

Further, the soft target may be an output value of the softmax layer in the last layer of the model, but when the output value of the softmax layer is directly used as the soft target, if the entropy of the probability distribution of the softmax output is relatively small, the values of the negative labels are all close to 0, and the contribution to the distillation loss is very small. In view of the above situationIn this case, the temperature can be introduced into the softmax function, so as to obtain the classification probability distribution q shown in the following formula_i：

In the formula, z_iIs the score of the category i, T is the temperature, the higher T, the smoother the probability distribution of the output of softmax, the larger the entropy of its distribution, the more the information carried by the negative tag will be relatively amplified.

Based on any of the above embodiments, the subtask loss function for intent recognition is determined based on tag loss that classifies each candidate intent two times.

Specifically, for the intention identification subtask in semantic understanding, considering that a single text may contain multiple intentions, that is, the intention is identified as a multi-label classification problem, the multi-label classification problem may be converted into a two-label classification problem for each label, that is, whether each intention is contained in the text is determined, and the subtask loss function for intention identification may also be expressed as a label loss for performing two classifications on each candidate intention.

Further, considering that the number of predicted intentions is far lower than the total number of intentions in the question-and-answer knowledge graph, so that the number of positive samples in the sample text is far lower than the number of negative samples, the problem of imbalance between the positive samples and the negative samples of the training samples may be introduced. In this regard, a loss function L of an intention recognition subtask in the semantic understanding model is calculated by adopting a loss function of Log-Sum-Exp_intentSpecifically, the following formula can be expressed:

in the formula, neg represents a negative sample, pos represents a positive sample, s_iAnd s_jThe outputs of the branches intended to be identified in the semantic understanding model for the negative and positive samples of the sample text, respectively.

The method provided by the embodiment of the invention sets a special subtask loss function for the intention identification subtask based on the characteristics of the intention identification, and is beneficial to improving the reliability and accuracy of the intention identification.

Based on any of the above embodiments, the semantic understanding model that incorporates a plurality of semantic understanding subtasks includes a text coding layer that is common to each semantic understanding subtask. Accordingly, fig. 4 is a schematic flow chart of step 120 in the semantic understanding method provided by the present invention, and as shown in fig. 4, step 120 includes:

step 121, inputting the text into a text coding layer of the semantic understanding model to obtain text semantic codes of the text output by the text coding layer and word semantic codes of each word in the text.

Specifically, in the semantic understanding model, three semantic understanding subtasks share one text encoding layer. The text coding layer can extract and code semantic features of the input text, so that semantic coding aiming at the whole text, namely text semantic coding, and semantic coding aiming at each word and the context thereof in the text, namely word semantic coding of each word are output.

And step 122, inputting the text semantic code into a semantic recognition layer of the semantic understanding model to obtain a problem type recognition result and an intention recognition result in a semantic understanding result output by the semantic recognition layer.

And 123, inputting the word semantic code of each word into an entity recognition layer of the semantic understanding model to obtain an entity recognition result in the semantic understanding result output by the entity recognition layer.

Specifically, the problem type identification and the intention identification need to consider the semantic information of the whole text, while the entity identification needs to consider the semantic information of each word in the text, so a semantic identification layer and an entity identification layer are respectively arranged in a semantic understanding model, and the semantic information of the whole text, namely the text semantic code, is identified and analyzed through the semantic identification layer, so that the problem type identification result and the intention identification result of the text are obtained; and identifying and analyzing semantic information of each word in the text, namely the word semantic code of each word, through an entity identification layer, so as to obtain an entity identification result of each word in the text.

According to the method provided by the embodiment of the invention, the model scale of the semantic understanding model is effectively compressed through the text coding layer shared by all the semantic understanding subtasks, and the calculated amount required by the semantic understanding is reduced.

Based on any of the above embodiments, fig. 5 is a schematic structural diagram of a text coding layer in the semantic understanding model provided by the present invention, as shown in fig. 5, step 121 includes:

inputting the text into a text coding layer, setting a sentence head symbol at the beginning of the text sentence by the text coding layer, carrying out context coding on each character in the text containing the sentence head symbol to obtain a semantic code of each character output by the text coding layer, and taking the semantic code of the sentence head symbol as a text semantic code,

Specifically, when text coding is performed, the text coding layer may set a sentence head symbol at an input text sentence head, for example, Cls in fig. 5, where the sentence head symbol itself does not include any semantic, and therefore, in the process of context coding, the sentence head symbol can be more fairly merged into semantic information of each word or phrase in the text, so as to obtain a word meaning code of the sentence head symbol, and can directly reflect the semantic information of the whole text, that is, can be directly used as text semantic code.

For example, the text input in FIG. 5, "where the girl is born with minds", 9 words, output via the text encoding layer is the literal meaning encoding of Cls, and the literal meaning encoding of T1-Tn, where n is an integer greater than 9, the first 9 words of semantic encoding in T1-Tn may be T1-T9. The word semantic code of the Cls can be used as the text semantic code of the text.

Based on any of the above embodiments, fig. 6 is a schematic structural diagram of a semantic recognition layer in the semantic understanding model provided by the present invention, and as shown in fig. 6, the semantic recognition layer divides a problem coding layer and a problem recognition layer, and an awareness coding layer and an intention recognition layer for the problem type recognition subtask and the intention recognition subtask, respectively. Accordingly, step 122 includes:

inputting the text semantic codes to a problem coding layer in a semantic recognition layer to obtain problem codes output by the problem coding layer;

inputting the problem codes into a problem recognition layer in the semantic recognition layer to obtain a problem type recognition result output by the problem recognition layer;

inputting the text semantic codes into an intention coding layer in a semantic recognition layer to obtain intention codes output by the intention coding layer;

and inputting the intention code into an intention recognition layer in the semantic recognition layer to obtain an intention recognition result output by the intention recognition layer.

Specifically, problem type identification and intention identification are both applied to semantic information of a text, the semantic information of the text, namely text semantic coding, is obtained through a shared text coding layer in a model, the text coding layer needs to learn two different coding systems of problem type identification and intention identification at the same time, the classification difficulty of intention multi-label identification is actually higher than that of a problem type identification task, and simple shared text semantic coding can cause the learning of problem type identification and intention identification to generate conflict and interfere with the task effect.

In order to solve the problem of semantic representation conflict, the embodiment of the invention respectively sets an encoding layer for problem type identification and intention identification in a semantic identification layer, namely a problem encoding layer and an intention encoding layer, which are used for further encoding aiming at respective tasks. Through the arrangement of the problem coding layer and the intention coding layer, two tasks are respectively connected with a set of small coding network with unshared parameters, and semantic coding can be more pertinently adjusted on the respective independent coding layers, so that the problem of effect loss caused by task difficulty difference of similar tasks in a multi-task model is solved, and the training progress and effect of each task in a single model are balanced.

Further, the problem coding layer and the intention coding layer may be the same or different network structures, for example, a structure of a layer of dense all-connected layer + tanh active layer may be provided.

Based on any of the above embodiments, fig. 7 is a schematic flow chart of the semantic understanding model training method provided by the present invention, and as shown in fig. 7, the semantic understanding model is obtained by training based on the following steps:

step 710, determining an initial model fusing a plurality of semantic understanding subtasks;

and 720, taking the initial model as a student model of each semantic understanding subtask, and performing multi-task combined distillation training on the initial model based on the sample text, the sample semantic understanding result thereof and the teacher model corresponding to each semantic understanding subtask to obtain the semantic understanding model.

Specifically, when a teacher model corresponding to each of a plurality of semantic understanding subtasks is used for multi-task combined distillation training, two feasible schemes are available, one is that the teacher model and the student models are subjected to one-to-one distillation training, namely, the distillation training is respectively carried out on each semantic understanding subtask to obtain the student model of each semantic understanding subtask, and then the student models of each semantic understanding subtask are subjected to fusion compression to obtain the semantic understanding model; and the other method is to directly carry out multi-task combined distillation training on the teacher model with a plurality of semantic understanding subtasks and compress the teacher model to obtain a semantic understanding model.

Considering that the first scheme may cause knowledge loss in the single-task distillation process, and the knowledge loss in the single-task distillation process is propagated to the fusion compression stage of the plurality of student models in the form of cascade loss, the second scheme is preferred in the embodiment of the invention, namely, the teacher model of the plurality of semantic understanding subtasks is directly compressed and fused into one semantic understanding model.

For example, as shown in fig. 8, during fusion, a 12-layer BERT complex model may be set for each semantic understanding subtask to serve as a teacher model, a multi-task small model including 3 layers of BERTs may be used as an initial model, distillation iterative training of multiple tasks is performed, and the initial model is guided, so that the effect of the initial model reaches or even exceeds the effect of the teacher model of each semantic understanding subtask.

From this, a semantic understanding model as shown in fig. 9 can be obtained. As shown in fig. 9, in the semantic understanding model, three semantic understanding subtasks share a text coding layer constructed by a 3-layer BERT network, and corresponding subtask execution modules are respectively arranged on the basis of the text coding layer, so as to realize semantic understanding under multi-task fusion.

Based on any of the above embodiments, fig. 10 is a schematic training diagram of the semantic understanding model provided by the present invention, and as shown in fig. 10, the student model and the three task branches included in the dashed line frame constitute the semantic understanding model. Wherein, the student model is a 3-layer BERT network, that is, BERT3, Task1 represents entity identification subtask NER, Task2 represents problem type identification subtask, Multi class represents Multi-problem type, Task3 represents intention identification subtask, and Multi label represents Multi-label. The student model is initialized through three layers of RBTs in the model for entity recognition NER, namely RBT 3.

Specifically, when training a semantic understanding model:

the subtask loss function for the entity identification subtask 1 is implemented in combination with tag loss and distillation loss of the teacher model. The teacher model is an entity identification model comprising a 12-layer BERT network, teacher classification probability distribution generated by the teacher model is recorded as Soft label1, distillation loss obtained based on Soft label1 is KDloss1, a label of entity identification is recorded as Hard label1, and label loss obtained based on Hard label1 is CEloss 1.

Identifying the subtask loss function of subtask 2 for the problem type is accomplished by combining tag loss and distillation loss of the teacher model. The teacher model is a problem type identification model comprising a 12-layer BERT network, teacher classification probability distribution generated by the teacher model is recorded as Soft label2, distillation loss obtained based on Soft label2 is KDloss2, a label of problem type identification is recorded as Hard label2, and label loss obtained based on Hard label2 is CEloss 2.

The subtask loss function for the intent identification subtask 3 is implemented based on tag loss. The Label intended to be identified is designated as Hard Label3, and the loss of the Label based on Hard Label3 is Circle loss.

Based on any of the above embodiments, fig. 11 is a schematic structural diagram of a semantic understanding apparatus provided by the present invention, and as shown in fig. 11, the apparatus includes:

a text acquisition unit 1110 for determining a text to be understood;

a semantic understanding unit 1120, configured to input the text into a semantic understanding model fusing a plurality of semantic understanding subtasks, and obtain a semantic understanding result output by the semantic understanding model;

The device provided by the embodiment of the invention obtains the semantic understanding model fusing a plurality of semantic understanding subtasks through a multi-task combined distillation training mode, carries out semantic understanding based on the semantic understanding model, and improves the multi-task semantic understanding effect realized based on the same model while compressing the scale of the model and reducing the operand, thereby realizing an accurate, reliable and widely applicable semantic understanding scheme.

Based on any of the above embodiments, the loss function of the semantic understanding model is determined based on the subtask loss function of each semantic understanding subtask;

Based on any embodiment above, further comprising a distillation loss determination unit for:

Based on any of the above embodiments, the subtask loss function for intent identification is determined based on tag loss that classifies each candidate intent two times.

Based on any of the above embodiments, the semantic understanding unit 1120 includes:

the text coding subunit is used for inputting the text into a text coding layer of the semantic understanding model to obtain text semantic codes of the text output by the text coding layer and word semantic codes of each word in the text;

the semantic recognition subunit is used for inputting the text semantic code into a semantic recognition layer of the semantic understanding model to obtain a problem type recognition result and an intention recognition result in the semantic understanding result output by the semantic recognition layer;

and the entity identification subunit is used for inputting the word semantic code of each word into the entity identification layer of the semantic understanding model to obtain the entity identification result in the semantic understanding result output by the entity identification layer.

Based on any of the above embodiments, the semantic identification subunit is configured to:

Based on any of the above embodiments, the method further comprises a model training unit, configured to:

Fig. 12 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 12: a processor (processor)1210, a communication Interface (Communications Interface)1220, a memory (memory)1230, and a communication bus 1240, wherein the processor 1210, the communication Interface 1220, and the memory 1230 communicate with each other via the communication bus 1240. Processor 1210 may call logic instructions in memory 1230 to perform a semantic understanding method comprising: determining a text to be understood; inputting the text into a semantic understanding model fusing a plurality of semantic understanding subtasks to obtain a semantic understanding result output by the semantic understanding model; the semantic understanding model is obtained by carrying out multi-task combined distillation training based on a sample text, a sample semantic understanding result and a teacher model corresponding to each semantic understanding subtask, wherein the semantic understanding subtasks comprise at least two of entity recognition, problem type recognition and intention recognition.

In addition, the logic instructions in the memory 1230 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform a semantic understanding method provided by the above methods, the method comprising: determining a text to be understood; inputting the text into a semantic understanding model fusing a plurality of semantic understanding subtasks to obtain a semantic understanding result output by the semantic understanding model; the semantic understanding model is obtained by carrying out multi-task combined distillation training based on a sample text, a sample semantic understanding result and a teacher model corresponding to each semantic understanding subtask, wherein the semantic understanding subtasks comprise at least two of entity recognition, problem type recognition and intention recognition.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program that, when executed by a processor, is implemented to perform the semantic understanding method provided above, the method comprising: obtaining a text to be understood; inputting the text into a semantic understanding model fusing a plurality of semantic understanding subtasks to obtain a semantic understanding result output by the semantic understanding model; the semantic understanding model is obtained by carrying out multi-task combined distillation training based on a sample text, a sample semantic understanding result and a teacher model corresponding to each semantic understanding subtask, wherein the semantic understanding subtasks comprise at least two of entity recognition, problem type recognition and intention recognition.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method of semantic understanding, comprising:

determining a text to be understood;

2. The semantic understanding method according to claim 1, wherein the loss function of the semantic understanding model is determined based on a subtask loss function of each semantic understanding subtask;

3. The semantic understanding method according to claim 2, wherein the distillation loss is determined based on the following steps:

4. The semantic understanding method according to claim 2, wherein the intent recognition subtask loss function is determined based on a tag loss that classifies each candidate intent.

5. The semantic understanding method according to any one of claims 1 to 4, wherein the inputting the text into a semantic understanding model fusing a plurality of semantic understanding subtasks to obtain a semantic understanding result output by the semantic understanding model comprises:

6. The semantic understanding method according to claim 5, wherein the semantic coding of the text is input to a semantic recognition layer of the semantic understanding model to obtain a question type recognition result and an intention recognition result in the semantic understanding result output by the semantic recognition layer, and the method comprises the following steps:

7. The semantic understanding method according to any one of claims 1 to 4, wherein the semantic understanding model is trained based on the following steps:

8. A semantic understanding apparatus, comprising:

the text acquisition unit is used for determining a text to be understood;

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the semantic understanding method according to any one of claims 1 to 7 when executing the program.

10. A non-transitory computer readable storage medium, having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the steps of the semantic understanding method according to any one of claims 1 to 7.