WO2024067377A1

WO2024067377A1 - Sample generation method and apparatus, and electronic device and storage medium

Info

Publication number: WO2024067377A1
Application number: PCT/CN2023/120564
Authority: WO
Inventors: 丁隆耀; 蒋宁; 吴海英; 李宽; 吕乐宾
Original assignee: 马上消费金融股份有限公司
Priority date: 2022-09-26
Filing date: 2023-09-22
Publication date: 2024-04-04
Also published as: CN117807987A

Abstract

Provided in the embodiments of the present description are a sample generation method and apparatus, and an electronic device and a storage medium. The sample generation method comprises: acquiring log data to be processed, wherein the log data comprises text and an intention recognition result of the text; performing data screening processing on the log data according to the intention recognition result of the text, so as to obtain low-frequency intention data; inputting the low-frequency intention data and standard text of a preset intention category into a text comparison model so as to perform similarity prediction processing, and obtaining a text comparison result corresponding to the low-frequency intention data, wherein the text comparison model is a model obtained by training an initial text comparison model on the basis of a training sample set, and the training sample set is constructed on the basis of the low-frequency intention data; and generating a low-frequency intention sample according to the text comparison result and a preset similarity threshold.

Description

Sample generation method, device, electronic device and storage medium

cross reference

This application claims priority to a Chinese patent application filed with the China Patent Office on September 26, 2022, with application number 202211178539.7 and title “Sample Generation Method, Device, Electronic Device and Storage Medium”. The entire contents of the application are incorporated by reference into this application.

Technical Field

The present application relates to the field of artificial intelligence, and in particular to a sample generation method, device, electronic device and storage medium.

Background technique

With the development of electronic technology, robots are being used more and more widely. Robot agents can automatically answer questions raised by customers, saving a lot of human resources and improving communication efficiency. The questions raised by customers are varied and varied. Some questions contain user intentions that appear frequently, which can be called high-frequency intentions; other questions contain user intentions that appear less frequently, which can be called low-frequency intentions. For each high-frequency intention, since the questions related to high-frequency intentions appear frequently, the high-frequency intention training data used for model training is easy to obtain, and the intention recognition results of the robot agent obtained through model training are more accurate. However, for each low-frequency intention, since the questions related to low-frequency intentions appear less frequently, there is often a lack of sufficient training data when training the model for the robot agent, resulting in a low accuracy of the intention recognition results of the robot agent, and the robot agent's answers are irrelevant, which brings a bad experience to users and indirectly increases the workload of manual agents.

Summary of the invention

The present application provides a sample generation method, device, electronic device and storage medium to expand the number of low-frequency intent samples to meet model training requirements, thereby improving the recognition accuracy of low-frequency intent.

On the one hand, the present application provides a sample generation method, including: obtaining log data to be processed; the log data includes text and intent recognition results of the text; according to the intent recognition results of the text, performing data screening processing on the log data to obtain low-frequency intent data; inputting the low-frequency intent data and standard text of a preset intent category into a text comparison model for similarity prediction processing to obtain text comparison results corresponding to the low-frequency intent data; the text comparison model is a model obtained by training an initial text comparison model based on a training sample set; the training sample set is constructed based on the low-frequency intent data; and generating a low-frequency intent sample based on the text comparison result and a preset similarity threshold.

On the one hand, the present application provides a training method for an intent recognition model, comprising: generating low-frequency intent samples by the sample generation method as described in the first aspect; inputting the low-frequency intent samples into an initial intent recognition model for iterative training to obtain an intent recognition model.

On the one hand, the present application provides an intention recognition method applied to a digital human, comprising: obtaining a text to be recognized input by a user; inputting the text to be recognized into an intention recognition model for intent recognition to obtain the user intention; the intention recognition model is obtained by inputting low-frequency intention samples into an initial intention recognition model for iterative training; the low-frequency intention samples are generated by the sample generation method as described above; according to the user intention, obtaining a target text corresponding to the user intention in the digital human system, and displaying the target text.

On the one hand, the present application provides a sample generation device, including: a first acquisition unit, used to acquire log data to be processed; the log data includes text and intent recognition results of the text; a screening unit, used to perform data screening processing on the log data according to the intent recognition results of the text, to obtain low-frequency intent data; a prediction unit, used to input the low-frequency intent data and standard text of a preset intent category into a text comparison model for similarity prediction processing, to obtain text comparison results corresponding to the low-frequency intent data; the text comparison model is a model obtained by training an initial text comparison model based on a training sample set; the training sample set is based on the low-frequency intent data Construct; a first generation unit, used to generate a low-frequency intent sample according to the text comparison result and a preset similarity threshold.

On the one hand, the present application provides a training device for an intent recognition model, comprising: a second generation unit, used to generate low-frequency intent samples through the sample generation method as described in the first aspect; a training unit, used to input the low-frequency intent samples into an initial intent recognition model for iterative training to obtain an intent recognition model.

On the one hand, the present application provides an intention recognition device applied to a digital human, comprising: a second acquisition unit, used to acquire a text to be recognized input by a user; a recognition unit, used to input the text to be recognized into an intention recognition model for intention recognition to obtain the user intention; the intention recognition model is obtained by inputting low-frequency intention samples into an initial intention recognition model for iterative training; the low-frequency intention samples are generated by the above-mentioned sample generation method; a display unit, used to obtain a target text corresponding to the user intention in the system of the digital human according to the user intention, and display the target text.

On the one hand, the present application provides an electronic device, comprising: a processor; and a memory configured to store computer-executable instructions, which, when executed, cause the processor to execute a sample generation method as described above, or a training method for an intent recognition model as described above, or a method for intent recognition applied to a digital human as described above.

On the one hand, the present application provides a computer-readable storage medium for storing computer-executable instructions, which, when executed by a processor, implement the sample generation method as described in the first aspect, or the training method of the intent recognition model as described above, or the intent recognition method applied to a digital human as described above.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following briefly introduces the drawings required for use in the embodiments or the prior art descriptions. Obviously, the drawings described below are only some embodiments described in this specification, and it is not difficult for a person skilled in the art to understand them. For the staff, other drawings can be obtained based on these drawings without paying any creative labor;

FIG1 is a processing flow chart of a sample generation method provided in an embodiment of the present application;

FIG2 is a processing flow chart of another sample generation method provided in an embodiment of the present application;

FIG3 is a schematic diagram of a training method of a text comparison model provided in an embodiment of the present application;

FIG4 is a business flow chart of a sample generation method provided in an embodiment of the present application;

FIG5 is a processing flow chart of a training method for an intent recognition model provided in an embodiment of the present application;

FIG6 is a processing flow chart of a method for identifying intentions of a digital human provided in an embodiment of the present application;

FIG7 is a schematic diagram of a sample generation device provided in an embodiment of the present application;

FIG8 is a schematic diagram of a training device for an intent recognition model provided in an embodiment of the present application;

FIG9 is a schematic diagram of an intention recognition device for a digital human provided in an embodiment of the present application;

FIG. 10 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.

Detailed ways

In order to enable those skilled in the art to better understand the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of this specification, not all of the embodiments. Based on the embodiments of the present application, all other embodiments obtained by ordinary technicians in this field without creative work should fall within the scope of protection of the present application.

In actual applications, there are a large number of user intentions. For some high-frequency intentions, it is often easy to obtain sufficient training data, which can be used to train the intention recognition model to recognize the intention, making the recognition accuracy of high-frequency intentions higher. However, for low-frequency intentions, different users may use different expressions to express the same intention. Since low-frequency intentions occur less frequently, the training data used to train the intention recognition model is difficult to obtain and the quantity is small, which is not conducive to the model training effect. When the robot seat recognizes the intention and automatically answers, if the low-frequency intention is ignored and only the high-frequency intention is recognized, it will greatly This greatly reduces the user experience of some customers, making them feel that the robot is "retarded" and difficult to communicate with. The agent can be a customer service representative or any other position or person who can respond to text or voice.

In order to overcome the above problems, an embodiment of the present application provides a sample generation method.

The sample generation method proposed in this application can be executed by an electronic device, specifically by a processor in the electronic device. The electronic device mentioned here can be a terminal device, such as a smart phone, a tablet computer, a desktop computer, an intelligent voice interaction device, a wearable device, a robot, and a vehicle terminal, etc.; or, the electronic device can also be a server, such as an independent physical server, a server cluster composed of multiple servers, or a cloud server capable of cloud computing.

The sample generation method proposed in this application will be specifically introduced through several embodiments below.

Referring to Figure 1, a processing flow chart of a sample generation method provided in an embodiment of the present application is shown. As shown in Figure 1, the sample generation method provided in an embodiment of the present application may specifically include the following steps:

Step S102, obtaining log data to be processed; the log data includes text and intent recognition results of the text.

Log data may be historical data related to the target business recorded during the operation of the target business.

The text may be a natural language text for which intent recognition is required. The text may be text input by a user, text obtained by voice conversion, or text obtained by other means. This specification does not impose any special restrictions on the method of obtaining the text.

In the scenario where the robot automatically answers, the text may be a question text asked by the customer to the robot, for example: How do I check the bill? The intent recognition result of the text may be the intent recognition result obtained by the robot after performing intent recognition on the above question text. For example, the intent recognition result of "How do I check the bill?" is "Inquire about the bill query method."

Acquiring the log data to be processed may be acquiring the conversation data in the log data to be processed. The conversation data may include the text of the question raised by the customer and the text of the robot's answer to the customer. The robot may be pre-configured with a correspondence between the intent recognition result and the answer text. Based on the correspondence between the intent recognition result and the answer text and the robot's answer text to the customer, the robot may query the Then, the question text can be determined as the text in the log data, and the intention recognition result of the question text can be determined as the intention recognition result of the text in the log data.

Based on the above content, it can be known that in some embodiments of the present application, obtaining the log data to be processed includes: obtaining conversation data in the log data to be processed; the conversation data includes the question text raised by the customer and the robot's response text to the customer; according to the correspondence between the response text and the pre-configured intent recognition result and the response text, querying to obtain the intent recognition result of the question text; determining the question text as the text in the log data, and determining the intent recognition result of the question text as the intent recognition result of the text in the log data.

Exemplarily, the log data in the agent system is the log text of the interaction between the customer and the agent robot generated in the actual production scenario. The amount of log data generated by the agent system is usually large and comes from different sources. The acquired log data can only include the customer's chat data, excluding the recommended questions, frequently asked questions (FAQ), multi-round engine data, and the remaining data is single-round conversation data, for example: the customer asked: "How to repay in advance", the robot answered "XXX". Because the robot's answer is bound to the identified intent. So the final log data format is shown in Table 1. Table 1 shows part of the log data.

Table 1

In addition, the log data may include multiple records. After obtaining the log data, duplicate data in the log data may be removed to reduce redundancy and improve data processing efficiency. Duplicate data may be multiple records of log data with completely identical customer text. For example, the log data includes:

Record 1: Customer text "How to repay early", robot recognizes the intent "Customer inquires about how to repay early".

Record 2: Customer text "How to repay early", robot recognizes the intent "Customer inquires about how to repay early".

Record 3: Customer text "I want to repay early", robot recognizes the intent "Customer asks how to repay early".

Since the customer texts of record 1 and record 2 are completely consistent, record 1 and record 2 can be determined to be duplicate data, and one of record 1 and record 2 can be deleted.

Step S104, based on the intent recognition result of the text, perform data screening processing on the log data to obtain low-frequency intent data.

In some embodiments, based on the intent recognition result of the text, it can be determined whether the intent recognition result of the text is a preset high-frequency intent. If so, the text and the intent recognition result of the text are deleted from the log data; if not, the text and the intent recognition result of the text are retained as low-frequency intent data.

Based on the above content, it can be known that in some embodiments of the present application, according to the intention recognition result of the text, the log data is screened and processed to obtain low-frequency intention data, including: according to the intention recognition result of the text, determining whether the intention recognition result of the text is a preset high-frequency intention; if the intention recognition result of the text is a preset high-frequency intention, deleting the text and the intention recognition result of the text from the log data; if the intention recognition result of the text is not the preset high-frequency intention, treating the text and the intention recognition result of the text as low-frequency intention data.

In some embodiments, based on the intent recognition result of the text, the log data is screened to obtain low-frequency intent data, including: inputting the log data into a high-frequency intent classification model to obtain first log data and the confidence of the intent classification result of the first log data; the intent classification result of the first log data is a preset high-frequency intent; the high-frequency intent classification model is used to perform intent classification on the log data based on the intent recognition result of the text in the log data; based on the first log data and the confidence of the intent classification result of the first log data, the log data is screened to obtain low-frequency intent data.

The high-frequency intent classification model may include a pre-trained language model, a multi-layer perceptron, and a normalized exponential function, i.e., a Softmax function, which are connected in sequence. The output of the pre-trained language model is the input of the multi-layer perceptron; and the output of the multi-layer perceptron is the input of the normalized exponential function.

Pre-trained language models include but are not limited to: BERT (Bidirectional Encoder Representations from Transformers) model, or RoBERTa (a Robustly Optimized BERT Pretraining Approach), etc.

Among them, the BERT model is a language representation model, represented by the bidirectional encoder of Transformer (transformation model). The training process of the BERT model can be divided into a pre-training part and a model fine-tuning part. The model fine-tuning part uses the pre-trained BERT model for model fine-tuning training, which is widely used in text classification, text matching and other tasks.

Pre-training and model fine-tuning can be illustrated by the following example: assuming that there is a training set A, the network is first pre-trained with training set A, the network parameters are learned on task A, and then saved for later use. When a new task B comes, the same network structure is adopted. When the network parameters are initialized, the parameters learned in A can be loaded, and other high-level parameters are randomly initialized. Then, the training data of task B is used to train the network. When the loaded parameters are constantly changed as the training of task B progresses, it is called "fine-tuning", that is, the parameters are adjusted to make them more suitable for the current task B.

The RoBERTa model is similar to the BERT model, with several adjustments based on BERT: 1) longer training time, larger batch size, and more training data; 2) removal of the next prediction loss; 3) longer training sequences; 4) dynamic adjustment of the mask mechanism. It is widely used in NLP (Natural Language Processing) tasks because it performs better than the BERT model in many scenarios.

By setting the high-frequency intent classification model to include a pre-trained language model, a multi-layer perceptron, and a Softmax function connected in sequence, the model fine-tuning of the pre-trained language model can be achieved. Under this training method, when the number of samples used to train the model is large, the model training effect is better. Since the amount of log data is very large and easy to obtain, the training data of the high-frequency intent classification model is easy to obtain, and thus the training effect of the high-frequency intent classification model is better, and the accuracy of the intent recognition results for high-frequency intents is higher.

Through the high-frequency intent classification model, the log data can be classified according to the intent recognition results of the text in the log data. The intent classification process is performed on the log data to obtain the first log data whose intent classification result is a preset high-frequency intent and the confidence level of the intent classification result of the first log data.

The first log data may be a record in the log data, and the first log data may include a text and an intention recognition result of the text. Specifically, the first log data may include a question text raised by a customer and an intention recognition result of the question text.

The confidence of the intent classification result can be used to characterize the accuracy of the intent classification result. The higher the confidence, the higher the accuracy of the intent classification result.

The preset high-frequency intent may include multiple preset high-frequency intents, for example, "customer consultation preset question 1", "customer consultation preset question 2", "customer complaint", etc. Then the intent classification result as the preset high-frequency intent may be that the intent classification result is one of the multiple preset high-frequency intents.

In one embodiment, based on the first log data and the confidence of the intent classification result of the first log data, the log data is screened to obtain low-frequency intent data, including: determining high-frequency intent data based on a comparison result of the confidence of the intent classification result of the first log data and a preset confidence threshold; and deleting the high-frequency intent data in the log data to obtain low-frequency intent data.

If the confidence of the intent classification result of the first log data is greater than a preset confidence threshold, it means that the accuracy of the intent classification result of the first log data is high, and the first log data can be determined as high-frequency intent data.

If the confidence of the intent classification result of the first log data is less than or equal to the preset confidence threshold, it means that the accuracy of the intent classification result of the first log data is low, and it can be considered that the first log data does not belong to high-frequency intent data. By setting the confidence threshold, high-frequency intent data can be more accurately screened out from the log data.

The high-frequency intent data in the log data is deleted to obtain the low-frequency intent data. It should be noted that the low-frequency intent data here is not the intent data with a low frequency of occurrence, but the intent data in the log data except the high-frequency intent data.

For example, the log data includes 5 records: record 1, record 2, record 3, record 4, and record Record 5, among which record 1, record 3 and record 4 are high-frequency intention data, then delete record 1, record 3 and record 4 in the log data to obtain record 2 and record 5, and determine record 2 and record 5 as low-frequency intention data.

Step S106, input the low-frequency intent data and the standard text of the preset intent category into the text comparison model for similarity prediction processing to obtain the text comparison result corresponding to the low-frequency intent data; the text comparison model is a model obtained by training the initial text comparison model based on the training sample set; the training sample set is constructed based on the low-frequency intent data.

Before executing step S106, low-frequency intent data may be input into an initial text comparison model, and the initial text comparison model may be trained to obtain a text comparison model.

The initial text comparison model may be a model to be trained in which all parameters to be trained take initial values.

The text contrast model can be a contrastive unsupervised learning model.

The low-frequency intent data in the log data cannot use the existing model labels. Here, the low-frequency intent data can be treated as unlabeled data.

Self-supervised learning is a type of unsupervised learning paradigm. It does not require manually labeled category label information, but directly uses the data itself as supervision information to learn the feature expression of sample data and use it for downstream tasks.

Contrastive Learning is a type of self-supervised learning that learns the feature representation of samples by comparing data with positive samples and negative samples in feature space. The core of its training is to shorten the distance between similar samples and increase the distance between irrelevant samples.

The main idea of contrastive learning is to bring similar samples closer and push dissimilar samples away, that is, to construct similar sample pairs ( _xi , _xj ⁺ ) and dissimilar sample pairs ( _xi , _xj ⁺ ).

Low-frequency intent data can be determined as unlabeled samples; the unlabeled samples are input into the initial text comparison model, and the initial text comparison model is iteratively trained to obtain a text comparison model.

In one embodiment, the initial text comparison model includes an encoder and a similarity prediction module connected in sequence; the output of the encoder is the input of the similarity prediction module; the encoder is used to perform encoding processing according to the low-frequency intent data to obtain similar sample pairs and non-similar sample pairs corresponding to the low-frequency intent data; The similarity prediction module is used to perform iterative training based on similar sample pairs and non-similar sample pairs corresponding to low-frequency intent data.

The initial text comparison model includes an encoder and a similarity prediction module connected in sequence; the output of the encoder is the input of the similarity prediction module; the sample generation method also includes: the encoder performs encoding processing based on the low-frequency intent data to obtain similar sample pairs and dissimilar sample pairs corresponding to the low-frequency intent data; the similarity prediction module performs iterative training based on the similar sample pairs and dissimilar sample pairs corresponding to the low-frequency intent data.

When constructing positive non-similar sample pairs, the following strategy is adopted: using the random deactivation (dropout) mechanism of the encoder, based on the question text in the target record, two texts corresponding to the question text are generated, the semantics of the two texts are exactly the same, and the encoding forms are different. Then, the two texts can be determined as similar text pairs corresponding to the target record. In addition, based on the question text in each record except the target record in the low-frequency intent data, a text corresponding to the question text is generated. Then, based on a text corresponding to the question text in each record and one of the two texts corresponding to the question text in the aforementioned target record, multiple non-similar text pairs can be generated.

Taking batchsize=64, the number of records included in the low-frequency intent data, as an example, there are 2 similar samples and 62 non-similar samples in one batchsize, forming 1 similar sample pair and 62 non-similar sample pairs.

Then, based on the similar sample pairs and non-similar samples corresponding to the low-frequency intent data, the similarity prediction module can be iteratively trained.

Exemplarily, the loss function is as follows.

Where, l _i is used to represent the loss function value. τ is used to represent the temperature hyperparameter of softmax, which is only used to control the randomness of the prediction. h _i and h _i ⁺ and h _j ⁺ are the encoding representations of x _i , x _i ⁺ and x _j + in the similar sample pair (x _i , x _i ⁺ ) and the non-similar sample pair (x _i , x _j ⁺ ⁾ respectively. N can be a preset value. The values of i and j can be Determined based on the corner marks of similar sample pairs and non-similar sample pairs.

Sim(h ₁ , h ₂ ) can be used to represent the similarity between two vectors h ₁ and h _2. The similarity can be calculated using cosine similarity.

After each iterative training, the loss function value corresponding to the training can be calculated. If the loss function value is less than or equal to the preset threshold, the training is stopped to obtain a trained similarity prediction module, that is, a trained text comparison model.

Through the above method, a trained text comparison model can be obtained. Its significance lies in that the model training can be based on unlabeled samples, so that the trained text comparison model has the ability to judge whether two texts are similar. Since log data is a kind of historical data that can be continuously expanded with time, when the time span is long enough, the amount of log data is large and easy to obtain. Therefore, the amount of low-frequency intent data used to train the text comparison model is large, and unsupervised learning can also achieve relatively good results.

In one embodiment, the low-frequency intent data includes target text and non-target text; the encoder is specifically used to: perform encoding processing according to the target text to obtain a target encoding result and a similar encoding result corresponding to the target text, and perform encoding processing according to the non-target text to obtain an encoding result corresponding to the non-target text; determine the target encoding result and the similar encoding result corresponding to the target text as similar sample pairs corresponding to the low-frequency intent data; determine the target encoding result corresponding to the target text and the encoding result corresponding to the non-target text as non-similar sample pairs corresponding to the low-frequency intent data.

The low-frequency intent data includes target text and non-target text; the sample generation method also includes: the encoder performs encoding processing according to the target text to obtain a target encoding result and a similar encoding result corresponding to the target text, and performs encoding processing according to the non-target text to obtain an encoding result corresponding to the non-target text; the target encoding result and the similar encoding result corresponding to the target text are determined as similar sample pairs corresponding to the low-frequency intent data; the target encoding result corresponding to the target text and the encoding result corresponding to the non-target text are determined as non-similar sample pairs corresponding to the low-frequency intent data.

The low-frequency intent data includes target text and non-target text. The number of target texts can be one. The number of non-target texts can be one or more. For example, the low-frequency intent data includes record 1, record 2, record 3, record 4, and record 5. Among them, record 1 includes the target text and the intent recognition result of the target text; record 2 includes non-target text 1 and the intent recognition result of non-target text 1; record 3 includes non-target text 2 and the intent recognition result of non-target text 2; record 4 includes non-target text 3 and the intent recognition result of non-target text 3; record 5 includes non-target text 4 and the intent recognition result of non-target text 4.

The target text included in record 1 in the input low-frequency intent data can be encoded by the encoder to obtain the target encoding result and similar encoding result corresponding to the target text. At the same time, the non-target texts 1-4 included in records 2-5 in the input low-frequency intent data can be encoded by the encoder to obtain the encoding results corresponding to the non-target texts 1-4.

Next, the target encoding result and the similar encoding result corresponding to record 1 can be determined as similar sample pairs corresponding to low-frequency intent data, the target encoding result corresponding to record 1 and the encoding result corresponding to record 2 can be determined as a non-similar sample pair, the target encoding result corresponding to record 1 and the encoding result corresponding to record 3 can be determined as a non-similar sample pair, the target encoding result corresponding to record 1 and the encoding result corresponding to record 4 can be determined as a non-similar sample pair, and the target encoding result corresponding to record 1 and the encoding result corresponding to record 5 can be determined as a non-similar sample pair. In summary, a similar sample pair and four non-similar sample pairs are generated.

In one embodiment, the encoder includes an attention layer and a fully connected layer connected in sequence; the output of the attention layer is the input of the fully connected layer; the attention layer is used to perform a first encoding process according to a preset first random inactivation probability and low-frequency intent data to obtain intermediate encoded data; the fully connected layer is used to perform a conversion process according to a preset second random inactivation probability and the intermediate encoded data to obtain similar sample pairs and non-similar sample pairs corresponding to the low-frequency intent data.

The encoder includes an attention layer and a fully connected layer connected in sequence; the output of the attention layer is the input of the fully connected layer; the sample generation method also includes: the attention layer performs a first encoding process according to a preset first random inactivation probability and the low-frequency intention data to obtain intermediate encoded data; the fully connected layer performs a conversion process according to a preset second random inactivation probability and the intermediate encoded data Processing is performed to obtain similar sample pairs and non-similar sample pairs corresponding to the low-frequency intent data.

In a specific implementation, the first random deactivation probability of the attention layer can be pre-configured, and the second random deactivation probability of the fully connected layer can be pre-configured.

The effect of the first random inactivation probability will take effect on each layer of the transformer, thereby obtaining two different semantic representations of the same text. By inputting the same text twice, two similar sample pairs with exactly the same semantics will be obtained.

In addition, because the lengths of similar sample pairs must be consistent, while the lengths of non-similar sample pairs are different, in order to eliminate the impact of the model using text length as a data feature, punctuation padding is used to expand the length during training. Because the semantic features of commas are the smallest and almost negligible, commas are randomly inserted into relatively short texts to make up for the impact of the length difference.

Based on the above, it can be known that in some embodiments of the present application, the sample generation method further includes: when the text lengths of non-similar sample pairs are different, the length of the shorter text in the non-similar sample pairs is extended by punctuation marks.

After obtaining the text comparison model, first determine the preset intent category that needs to be recalled. The preset intent category may be one or more low-frequency intent categories.

In one embodiment, the low-frequency intent data includes multiple low-frequency intent texts; the text comparison model is specifically used to: determine each low-frequency intent text and a standard text of a preset intent category as a similar sample pair corresponding to each low-frequency intent text; perform similarity prediction processing on the similar sample pairs corresponding to each low-frequency intent text to obtain a similarity score for each low-frequency intent text; and determine the similarity score of each low-frequency intent text as the text comparison result corresponding to the low-frequency intent data.

The low-frequency intent data includes multiple low-frequency intent texts; the sample generation method also includes: the text comparison model determines each low-frequency intent text and the standard text of the preset intent category as a similar sample pair corresponding to each low-frequency intent text; performs similarity prediction processing on the similar sample pairs corresponding to each low-frequency intent text to obtain a similarity score for each low-frequency intent text; and determines the similarity score of each low-frequency intent text as the text comparison result corresponding to the low-frequency intent data.

Each low-frequency intent text and the standard text of the preset intent category are determined as similar sample pairs corresponding to each low-frequency intent text. For each preset intent category, one or more standard questions are used as the xi text input to the text comparison model, and the low-frequency intent data is traversed as xi+ to form (xi, xi+) data pairs for prediction. The prediction result is a similarity score of 0-1.

Perform similarity prediction processing on similar sample pairs corresponding to each low-frequency intent text to obtain a similarity score for each low-frequency intent text; determine the similarity score for each low-frequency intent text as a text comparison result corresponding to the low-frequency intent data.

Step S110, generating a low-frequency intent sample according to the text comparison result and a preset similarity threshold.

The preset similarity threshold may be a preset value, and the preset similarity threshold may be updated once or multiple times based on a pre-configured threshold change rule.

For example, the preset similarity threshold may be 95%, and the threshold change rule may be that each time the threshold is updated, 5% is subtracted from the current similarity threshold to obtain an updated similarity threshold.

According to the text comparison result and the preset similarity threshold, a low-frequency intent sample is generated. The low-frequency intent text with a similarity score less than the preset similarity threshold can be determined as a low-frequency intent sample, or the low-frequency intent text with a similarity score less than the preset similarity threshold can be determined as similar sample data, and the similar sample data is quality-checked, and the similar sample data that passes the quality check is determined as a low-frequency intent sample. The similar sample data is used to indicate candidate sample data that needs to be quality-checked to determine whether it is a low-frequency intent sample. The quality inspection method can be manual quality inspection or quality inspection processing according to preset quality inspection rules.

In one embodiment, a low-frequency intent sample is generated according to a text comparison result and a preset similarity threshold, including: determining the number of similar sample data corresponding to the preset similarity threshold according to a comparison result of the preset similarity threshold and the text comparison result; if the number of similar sample data corresponding to the low-frequency intent data is less than the preset number threshold, subtracting a preset reduction value from the current similarity threshold to obtain an updated similarity threshold, and determining an updated number of similar sample data corresponding to the updated similarity threshold according to a comparison result of the updated similarity threshold and the text comparison result, repeating the above operation according to the updated number and the preset number threshold until the updated similarity threshold is met. The preset stopping condition is met; the preset stopping condition is that the number of samples is greater than or equal to the preset number threshold; the number of samples is the sum of the number of similar sample data corresponding to the preset similarity threshold and the number of similar sample data corresponding to each updated similarity threshold; each sample data in the similar sample data corresponding to the preset similarity threshold and the similar sample data corresponding to each updated similarity threshold is determined as a low-frequency intention sample corresponding to each sample data.

According to the text comparison result and the preset similarity threshold, a low-frequency intent sample is generated, including: according to the comparison result of the preset similarity threshold and the text comparison result, the number of similar sample data corresponding to the preset similarity threshold is determined; if the number of similar sample data corresponding to the low-frequency intent data is less than the preset number threshold, the current similarity threshold is subtracted from the preset reduction value to obtain an updated similarity threshold, and, according to the comparison result of the updated similarity threshold and the text comparison result, the updated number of similar sample data corresponding to the updated similarity threshold is determined; if the sample number is greater than or equal to the preset number threshold, the number of similar sample data corresponding to the preset similarity threshold is determined as the final updated number.

For example, the preset number threshold is 100, the initial value of the preset similarity threshold is 99%, and according to the comparison result between 99% and the text comparison result, the number of similar sample data corresponding to 99% is determined to be 10, which is less than the preset number threshold 100, and then a threshold update is performed: the current preset similarity threshold is 99%, minus the preset reduction value of 5% to obtain the updated similarity threshold 94%, and according to the comparison result between 94% and the text comparison result, the number of similar sample data corresponding to 94% is determined to be 30, 10+30=40, 40 is less than the preset number threshold 100, and then a threshold update is performed; the current similarity threshold is 94%, minus the preset reduction value of 5% to obtain the updated similarity threshold 89%, and according to the comparison result between 89% and the text comparison result, the number of similar sample data corresponding to 89% is determined to be 70, 10+30+70=110>100, which meets the preset stop condition and no longer performs the threshold update. Furthermore, each sample data in the 110 similar sample data can be determined as a low-frequency intention sample.

The initial value of the preset similarity threshold can be relatively high, for example, 95%. Initially, the threshold is set high, candidate data is strictly recalled and quality checked, and qualified data is used as similar question data corresponding to the standard question of this low-frequency intent. When all similar question data under the high threshold are marked and analyzed, the threshold is gradually reduced. Set a low preset similarity threshold, gradually recall new candidate data for quality inspection, and exclude the data that has been quality inspected; repeat the above work to obtain similar question data with low-frequency intent.

By repeatedly performing the operation of subtracting a preset reduction value from a current similarity threshold to obtain an updated similarity threshold, and determining the number of similar sample data corresponding to the updated similarity threshold based on a comparison result between the updated similarity threshold and a text comparison result, until the updated similarity threshold satisfies a preset stop condition, the workload of quality inspection can be reduced and the efficiency of quality inspection can be improved.

Since log data can continue to increase over time, the number of low-frequency intent samples can also continue to increase with the expansion of log data. Based on massive log data and text comparison models, a large number of samples of preset intent categories can be accumulated, and the preset intent category can be an intent category of low-frequency intent. When the number of low-frequency intent samples of the preset intent category is large enough, the initial intent recognition model can be trained based on the low-frequency intent samples of the preset intent category to obtain an intent recognition model, and the intent recognition model has a high recognition accuracy for low-frequency intents of the preset intent category.

In the scenario where the robot automatically answers, the robot agent can recognize the intent of the text based on the trained intent recognition model. The intent recognition model can be an intent recognition model obtained after training the initial intent recognition model with low-frequency intent samples generated by the sample generation method provided in the embodiment of Figure 1. Since the number of low-frequency intent samples is large enough, the training effect of the intent recognition model is good. The robot agent can use the intent recognition model to accurately identify the user's low-frequency intentions, and then make appropriate responses to the user based on the accurately identified low-frequency intentions, thereby improving user satisfaction.

In the embodiment shown in FIG1 , first, log data to be processed is obtained; the log data includes text and intent recognition results of the text; secondly, according to the intent recognition results of the text, the log data is screened to obtain low-frequency intent data; then, the low-frequency intent data and standard text of a preset intent category are input into a text comparison model for similarity prediction processing to obtain text comparison results corresponding to the low-frequency intent data; the text comparison model is a text comparison model based on an initial text comparison model of a training sample set. The model obtained by training the model; the training sample set is constructed based on the low-frequency intent data; finally, the low-frequency intent samples are generated according to the text comparison results and the preset similarity threshold. Log data is a kind of historical data that grows continuously over time. Even if the low-frequency intent data appears less frequently in the log data, if the time span corresponding to the log data is long enough, a large amount of accumulated low-frequency intent data can be screened from the log data. Based on this large amount of low-frequency intent data, a sufficient amount of training data can be generated for training the initial text comparison model, and the amount of training data can be continuously expanded as the time span of the log data increases. Therefore, when the amount of training data is large enough, the prediction results of the similarity prediction performed by the text comparison model obtained after training are relatively accurate. Furthermore, by performing similarity prediction processing on the low-frequency intent data and the standard text of the preset intent category through the text comparison model, the low-frequency intent samples with high similarity to the standard text of the preset intent category in the low-frequency intent data can be determined. When the acquired log data increases with time, a large number of low-frequency intent samples of the preset intent category can be accumulated by using the growing log data and the text comparison model, thereby meeting the training requirements of the intent recognition model corresponding to the low-frequency intent samples and improving the recognition accuracy of the low-frequency intent.

Based on the same technical concept as the method embodiment of FIG1 , the embodiment of the present application also provides another sample generation method. FIG2 is a processing flow chart of another sample generation method provided by the embodiment of the present application.

As shown in FIG. 2 , the model acquisition stage includes steps S202 to S204 .

Step S202: unsupervised contrastive learning training.

Step S202 may refer to the corresponding description part of the embodiment of FIG. 1 in which “the text comparison model is a model obtained by training the initial text comparison model based on the training sample set; the training sample set is constructed based on low-frequency intent data”.

Step S204, obtaining a comparative learning model.

The data recall stage includes steps S206 to S210.

Step S206, adjusting the threshold precision recall.

The threshold value may be a preset similarity threshold value. The adjustment threshold value in step S206 may be a preset similarity threshold value. The initial value of the similarity threshold. Precision recall can be based on the comparison result of the text comparison result and the preset similarity threshold to determine whether the low-frequency intent text is similar sample data to be inspected.

Step S208: Manual quality inspection to see if it is qualified.

If qualified, the manual quality inspection ends; if unqualified, step S210 is executed.

Step S210, adjusting the threshold wide recall.

The adjustment threshold in step S210 may be obtained by subtracting a preset reduction value from the current similarity threshold to obtain an updated similarity threshold. Wide recall may be determined based on the comparison result between the text comparison result and the current similarity threshold to determine whether the low-frequency intended text is similar sample data to be inspected.

For steps S206 , S208 and S210 , reference may be made to the corresponding description of step S108 in the embodiment of FIG. 1 .

Based on the same technical concept as the method embodiment of Figure 1, the embodiment of the present application also provides a method for training a text contrast model. Figure 3 is a schematic diagram of a text contrast model training method provided by the embodiment of the present application.

As shown in FIG3 , a batch size data may include n sample data: sample data 1, i.e., sample data 301 in FIG3 , sample data 2, i.e., sample data 302 in FIG3 , ... sample data n. n is a natural number greater than 0. The n sample data are input into the encoder 303 for encoding. The encoder 303 may generate an x sample 304 and a similar sample 305 based on the sample data 301. The x sample 304 and the similar sample 305 are two samples with the same semantics but different formats obtained after the same sample data is encoded in different ways. The encoder 303 may generate a non-similar sample 1 based on the sample data 302, i.e., a non-similar sample 306 in FIG3 , ... The encoder 303 may generate a non-similar sample n based on the sample data n. The x sample 304 and the similar sample 305 may constitute a similar sample pair. The x sample 304 and the non-similar sample 306 may constitute a non-similar sample pair.

Based on similar sample pairs and multiple non-similar sample pairs, the initial text comparison model can be iteratively trained to obtain a text comparison model.

Based on the same technical concept as the method embodiment of FIG1 , the embodiment of the present application also provides a sample generation method applied to the field of robotics. Business process diagram of the method.

Step S402, the robot goes online.

The robot may be a robot with automatic answering capability, which may call an intent recognition model to perform intent recognition on the text, obtain the user's intent, and then automatically answer based on the user's intent.

The robot going online means that the robot enters a working state, and the robot can automatically respond to the obtained text in the working state.

Step S404: log analysis.

The log may be the work log data of the robot, including but not limited to: the text to be responded to received by the robot, the record data of the robot's intention recognition of the text, and the robot's response record data, etc.

Step S406: the algorithm tool recalls similar question data.

Step S408: manual labeling and quality inspection.

For step S406 and step S408 , reference may be made to the corresponding description portion of step S108 in the embodiment of FIG. 1 .

Step S410: Add new labeled data to the model and perform iterative training.

The model may be an intent recognition model, which may be used to recognize whether a text contains a low-frequency intent.

Step S412: The new robot comes online and the iteration continues.

Based on the same technical concept as the above-mentioned sample generation method embodiments, the present application embodiment also provides a method for training an intent recognition model. FIG5 is a processing flow chart of a method for training an intent recognition model provided by the present application embodiment.

Step S502: Generate low-frequency intention samples through a sample generation method.

Specifically, the low-frequency intention sample may be generated by the sample generation method described above in the present application.

Step S504 inputs the low-frequency intent samples into the initial intent recognition model for iterative training to obtain the intent recognition model.

The initial intent recognition model may be a low-frequency intent classification model in which all parameters to be trained take initial values and the model has not been fine-tuned. The low-frequency intent classification model may be a pre-trained language model. Pre-trained language models include but are not limited to: BERT (Bidirectional Encoder Representations from Transformers) model, or RoBERTa (a Robustly Optimized BERT Pretraining Approach) model, etc.

The intent recognition model obtained after iterative training can be used to identify whether the text contains low-frequency intent.

In the training method embodiment of the intent recognition model as shown in FIG5 , a low-frequency intent sample is generated by the sample generation method provided by the above-mentioned sample generation method embodiment; the low-frequency intent sample is input into the initial intent recognition model for iterative training to obtain the intent recognition model. Log data is a kind of historical data that continues to grow over time. Even if the frequency of occurrence of low-frequency intent data in the log data is low, if the time span corresponding to the log data is long enough, a large amount of accumulated low-frequency intent data can be screened from the log data, and based on the large amount of low-frequency intent data, a sufficient amount of training data can be generated for training the initial text comparison model, and the amount of training data can be continuously expanded as the time span of the log data increases. Therefore, when the amount of training data is large enough, the prediction results of similarity prediction performed by the text comparison model obtained after training are relatively accurate. Furthermore, by performing similarity prediction processing on the low-frequency intent data and the standard text of the preset intent category through the text comparison model, the low-frequency intent samples with high similarity to the standard text of the preset intent category in the low-frequency intent data can be determined. When the acquired log data increases with time, a large number of low-frequency intent samples of the preset intent category can be accumulated by using the growing log data and the text comparison model. Then, the initial intent recognition model can be iteratively trained using the low-frequency intent samples of the preset intent category, which can achieve better training results and ensure that the intent recognition model obtained after training has higher recognition accuracy for low-frequency intents.

Based on the same technical concept as the above-mentioned sample generation method embodiments, the present application embodiment also provides an intention recognition method applied to a digital human. Figure 6 is a processing flow chart of an intention recognition method applied to a digital human provided by the present application embodiment.

Step S602: obtaining the text to be recognized input by the user.

Step S604, input the text to be recognized into the intention recognition model for intent recognition to obtain the user intention; the intention recognition model is obtained by inputting low-frequency intention samples into the initial intention recognition model for iterative training; the low-frequency intention samples are generated by a sample generation method.

Specifically, the low-frequency intent samples may be generated by the sample generation method described above in the present application. The initial intent recognition model and the intent recognition model may refer to the corresponding description part of the embodiment of the training method of the intent recognition model shown in FIG5 .

Step S606, according to the user intention, the target text corresponding to the user intention is obtained in the digital human system, and the target text is displayed.

The digital human system may store a pre-configured correspondence between preset user intentions and preset texts. According to the user intention obtained in step S604 and the correspondence between the preset user intentions and preset texts, the target text corresponding to the user intention may be queried in the digital human system and displayed.

Based on the above content, it can be known that in some implementations of the present application, a target text corresponding to the user intention is obtained in the digital human system according to the user intention, including: querying the digital human system to obtain the target text corresponding to the user intention according to the correspondence between the user intention and the pre-configured preset user intention and the preset text.

In the digital human scenario, the preset user intent can be a pre-configured low-frequency intent, such as "early repayment", and the preset text can be the response text predetermined by the digital human system for the low-frequency intent, such as "You can make an appointment for this service with xxx according to xxx".

In the embodiment of the intention recognition method applied to a digital human as shown in FIG6, first, the text to be recognized input by the user is obtained; secondly, the text to be recognized is input into the intention recognition model for intention recognition to obtain the user's intention; the intention recognition model is obtained by inputting low-frequency intention samples into the initial intention recognition model for iterative training; the low-frequency intention samples are generated by the sample generation method provided by the aforementioned sample generation method embodiment; finally, according to the user's intention, the target text corresponding to the user's intention is obtained in the digital human system, and the target text is displayed. Log data is a kind of historical data that grows continuously over time. Even if the frequency of occurrence of low-frequency intention data in log data is low, When the time span corresponding to the log data is long enough, a large amount of low-frequency intent data can be screened from the log data to obtain a large amount of accumulated low-frequency intent data. Based on the large amount of low-frequency intent data, a sufficient amount of training data can be generated for training the initial text comparison model, and the amount of training data can be continuously expanded as the time span of the log data increases. Therefore, when the amount of training data is large enough, the prediction result of similarity prediction by the text comparison model obtained after training is relatively accurate. Then, by performing similarity prediction processing on the low-frequency intent data and the standard text of the preset intent category through the text comparison model, the low-frequency intent samples with high similarity to the standard text of the preset intent category in the low-frequency intent data can be determined. When the acquired log data increases with time, a large amount of low-frequency intent samples of the preset intent category can be accumulated by using the growing log data and the text comparison model. Then, the initial intent recognition model is iteratively trained by using the low-frequency intent samples of the preset intent category, and a good training effect can be achieved, so that the intent recognition model obtained after training has a high recognition accuracy for low-frequency intent. Then, the accurate user intent obtained by recognition can be used to obtain and display the target text that meets the user intent from the digital human system, thereby improving the user experience.

In the above-mentioned embodiment, a sample generation method is provided, and correspondingly, a sample generation device is also provided, which will be described below with reference to the accompanying drawings.

FIG. 7 is a schematic diagram of a sample generating device provided in an embodiment of the present application.

The present embodiment provides a sample generation device, including: a first acquisition unit 701, used to acquire log data to be processed; the log data includes text and intent recognition results of the text; a screening unit 702, used to perform data screening processing on the log data according to the intent recognition results of the text, to obtain low-frequency intent data; a prediction unit 703, used to input the low-frequency intent data and standard text of a preset intent category into a text comparison model for similarity prediction processing, to obtain text comparison results corresponding to the low-frequency intent data; the text comparison model is a model obtained by training an initial text comparison model based on a training sample set; the training sample set is constructed based on the low-frequency intent data; a first generation unit 704, used to generate a low-frequency intent sample according to the text comparison result and a preset similarity threshold.

Optionally, the screening unit 702 includes: a classification subunit, configured to input the log data into a high-frequency intent classification model to obtain the first log data and the confidence of the intent classification result of the first log data; The intent classification result of the first log data is a preset high-frequency intent; the high-frequency intent classification model is used to perform intent classification processing on the log data according to the intent recognition result of the text in the log data; the screening subunit is used to perform data screening processing on the log data according to the first log data and the confidence of the intent classification result of the first log data to obtain low-frequency intent data.

Optionally, the screening subunit is specifically used to: determine high-frequency intent data based on a comparison result of the confidence of the intent classification result of the first log data with a preset confidence threshold; delete the high-frequency intent data in the log data to obtain low-frequency intent data.

Optionally, the initial text comparison model includes an encoder and a similarity prediction module connected in sequence; the output of the encoder is the input of the similarity prediction module; the encoder is used to perform encoding processing based on the low-frequency intent data to obtain similar sample pairs and dissimilar sample pairs corresponding to the low-frequency intent data; the similarity prediction module is used to perform iterative training based on similar sample pairs and dissimilar sample pairs corresponding to the low-frequency intent data.

Optionally, the low-frequency intent data includes target text and non-target text; the encoder is specifically used to: perform encoding processing according to the target text to obtain a target encoding result and a similar encoding result corresponding to the target text, and perform encoding processing according to the non-target text to obtain an encoding result corresponding to the non-target text; determine the target encoding result and the similar encoding result corresponding to the target text as similar sample pairs corresponding to the low-frequency intent data; determine the target encoding result corresponding to the target text and the encoding result corresponding to the non-target text as non-similar sample pairs corresponding to the low-frequency intent data.

Optionally, the encoder includes an attention layer and a fully connected layer connected in sequence; the output of the attention layer is the input of the fully connected layer; the attention layer is used to perform a first encoding process according to a preset first random inactivation probability and the low-frequency intent data to obtain intermediate encoded data; the fully connected layer is used to perform conversion processing according to a preset second random inactivation probability and the intermediate encoded data to obtain similar sample pairs and non-similar sample pairs corresponding to the low-frequency intent data.

Optionally, the low-frequency intent data includes a plurality of low-frequency intent texts; the text comparison model is specifically used to: determine each low-frequency intent text and a standard text of a preset intent category as a similar sample pair corresponding to each low-frequency intent text; and perform similarity prediction on the similar sample pairs corresponding to each low-frequency intent text. The similarity score of each low-frequency intent text is obtained by performing measurement processing; and the similarity score of each low-frequency intent text is determined as the text comparison result corresponding to the low-frequency intent data.

Optionally, the first generating unit 704 is specifically used to: determine the number of similar sample data corresponding to the preset similarity threshold according to the comparison result between the preset similarity threshold and the text comparison result; if the number of similar sample data corresponding to the low-frequency intent data is less than the preset number threshold, repeatedly perform the operation of subtracting the preset reduction value from the current similarity threshold to obtain an updated similarity threshold, and, according to the comparison result between the updated similarity threshold and the text comparison result, determine the number of similar sample data corresponding to the updated similarity threshold, until the updated similarity threshold meets the preset stop condition; the preset stop condition is that the number of samples is greater than or equal to the preset number threshold; the number of samples is the sum of the number of similar sample data corresponding to the preset similarity threshold and the number of similar sample data corresponding to each updated similarity threshold; each sample data in the similar sample data corresponding to the preset similarity threshold and the similar sample data corresponding to each updated similarity threshold is determined as a low-frequency intent sample corresponding to each sample data.

The sample generation device provided in the embodiment of the present application includes: a first acquisition unit, a screening unit, a prediction unit and a first generation unit, wherein the first acquisition unit is used to acquire the log data to be processed; the log data includes text and the intention recognition result of the text; the screening unit is used to perform data screening processing on the log data according to the intention recognition result of the text to obtain low-frequency intention data; the prediction unit is used to input the low-frequency intention data and the standard text of the preset intention category into the text comparison model for similarity prediction processing to obtain the text comparison result corresponding to the low-frequency intention data; the text comparison model is a model obtained by training the initial text comparison model based on the training sample set; the training sample set is constructed based on the low-frequency intention data; the first generation unit is used to generate a low-frequency intention sample according to the text comparison result and the preset similarity threshold. Log data is a kind of historical data that grows continuously with time. Even if the frequency of occurrence of low-frequency intention data in the log data is low, when the time span corresponding to the log data is long enough, a large amount of accumulated low-frequency intention data can be screened from the log data, and a sufficient amount of training data for training the initial text comparison model can be generated based on the large amount of low-frequency intention data, and the amount of training data can increase with the time span of the log data. Therefore, when the amount of training data is large enough, the prediction result of similarity prediction by the text comparison model obtained after training is relatively accurate. Then, by performing similarity prediction processing on the low-frequency intent data and the standard text of the preset intent category through the text comparison model, the low-frequency intent samples with high similarity to the standard text of the preset intent category in the low-frequency intent data can be determined. When the acquired log data increases with time, a large number of low-frequency intent samples of the preset intent category can be accumulated by using the growing log data and the text comparison model, thereby meeting the training requirements of the intent recognition model corresponding to the low-frequency intent samples and improving the recognition accuracy of the low-frequency intent.

In the above-mentioned embodiment, a method for training an intent recognition model is provided, and correspondingly, a device for training an intent recognition model is also provided, which will be described below in conjunction with the accompanying drawings.

FIG8 is a schematic diagram of a training device for an intent recognition model provided in an embodiment of the present application.

This embodiment provides a training device for an intent recognition model, including: a second generation unit 801, used to generate low-frequency intent samples through a sample generation method; a training unit 802, used to input the low-frequency intent samples into an initial intent recognition model for iterative training to obtain an intent recognition model.

The training device of the intent recognition model provided in the embodiment of the present application includes a second generation unit and a training unit, wherein the second generation unit is used to generate low-frequency intent samples through the sample generation method provided in the above-mentioned sample generation method embodiment; the training unit is used to input the low-frequency intent samples into the initial intent recognition model for iterative training to obtain the intent recognition model. Log data is a kind of historical data that grows continuously with time. Even if the low-frequency intent data has a low frequency of occurrence in the log data, when the time span corresponding to the log data is long enough, a large amount of accumulated low-frequency intent data can be screened from the log data, and a sufficient amount of training data can be generated based on the large amount of low-frequency intent data for training the initial text comparison model, and the amount of training data can be continuously expanded as the time span of the log data increases. Therefore, when the amount of training data is sufficient, the prediction result when similarity prediction is performed by the text comparison model obtained after training is more accurate. Then, by performing similarity prediction processing on the low-frequency intent data and the standard text of the preset intent category through the text comparison model, the low-frequency intent samples with higher similarity with the standard text of the preset intent category in the low-frequency intent data can be determined. When the acquired log data increases continuously with time, the unsatisfactory log data can be used to train the initial text comparison model. The continuously growing log data and text comparison model accumulate a large number of low-frequency intent samples of the preset intent category, and then use the low-frequency intent samples of the preset intent category to iteratively train the initial intent recognition model, which can achieve better training results and make the intent recognition model obtained after training have higher recognition accuracy for low-frequency intents.

In the above-mentioned embodiment, a method for identifying intentions applied to a digital human is provided, and correspondingly, a device for identifying intentions applied to a digital human is also provided, which will be described below in conjunction with the accompanying drawings.

FIG. 9 is a schematic diagram of an intention recognition device for a digital human provided in an embodiment of the present application.

This embodiment provides an intention recognition device applied to a digital human, comprising: a second acquisition unit 901, used to acquire a text to be recognized input by a user; a recognition unit 902, used to input the text to be recognized into an intention recognition model for intention recognition, and obtain the user intention; the intention recognition model is obtained by inputting a low-frequency intention sample into an initial intention recognition model for iterative training; the low-frequency intention sample is generated by the above-mentioned sample generation method; a display unit 903, used to obtain a target text corresponding to the user intention in the digital human system according to the user intention, and display the target text.

The intention recognition device for digital human provided by the embodiment of the present application includes a second acquisition unit, an identification unit and a display unit, wherein the second acquisition unit is used to acquire the text to be recognized input by the user; the identification unit is used to input the text to be recognized into the intention recognition model for intention recognition to obtain the user intention; the intention recognition model is obtained by inputting low-frequency intention samples into the initial intention recognition model for iterative training; the low-frequency intention samples are generated by the sample generation method provided by the aforementioned sample generation method embodiment; the display unit is used to obtain the target text corresponding to the user intention in the digital human system according to the user intention, and display the target text. Log data is a kind of historical data that grows with time. Even if the frequency of occurrence of low-frequency intention data in the log data is low, when the time span corresponding to the log data is long enough, a large amount of accumulated low-frequency intention data can be screened from the log data, and based on the large amount of low-frequency intention data, a sufficient amount of training data for training the initial text comparison model can be generated, and the amount of training data can be continuously expanded as the time span of the log data increases. Therefore, when the amount of training data is large enough, through The text comparison model obtained after training has relatively accurate prediction results when performing similarity prediction. Furthermore, by using the text comparison model to perform similarity prediction processing on low-frequency intent data and standard text of preset intent categories, low-frequency intent samples with high similarity to standard text of preset intent categories in low-frequency intent data can be determined. When the acquired log data increases with time, a large number of low-frequency intent samples of preset intent categories can be accumulated using the growing log data and text comparison model. Then, the initial intent recognition model can be iteratively trained using the low-frequency intent samples of the preset intent categories, which can achieve better training results. This makes the intent recognition model obtained after training have high recognition accuracy for low-frequency intents. Then, the accurate user intent obtained by recognition can be used to obtain and display the target text that meets the user intent from the digital human system, thereby improving the user experience.

Corresponding to a sample generation method described above, or, corresponding to a training method for an intent recognition model described above, or, corresponding to a method for intent recognition applied to a digital human described above, based on the same technical concept, an embodiment of the present application also provides an electronic device, which is used to execute one or more of the sample generation method, the training method for the intent recognition model, and the intent recognition method applied to a digital human provided above. Figure 10 is a structural schematic diagram of an electronic device provided in an embodiment of the present application.

As shown in FIG10 , electronic devices may have relatively large differences due to different configurations or performances, and may include one or more processors 1001 and memory 1002, and the memory 1002 may store one or more storage applications or data. Among them, the memory 1002 may be a short-term storage or a persistent storage. The application stored in the memory 1002 may include one or more modules (not shown in the figure), and each module may include a series of computer executable instructions in the electronic device. Furthermore, the processor 1001 may be configured to communicate with the memory 1002 and execute a series of computer executable instructions in the memory 1002 on the electronic device. The electronic device may also include one or more power supplies 1003, one or more wired or wireless network interfaces 1004, one or more input/output interfaces 1005, one or more keyboards 1006, etc.

In a specific embodiment, the electronic device includes a memory and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs are stored in the memory. The program may include one or more modules, and each module may include a series of computer executable instructions in the electronic device, and is configured to be executed by one or more processors. The one or more programs include the following computer executable instructions: obtaining log data to be processed; the log data includes text and the intent recognition result of the text; according to the intent recognition result of the text, the log data is screened to obtain low-frequency intent data; the low-frequency intent data and the standard text of the preset intent category are input into the text comparison model for similarity prediction processing to obtain the text comparison result corresponding to the low-frequency intent data; the text comparison model is a model obtained by training the initial text comparison model based on the training sample set; the training sample set is constructed based on the low-frequency intent data; and a low-frequency intent sample is generated according to the text comparison result and the preset similarity threshold.

In another specific embodiment, an electronic device includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer executable instructions in the electronic device, and the one or more programs are configured to be executed by one or more processors, and include the following computer executable instructions: generating low-frequency intent samples by a sample generation method; inputting the low-frequency intent samples into an initial intent recognition model for iterative training to obtain an intent recognition model.

In another specific embodiment, the electronic device includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer executable instructions in the electronic device, and is configured to be executed by one or more processors. The one or more programs include the following computer executable instructions: obtaining text to be recognized input by the user; inputting the text to be recognized into an intention recognition model for intention recognition to obtain the user intention; the intention recognition model is obtained by inputting low-frequency intention samples into the initial intention recognition model for iterative training; the low-frequency intention samples are generated by a sample generation method; according to the user intention, a target text corresponding to the user intention is obtained in the digital human system, and the target text is displayed.

Corresponding to a sample generation method described above, or corresponding to an intention recognition method described above A training method for a recognition model, or, corresponding to the above-described method for recognizing intentions applied to digital humans, based on the same technical concept, an embodiment of the present application also provides a computer-readable storage medium.

In a specific embodiment, a computer-readable storage medium is used to store computer-executable instructions, which implement the following process when executed by a processor: obtaining log data to be processed; the log data includes text and intent recognition results of the text; based on the intent recognition results of the text, the log data is screened to obtain low-frequency intent data; the low-frequency intent data and standard text of a preset intent category are input into a text comparison model for similarity prediction processing to obtain text comparison results corresponding to the low-frequency intent data; the text comparison model is a model obtained by training an initial text comparison model based on a training sample set; the training sample set is constructed based on the low-frequency intent data; and a low-frequency intent sample is generated based on the text comparison result and a preset similarity threshold.

In another specific embodiment, a computer-readable storage medium is used to store computer-executable instructions, which implement the following process when executed by a processor: generating low-frequency intent samples through a sample generation method; inputting the low-frequency intent samples into an initial intent recognition model for iterative training to obtain an intent recognition model.

In another specific embodiment, a computer-readable storage medium is used to store computer-executable instructions, which implement the following process when executed by a processor: obtaining a text to be recognized input by a user; inputting the text to be recognized into an intention recognition model for intent recognition to obtain the user's intention; the intention recognition model is obtained by inputting low-frequency intention samples into an initial intention recognition model for iterative training; the low-frequency intention samples are generated by a sample generation method; and according to the user's intention, a target text corresponding to the user's intention is obtained in the digital human system, and the target text is displayed.

It should be noted that the embodiment of the computer-readable storage medium in this specification and at least one of the embodiments of the sample generation method, the training method of the intent recognition model, and the intent recognition method applied to a digital human in this specification are based on the same inventive concept. Therefore, the specific implementation of this embodiment can refer to the implementation of the corresponding method mentioned above, and the repeated parts will not be repeated.

The above describes certain embodiments of the present specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in different The processes described in the accompanying drawings do not necessarily require the specific order or sequential order shown to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

Those skilled in the art will appreciate that the embodiments of the present application may be provided as methods, systems or computer program products. Therefore, the embodiments of the present application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment in combination with software and hardware. Moreover, this specification may adopt the form of a computer program product implemented on one or more computer-readable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

This specification is described with reference to the flowchart and/or block diagram of the method, device (system), and computer program product according to the embodiment of this specification. It should be understood that each process and/or box in the flowchart and/or block diagram, as well as the combination of the process and/or box in the flowchart and/or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processor or other programmable device to produce a machine, so that the instructions executed by the processor of the computer or other programmable device produce a device for implementing the functions specified in one process or multiple processes in the flowchart and/or one box or multiple boxes in the block diagram.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

These computer program instructions may also be loaded onto a computer or other programmable device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

In a typical configuration, a computing device includes one or more processors (CPU), input/output interfaces, network interfaces, and memory.

Memory may include non-permanent storage in computer-readable media, random access memory Memory is an example of a computer-readable medium.

Computer readable media include permanent and non-permanent, removable and non-removable media that can be implemented by any method or technology to store information. Information can be computer readable instructions, data structures, program modules or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, magnetic cassettes, disk storage or other magnetic storage devices or any other non-transmission media that can be used to store information that can be accessed by a computing device. As defined in this article, computer readable media does not include temporary computer readable media (transitory media), such as modulated data signals and carrier waves.

It should also be noted that the terms "include", "comprises" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, commodity or device. In the absence of more restrictions, the elements defined by the sentence "comprises a ..." do not exclude the existence of other identical elements in the process, method, commodity or device including the elements.

Embodiments of the present application may be described in the general context of computer-executable instructions executed by a computer, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. One or more embodiments of the present specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices connected via a communications network. In a distributed computing environment, program modules may be located in local and remote computer storage media, including storage devices.

The various embodiments in this specification are described in a progressive manner. The same or similar parts between the various embodiments can be referred to each other. Each embodiment focuses on the differences from other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the partial description of the method embodiment.

The above description is only an embodiment of this document and is not intended to limit this document. For those skilled in the art, this document may have various changes and variations. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this document should be included in the scope of the claims of this document.

Claims

A sample generation method, comprising:

Obtaining log data to be processed; the log data includes text and intent recognition results of the text;

According to the intention recognition result of the text, the log data is screened to obtain low-frequency intention data;

The low-frequency intent data and the standard text of the preset intent category are input into a text comparison model for similarity prediction processing to obtain a text comparison result corresponding to the low-frequency intent data; the text comparison model is a model obtained by training an initial text comparison model based on a training sample set; the training sample set is constructed based on the low-frequency intent data;

A low-frequency intent sample is generated according to the text comparison result and a preset similarity threshold.
The method according to claim 1, wherein the step of performing data screening on the log data according to the intent recognition result of the text to obtain low-frequency intent data comprises:

Input the log data into a high-frequency intent classification model to obtain first log data and the confidence of the intent classification result of the first log data; the intent classification result of the first log data is a preset high-frequency intent; the high-frequency intent classification model is used to perform intent classification processing on the log data according to the intent recognition result of the text in the log data;

According to the first log data and the confidence of the intent classification result of the first log data, the log data is screened to obtain low-frequency intent data.
The method according to claim 2, wherein the step of performing data screening on the log data to obtain low-frequency intent data based on the first log data and the confidence level of the intent classification result of the first log data comprises:

Determining high-frequency intent data according to a comparison result of the confidence of the intent classification result of the first log data with a preset confidence threshold;

The high-frequency intention data in the log data is deleted to obtain the low-frequency intention data.
The method according to claim 1, wherein

The initial text comparison model includes an encoder and a similarity prediction module connected in sequence; the output of the encoder is the input of the similarity prediction module;

The method further comprises:

The encoder performs encoding processing according to the low-frequency intention data to obtain similar sample pairs and non-similar sample pairs corresponding to the low-frequency intention data;

The similarity prediction module performs iterative training based on the similar sample pairs and the non-similar sample pairs corresponding to the low-frequency intent data.
The method according to claim 4, wherein the low-frequency intent data includes target text and non-target text;

The method further comprises: the encoder performs encoding processing according to the target text to obtain a target encoding result and a similar encoding result corresponding to the target text, and performs encoding processing according to the non-target text to obtain an encoding result corresponding to the non-target text;

The target encoding result and the similar encoding result corresponding to the target text are determined as similar sample pairs corresponding to the low-frequency intention data; the target encoding result corresponding to the target text and the encoding result corresponding to the non-target text are determined as non-similar sample pairs corresponding to the low-frequency intention data.
The method according to claim 4, wherein the encoder comprises an attention layer and a fully connected layer connected in sequence; the output of the attention layer is the input of the fully connected layer;

The method further comprises:

The attention layer performs a first encoding process according to a preset first random inactivation probability and the low-frequency intention data to obtain intermediate encoded data;

The fully connected layer performs conversion processing according to a preset second random inactivation probability and the intermediate coded data to obtain similar sample pairs and non-similar sample pairs corresponding to the low-frequency intent data.
The method according to claim 1, wherein the low-frequency intent data comprises a plurality of low-frequency intent texts; the method further comprises:

The text comparison model determines each low-frequency intent text and a standard text of a preset intent category as a similar sample pair corresponding to each low-frequency intent text;

Perform similarity prediction processing on similar sample pairs corresponding to each low-frequency intent text to obtain a similarity score for each low-frequency intent text; determine the similarity score for each low-frequency intent text as the text comparison result corresponding to the low-frequency intent data.
The method according to any one of claims 1 to 7, wherein generating a low-frequency intent sample according to the text comparison result and a preset similarity threshold comprises:

Determining the number of similar sample data corresponding to the preset similarity threshold according to a comparison result between the preset similarity threshold and the text comparison result;

If the number of similar sample data corresponding to the low-frequency intent data is less than a preset number threshold, the current similarity threshold is subtracted from the preset reduction value to obtain an updated similarity threshold, and, based on the comparison result between the updated similarity threshold and the text comparison result, the updated number of similar sample data corresponding to the updated similarity threshold is determined; if the sample number is greater than or equal to the preset number threshold, the number of similar sample data corresponding to the preset similarity threshold is determined to be the final updated number.
The method according to claim 1, wherein the step of performing data screening on the log data according to the intent recognition result of the text to obtain low-frequency intent data comprises:

Determining, based on the intention recognition result of the text, whether the intention recognition result of the text is a preset high-frequency intention;

If the intention recognition result of the text is a preset high-frequency intention, deleting the text and the intention recognition result of the text from the log data;

If the intent recognition result of the text is not the preset high-frequency intent, the text and the The intent recognition result of the text is used as the low-frequency intent data.
The method according to claim 1, wherein obtaining the log data to be processed comprises:

Acquire the dialogue data in the log data to be processed; the dialogue data includes the text of the question raised by the customer and the text of the robot's response to the customer;

According to the correspondence between the answer text and the pre-configured intention recognition result and the answer text, query to obtain the intention recognition result of the question text;

The question text is determined as the text in the log data, and the intention recognition result of the question text is determined as the intention recognition result of the text in the log data.
The method according to claim 4, wherein the sample generation method further comprises:

In the case that the text lengths of the non-similar sample pairs are different, the text with a shorter text length in the non-similar sample pairs is extended by using punctuation marks.
A method for training an intent recognition model, comprising:

Generate a low-frequency intention sample by using the sample generation method according to any one of claims 1 to 11;

The low-frequency intent samples are input into the initial intent recognition model for iterative training to obtain the intent recognition model.
An intention recognition method applied to a digital human, comprising:

Get the text to be recognized input by the user;

Inputting the text to be recognized into an intention recognition model for intention recognition to obtain user intention; the intention recognition model is obtained by inputting low-frequency intention samples into an initial intention recognition model for iterative training; the low-frequency intention samples are generated by the sample generation method according to any one of claims 1 to 11;

According to the user intention, a target corresponding to the user intention is obtained in the digital human system. text, and display the target text.
The method according to claim 13, wherein the step of obtaining a target text corresponding to the user intention in the digital human system according to the user intention comprises:

According to the user intention and the correspondence between the pre-configured preset user intention and the preset text, the target text corresponding to the user intention is queried in the digital human system.
A sample generating device, comprising:

A first acquisition unit is used to acquire log data to be processed; the log data includes text and an intention recognition result of the text;

A screening unit, configured to perform data screening processing on the log data according to the intention recognition result of the text, to obtain low-frequency intention data;

A prediction unit is used to input the low-frequency intent data and the standard text of the preset intent category into a text comparison model for similarity prediction processing to obtain a text comparison result corresponding to the low-frequency intent data; the text comparison model is a model obtained by training an initial text comparison model based on a training sample set; the training sample set is constructed based on the low-frequency intent data;

The first generating unit is used to generate a low-frequency intent sample according to the text comparison result and a preset similarity threshold.
An electronic device, comprising:

A processor; and a memory configured to store computer executable instructions, which, when executed, cause the processor to execute the sample generation method as described in any one of claims 1 to 11, or the training method for the intent recognition model as described in claim 12, or the intent recognition method applied to a digital human as described in any one of claims 13 to 14.
A computer-readable storage medium for storing computer-executable instructions, wherein the computer-executable instructions, when executed by a processor, implement the The sample generation method as described in any one of 1-11, or the training method of the intention recognition model as described in claim 12, or the intention recognition method applied to a digital human as described in claims 13-14.