CN114510570A

CN114510570A - Intention classification method and device based on small sample corpus and computer equipment

Info

Publication number: CN114510570A
Application number: CN202210071898.6A
Authority: CN
Inventors: 吴粤敏; 舒畅; 陈又新
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-01-21
Filing date: 2022-01-21
Publication date: 2022-05-17
Also published as: WO2023137911A1

Abstract

The embodiment of the application provides an intention classification method, an intention classification device and computer equipment based on a small sample corpus, wherein the method comprises the following steps: constructing a first sample data set, wherein the first sample data set comprises a labeled sample data set and an unlabeled sample data set; obtaining a plurality of initial weak classification models; training each initial weak classification model based on the sample data set with the labels to obtain a target weak classification model; inputting the unlabelled sample data set into each target weak classification model to obtain a first prediction classification label; constructing a second sample data set based on the unlabeled sample data set and the first prediction classification label; training an initial intention classification model based on the second sample data set to obtain a target intention classification model, performing intention classification by using the target intention classification model, and outputting a classification result. Through the steps, more accurate intention classification can be realized on the basis of a small amount of marked samples and unmarked samples.

Description

Intention classification method and device based on small sample corpus and computer equipment

Technical Field

The application relates to the field of artificial intelligence, in particular to an intention classification method and device based on a small sample corpus and computer equipment.

Background

The intention classification is an important function of the robot, and the intention classification means that the robot needs to recognize the intention of the user. For example, user A: "help me check how the weather is on tomorrow", the intent of user a is "check weather". And a user B: "you are you, what your name calls", user B's intention is chatty. The chat robot can lay a foundation for subsequent processing (such as word slot extraction) only by accurately identifying the intention of the user, and the satisfaction degree of the user is improved.

The robot intention classification can be generally divided into a chatty type and a task type, and in the related art, the intention classification is generally realized by adopting a supervised learning-based method which needs a large amount of manually labeled data. In an actual application scenario, few or almost no usable corpora related to an actual service can be obtained along with the accumulation of a session log for a period of time, but manually labeling the corpora data consumes a lot of manpower, material resources and financial resources.

Disclosure of Invention

The method, the device and the computer equipment aim at solving the problems of the prior art to at least a certain extent, and the intention classification method, the device and the computer equipment based on the small sample corpus can obtain an intention classification model which does not need to be finely adjusted on the basis of a small number of marked samples and unmarked samples, so that manpower, material resources and financial resources are saved, and intention classification is realized.

The technical scheme of the embodiment of the application is as follows:

in a first aspect, the present application provides a method for intent classification based on a small sample corpus, the method comprising:

constructing a first sample data set, wherein the first sample data set comprises a labeled sample data set and an unlabeled sample data set, the labeled sample data set comprises a plurality of first sample corpora and labeled labels corresponding to the first sample corpora, and the unlabeled sample data set comprises a plurality of unlabeled second sample corpora;

obtaining a plurality of initial weak classification models, wherein the initial weak classification models are provided with different template-classification label pairs, and the template-classification label pairs are used for representing the mapping relation between a template and a classification label;

training each initial weak classification model based on the sample data set with the labels to obtain target weak classification models which correspond to the initial weak classification models one by one;

inputting each second sample corpus in the unlabeled sample data set into each target weak classification model to obtain a first prediction classification label corresponding to the second sample corpus, which is output by each target weak classification model;

constructing a second sample data set based on all the second sample corpora and the first prediction classification labels corresponding to the second sample corpora;

training an initial intention classification model based on the second sample data set to obtain a target intention classification model;

and performing intention classification by using the target intention classification model, and outputting an intention classification result.

According to some embodiments of the present application, the set of tagged sample data comprises a training set of tagged samples and a testing set of tagged samples;

training each initial weak classification model based on the labeled sample data set to obtain target weak classification models corresponding to the initial weak classification models one by one, and the method comprises the following steps:

training each initial weak classification model based on the labeled sample training set to obtain target weak classification models which correspond to the initial weak classification models one by one;

after obtaining the target weak classification models corresponding to the initial weak classification models one by one, the method further includes:

traversing each target weak classification model, and aiming at the currently traversed target weak classification model, executing the following processing:

inputting each first sample corpus in the labeled sample test set into the target weak classification model, so that the target weak classification model outputs a second predicted classification label corresponding to each first sample corpus;

and determining the prediction accuracy of the target weak classification model according to the second prediction classification label and the labeling label corresponding to each first sample corpus.

According to some embodiments of the present application, the training each of the initial weak classification models based on the training set of labeled samples to obtain target weak classification models corresponding to the plurality of initial weak classification models one by one includes:

inputting each first sample corpus of the labeled sample training set into the initial weak classification model, so that the initial weak classification model outputs a third predicted classification label;

determining a value of a first loss function of the initial weak classification model according to the third prediction classification label and the labeling label corresponding to each first sample corpus;

under the condition that the value of the first loss function meets a preset training end condition, ending the training to obtain the target weak classification model;

and under the condition that the value of the first loss function does not meet a preset training end condition, adjusting the model parameters of the initial weak classification model, and continuing training the initial weak classification model based on the training set with the labeled samples.

According to some embodiments of the present application, the constructing a second sample data set based on all the second sample corpora and the first prediction classification tags corresponding to the second sample corpora includes:

determining a first prediction classification label corresponding to each second sample corpus according to the prediction accuracy of each target weak classification model and a fourth prediction classification label output by each target weak classification model;

and constructing the second sample data set according to all the second sample corpora and the first prediction classification labels corresponding to the second sample corpora.

According to some embodiments of the present application, the first prediction classification label corresponding to each second sample corpus is determined according to the prediction accuracy of each target weak classification model and the fourth prediction classification label output by each target weak classification model, and the calculation formula is as follows:

wherein ω (p) represents the prediction accuracy, S_p(L | X) represents the fourth predicted classification label,

p represents the template-class label pair, x represents the second sample corpus, and l represents the first prediction class label.

According to some embodiments of the present application, training the initial intention classification model based on the second sample data set to obtain a target intention classification model includes:

inputting each second sample corpus of the second sample data set into the initial intention classification model, so that the initial intention classification model outputs a fifth predicted classification label;

determining a value of a second loss function of the initial intention classification model according to the fifth prediction classification label and the first prediction classification label corresponding to each second sample corpus;

under the condition that the value of the second loss function meets a preset training end condition, ending the training to obtain the target intention classification model;

and under the condition that the value of the second loss function does not meet a preset training end condition, adjusting the model parameters of the initial weak classification model, and continuing training the initial weak classification model based on the training set with the labeled samples.

According to some embodiments of the present application, the determining a value of a second loss function of the initial intention classification model according to the fifth prediction classification label and the first prediction classification label corresponding to each of the second sample corpora includes:

calculating the distance between the distribution of the fifth prediction classification label and the distribution of the first prediction classification label by utilizing the KL divergence to obtain a KL divergence value;

determining a value of the second loss function according to the KL divergence value.

In a second aspect, the present application provides an intent classification apparatus based on a small sample corpus, comprising:

the data set construction module is used for constructing a first sample data set, wherein the first sample data set comprises a labeled sample data set and a non-labeled sample data set, and the labeled data set comprises a plurality of first sample corpora and labeled labels corresponding to the first sample corpora, and the non-labeled sample data set comprises a plurality of second sample corpora without labeled labels;

the model acquisition module is used for acquiring a plurality of initial weak classification models, wherein the initial weak classification models are provided with different template-classification label pairs, and the template-classification label pairs are used for representing the mapping relation between a template and a classification label;

the first training module is used for training each initial weak classification model based on the sample data set with the labels to obtain target weak classification models which correspond to the initial weak classification models one by one;

the data labeling module is used for inputting each second sample corpus in the unlabeled sample data set into each target weak classification model to obtain a first prediction classification label which is output by each target weak classification model and corresponds to the second sample corpus;

the data set construction module is further used for constructing a second sample data set based on all the second sample corpora and the first prediction classification labels corresponding to the second sample corpora;

the second training module is used for training the initial intention classification model based on the second sample data set to obtain a target intention classification model;

and the processing module is used for carrying out intention classification by utilizing the target intention classification model and outputting an intention classification result.

In a third aspect, the present application provides a computer device comprising a memory and a processor, the memory having stored therein computer-readable instructions which, when executed by one or more of the processors, cause the one or more processors to perform the steps of any one of the methods described above in the first aspect.

In a fourth aspect, the present application also provides a computer-readable storage medium readable by a processor, the storage medium storing computer instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of any of the methods described above in the first aspect.

The technical scheme provided by the embodiment of the application has the following beneficial effects:

according to the embodiment of the application, a first sample data set is constructed, and the first sample data set comprises a sample data set with a label and a sample data set without the label; acquiring a plurality of initial weak classification models, wherein the initial weak classification models are provided with different template-classification label pairs, and the template-classification label pairs are used for representing the mapping relation between a template and a classification label; training each initial weak classification model based on the sample data set with the labels to obtain target weak classification models corresponding to the initial weak classification models one by one, and realizing a text blank filling form through a template-classification label pair so as to improve the accuracy rate of the predicted vocabulary; inputting each second sample corpus in the unlabeled sample data set into each target weak classification model to obtain a first prediction classification label corresponding to the second sample corpus, which is output by each target weak classification model; constructing a second sample data set based on all the second sample corpora and the first prediction classification labels corresponding to the second sample corpora; training the initial intention classification model based on the second sample data set to obtain a target intention classification model so as to increase the generalization of the model, performing intention classification by using the target intention classification model, and outputting an intention classification result. According to the embodiment of the application, the intention classification model which does not need to be finely adjusted can be obtained on the basis of a small number of marked samples and unmarked samples, so that the manpower, material resources and financial resources are saved, and the intention classification is realized.

Drawings

FIG. 1 is a flow chart illustrating a method for classifying intent based on a small sample corpus according to an embodiment of the present application;

FIG. 2 is a flow chart illustrating the sub-steps of step S130 in FIG. 1;

FIG. 3 is a flow chart illustrating a method for small sample corpus-based intent classification according to another embodiment of the present application;

FIG. 4 is a flow chart illustrating the sub-steps of step S131 in FIG. 2;

FIG. 5 is a flow chart illustrating the sub-steps of step S150 in FIG. 1;

FIG. 6 is a flow chart illustrating the sub-steps of step S160 in FIG. 1;

FIG. 7 is a flowchart illustrating the sub-steps of step S162 of FIG. 6;

FIG. 8 is a schematic structural diagram of an intent classification apparatus based on a small sample corpus according to an embodiment of the present application;

FIG. 9 is a schematic diagram of an initial weak classification network structure of a small sample corpus-based intention classification device according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a target weak classification network of the small sample corpus-based intention classification device according to an embodiment of the present application;

FIG. 11 is a schematic diagram illustrating an initial intent classification network structure of a small sample corpus-based intent classification apparatus according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of a computer device provided in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

It is to be noted that, unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

First, several terms referred to in the present application are resolved:

artificial Intelligence (AI): is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding human intelligence; artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produces a new intelligent machine that can react in a manner similar to human intelligence, and research in this field includes robotics, language recognition, image recognition, natural language processing, and expert systems, among others. The artificial intelligence can simulate the information process of human consciousness and thinking. Artificial intelligence is also a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results.

Natural Language Processing (NLP): natural language processing refers to a technology of interactive communication with a machine using natural language used for human communication. The natural language is processed by human, so that the computer can read and understand the natural language. Relevant research in natural language processing begins with human exploration of machine translation. Although natural language processing involves multidimensional operations such as speech, grammar, semantics, pragmatics and the like, in a simple aspect, the basic task of natural language processing is to perform word segmentation on a corpus to be processed based on an ontology dictionary, word frequency statistics, context semantic analysis and the like to form a term unit which takes the minimum part of speech as a unit and is rich in semantics.

Prompt (prompt) template: the prompt template, given an input text x, has a function f (x) to convert it into a set format, with the following two operations, using one template, typically a piece of natural language, and containing two empty positions: a location [ X ] for filling in X and a location [ Z ] for generating answer text Z; fill input X to the location of [ X ]. The prompt has an empty position to fill in the answer, and this position is usually in the sentence or at the end of the sentence, if in the sentence, this prompt is called cloze prompt; at the end of a period, such a prompt is called a prefix prompt. The position and number of [ X ] and [ Z ] may affect the result, and thus can be flexibly adjusted as required. The template format can be defined manually according to the actual business situation or automatically through a neural network.

Encoder Representation (Bidirective Encoder reproduction from transforms, Bert): the Bert model is an NLP model, further increases the generalization capability of a word vector model, fully describes the character level, the word level, the sentence level and even the relation characteristics between sentences, and is constructed based on a Transformer. There are three kinds of characterization vectors (embedding) in Bert, namely, TokenEmbedding, SegmentEmbedding, PositionEmbedding; wherein, TokenEmbendings is a word vector, the first word is a CLS mark, and the first word can be used for the subsequent classification task; SegmentEmbeddings are used to distinguish two sentences because pre-training does not only do LM but also do a classification task with two sentences as input; PositionEmbeddings, where the position word vector is not a trigonometric function in transform, but rather is learned by training of Bert. But the Bert directly trains a positionedboarding to reserve position information, a vector is randomly initialized at each position, model training is added, finally an embedding containing the position information is obtained, and the Bert selects direct splicing in the combination mode of the positionedboarding and the wordledding.

Loss function (lossfunction): the loss function is a function that maps the value of a random event or its associated random variable to a non-negative real number to represent the "risk" or "loss" of the random event. In application, the loss function is usually associated with the optimization problem as a learning criterion, i.e. the model is solved and evaluated by minimizing the loss function.

Cross Entropy Loss (Binary Cross-control Loss) function: the cross entropy loss function can predict the difference between the real value and the predicted value, and the quality between prediction models is judged through the loss value. Cross entropy loss functions are often used in classification problems, particularly when neural networks are used for classification problems, and cross entropy is also often used as a loss function, and in addition, cross entropy appears almost every time with a sigmoid (or softmax) function since it involves calculating the probability of each class. In the case of bisection, the final predicted result of the model is only two cases, and the probability obtained by prediction for each category is p and 1-p.

KL divergence (Kullback-Leiblerdcargence) loss function: the KL divergence loss function is an asymmetric measure of the difference between two probability distributions, the difference between the two distributions being judged by a loss value, the relative entropy of two random distributions being zero when they are identical and increasing when they are increasing. The KL divergence can be used to compare the similarity of the text, count the frequency of words first, and then calculate the KL divergence.

Based on this, the embodiment of the application provides an intention classification method, an intention classification device and computer equipment based on a small sample corpus, which can obtain an intention classification model without fine adjustment on the basis of a small number of labeled samples and unlabeled samples, thereby realizing intention classification.

The intention classification method based on the small sample corpus provided by the embodiment of the application can be used for acquiring and processing related data based on an Artificial Intelligence (AI) technology. AI is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. As artificial intelligence technology has been researched and developed in a wide variety of fields, it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and will play an increasingly important role.

Embodiments of the application are operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Referring to fig. 1, fig. 1 shows a schematic flow chart of an intent classification method based on small sample corpus according to an embodiment of the present application. The method includes, but is not limited to, step S110, step S120, step S130, step S140, step S150, step S160, and step S170.

Step S110, a first sample data set is constructed, wherein the first sample data set comprises a labeled sample data set and a non-labeled sample data set, the labeled sample data set comprises a plurality of first sample corpora and labeled labels corresponding to the first sample corpora, and the non-labeled sample data set comprises a plurality of second sample corpora without labeled labels.

It can be understood that in an industrial application scenario, there may not be corpus data related to an actual service, and corresponding corpus data is obtained through accumulation for a period of time, so as to construct a first sample data set, where the first sample data set includes a small part of sample data set τ 1 with a tag and a remaining part of sample data set τ 2 without a tag in the first sample data set, where the sample data set with a tag is tagged manually, and because the sample data set with a tag includes less corpus of the first sample, the time spent on manual tagging is relatively less. The problem that a large amount of manpower, material resources and financial resources are consumed by manually marking a large amount of data can be solved, and the working time of manual participation is saved.

It should be noted that the first sample data set is a set formed by actual service scene corpora; the first sample corpus and the second sample corpus may be text data in a service scene or voice data, and if the first sample corpus and the second sample corpus are voice data, text information extraction needs to be performed on the voice data, which is not described herein in detail, or data in other forms may be converted into the text data.

Step S120, a plurality of initial weak classification models are obtained, different template-classification label pairs are arranged on the initial weak classification models, and the template-classification label pairs are used for representing the mapping relation between the templates and the classification labels.

It is understood that each different template-class label pair includes a prompt template and a text vocabulary and label mapping relationship (Verbmap), and the prompt template and the Verbmap can be manually formatted according to an actual business scenario or automatically formatted through a model. Illustratively, in the chat-type and task-type intent classification, the above-mentioned template-classification tag pair is set, and the tag y ═ { y _1, y _2}, where y _1 represents the chat-type intent and y _2 represents the task-type intent, and for the input "what is your name", the Prompt template can be defined in two ways: p1(a) ═ It was [ mask ]. what is you namewa P2(a) ═ what is you namewa [ mask ]. Here, P1(a) represents one mode, and P2(a) represents the other mode. The value at [ mask ] is a word in a vocabulary table, such as "chat", "task", etc., where "chat" corresponds to "chatty" intention class, "task" corresponds to "task type" intention class, and the mapping relationship of the word to the label y is denoted by V, i.e., V ("chatty") -chat "and V (" task ") -task". According to the actual business scene, a plurality of different template-classification label pairs, which are represented as (Pn, Vn), are formed through the setting, and subsequent operations can be carried out by using the set template-classification label pairs.

And S130, training each initial weak classification model based on the labeled sample data set to obtain target weak classification models corresponding to the initial weak classification models one by one.

In some embodiments, the set of tagged sample data includes a training set of tagged samples and a testing set of tagged samples. Referring to fig. 2 and 9, training each initial weak classification model based on a labeled sample data set to obtain target weak classification models corresponding to a plurality of initial weak classification models one by one, including but not limited to the following steps:

and S131, training each initial weak classification model based on a labeled sample training set to obtain target weak classification models corresponding to the initial weak classification models one by one.

It can be understood that, the labeled sample training set is input into the initial weak classification model, the first sample corpus in one labeled sample training set may be input into the initial weak classification model at a time, or the first sample corpus in the labeled sample training set may be input into the initial weak classification model at a time according to a preset batch processing amount, where the preset batch processing amount may be 4 or 8, and the batch processing amount may be modified according to an actual training situation. The initial weak classification models are trained to obtain target weak classification models corresponding to the initial weak classification models one by one, the target weak classification models can be used for performing subsequent operation, and the training process consumes less time and resources because the classification training is performed on a small amount of labeled corpora.

It should be noted that before each first sample corpus of the labeled sample training set is input into the initial weak classification model, word vector conversion or other format conversion is performed on each first sample corpus, so that the input format of the initial weak classification model is satisfied.

Referring to fig. 4 and 10, training each initial weak classification model based on a labeled sample training set to obtain target weak classification models corresponding to a plurality of initial weak classification models one by one, including but not limited to the following steps:

step 1311, inputting each first sample corpus of the labeled sample training set into the initial weak classification model, so that the initial weak classification model outputs a third prediction classification label.

It is understood that the initial weak classification Model includes a Pretraining Language Model (PLM); a Fully connected layer (FC); softmax layer. The method comprises the steps that a pre-training language model carries out feature extraction and word vector prediction on a first sample corpus in an input labeled sample training set, a plurality of feature vectors output by the pre-training language model are fused by a full connection layer, then a fusion result is input into a softmax layer, and a prediction probability is output by the softmax layer. The first sample corpus is input into the initial weak classification model such that the initial weak classification model outputs third predicted classification tags denoted as Y1 and Y2. The initial weak classification model is provided with a plurality of pre-training language models, one pre-training language model corresponds to one template-classification label pair described above, the initial weak classification model is a deep learning model and can also be a shallow neural network model, the characteristics of the first sample corpus can be fully extracted, the pre-training model further comprises a vocabulary described in the step S120, words of the vocabulary are used for filling the position of [ mask ] in the prompt template, and the template-classification label pair is combined to guide the initial weak classification model to predict the third prediction classification label, so that the accuracy of predicting the vocabulary filled in the prompt template is improved.

It should be noted that the pre-training model may be Bert, or may be a robust Optimized Bert pre-training method (a Robustly Optimized Bert pre-training Approach, Roberta), or may be other neural network models, and may be applied to natural language processing to perform vocabulary prediction. Wherein Roberta is adjusted on the basis of Bert as follows: the training time is longer, the batch size (batch size) is larger, and the training data is more; removing the next prediction loss (next prediction loss); the training sequence is longer; the Masking (Masking) mechanism is dynamically adjusted.

It should be further noted that the third prediction classification label is a label input into the initial weak classification model as a training set with labels of samples, and the initial weak classification model predicts an output.

Step S1312 determines a value of the first loss function of the initial weak classification model according to the third prediction classification label and the tagging label corresponding to each first sample corpus.

It can be understood that the cross entropy loss function is adopted to calculate the third prediction classification label and the labeling label to obtain a value of the first loss function, and the value of the first loss function is subjected to back propagation derivation calculation on the weight and the bias, so that the initial weak classification model parameters are updated to obtain the target weak classification model. Other loss functions for classification may also be used, and are not described in detail herein.

It should be noted that the value of the first loss function is obtained by inputting the first sample corpus into the initial weak classification model to output the third prediction classification label, and calculating the third prediction classification label and the labeling label.

Step 1313, when the value of the first loss function meets a preset training end condition, ending the training to obtain a target weak classification model.

It is understood that the training ending condition may be that the value of the first loss function is smaller than a preset loss value, and the training is ended; training iteration times can also be taken as training ending conditions, illustratively, the preset training times are 1000 times, and the training is ended when the loop parameter reaches the preset training times; the training may be ended under other training ending conditions. And finishing training and outputting a target weak classification model, wherein the target weak classification model is used for carrying out subsequent operation.

And step S1314, when the value of the first loss function does not satisfy the preset training end condition, adjusting the model parameters of the initial weak classification model, and continuing training the initial weak classification model based on the training set with the labeled samples.

It can be understood that the training set with the labeled samples is input into the initial weak classification model for continuous training, if the value of the first loss function cannot be trained after multiple times of training, the training iteration times are used as training ending conditions, when the preset training times are reached, the training is ended regardless of whether the value of the first loss function meets the preset training ending conditions, the target weak classification model is obtained, and by combining different training ending conditions, the infinite loop problem can be avoided, wherein the training times are more than 100000 for multiple times, and the value can be modified according to the actual situation. Other combination modes can be adopted, and the infinite loop problem can be avoided.

Referring to fig. 3, fig. 3 is a schematic flow chart illustrating a small sample corpus-based intention classification method according to an embodiment of the present application, where after a target weak classification model corresponding to a plurality of initial weak classification models one by one is obtained, the small sample corpus-based intention classification method according to the above embodiment further includes, but is not limited to, step S210, step S220, and step S230.

Step S210, traversing each target weak classification model, and executing step S220 and step S230 for the currently traversed target weak classification model.

Step S220, inputting each first sample corpus in the labeled sample test set into the target weak classification model, so that the target weak classification model outputs a second prediction classification label corresponding to each first sample corpus.

It can be understood that the target weak classification model is obtained by training the initial weak classification model, each first sample corpus of the labeled sample test set is input into the target weak classification model, the first sample corpus of one labeled sample test set can be input into the target weak classification model at a time, the first sample corpus of the labeled sample test set can also be input into the target weak classification model at a time according to a preset batch processing amount, wherein the preset batch processing amount can be 2 or 4, and the batch processing amount can be modified according to an actual training condition. And a small amount of labeled corpora are used for testing the target weak classification model, so that the time and resources consumed in the testing process can be saved.

It should be noted that before each first sample corpus of the labeled sample test set is input into the target weak classification model, word vector conversion or other format conversion is performed on each first sample corpus, so that the input format of the target weak classification model is satisfied.

It should be noted that the second prediction classification label is a label that the target weak classification model is input as a labeled sample test set, and the target weak classification model predicts the output.

And step S230, determining the prediction accuracy of the target weak classification model according to the second prediction classification label and the labeling label corresponding to each first sample corpus.

It can be understood that, according to the step S220, the second prediction classification label is obtained, for example, the first sample corpus in the labeled sample test set is input to the target weak classification model one time to obtain the corresponding second prediction classification label, and the value corresponding to the second prediction classification label is divided by the value corresponding to the label to obtain each corresponding prediction accuracy, where the prediction accuracy is represented as ω n. The accuracy may also be calculated in a batch processing manner, and other accuracy calculation methods may also be adopted, which are not described herein again. The accuracy is obtained through the method so as to be convenient for subsequent solving and calculation.

Step S140, inputting each second sample corpus in the unlabeled sample data set into each target weak classification model to obtain a first prediction classification label corresponding to the second sample corpus output by each target weak classification model.

It can be understood that the target weak classification model is obtained by training the initial weak classification model, and the unlabeled sample data set is input into the target weak classification model in a manner similar to that in step S131 and step S220, which is not described herein again. Through the steps, the first prediction classification label corresponding to the second sample corpus can be obtained, the second sample corpus without labels is converted into the second sample corpus with the first prediction classification label, manual participation in data labeling is not needed, manpower, material resources and financial resources are saved, model enhancement is achieved, and the generalization of the model is improved.

It should be noted that before each second sample corpus of the unlabeled sample data set is input into the target weak classification model, word vector conversion or other format conversion is performed on each second sample corpus, so that the input format of the target weak classification model is satisfied.

It should be further noted that the first prediction classification label is a soft label that is finally output by the target weak classification model, where the target weak classification model is input as a sample data set without a label, that is, the first prediction classification label is a label of the second sample corpus corresponding to the first prediction classification label.

Step S150, a second sample data set is constructed based on all the second sample corpora and the first prediction classification labels corresponding to the second sample corpora.

Referring to fig. 5, the second sample data set is constructed based on all the second sample corpora and the first prediction classification tags corresponding to the second sample corpora, including but not limited to the following steps:

and step S151, determining a first prediction classification label corresponding to each second sample corpus according to the prediction accuracy of each target weak classification model and a fourth prediction classification label output by each target weak classification model.

It will be appreciated that given a second sample corpus X in the input unlabeled sample dataset, and inputting it into each of the target weak classification models, the target weak classification model outputs a fourth predicted classification label, denoted S_p(L | X) ═ M (v (L) | p (X)), where p (X) represents prompt template information, v (L) represents the text vocabulary and label mapping relationship, and M represents the target classification model. And performing weighted summation operation on the prediction accuracy and the fourth prediction classification label by combining the prediction accuracy obtained in the step S230 to obtain the first prediction classification label corresponding to each second sample corpus. Through the process, the unlabeled second sample corpus can be converted into the second sample corpus with the first prediction classification label, manual participation in data labeling is not needed, and manpower, material resources and financial resources are saved.

The fourth prediction classification label is a label output in the middle process of the target weak classification model when the target weak classification model is input as a non-label sample data set.

In some embodiments, according to some embodiments of the present application, a first prediction classification label corresponding to each second sample corpus is determined according to the prediction accuracy of each target weak classification model and a fourth prediction classification label output by each target weak classification model, and a calculation formula is as follows:

where ω (p) represents the prediction accuracy, S_p(L | X) denotes a fourth prediction classification label,

p denotes a template-class label pair, x denotes a second sample corpus, and l denotes a first prediction class label.

Note that, a data set including all the second sample corpora and the fourth prediction classification labels corresponding to the second sample corpora is denoted by τ 2 n.

Step S152, a second sample data set is constructed according to all the second sample corpora and the first prediction classification labels corresponding to the second sample corpora.

It should be noted that all the second sample corpora in the second sample data set have the corresponding first prediction classification label, and the second data set is denoted as τ' 2. The second sample data set is a set formed by the unlabeled sample data set tau 2 obtained according to the actual service scene, and the second sample corpus is processed through the target classification model to obtain the second sample corpus labeled with the first prediction label.

And step S160, training the initial intention classification model based on the second sample data set to obtain a target intention classification model.

Referring to fig. 6 and 11, training the initial intention classification model based on the second sample data set to obtain the target intention classification model, including but not limited to the following steps:

step S161, inputting each second sample corpus of the second sample data set into the initial intent classification model, so that the initial intent classification model outputs a fifth predicted classification tag.

It is understood that the initial intended classification model is similar in structure to the initial weak classification model in step S130, and will not be described herein. The initial intention classification model is a deep learning model and can also be a shallow neural network model, the characteristics of the second sample corpus can be fully extracted, the initial intention classification model is guided to be predicted by combining template-classification labels, and the accuracy of predicting and filling the vocabulary of the prompt template is improved.

The fifth prediction classification label is a label that the initial intention classification model inputs to label the second sample corpus of the first prediction classification label in the second sample dataset, and the initial intention classification model predicts the output.

Step S162, determining a value of a second loss function of the initial intention classification model according to the fifth prediction classification label and the first prediction classification label corresponding to each second sample corpus.

According to fig. 7, determining the value of the second loss function of the initial intent classification model according to the fifth prediction classification label and the first prediction classification label corresponding to each second sample corpus includes, but is not limited to, the following steps:

step S1621, calculating the distance between the distribution of the fifth prediction classification label and the distribution of the first prediction classification label by utilizing the KL divergence to obtain a KL divergence value;

in step S1622, a value of the second loss function is determined based on the KL divergence value.

It can be understood that the distance between the distribution of the fifth prediction classification label and the distribution of the first prediction classification label is calculated by using the KL divergence to obtain a KL divergence value, the KL divergence value is summed and then subjected to mean calculation, or the KL divergence value is summed with the cross entropy loss function and then subjected to mean calculation, so as to obtain a value of the second loss function, and the value of the second loss function is subjected to back propagation derivation calculation on the weight and the bias, so that the initial intention classification model parameter is updated, so as to obtain the target intention classification model. Other dispersion loss functions including KL may also be used, and are not described in detail here.

It should be noted that the value of the second loss function is a distribution similarity value obtained by inputting the second sample corpus into the initial intent classification model to output the fifth prediction classification label and calculating the fifth prediction classification label and the first prediction classification label through the KL divergence.

And step S163, finishing the training under the condition that the value of the second loss function meets the preset training finishing condition, and obtaining the target intention classification model.

It should be noted that, when the value of the second penalty function satisfies the preset training end condition, the subsequent operations are performed in a manner similar to the processing manner in step S1313, which is not described herein again. And finishing the training to obtain a target intention classification model without fine adjustment so as to carry out a subsequent intention classification task.

And step S164, under the condition that the value of the second loss function does not meet the preset training end condition, adjusting the model parameters of the initial weak classification model, and continuing training the initial weak classification model based on the training set with the labeled samples.

It should be noted that, when the value of the second loss function does not satisfy the preset training end condition, the subsequent operations are performed in a manner similar to the processing manner in step S1314, and are not described herein again. According to the training end conditions, the problem of infinite loop training can be avoided, and a target intention classification model is obtained.

And step S170, performing intention classification by using the target intention classification model and outputting intention classification results.

It should be noted that the corpus data obtained in the actual service scene is input into the target intent classification model, so that intent classification of the input corpus can be obtained, and the intent classification is used for subsequent processing, thereby improving the user satisfaction. The obtained target intention classification model can realize the input of corpus data of different scenes, provides a more accurate intention classification result and has better generalization.

It should be further noted that, by constructing a first sample data set, the first sample data set includes a sample data set with a tag and a sample data set without a tag; acquiring a plurality of initial weak classification models, wherein the initial weak classification models are provided with different template-classification label pairs, and the template-classification label pairs are used for representing the mapping relation between a template and a classification label; training each initial weak classification model based on the sample data set with the labels to obtain target weak classification models corresponding to the initial weak classification models one by one, and realizing a text blank filling form through a template-classification label pair so as to improve the accuracy rate of the predicted vocabulary; inputting each second sample corpus in the unlabeled sample data set into each target weak classification model to obtain a first prediction classification label corresponding to the second sample corpus, which is output by each target weak classification model; constructing a second sample data set based on all the second sample corpora and the first prediction classification labels corresponding to the second sample corpora; training the initial intention classification model based on the second sample data set to obtain a target intention classification model so as to increase the generalization of the model, performing intention classification by using the target intention classification model, and outputting an intention classification result. According to the embodiment of the method and the device, the intention classification model which does not need to be finely adjusted can be obtained on the basis of a small number of labeled samples and unlabeled samples, so that manpower, material resources and financial resources are saved, and intention classification is realized.

Referring to fig. 8, fig. 8 is a schematic structural diagram illustrating an apparatus 100 for classifying intent based on small sample corpus according to an embodiment of the present application, where the apparatus 100 includes:

a data set constructing module 110, configured to construct a first sample data set, where the first sample data set includes a labeled sample data set and an unlabeled sample data set, and the labeled data set includes a plurality of first sample corpora and labeled labels corresponding to the first sample corpora, and the unlabeled sample data set includes a plurality of second sample corpora without labeled labels;

a model obtaining module 120, configured to obtain a plurality of initial weak classification models, where the plurality of initial weak classification models are provided with different template-classification label pairs, and the template-classification label pairs are used to represent a mapping relationship between a template and a classification label;

the first training module 130 is configured to train each initial weak classification model based on a sample data set with a label to obtain target weak classification models corresponding to the plurality of initial weak classification models one by one;

the data labeling module 140 is configured to input each second sample corpus in the unlabeled sample data set into each target weak classification model to obtain a first predicted classification label corresponding to the second sample corpus, which is output by each target weak classification model;

the data set constructing module 110 is further configured to construct a second sample data set based on all the second sample corpora and the first prediction classification tags corresponding to the second sample corpora;

the second training module 150 is configured to train the initial intention classification model based on the second sample data set to obtain a target intention classification model;

and the processing module 160 is used for performing intention classification by using the target intention classification model and outputting an intention classification result.

It can be understood that, the data set constructing module 110 is adopted to construct a first sample data set, where the first sample data set includes a labeled sample data set and an unlabeled sample data set, where the labeled sample data set includes a plurality of first sample corpora and labeled labels corresponding to the first sample corpora, and the unlabeled sample data set includes a plurality of unlabeled second sample corpora; a model obtaining module 120 is used for obtaining a plurality of initial weak classification models, wherein the initial weak classification models are provided with different template-classification label pairs, and the template-classification label pairs are used for representing the mapping relation between the templates and the classification labels; the first training module 130 is configured to train each initial weak classification model based on a sample data set with a label to obtain target weak classification models corresponding to the plurality of initial weak classification models one by one; the data labeling module 140 inputs each second sample corpus in the unlabeled sample data set into each target weak classification model to obtain a first predicted classification label corresponding to the second sample corpus, which is output by each target weak classification model; the data set construction module 110 constructs a second sample data set for the first prediction classification tags corresponding to all the second sample corpora and all the second sample corpora; the second training module 150 trains the initial intention classification model based on the second sample data set to obtain a target intention classification model, and the processing module 160 performs intention classification by using the target intention classification model to output an intention classification result. According to the embodiment, the intention classification model which does not need to be finely adjusted can be obtained on the basis of a small number of labeled samples and a large number of unlabeled samples, so that intention classification is realized.

It should be noted that, the device 100 is used to train and obtain a target intention classification model, and the corpus data obtained in the actual service scene is input into the obtained target intention classification model, so that intention classification of the input corpus data can be obtained, and the intention classification is used for subsequent processing, thereby improving the user satisfaction. The device 100 trains the target intention classification model to perform intention classification on the sample corpus input into any scene, and can output a more accurate intention classification result.

The embodiments described in the embodiments of the present application are for more clearly illustrating the technical solutions of the embodiments of the present application, and do not constitute a limitation to the technical solutions provided in the embodiments of the present application, and it is obvious to those skilled in the art that the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems with the evolution of technology and the emergence of new application scenarios.

Fig. 12 illustrates a computer device 500 provided by an embodiment of the present application. The computer device 500 may be a server or a terminal, and the internal structure of the computer device 500 includes but is not limited to:

a memory 510 for storing programs;

and a processor 520 for executing the program stored in the memory 510, wherein when the processor 520 executes the program stored in the memory 510, the processor 520 is configured to perform the above-mentioned intent classification method based on the small sample corpus.

The processor 520 and the memory 510 may be connected by a bus or other means.

The memory 510 is a non-transitory computer readable storage medium, and can be used to store non-transitory software programs and non-transitory computer executable programs, such as the small sample corpus-based intent classification method described in any embodiment of the present invention. The processor 520 implements the above-described small sample corpus-based intent classification method by executing non-transitory software programs and instructions stored in the memory 510.

The memory 510 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store and execute the above-described intent classification method based on the small sample corpus. Further, memory 510 may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 510 may optionally include memory located remotely from the processor 520, which may be connected to the processor 520 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The non-transitory software programs and instructions required to implement the above-described method for small sample corpus-based intent classification are stored in the memory 510 and, when executed by the one or more processors 520, perform the method for small sample corpus-based intent classification provided by any of the embodiments of the present invention.

The embodiment of the application also provides a computer-readable storage medium, which stores computer-executable instructions, and the computer-executable instructions are used for executing the above intent classification method based on the small sample corpus.

In one embodiment, the storage medium stores computer-executable instructions, which are executed by one or more control processors 520, for example, by one processor 520 in the computer device 500, and the one or more processors 520 may be enabled to perform the method for classifying intent based on corpus of small samples according to any embodiment of the present invention.

The above described embodiments are merely illustrative, wherein elements illustrated as separate components may or may not be physically separate, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

The terms "first," "second," and the like (if any) in the description of the present application and the above-described figures are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that, in this application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

One of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

While the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and those skilled in the art will appreciate that the present invention is not limited thereto. Under the shared conditions, various equivalent modifications or substitutions can be made, and the equivalent modifications or substitutions are included in the scope of the invention defined by the claims.

Claims

1. A small sample corpus-based intention classification method is characterized by comprising the following steps:

2. The method of claim 1, wherein the set of tagged sample data comprises a training set of tagged samples and a testing set of tagged samples;

inputting each first sample corpus in the labeled sample test set into the target weak classification model, so that the target weak classification model outputs a second prediction classification label corresponding to each first sample corpus;

3. The method of claim 2, wherein the training each of the initial weak classification models based on the labeled sample training set to obtain a target weak classification model that corresponds one-to-one to the plurality of initial weak classification models comprises:

4. The method according to claim 2, wherein the constructing a second sample data set based on all the second sample corpora and the first prediction classification labels corresponding to the second sample corpora comprises:

5. The method according to claim 4, wherein the first predicted classification label corresponding to each second sample corpus is determined according to the prediction accuracy of each target weak classification model and a fourth predicted classification label output by each target weak classification model, and a calculation formula is as follows:

wherein ω (p) represents the prediction accuracy, S_p(L | X) represents the fourth prediction classification label,

6. The method of claim 1, wherein training an initial intent classification model based on the second set of sample data to obtain a target intent classification model comprises:

inputting each second sample corpus of the second sample data set into the initial intention classification model, so that the initial intention classification model outputs a fifth prediction classification label;

under the condition that the value of the second loss function meets a preset training end condition, ending training to obtain the target intention classification model;

7. The method according to claim 6, wherein the determining a value of a second loss function of the initial intention classification model according to the fifth prediction classification label and the first prediction classification label corresponding to each of the second sample corpuses comprises:

8. An intention classification device based on a small sample corpus, characterized by comprising:

9. A computer device comprising a memory and a processor, the memory having stored therein computer-readable instructions which, when executed by one or more of the processors, cause the one or more processors to perform the steps of the method of any one of claims 1 to 7.

10. A computer-readable storage medium readable by a processor, the storage medium storing computer instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the method of any one of claims 1 to 7.