CN112883193A

CN112883193A - Training method, device and equipment of text classification model and readable medium

Info

Publication number: CN112883193A
Application number: CN202110210276.2A
Authority: CN
Inventors: 黄海龙
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2021-02-25
Filing date: 2021-02-25
Publication date: 2021-06-01

Abstract

The embodiment of the application is applied to the field of artificial intelligence, and discloses a training method, a device, equipment and a readable medium for a text classification model, wherein the training method comprises the following steps: obtaining a standard sample set and a non-standard sample set of the classification model; performing text enhancement processing on each second text data in the non-standard sample set to obtain enhanced text data; inputting the marked sample set, the non-marked sample set and the enhanced text data into a classification model; training a classification model according to a first loss function determined by the predicted first probability distribution of each first text data in the standard sample set, the predicted second probability distribution of each second text data and the predicted third probability distribution of the enhanced text data, and the standard sample set and the standard-free sample set; and when the first loss function meets the training end condition, determining a target text classification model. By adopting the embodiment of the application, the model training efficiency can be improved, and the service iteration speed can be improved. The present application relates to blockchain technology, and the data described above may be stored in blockchains.

Description

Training method, device and equipment of text classification model and readable medium

Technical Field

The invention relates to the field of artificial intelligence, in particular to a training method, a training device and a readable medium of a text classification model.

Background

With the growth of text information, text classification has become a key technology for processing text information, and is widely applied in various fields. For example, in the field of human-computer interaction, a computer device may receive an inquiry sentence spoken by a user, classify text information corresponding to the inquiry sentence, determine a classification corresponding to the text information, automatically answer the inquiry sentence of the user according to the corresponding classification, and push related information, and the like. At present, in the method for classifying text information, prediction is performed by a trained deep learning model most commonly. Wherein, a large amount of labeled corpora are needed for training the deep learning model. However, since the on-line text information is updated at a high speed, the effect is not good if the previous model is used.

The solution is generally to take out a large amount of unmarked data in the log, wait for the marking of the marking team to be finished, retrain the model by using new data to obtain the model with updated parameters, and then classify by using the model with updated parameters. However, manual tagging of a large number of corpora is inefficient, resulting in slow business iteration speed.

Disclosure of Invention

The embodiment of the invention provides a training method, a training device, equipment and a readable medium of a text classification model, which can improve the efficiency of model training and further improve the speed of service iteration.

In a first aspect, an embodiment of the present application provides a method for training a text classification model, including:

acquiring a training sample set of an initial classification model, wherein the training sample set comprises a marked sample set and an unmarked sample set, the marked sample set comprises a plurality of first text data, each first text data carries a category label, and the unmarked sample set comprises a plurality of second text data;

performing text enhancement processing on each second text data in the unmarked sample set to obtain enhanced unmarked text data;

inputting the labeled sample set, the unlabeled sample set, and the enhanced unlabeled text data into the initial classification model, respectively, to obtain a first probability distribution of a prediction class label of each first text data in the labeled sample set, a second probability distribution of a prediction class label of each second text data in the unlabeled sample set, and a third probability distribution of a prediction class label of the enhanced unlabeled text data;

determining a first loss function according to the first probability distribution, the second probability distribution and the third probability distribution, and performing iterative training on the initial classification model according to the first loss function and the training sample set;

and when the first loss function meets the training end condition, determining the initial classification model when the first loss function meets the training end condition as a target text classification model.

Further, the determining a first loss function from the first, second, and third probability distributions includes:

calculating a first difference degree of the first probability distribution and a preset probability distribution corresponding to each first text data in the marked sample set according to a preset cross entropy, and determining a second loss function according to the first difference degree;

calculating a second difference degree between the second probability distribution and the third probability distribution according to the preset cross entropy, and determining a third loss function according to the second difference degree;

determining the first loss function according to the second loss function and the third loss function.

Further, the determining the first loss function according to the second loss function and the third loss function includes:

calculating the product of the first proportional coefficient and the second loss function according to a preset first proportional coefficient and the second loss function to obtain a first result, wherein the first proportional coefficient is a positive number;

calculating the product of the second proportionality coefficient and the third loss function according to a preset second proportionality coefficient and the third loss function to obtain a second result, wherein the second proportionality coefficient is a positive number;

determining a sum of the first result and the second result as the first loss function.

Further, after the calculating a second difference between the second probability distribution and the third probability distribution according to the preset cross entropy and determining a third loss function according to the second difference, the method further includes:

determining the category label corresponding to the maximum probability in the third probability distribution as the category label corresponding to the enhanced unmarked text data under the condition that the third loss function is smaller than a preset threshold value;

adding the enhanced unmarked text data and the category label corresponding to the maximum probability in the third probability distribution into the marked sample set;

and training the initial classification model according to the labeled sample set added with the enhanced unlabeled text data and the class label corresponding to the maximum probability in the third probability distribution.

Further, the second text data includes first language text data; performing text enhancement processing on each second text data in the unlabeled sample set to obtain enhanced unlabeled text data, including:

performing language conversion processing on the first language text data to obtain second language text data;

randomly extracting words in the second language text data, acquiring synonyms corresponding to the words from a preset synonym set according to the corresponding relation between preset words and the synonyms, and replacing the words in the second language text data with the synonyms;

and performing language conversion processing on the replaced second language text data to obtain updated first language text data, and determining the updated first language text data as the enhanced unmarked text data.

acquiring M words with the frequency greater than a preset frequency threshold in the second language text data, wherein M is an integer greater than or equal to 1;

obtaining synonyms corresponding to each word in the M words from a preset synonym set according to the corresponding relation between preset words and the synonyms, and replacing each word in the M words in the second language text data with the synonym corresponding to each word in the M words;

Further, the training end condition is that, in the first loss functions obtained by N consecutive times of training, the number of times that the difference value of the first loss functions obtained by two adjacent times of training is smaller than a preset difference threshold is greater than or equal to a preset number threshold, where N is an integer greater than 2.

In a second aspect, an embodiment of the present application provides a training apparatus for a text classification model, including:

the device comprises an acquisition unit, a classification unit and a classification unit, wherein the acquisition unit is used for acquiring a training sample set of an initial classification model, the training sample set comprises a marked sample set and an unmarked sample set, the marked sample set comprises a plurality of first text data, each first text data carries a category label, and the unmarked sample set comprises a plurality of second text data;

the enhancement unit is used for performing text enhancement processing on each second text data in the unmarked sample set to obtain enhanced unmarked text data;

an input unit, configured to input the labeled sample set, the unlabeled sample set, and the enhanced unlabeled text data into the initial classification model, respectively, to obtain a first probability distribution of a prediction category label of each first text data in the labeled sample set, a second probability distribution of a prediction category label of each second text data in the unlabeled sample set, and a third probability distribution of a prediction category label of the enhanced unlabeled text data;

a first determining unit, configured to determine a first loss function according to the first probability distribution, the second probability distribution, and the third probability distribution, and iteratively train the initial classification model according to the first loss function and the training sample set;

and the second determining unit is used for determining the initial classification model when the first loss function meets the training end condition as the target text classification model when the first loss function meets the training end condition.

Further, the first determining unit is specifically configured to:

Further, after the calculating a second difference between the second probability distribution and the third probability distribution according to the preset cross entropy and determining a third loss function according to the second difference, the apparatus further includes:

a third determining unit, configured to determine, when the third loss function is smaller than a preset threshold, a category label corresponding to a maximum probability in the third probability distribution as a category label corresponding to the enhanced unmarked text data;

an adding unit, configured to add the enhanced unmarked text data and the category label corresponding to the maximum probability in the third probability distribution into the marked sample set;

and the training unit is used for training the initial classification model according to the labeled sample set added with the enhanced label-free text data and the class label corresponding to the maximum probability in the third probability distribution.

Further, the second text data includes first language text data; the enhancement unit is specifically configured to:

In a third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a transceiver; the processor is connected to the memory and the transceiver, respectively, where the memory stores computer program codes, and the processor and the transceiver are configured to call the program codes to execute the method provided by the first aspect and/or any possible implementation manner of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium storing a computer program, which, when executed by a computer device, implements the method for training a text classification model as disclosed in any one of the possible implementations of the first aspect.

In the embodiment of the application, each second text data in the unlabeled sample set is enhanced through an obtained training sample set of the initial classification model, so that enhanced unlabeled text data is obtained, the labeled sample set, the unlabeled sample set and the enhanced unlabeled text data are input into the initial classification model, and a first probability distribution of a category label of each first text data in the labeled sample set, a second probability distribution of a category label of each second text data and a third probability distribution of a category label of the enhanced unlabeled text data are respectively obtained; and then determining a first loss function according to the three probability distributions, training the initial classification model according to the first loss function and the training sample set, and determining the initial classification model at the moment as a target text classification model when the first loss function obtained by training meets the training end condition. Therefore, the text classification model is trained by using the label-free sample set, so that the manual labeling cost is reduced, the model training efficiency is improved, and the service iteration speed is increased.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a method for training a text classification model according to an embodiment of the present disclosure;

fig. 2 is another schematic flowchart of a method for training a text classification model according to an embodiment of the present disclosure;

FIG. 3 is a timing diagram illustrating a training method of a text classification model according to an embodiment of the present disclosure;

FIG. 4 is a schematic structural diagram of a training apparatus for text classification models according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The following describes schematically a training method of a text classification model according to an embodiment of the present application with reference to fig. 1 to 3.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a method for training a text classification model according to an embodiment of the present disclosure. As shown in fig. 1, the method may include:

101. the method comprises the steps of obtaining a training sample set of an initial classification model, wherein the training sample set comprises a marked sample set and an unmarked sample set, the marked sample set comprises a plurality of first text data, each first text data carries a category label, and the unmarked sample set comprises a plurality of second text data.

In the embodiment of the present application, each training sample in the training sample set is text data used for training the initial classification model.

The training sample set comprises a marked sample set and an unmarked sample set, the marked sample set comprises a plurality of first text data, each first text data carries a category label, and the category label represents the real text category of the corresponding first text data. The unlabeled sample set includes a plurality of second text data, and each second text data does not carry a category label, that is, the true text category of each second text data in the unlabeled sample set is unknown.

The specific classification of the real text category represented by the category label may be determined based on the actual application scene requirement, which is not limited herein. For example, the category labels described above may characterize emotional categories (positive, negative, neutral, etc.), or characterize different traffic types in insurance traffic, etc.

The acquisition mode of the training sample set is not limited in the embodiment of the present application, for example, text data in a related field is acquired from the internet based on big data and the like, and category labeling is performed on part of the text data to divide the text data into a labeled sample set and an unlabeled sample set.

For example, in the scenario of an Artificial Intelligence (AI) interview, the AI interview is conducted by a robot simulation to interview candidate insurance providers, the robot selects a question according to a fixed script of questions, performs intent recognition according to answers of the candidate, and then finds the next branch node in the script according to the result of the intent recognition. The training samples can be obtained from a dialog log between the AI and the candidate insurance businessmen, so that the workload of human resources can be reduced, and the efficiency of recruiting the insurance businessmen is improved.

102. And performing text enhancement processing on each second text data in the unmarked sample set to obtain enhanced unmarked text data.

In one possible implementation, a plurality of first text data in the unlabeled sample set is subjected to enhancement processing. Specifically, the plurality of first text data in the unmarked sample set include first language text data, and the first language text data may be subjected to language conversion processing to obtain second language text data, words in the second language text data are then randomly extracted, synonyms corresponding to the extracted words are obtained from a preset synonym set according to a corresponding relationship between preset words and synonyms, and the synonyms are used to replace the extracted words in the second language text data. And further performing language conversion processing on the replaced second language text data to convert the second language text data into first language text data to obtain updated first language text data, and determining the updated first language text data as enhanced unmarked text data.

Illustratively, the first language is chinese, the second language is english, the first text data includes chinese text data, the chinese text data is translated into english text data, words are randomly extracted from the english text data, synonyms of the extracted words are obtained from a preset synonym set according to a correspondence between preset words and synonyms, the extracted words are replaced with the synonyms, the replaced english text data is translated back to the chinese text data, updated chinese text data is obtained, and the updated chinese text data is used as enhanced unmarked text data.

In another possible implementation manner, the text enhancement processing is performed on each second text data in the unmarked sample set, the text data in the first language can also be subjected to language conversion processing to obtain the text data in the second language, and then synonym replacement processing can be performed on M words in the text data in the second language, wherein the frequency of the M words is greater than the preset frequency threshold. Obtaining synonyms corresponding to each word in the M words from a preset synonym set according to the corresponding relation between the preset words and the synonyms, replacing the M words in the second language text data with the synonyms corresponding to each word in the M words, so that the replaced second language text data can be obtained, further performing language conversion processing on the second language text data to obtain updated first language text data, and determining the updated first language text data as enhanced unmarked text data. Wherein M is a positive integer.

For example, the first language is chinese, the second language is english, and M is 2, the explanation is performed by first translating the chinese text data into english text data, obtaining 2 words whose occurrence frequency is greater than a preset frequency threshold from the english text data, obtaining synonyms respectively corresponding to the 2 words from a preset synonym set according to a correspondence between the preset words and the synonyms, and using the synonyms to simultaneously replace the 2 words, and further translating the replaced english text data back into chinese text data to obtain updated chinese text data, and using the updated chinese text data as enhanced unmarked text data.

Optionally, the obtained updated first language text data may be compared with second text data in an existing unmarked sample set, and text data overlapping with the second text data in the enhanced unmarked text data is removed.

Because the existing translation mode has translation deviation in the translation-retranslation process, namely, words of the text are changed under the condition of not changing the text semantics, the purpose of text enhancement can be achieved under the condition of not changing the semantics of the second text data based on the mode of retranslation and synonym replacement.

103. The labeled sample set, the unlabeled sample set, and the enhanced unlabeled text data are input to the initial classification model, respectively, to obtain a first probability distribution of the prediction type label of each first text data in the labeled sample set, a second probability distribution of the prediction type label of each second text data in the unlabeled sample set, and a third probability distribution of the prediction type label of the enhanced unlabeled text data.

Specifically, in the embodiment of the present application, the initial classification model is an initial classification model or a classification structure based on a neural network, and includes but is not limited to a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), a long-short term memory model Recurrent neural network (LSTM), a Gated Recurrent Unit (GRU) and a Bidirectional Encoder Representation (BERT) model based on a transformer, and the like. The training method for the initial classification model mainly comprises a training process of supervised learning and a training process of label-free learning. In other words, in each training process of the initial classification model, the text data of the labeled sample set and the unlabeled sample set are used for model training at the same time.

And inputting each first text data in the marked sample set into an initial classification model, and obtaining a prediction class label of each first text data through the initial classification model. For any first text data, inputting the first text data into an initial classification model, so as to obtain a first probability distribution of the prediction category labels corresponding to the first text data, wherein the first probability distribution represents the probability that the prediction category labels of the first text data are all the category labels, and further, the prediction category labels of the first text data can be determined according to the probability distribution. For example, taking the second classification as an example, where the class labels include a and B, the first probability distribution of the predicted class label corresponding to the first text data output by the initial classification model may be (0.6, 0.4). The probability of the predicted class label being a is 0.6 and the probability of the predicted class label being B is 0.4.

And inputting each second text data in the unmarked sample set into an initial classification model, and outputting the prediction class label of each second text data through the initial classification model. And for any second text data, inputting the second text data into the initial classification model to obtain a second probability distribution of the prediction category label corresponding to the second text data.

Similarly, the enhanced unmarked text data is also input into the initial classification model, and the third probability distribution of the corresponding prediction class label of the enhanced unmarked text data is output through the initial classification model.

104. And determining a first loss function according to the first probability distribution, the second probability distribution and the third probability distribution, and performing iterative training on the initial classification model according to the first loss function and the training sample set.

In the embodiment of the application, the loss function is used for estimating the inconsistency degree of the predicted value and the true value of the model. The loss function in this application is a value of the degree of difference between a predicted value and a true value calculated using different probability distributions.

In a possible implementation manner, each first text data carries a category label, that is, a known preset probability distribution. For example, taking the second classification as an example, if the tag carried by a certain first text data is a, the preset probability distribution of the first text data is (1, 0). The difference between the predicted first probability distribution and the preset probability distribution may be used to determine a Loss function (super cross-entry Loss) of the Supervised learning training process, i.e. a second Loss function. The second Loss function may be a cross-entropy (superordinate cross-entropy Loss) between the first probability distribution and a preset probability distribution. Wherein the second loss function characterizes a difference between the true class label and the predicted class label of the first text data. The larger the second loss function is, the larger the difference between the real text type and the predicted text category of the first text data is, which means that the classification effect of the initial classification model in the training process is worse.

Further, a loss function (Unsupervised Consistency loss) of the Unsupervised learning training process, i.e. a third loss function, may be determined based on a difference between the second probability distribution and the third probability distribution. The third loss function may also be a cross entropy between the second probability distribution and the third probability distribution. Thereby determining the first loss function based on the second loss function of supervised learning and the third loss function of unsupervised learning. Further, the initial classification model is iteratively trained based on the first loss function and the training sample set.

105. And when the first loss function meets the training end condition, determining the initial classification model when the first loss function meets the training end condition as the target text classification model.

In a possible implementation manner, in the process of performing iterative training on the initial classification model based on the first loss function and the training set, relevant model parameters of the initial classification model are continuously adjusted through a back propagation mechanism until the training is finished when the first training loss obtained through the training meets the training finishing condition, and the initial classification model at the training finishing time is determined as the target text classification model.

In a possible implementation manner, the training end condition may be that, in the first training loss functions obtained by N consecutive times of training, the number of times that the difference value of the first training loss functions obtained by two adjacent times of training is smaller than the preset difference threshold is greater than or equal to the preset number threshold. Wherein N is an integer greater than 2. At this time, it is explained that the classification performance of the initial classification model tends to be stable, and therefore, the training can be ended.

Optionally, a part of the second text data in the current unlabeled training sample set may be labeled in an active learning manner, and the labeled second text data is added to the labeled sample set for the next training.

Optionally, text enhancement processing may be performed on a plurality of first text data in the marked sample set, so as to increase the number of the marked training samples, that is, to enhance the number of the first text data. In particular, the text enhancement modes may include, but are not limited to, back translation (back translation), word frequency-inverse file frequency (tf-idf word replacement) word replacement, and random replacement (rand argument). The retranslation is to translate the marked samples in the marked sample set into a text in a second language (assuming that the first text data in the marked sample set is in the first language), and then retranslate the text in the second language into the first language to obtain a corresponding enhanced training sample. For example, the first text data translated back to english from chinese is then translated back to chinese from english, and the first text data translated back to chinese serves as the enhanced tagged text data. The word frequency-reverse file frequency word replacement can be to replace the text information of high-frequency words in the text information. Wherein, the replacement mode can be synonym replacement. The random replacement may be to randomly extract a word from the text message, and then randomly select one of the synonym sets of the word to insert the selected word into a random position in the original sentence.

Referring to fig. 2, fig. 2 is another schematic flow chart of a method for training a text classification model according to an embodiment of the present application. As shown in fig. 2, the method may include:

201. and calculating a first difference degree of the first probability distribution and a preset probability distribution corresponding to each first text data in the marked sample set according to a preset cross entropy, and determining a second loss function according to the first difference degree.

In one possible implementation, a first degree of difference between the first probability distribution and a preset probability distribution is calculated, and a cross entropy between the first probability distribution and the preset probability distribution may be calculated. Wherein the cross entropy can be calculated as shown in formula 1.

Loss2＝CrossEntropy(p(y’|x_labeled) Y) formula 1

Where Loss2 denotes the second Loss function, Cross Encopy denotes the cross entropy, p (y | x)_labeled) Representing first text data (x)_labeled) A probability distribution of a corresponding predicted category label y', y representing the category label carried by the first text data.

202. And calculating a second difference degree between the second probability distribution and the third probability distribution according to the preset cross entropy, and determining a third loss function according to the second difference degree.

In one possible implementation, the second text data is marked as x_unlabeledNoting the enhanced unmarked text data as

A plurality of second text data x in the unlabeled sample set are added_unlabeledInputting the initial classification model to obtain second text data x_unlabeledProbability distribution p (y | x) corresponding to predicted text category y_unlabeled) To enhance unmarked text data

Inputting an initial classification model to obtain enhanced label-free text data

Probability distribution corresponding to predicted text category y

Due to enhanced unmarked text data

With the second text data x_unlabeledAre identical, and thus enhanced label-free text data with high accuracy of classification of the initial classification model

With the second text data x_unlabeledThe probability distributions for the prediction class labels y are the same or similar. Based on this, it can be based on the probability distribution p (y | x)_unlabeled) And

the stability of the initial network model in the training process is measured by the similarity between the initial network model and the initial network model, namely the probability distribution p (y | x)_unlabeled) And

as a third loss function. Wherein the smaller the third loss function, the probability distribution p (y | x)_unlabeled) And

the greater the similarity between the two, the higher the accuracy of the classification of the initial classification model.

Optionally, the first difference and the second difference may also be calculated according to a preset relative entropy. Taking the calculation of the third loss function as an example, the calculation of the relative entropy of the second probability distribution and the third probability distribution can be as shown in equation 2.

Wherein, Loss3 represents the third Loss function, KL represents the relative entropy, also called KL divergence (Kullback-Leibler divergence).

203. And determining the first loss function according to the second loss function and the third loss function.

In a possible implementation manner, a product of the first scaling factor and the second loss function is calculated according to a preset first scaling factor and a second loss function to obtain a first result, and a product of the second scaling factor and the third loss function is calculated according to a preset second scaling factor and a third loss function to obtain a second result. The first scaling factor and the second scaling factor may be preset and are both positive numbers. Further, a sum of the first result and the second result is determined as a first penalty function.

Specifically, the first scaling factor is denoted as a, and the second scaling factor is denoted as b, the first Loss function Loss1 can be calculated as shown in equation 3.

Loss1 ═ Loss2 a + Loss3 b formula 3

Alternatively, the sum of the preset first scaling factor and the second scaling factor may be set to be equal to 1, that is, a + b is equal to 1.

Referring to fig. 3, fig. 3 is a timing diagram illustrating a training method of a text classification model according to an embodiment of the present disclosure. As shown in fig. 3, the training method of the text classification model in the present application mainly includes a training process of supervised learning (left half part) and a training process of unsupervised learning (right half part). The training process of supervised learning is based on the labeled sample set for training, and the training process of unsupervised learning is based on the unlabeled sample set for training. In other words, in each training process of the initial classification model, the first text data carrying the class labels and the second text data not carrying the class labels are simultaneously adopted for model training.

In the training process, each first text data in the marked sample set is input into an initial classification model, and the predicted text category of each first text data is obtained through the initial classification model. Inputting any first text data into the initial classification model, obtaining a first probability distribution of the first text data prediction category label, and further calculating a second loss function according to the first probability distribution and a first difference degree of a preset probability distribution of the first text data.

And in the training process, performing text enhancement processing on each second text data in the unlabeled sample set to obtain enhanced unlabeled text data. And then training the initial classification model based on the second text data and the enhanced unmarked text data, inputting each second text data and the enhanced unmarked text data into the initial classification model, and respectively obtaining a second probability distribution of the prediction class label of each second text data and a third probability distribution of the prediction class label of the enhanced unmarked text data. And determining a third loss function according to the second probability distribution and the second difference degree of the third probability distribution. And determining the sum of the second loss function and the third loss function as the first loss function. Further, two scaling factors may be enhanced for the second loss function and the third loss function when determining the first loss function.

And then judging whether the obtained first loss function meets the training end condition or not, and continuously adjusting the model parameters of the initial classification model through a back propagation mechanism when the first loss function does not meet the training end condition. And determining the initial classification model meeting the training end condition as a target text classification model until the first loss function obtained by training meets the training end condition.

Further, the text to be classified can be obtained, and the text to be classified is classified based on the target text classification model. Specifically, when text classification is performed on a text to be classified, word vectors of words in the text to be classified need to be determined first, and the word vectors of the words are used as input features to be input into a target text classification model. Then, after the target text classification model processes the input features, the probability distribution of the text to be classified corresponding to the prediction category labels can be obtained, the probability distribution represents the probability that the prediction category labels of the text to be classified are all the category labels, and further, the text category corresponding to the highest probability can be determined as the prediction category label of the text to be classified.

For example, in a scenario where a robot simulates manual interview of candidate insurance businessmen, a target text classification model may be applied to classify text information in a received answer of a candidate, and a next branch node is found in a scenario after classification, so as to further ask questions and judge the candidate.

Optionally, the category tag of the text information in the answer of the candidate may also be uploaded to the administrator, and the administrator determines whether the candidate is recorded.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a training apparatus for text classification models according to an embodiment of the present disclosure. The training apparatus 400 for text classification model includes:

an obtaining unit 401, configured to obtain a training sample set of an initial classification model, where the training sample set includes a labeled sample set and an unlabeled sample set, the labeled sample set includes a plurality of first text data, each first text data carries a category label, and the unlabeled sample set includes a plurality of second text data;

an enhancing unit 402, configured to perform text enhancement processing on each second text data in the unlabeled sample set to obtain enhanced unlabeled text data;

an input unit 403, configured to input the labeled sample set, the unlabeled sample set, and the enhanced unlabeled text data into the initial classification model, so as to obtain a first probability distribution of a prediction type label of each first text data in the labeled sample set, a second probability distribution of a prediction type label of each second text data in the unlabeled sample set, and a third probability distribution of a prediction type label of the enhanced unlabeled text data;

a first determining unit 404, configured to determine a first loss function according to the first probability distribution, the second probability distribution, and the third probability distribution, and perform iterative training on the initial classification model according to the first loss function and the training sample set;

a second determining unit 405, configured to determine, when the first loss function satisfies a training end condition, that the initial classification model when the first loss function satisfies the training end condition is the target text classification model.

Further, the first determining unit 404 is specifically configured to:

and determining the first loss function according to the second loss function and the third loss function.

Further, the first determining unit 404 is specifically configured to:

determining a sum of the first result and the second result as the first penalty function.

Further, after calculating a second difference between the second probability distribution and the third probability distribution according to the preset cross entropy, and determining a third loss function according to the second difference, the apparatus 400 further includes:

a third determining unit 406, configured to determine, when the third loss function is smaller than a preset threshold, a category label corresponding to a maximum probability in the third probability distribution as a category label corresponding to the enhanced unmarked text data;

an adding unit 407, configured to add the enhanced unmarked text data and the category label corresponding to the maximum probability in the third probability distribution to the marked sample set;

a training unit 408, configured to train the initial classification model according to the labeled sample set added with the enhanced unlabeled text data and the class label corresponding to the maximum probability in the third probability distribution.

Further, the second text data includes first language text data; the enhancing unit 402 is specifically configured to:

obtaining a synonym corresponding to each word in the M words from a preset synonym set according to the corresponding relation between a preset word and the synonym, and replacing each word in the M words in the second language text data by using the synonym corresponding to each word in the M words;

The detailed descriptions of the obtaining unit 401, the enhancing unit 402, the input unit 403, the first determining unit 404, the second determining unit 405, the third determining unit 406, the adding unit 407, and the training unit 408 may be directly obtained by directly referring to the related descriptions in the method embodiments shown in fig. 1 to fig. 3, and are not repeated herein.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure, and as shown in fig. 5, a computer device 500 according to an embodiment of the present disclosure may include:

the processor 501, the transceiver 502, and the memory 505, the computer device 500 may further include: a user interface 504, and at least one communication bus 503. Wherein a communication bus 503 is used to enable connection communication between these components. The user interface 504 may include a Display (Display) and a Keyboard (Keyboard), and the memory 505 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 505 may alternatively be at least one memory device located remotely from the processor 501 and the transceiver 502. As shown in fig. 5, the memory 505, which is a type of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.

In the computer device 500 shown in fig. 5, the transceiver 502 may provide network communication functions to enable communication between servers; while user interface 505 is primarily an interface for providing input to a user; and processor 501 may be configured to invoke a device control application stored in memory 505 to perform the following operations:

the processor 501 is configured to obtain a training sample set of an initial classification model, where the training sample set includes a labeled sample set and an unlabeled sample set, the labeled sample set includes a plurality of first text data, each first text data carries a category label, and the unlabeled sample set includes a plurality of second text data; performing text enhancement processing on each second text data in the unmarked sample set to obtain enhanced unmarked text data; inputting the labeled sample set, the unlabeled sample set, and the enhanced unlabeled text data into the initial classification model to obtain a first probability distribution of a prediction type label of each first text data in the labeled sample set, a second probability distribution of a prediction type label of each second text data in the unlabeled sample set, and a third probability distribution of a prediction type label of the enhanced unlabeled text data; determining a first loss function according to the first probability distribution, the second probability distribution and the third probability distribution, and performing iterative training on the initial classification model according to the first loss function and the training sample set; and when the first loss function meets the training end condition, determining the initial classification model when the first loss function meets the training end condition as the target text classification model.

In a possible implementation manner, the processor 501 determines a first loss function according to the first probability distribution, the second probability distribution, and the third probability distribution, and is specifically configured to:

calculating a first difference degree of the first probability distribution and a preset probability distribution corresponding to each first text data in the marked sample set according to a preset cross entropy, and determining a second loss function according to the first difference degree; calculating a second difference degree between the second probability distribution and the third probability distribution according to the preset cross entropy, and determining a third loss function according to the second difference degree; and determining the first loss function according to the second loss function and the third loss function.

In a possible implementation manner, the processor 501 determines the first loss function according to the second loss function and the third loss function, and is specifically configured to:

calculating the product of the first proportional coefficient and the second loss function according to a preset first proportional coefficient and the second loss function to obtain a first result, wherein the first proportional coefficient is a positive number; calculating the product of the second proportionality coefficient and the third loss function according to a preset second proportionality coefficient and the third loss function to obtain a second result, wherein the second proportionality coefficient is a positive number; determining a sum of the first result and the second result as the first penalty function.

In a possible implementation manner, after the processor 501 calculates a second difference between the second probability distribution and the third probability distribution according to the preset cross entropy, and determines a third loss function according to the second difference, the processor 501 is further configured to:

determining the category label corresponding to the maximum probability in the third probability distribution as the category label corresponding to the enhanced unmarked text data under the condition that the third loss function is smaller than a preset threshold value; adding the enhanced unmarked text data and the category label corresponding to the maximum probability in the third probability distribution into the marked sample set; and training the initial classification model according to the labeled sample set added with the enhanced unlabeled text data and the class label corresponding to the maximum probability in the third probability distribution.

In a possible implementation manner, the second text data includes first language text data; the processor 501 performs text enhancement processing on each second text data in the unlabeled sample set to obtain enhanced unlabeled text data, which is specifically configured to:

performing language conversion processing on the first language text data to obtain second language text data; randomly extracting words in the second language text data, acquiring synonyms corresponding to the words from a preset synonym set according to the corresponding relation between preset words and the synonyms, and replacing the words in the second language text data with the synonyms; and performing language conversion processing on the replaced second language text data to obtain updated first language text data, and determining the updated first language text data as the enhanced unmarked text data.

performing language conversion processing on the first language text data to obtain second language text data; acquiring M words with the frequency greater than a preset frequency threshold in the second language text data, wherein M is an integer greater than or equal to 1; obtaining a synonym corresponding to each word in the M words from a preset synonym set according to the corresponding relation between a preset word and the synonym, and replacing each word in the M words in the second language text data by using the synonym corresponding to each word in the M words; and performing language conversion processing on the replaced second language text data to obtain updated first language text data, and determining the updated first language text data as the enhanced unmarked text data.

In a possible implementation manner, the training end condition is that, in the first loss functions obtained by N consecutive times of training, the number of times that the difference value of the first loss functions obtained by two adjacent times of training is smaller than a preset difference threshold is greater than or equal to a preset number threshold, where N is an integer greater than 2.

It should be understood that, in some possible embodiments, the processor 501 may be a Central Processing Unit (CPU), and the processor 501 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), field-programmable gate arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 505 may include both read-only memory and random access memory and provides instructions and data to the processor. A portion of memory 505 may also include non-volatile random access memory.

In a specific implementation, the computer device 500 may execute the implementation manners provided in the steps in fig. 1 to fig. 3 through the built-in functional modules, which may specifically refer to the implementation manners provided in the steps, and are not described herein again.

Further, here, it is to be noted that: an embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program executed by the aforementioned computer device, and the computer program includes program instructions, and when the processor executes the program instructions, the processor can perform the description of any one of the methods in the embodiment corresponding to any one of fig. 1 or fig. 3, and therefore, the description of any one of the methods will not be repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in embodiments of the computer-readable storage medium referred to in the present application, reference is made to the description of embodiments of the method of the present application.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer program instructions, and the above programs can be stored in a computer readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like.

It is emphasized that the data may also be stored in a node of a blockchain in order to further ensure the privacy and security of the data. The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims

1. A training method of a text classification model is characterized by comprising the following steps:

2. The method of claim 1, wherein determining a first loss function based on the first probability distribution, the second probability distribution, and the third probability distribution comprises:

3. The method of claim 2, wherein determining the first loss function from the second loss function and the third loss function comprises:

4. The method according to claim 2, wherein after calculating a second degree of difference between the second probability distribution and the third probability distribution according to the preset cross entropy and determining a third loss function according to the second degree of difference, the method further comprises:

5. The method of claim 1, wherein the second text data comprises first language text data; performing text enhancement processing on each second text data in the unlabeled sample set to obtain enhanced unlabeled text data, including:

6. The method of claim 1, wherein the second text data comprises first language text data; performing text enhancement processing on each second text data in the unlabeled sample set to obtain enhanced unlabeled text data, including:

7. The method of claim 1,

the training end condition is that the number of times that the difference value of the first loss functions obtained by two adjacent times of training is smaller than a preset difference threshold value in the first loss functions obtained by N times of continuous training is larger than or equal to a preset number threshold value, wherein N is an integer larger than 2.

8. An apparatus for training a text classification model, the apparatus comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the method of any one of claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the method of any one of claims 1-7.