CN113011531B - Classification model training method, device, terminal equipment and storage medium - Google Patents
Classification model training method, device, terminal equipment and storage medium Download PDFInfo
- Publication number
- CN113011531B CN113011531B CN202110476068.7A CN202110476068A CN113011531B CN 113011531 B CN113011531 B CN 113011531B CN 202110476068 A CN202110476068 A CN 202110476068A CN 113011531 B CN113011531 B CN 113011531B
- Authority
- CN
- China
- Prior art keywords
- sample
- training
- target
- translated
- training data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012549 training Methods 0.000 title claims abstract description 532
- 238000013145 classification model Methods 0.000 title claims abstract description 108
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000013519 translation Methods 0.000 claims abstract description 57
- 238000012545 processing Methods 0.000 claims abstract description 29
- 238000004590 computer program Methods 0.000 claims description 23
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 10
- 238000002372 labelling Methods 0.000 description 6
- 238000003062 neural network model Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Probability & Statistics with Applications (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application is applicable to the technical field of artificial intelligence, and provides a classification model training method, a device, terminal equipment and a storage medium, wherein the method comprises the following steps: model training is carried out based on the first training data, and an initial classification model is obtained; obtaining a second training sample of the unlabeled sample class, and performing back translation processing on the second training sample to obtain a back translated second training sample; inputting a second training sample to the initial classification model to obtain a target second sample class; and comprehensively training the first training data serving as first target training data, the second training sample and the target second sample category serving as second target training data, and the back-translated second training sample and the target second sample category serving as third target training data to obtain a target classification model. By adopting the method, the terminal equipment can obtain the classification model with higher classification accuracy even if model training is performed based on a small amount of labeled training data.
Description
Technical Field
The application belongs to the technical field of artificial intelligence, and particularly relates to a classification model training method, a device, terminal equipment and a storage medium.
Background
With the development of deep learning technology, people can train to obtain a neural network model for classification by using a deep learning method so as to solve machine classification problems such as text classification or image classification. Generally, in order to obtain higher accuracy when classifying the neural network model obtained by training, a large amount of labeling sample data is required for model training. In the process, not only a large amount of sample data needs to be collected, but also the collected sample data needs to be marked in a manual marking mode, so that marked sample data for training a neural network model is obtained.
However, labeling a large amount of sample data can take a significant amount of time and effort. If the model training is directly carried out through a small amount of labeling sample data, the classification accuracy of the generated neural network model is low.
Disclosure of Invention
The embodiment of the application provides a classification model training method, a device, terminal equipment and a storage medium, which can solve the problem of low classification accuracy of a generated neural network model when model training is directly carried out through a small amount of labeled sample data.
In a first aspect, an embodiment of the present application provides a classification model training method, including:
Model training is carried out based on first training data to obtain an initial classification model, wherein the first training data comprises a first training sample and a first sample category corresponding to the first training sample;
obtaining a second training sample of an unlabeled sample class, and performing back translation processing on the second training sample to obtain a back translated second training sample;
Inputting the back-translated second training sample to the initial classification model to obtain a target second sample class corresponding to the back-translated second training sample;
And comprehensively training the first training data serving as first target training data, the second training sample and the target second sample category serving as second target training data, and the back-translated second training sample and the target second sample category serving as third target training data to obtain a target classification model.
In an embodiment, the performing the back-translation processing on the second training sample to obtain a back-translated second training sample includes:
determining a sample language of the second training sample;
Translating the second training sample into a second training sample of a preset language by using a first translation interface;
And using a second translation interface to translate the second training sample in the preset language back into a translated second training sample in the same sample language.
In an embodiment, the inputting the second training sample to the initial classification model to obtain the target second sample class corresponding to the second training sample includes:
Inputting the back-translated second training sample to the initial classification model to obtain a back-translated second sample class corresponding to the back-translated second training sample;
Determining a transliteration category weight value corresponding to the transliteration second sample category;
And carrying out weighting treatment on the back-translated second sample category according to a preset weighting formula and the back-translated category weight value to obtain the target second sample category.
In an embodiment, the weighting processing is performed on the recompilated second sample class according to a preset weighting formula and the recompilated class weight value to obtain a target second sample class, including:
Inputting a second training sample of the unlabeled sample class to the initial classification model to obtain an initial second sample class corresponding to the second training sample;
determining a preset weight value corresponding to the initial second sample category;
And carrying out weighting treatment on the back-translated second sample category and the initial second sample category according to the preset weighting formula, the back-translated category weight value and the preset weight value to obtain the target second sample category.
In an embodiment, after the weighting process is performed on the recompilated second sample class according to the preset weighting formula and the recompilated class weight value to obtain the target second sample class, the method further includes:
and regularizing the target second sample category by adopting a preset regular expression to obtain the target second sample category finally used for comprehensive training.
In an embodiment, the performing comprehensive training with the first training data as first target training data, the second training sample and the target second sample class as second target training data, and the back-translated second training sample and the target second sample class as third target training data, to obtain a target classification model includes:
Randomly extracting at least two target training data from the first target training data, the second target training data and the third target training data for a plurality of times;
for the at least two target training data randomly extracted at any time, respectively carrying out random weighting treatment on the at least two target training data to obtain new target training data;
training is carried out based on the new target training data obtained for many times, and the target classification model is generated.
In an embodiment, each target training data includes a target training sample and a target sample class corresponding to the target training sample;
The random weighting processing is performed on the at least two target training data aiming at the at least two target training data randomly extracted at any time, so as to obtain new target training data, wherein the method comprises the following steps:
According to the forward distribution formula, carrying out weighted summation on target training samples in the at least two target training data to obtain new target training samples; and
According to the front distribution formula, carrying out weighted summation on target sample categories in the two target training data to obtain new target sample categories;
and taking the new target training sample and the new target sample category as the new target training data.
In a second aspect, an embodiment of the present application provides a classification model training apparatus, including:
The first training module is used for carrying out model training based on first training data to obtain an initial classification model, wherein the first training data comprises a first training sample and a first sample class corresponding to the first training sample;
the back translation module is used for obtaining a second training sample of the unlabeled sample class, and carrying out back translation processing on the second training sample to obtain a back translated second training sample;
The input module is used for inputting the back-translated second training sample to the initial classification model to obtain a target second sample category corresponding to the back-translated second training sample;
And the second training module is used for taking the first training data as first target training data, taking the second training sample and the target second sample class as second target training data, and comprehensively training the back-translated second training sample and the target second sample class as third target training data to obtain a target classification model.
In a third aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the method according to any one of the first aspects when executing the computer program.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program which, when executed by a processor, implements a method as in any of the first aspects above.
In a fifth aspect, an embodiment of the application provides a computer program product for, when run on a terminal device, causing the terminal device to perform the method of any of the first aspects described above.
Compared with the prior art, the embodiment of the application has the beneficial effects that: the terminal device can train the initial classification model according to the first training data of the marked sample types, and then carry out back-translation processing on the second training samples of the unmarked types to obtain a large number of back-translated second training samples so as to realize data enhancement. And then, identifying and translating back the second training sample through the initial classification model to obtain a target second sample category corresponding to the second training sample. And finally, generating a large amount of labeled training data for comprehensive training based on the first training data, second target training data formed by the second training sample and the target second sample class, and third target training data formed by the second training sample and the target second sample class, and obtaining a target classification model. Therefore, the terminal equipment can reduce the time and labor cost required by labeling a large amount of data, and can train based on a small amount of labeled training data to obtain the target classification model with high classification accuracy.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a training method for classification models according to an embodiment of the present application;
FIG. 2 is a schematic diagram of an implementation of S102 of a classification model training method according to an embodiment of the application;
FIG. 3 is a schematic diagram of an implementation of S103 of a classification model training method according to an embodiment of the application;
FIG. 4 is a schematic diagram of an implementation of S1033 of a training method for classification models according to an embodiment of the application;
FIG. 5 is a schematic diagram of an implementation of S104 of a classification model training method according to an embodiment of the application;
FIG. 6 is a schematic diagram of an implementation of S1042 of a classification model training method according to an embodiment of the present application;
FIG. 7 is a block diagram of a training device for classification models according to an embodiment of the present application;
Fig. 8 is a block diagram of a terminal device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.
The classification model training method provided by the embodiment of the application can be applied to terminal equipment such as tablet computers, notebook computers, ultra-mobile personal computer (UMPC) and netbooks, and the specific type of the terminal equipment is not limited.
Referring to fig. 1, fig. 1 shows a flowchart of an implementation of a classification model training method according to an embodiment of the present application, where the method includes the following steps:
S101, performing model training based on first training data to obtain an initial classification model, wherein the first training data comprises a first training sample and a first sample class corresponding to the first training sample.
In an embodiment, the first training data is training data of a known labeling category, and the number of the first training data may be less than the number of training data required for normally training the classification model. The initial classification model may be an expression classification model or a text classification model, and in this embodiment, the specific application of the initial classification model is not limited. For ease of explanation, this embodiment is described in terms of a text classification model.
By way of example, the first training sample may be a first training text including, but not limited to, text in the form of news, papers, and the like. The first sample class is a class of a corresponding first training sample. For example, the first sample class includes, but is not limited to, the first training sample belonging to a class such as a physical discipline class and a linguistic discipline class. It should be noted that the first training data generally includes a plurality of first training samples, and each first training sample corresponds to a first sample class based on the first training sample. Specifically, the first training data may be represented by the following expression: l= (Xm, ym), wherein 1+.m+.n, where n is the number of first training samples, m is the mth first training sample, xm is the sample characteristics of the mth first training sample, ym is the first sample class of the mth first training sample.
S102, obtaining a second training sample of the unlabeled sample class, and performing back translation processing on the second training sample to obtain a back translated second training sample.
In one embodiment, the second training sample may be similar to the first training sample, including but not limited to, news, papers, and the like. It should be noted that, the second training sample is not labeled with the corresponding sample class in advance. It will be appreciated that one of the objects of this embodiment may also include: and training the classification model by using the second training samples of the unlabeled categories to improve the utilization of the second training samples of the unlabeled categories, and performing model training based on a small amount of labeled sample data to obtain the classification model with high classification accuracy. The second training sample may be represented by u= { X U }.
It should be added that, the first training sample and the second training sample may be texts that are stored in advance in the terminal device and under the specified storage path. The second training sample may also be a training sample that is crawled by the terminal device from the network, which is not limited.
In one embodiment, the back-translation process translates the second training sample into text in another language, and then back-translates the text in the other language into text in the original language. For example, if the second training sample is a text in a chinese language, the second training sample may be translated into an english second training sample by using a chinese translation technique. And then, performing back translation on the second training sample of the English version by utilizing the English-Chinese technology to obtain a back translated second training sample.
It can be understood that the number of the second training samples is also plural, and the performing the back-translation processing on the second training samples is that performing the back-translation processing on each second training sample. However, it should be specifically noted that, in the foregoing process of performing the back-translation processing on each second training sample, one second training sample may be translated into a plurality of training samples in a plurality of different languages, and then, when the back-translation is performed on each training sample in a different language, a plurality of back-translated second training samples may be obtained. Namely, after the second training sample is subjected to the back translation processing, at least one back translated second training sample can be obtained, so that the second training sample can be subjected to data enhancement.
For example, the i second training sample X U,i is back-translated, and a plurality of back-translated second training samples X U,i,j, j=1, 2, are obtained, where K is expressed as translating the second training sample into text in K languages, and then back-translating the text in K to obtain a back-translated second training sample in K; x U,i,j is that when the ith second training sample is processed in a back translation mode, the back translation second training sample belonging to the jth language is obtained.
S103, inputting the back-translated second training sample to the initial classification model to obtain a target second sample category corresponding to the back-translated second training sample.
In an embodiment, the initial classification model is a classification model obtained by training based on the small amount of the first training data, and the classification model can perform classification recognition on the second training samples, so as to obtain a second sample class corresponding to each second training sample. At this time, the initial classification model is also a model obtained by training based on the first training data of the marked category, and has a certain classification accuracy. Based on this, the terminal device may also determine the recompile second sample class as the target second sample class corresponding to the recompile second training sample.
It should be noted that, when the plurality of second training samples obtained by performing the back-translation on the same second training sample are identified by the initial classification model, the class of the back-translated second samples corresponding to the plurality of second training samples should be the same. Namely, after model recognition is performed on a plurality of the second back-translated training samples X U,i,j, j=1, 2, corresponding to one second training sample, K back-translated second sample classes can be obtained respectively, and the K back-translated second sample classes are substantially the same. The second sample class may be represented by p (xu, i, j), which is specifically represented as the second sample class of the j-th second training sample among the K second sample classes.
However, it should be added that one second training sample may correspondingly obtain a plurality of second training samples for the transliteration, and the class of the second training samples for each transliteration may also be different. Based on this, to unify the recompilated second sample class corresponding to each recompilated second training sample in the second training samples, the terminal device may process each recompilated second sample class to obtain a target sample class. That is, for any second training sample, although multiple back-translated second training samples can be obtained, each back-translated second training sample corresponds to only one sample class (i.e., the target sample class) finally.
For example, for a plurality of recompilated second sample classes obtained by recompilating any second training sample, the terminal device may determine a recompilated class weight value corresponding to each recompilated second sample class. And then, weighting the back-translated second sample class according to a preset weighting formula and a back-translated class weight value to obtain a target second sample class.
S104, taking the first training data as first target training data, taking the second training sample and the target second sample class as second target training data, and comprehensively training the back-translated second training sample and the target second sample class as third target training data to obtain a target classification model.
In an embodiment, the performing the comprehensive training may be performing random combination on the first target training data, the second target training data, and the third target training data to obtain new training data, and performing model training according to the new training data to obtain the target classification model. The training process of the model is the prior art, and will not be described in detail.
In an embodiment, the first training data L is training data of a known sample class. In the second target training data, after the second training samples are processed in S102-S103, each of the back-translated second training samples corresponds to the same target second sample class. At this time, the terminal device may use the target second sample class as a sample class corresponding to the second training sample, which may be represented by U1. Specifically, u1= { X U,Y'U }, where Y' U is the target second sample class corresponding to the second training sample. In S103, it has been described that the target second sample class is determined as the class of each of the recompilated second training samples corresponding to one second training sample. Thus, the terminal device may interpret the second training sample and the target second sample class as third target training data. Specifically, u2= { X U,K,Y'U }, where X U,K is a second training sample set obtained by performing the back-translation on the second training sample X U.
In this embodiment, the terminal device may train the initial classification model according to the first training data of the labeled sample class, and then perform the back-translation processing on the second training samples of the unlabeled class, so as to obtain a large number of back-translated second training samples, so as to implement data enhancement. And then, identifying and translating back the second training sample through the initial classification model to obtain a target second sample category corresponding to the second training sample. And finally, generating a large amount of labeled training data for comprehensive training based on the first training data, second target training data formed by the second training sample and the target second sample class, and third target training data formed by the second training sample and the target second sample class, and obtaining a target classification model. Therefore, the terminal equipment can reduce the time and labor cost required by labeling a large amount of data, and can train and obtain the target classification model with high classification accuracy.
Referring to fig. 2, in one embodiment, the step S102 of performing the back-translation processing on the second training sample to obtain a back-translated second training sample specifically includes the following substeps S1021-1024, which are described in detail as follows:
s1021, determining the sample language of the second training sample.
S1022, translating the second training sample into a second training sample of a preset language by using the first translation interface.
S1023, using a second translation interface to translate the second training sample of the preset language back into a translated second training sample of the same sample language.
In an embodiment, the first translation interface and the second translation interface are language translation interfaces, which are existing open interfaces. The terminal device can accurately translate the second training sample through the translation interface. For example, the translation interface may be a hundred degree translation interface or a google translation interface, which may be used to translate the second training samples in chinese language into second training samples in english, and then translate the second training samples in english back into translated second training samples in chinese. At this time, the interface for translating the second training sample in the chinese language into the second training sample in the english language is the first translation interface, and the interface for translating the second training sample in the english language back into the second training sample in the chinese language is the second translation interface.
In an embodiment, the preset language may be a plurality of languages preset in the terminal device, and for each language, a corresponding first translation interface and a second translation interface are preconfigured for calling.
Referring to fig. 3, in one embodiment, in S103, inputting the second training samples to the initial classification model to obtain target second sample classes corresponding to the second training samples includes the following substeps S1031-1033, which are described in detail below:
s1031, inputting the back-translated second training sample to the initial classification model to obtain a back-translated second sample class corresponding to the back-translated second training sample.
In one embodiment, the second sample class is specifically explained in S103, and reference is specifically made to the explanation of p (xu, i, j), which is not explained.
S1032, determining a back translation class weight value corresponding to the back translation second sample class.
In an embodiment, the terminal device has been described in S1023 that stores a plurality of preset languages in advance, and a translation interface corresponding to each language. Based on this, for the back-translated second sample class of each preset language, the terminal device may also set in advance a corresponding back-translated class weight value for each translation interface (each preset language). Therefore, the terminal device can determine the back-translated second sample of the back-translated second sample class first and determine the translation interface correspondingly called when the back-translated second sample is processed, so as to determine the back-translated class weight value corresponding to the back-translated second sample class.
S1033, weighting the back-translated second sample category according to a preset weighting formula and the back-translated category weight value to obtain the target second sample category.
In an embodiment, the above-mentioned preset weighting formula may be specifically expressed as:
wherein yui is the target second sample class after the processing of S102-S103 on the ith second training sample; xui denotes the sample feature of the ith second training sample; xuij is the sample characteristics of the j-th second training sample in the multiple second training samples after the i-th second training sample is processed in a back translation mode; p (xu, i, j) is the second sample class of the jth second training sample; wj is the weight of the j-th back-translated second training sample; k may be the number of back-translated second training samples (i.e., the number of preset languages).
It should be noted that, since each of the back-translated second training samples is identified by the initial classification model, the accuracy of the sample classification may not be consistent. Based on the above, the user can set the weight value w of the second training sample after the back translation of different languages in the terminal device in advance according to the actual situation, so that the target second sample class is closer to the actual class of the second training sample. That is, the weight values of the backtranslation classes corresponding to different translation interfaces may be the same or different, which is not limited.
Referring to fig. 4, in an embodiment, in S1033, the weighting process is performed on the recompilated second sample class according to a preset weighting formula and the recompilated class weight value to obtain the target second sample class, which specifically includes the following substeps S10331-10333, which are described in detail below:
s10331, inputting a second training sample of the unlabeled sample category to the initial classification model to obtain an initial second sample category corresponding to the second training sample.
In an embodiment, since the second training samples are obtained by performing the back-translation processing based on the second training samples, the sample types of the original second training samples need to be considered when determining the target second sample types corresponding to the back-translated second training samples. Therefore, the terminal device can input the second training sample into the initial classification model to obtain an initial second sample category corresponding to the second training sample.
S10332, determining a preset weight value corresponding to the initial second sample category.
In an embodiment, the corresponding preset weight value of the initial second sample class may also be a preset value inside the terminal device. It should be noted that, for any second training sample, after the initial classification model is input to obtain the initial second sample class, the preset weight values of the second training sample are consistent. I.e. the preset weight values used for each second training sample are the same.
S10333, weighting the back-translated second sample class and the initial second sample class according to the preset weighting formula, the back-translated class weight value and the preset weight value to obtain the target second sample class.
In an embodiment, the predetermined weighting formula may be:
Wherein, w0 is a weight value corresponding to the second training sample (i.e. a preset weight value corresponding to the second training sample class), and p (xu, i) is an initial second sample class of the i second training sample predicted by the initial classification model; yui, xuij, p (xu, i, j), wj, and K are explained in the above-mentioned S1033, and will not be explained.
Illustratively, when the initial classification model performs sample class prediction, the prediction result is usually a vector, and the vector is expressed as a probability value of which class the sample belongs to. Specifically, if the initial classification model is a two-classification prediction model, the prediction result should be [ y1, y2], where y1 represents a probability value of the first class of the initial classification model prediction sample, y2 represents a probability value of the second class of the initial classification model prediction sample, and y1+y2=1. Based on this, the calculation formula for the above weighting process can be simplified as:
Wherein ys is the terminal equipment and finally determines the probability that the ith second training sample belongs to the first class, and yz is the terminal equipment and finally determines the probability that the ith second training sample belongs to the second class. It should be added that, for the probabilities of the two classes output by the initial classification model, the terminal device generally determines only the class with the highest probability as the classification class of the training sample (i.e. the target second sample class). That is, from ys and yz, the classification class corresponding to the maximum value is determined as the target second sample class.
It is added that the initial classification model predicts an initial second sample class of the second training sample, typically closer to the actual sample class of the second training sample. Based on this, in the process of performing weighted summation, the terminal device may set a preset weight value w0 of the initial second sample class, which is greater than the weight values wj corresponding to the remaining back-translated second sample classes. Furthermore, the target second sample class obtained by the terminal device can be more approximate to the actual sample class.
In an embodiment, after S1033 performs a weighting process on the recompilated second sample class according to a preset weighting formula and the recompilated class weight value to obtain the target second sample class, the method further includes the following steps, which are described in detail below:
and regularizing the target second sample category by adopting a preset regular expression to obtain the target second sample category finally used for comprehensive training.
In one embodiment, the number of the second training samples u= { X U } is plural, and based on this, the number of the target second sample classes will also have plural. However, in order to avoid the above-mentioned distribution of the plurality of target second sample classes being so uniform that the final target classification model is subjected to a problem of over-fitting or under-fitting when model training is performed based on the second training sample and the target second sample class.
Based on the above, the terminal device needs to perform regularization processing on the target second sample types to control distribution of the plurality of target second sample types, so as to avoid larger values of the target second sample types. That is, it can be understood that after the weighting process in S10333, the sum of the values of ys and yz in the target second sample class yui may be greater than 1. Therefore, the above regularization process may be also regarded as normalization process of yui so that the sum of ys, yz in the interior is 1.
Specifically, the regular expression may specifically be:
Wherein T represents a preset hyper-parameter, yui has been explained above, which will not be described again, y u,i has been explained above. Based on this, the terminal device may obtain the final target second sample class (i.e. the probabilities of belonging to the first class and the probabilities of belonging to the second class respectively) of the ith second training sample after regularization. And then, the terminal equipment can determine the maximum value of probabilities corresponding to the two categories respectively, and determine the category corresponding to the maximum value as a target second sample category.
Referring to fig. 5, in one embodiment, in S104, the first training data is used as first target training data, the second training sample and the target second sample class are used as second target training data, and the back-translated second training sample and the target second sample class are used as third target training data to perform comprehensive training, so as to obtain a target classification model, which specifically includes the following substeps S1041-1043, which are described in detail below:
S1041, randomly extracting at least two target training data from the first target training data, the second target training data and the third target training data for a plurality of times.
S1042, for the at least two target training data randomly extracted at any time, respectively performing random weighting processing on the at least two target training data to obtain new target training data.
In an embodiment, the first target training data, the second target training data and the third target training data each include a corresponding training sample and a sample class corresponding to the training sample. Based on this, each of the target training data described above may participate in model training. However, the target second sample class is predicted based on the initial classification model, and is determined after the processing, which may be considered to be less accurate. Therefore, in order to improve the accuracy of matching the target training sample with the target sample category in the target training data, the target training data can be subjected to random weighting processing to obtain new target training data.
The first target training data, the second target training data, and the third target training data each include a plurality of training samples and corresponding sample categories. Therefore, the terminal equipment can randomly determine at least two training samples from a plurality of training samples and corresponding sample categories, and perform random weighting processing on the training samples to obtain new training samples; and after determining at least two training samples, the terminal equipment performs random weighting processing on sample categories corresponding to the at least two training samples respectively to obtain new sample categories. At this time, the new training sample and the new sample class are new target training data.
S1043, training based on the new target training data obtained for multiple times, and generating the target classification model.
In an embodiment, the process of model training using the target training data has been described in S104 as the prior art, which will not be described.
Referring to fig. 6, in an embodiment, each of the at least two target training data includes a target training sample and a target sample class corresponding to the target training sample; in S1042, for the at least two target training data randomly extracted at any time, the at least two target training data are respectively subjected to random weighting processing, so as to obtain new target training data, which specifically includes the following substeps S10421-10423, which are described in detail below:
s10421, carrying out weighted summation on the target training samples in the at least two target training data according to the forward distribution formula to obtain a new target training sample. And
S10422, according to the forward distribution formula, carrying out weighted summation on the target sample categories in the two target training data to obtain new target sample categories.
S10423, taking the new target training sample and the new target sample class as the new target training data.
In an embodiment, when the weighted summation is performed on at least two target training data based on the above-mentioned forward distribution formula, the weight assigned to each target training data participating in formula calculation is subject to forward distribution. Specifically, the positive-going distribution may be regarded as a probability distribution of probability (weight) for randomly giving each target training data a probability of occurrence of the ownership of the weight according to the positive-going distribution when the specific weight of each target training data is not known. Furthermore, the terminal equipment can enable the generated new target training data to have objectivity, so that the classification accuracy of the target classification model generated according to the new target training data is high.
Specifically, for the above three sets of target training data, at least two target training data may be randomly determined from any set. Illustratively, two target training data are randomly extracted at a time. In this case, the target training data may be any two target training data in one set at the same time, or may be any one target training data in two sets, which is not limited. After that, after determining two target training data, for example, for target training data a= (x 1,y1) and target training data b= (x 2,y2), calculation can be performed by the following n-ethernet distribution formula:
a'~Beta(λ,λ); (3)
a=max(a',1-a'); (4);
Wherein a' is a weight randomly given according to the normal distribution, which obeys the normal distribution lambda; x1 is a sample feature obtained when the initial classification model processes the target training sample A, and x2 is a sample feature obtained when the initial classification model processes the target training sample B; y1 is the target sample class of the target training sample a, and y2 is the target sample class of the target training sample B. Thus, the terminal equipment can be according to the above And/>New target training data is obtained. The significance of the above formula (3) and the above formula (4) is that: when the value of λ is determined based on the positive-going distribution, the value of a' is determined. At this time, the terminal device may select the maximum value of a 'and 1-a' in the formula 3, and participate in the calculation of the above formula (1) and formula (2) as a value of a, to obtain new target training data.
Referring to fig. 7, fig. 7 is a block diagram of a training device for classification models according to an embodiment of the application. The classification model training apparatus in this embodiment includes modules for executing the steps in the embodiments corresponding to fig. 1 to 6. Please refer to fig. 1 to 6 and the related descriptions in the embodiments corresponding to fig. 1 to 6. For convenience of explanation, only the portions related to the present embodiment are shown. Referring to fig. 7, the classification model training apparatus 700 includes: a first training module 710, a back translation module 720, an input module 730, and a second training module 740, wherein:
The first training module 710 is configured to perform model training based on first training data, to obtain an initial classification model, where the first training data includes a first training sample and a first sample class corresponding to the first training sample.
And the back-translation module 720 is configured to obtain a second training sample of an unlabeled sample class, and perform back-translation processing on the second training sample to obtain a back-translated second training sample.
And an input module 730, configured to input the second training sample to the initial classification model, so as to obtain a target second sample class corresponding to the second training sample.
The second training module 740 is configured to perform comprehensive training with the first training data as first target training data, the second training sample and the target second sample class as second target training data, and the back-translated second training sample and the target second sample class as third target training data, so as to obtain a target classification model.
In one embodiment, the back translation module 720 is further configured to:
Determining a sample language of the second training sample; translating the second training sample into a second training sample of a preset language by using a first translation interface; and using a second translation interface to translate the second training sample in the preset language back into a translated second training sample in the same sample language.
In one embodiment, the input module 730 is further configured to:
Inputting the back-translated second training sample to the initial classification model to obtain a back-translated second sample class corresponding to the back-translated second training sample; determining a transliteration category weight value corresponding to the transliteration second sample category; and carrying out weighting treatment on the back-translated second sample category according to a preset weighting formula and the back-translated category weight value to obtain the target second sample category.
In one embodiment, the input module 730 is further configured to:
inputting a second training sample of the unlabeled sample class to the initial classification model to obtain an initial second sample class corresponding to the second training sample; determining a preset weight value corresponding to the initial second sample category; and carrying out weighting treatment on the back-translated second sample category and the initial second sample category according to the preset weighting formula, the back-translated category weight value and the preset weight value to obtain the target second sample category.
In one embodiment, the classification model training apparatus 700 further includes the following modules, in particular:
and the regularization processing module is used for regularizing the target second sample category by adopting a preset regular expression to obtain the target second sample category which is finally used for comprehensive training.
In an embodiment, the second training module 740 is further configured to:
Randomly extracting at least two target training data from the first target training data, the second target training data and the third target training data for a plurality of times; for the at least two target training data randomly extracted at any time, respectively carrying out random weighting treatment on the at least two target training data to obtain new target training data; training is carried out based on the new target training data obtained for many times, and the target classification model is generated.
In an embodiment, each target training data includes a target training sample and a target sample class corresponding to the target training sample; the second training module 740 is further configured to:
According to the forward distribution formula, carrying out weighted summation on target training samples in the at least two target training data to obtain new target training samples; according to the forward distribution formula, weighting and summing the target sample categories in the two target training data to obtain a new target sample category; and taking the new target training sample and the new target sample category as the new target training data.
It is to be understood that, in the block diagram of the classification model training apparatus shown in fig. 7, each unit/module is configured to perform each step in the embodiments corresponding to fig. 1 to 6, and each step in the embodiments corresponding to fig. 1 to 6 has been explained in detail in the above embodiments, and specific reference is made to fig. 1 to 6 and related descriptions in the embodiments corresponding to fig. 1 to 6, which are not repeated herein.
Fig. 8 is a block diagram of a terminal device according to another embodiment of the present application. As shown in fig. 8, the terminal device 800 of this embodiment includes: a processor 810, a memory 820, and a computer program 830 stored in the memory 820 and executable on the processor 810, such as a program of a classification model training method. The steps of the various embodiments of the classification model training method described above, such as S101 through S104 shown in fig. 1, are implemented by processor 810 when executing computer program 830. Or the processor 810 may perform the functions of the modules in the embodiment corresponding to fig. 7, for example, the functions of the modules 710 to 740 shown in fig. 7, when executing the computer program 830, refer to the related descriptions in the embodiment corresponding to fig. 7.
By way of example, the computer program 830 may be partitioned into one or more units, one or more units being stored in the memory 820 and executed by the processor 810 to accomplish the application. One or more of the elements may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments describe the execution of the computer program 830 in the terminal device 800.
Terminal device 800 can include, but is not limited to, a processor 810, a memory 820. It will be appreciated by those skilled in the art that fig. 8 is merely an example of a terminal device 800 and is not intended to limit the terminal device 800, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the terminal device may further include an input-output device, a network access device, a bus, etc.
The processor 810 may be a central processing unit, but may also be other general purpose processors, digital signal processors, application specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 820 may be an internal storage unit of the terminal device 800, such as a hard disk or a memory of the terminal device 800. The memory 820 may also be an external storage device of the terminal device 800, such as a plug-in hard disk, a smart memory card, a flash memory card, etc. provided on the terminal device 800. Further, the memory 820 may also include both internal storage units and external storage devices of the terminal device 800.
Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps for implementing the various method embodiments described above.
Embodiments of the present application provide a computer program product which, when run on a mobile terminal, causes the mobile terminal to perform steps that enable the implementation of the method embodiments described above.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.
Claims (8)
1. A method of training a classification model, comprising:
Model training is carried out based on first training data to obtain an initial classification model, wherein the first training data comprises a first training sample and a first sample category corresponding to the first training sample;
obtaining a second training sample of an unlabeled sample class, and performing back translation processing on the second training sample to obtain a back translated second training sample; each second training sample corresponds to a plurality of the back-translated second training samples;
Inputting the back-translated second training sample to the initial classification model to obtain a target second sample class corresponding to the back-translated second training sample;
Taking the first training data as first target training data, taking the second training sample and the target second sample class as second target training data, and comprehensively training the back-translated second training sample and the target second sample class as third target training data to obtain a target classification model;
Inputting the second training sample to the initial classification model to obtain a target second sample class corresponding to the second training sample, including:
Inputting the back-translated second training sample to the initial classification model to obtain a back-translated second sample class corresponding to the back-translated second training sample;
Determining a transliteration category weight value corresponding to the transliteration second sample category;
Inputting a second training sample of the unlabeled sample class to the initial classification model to obtain an initial second sample class corresponding to the second training sample;
determining a preset weight value corresponding to the initial second sample category;
And carrying out weighting treatment on the back-translated second sample category and the initial second sample category according to the preset weighting formula, the back-translated category weight value and the preset weight value to obtain the target second sample category.
2. The method for training a classification model according to claim 1, wherein the performing a back-translation process on the second training sample to obtain a back-translated second training sample comprises:
determining a sample language of the second training sample;
Translating the second training sample into a second training sample of a preset language by using a first translation interface;
And using a second translation interface to translate the second training sample in the preset language back into a translated second training sample in the same sample language.
3. The classification model training method of claim 1, further comprising, after the weighting the recompilated second sample class and the initial second sample class according to the preset weighting formula, the recompilated class weight value, and the preset weight value, obtaining the target second sample class:
and regularizing the target second sample category by adopting a preset regular expression to obtain the target second sample category finally used for comprehensive training.
4. The method for training a classification model according to claim 1, wherein the step of comprehensively training the first training data as first target training data, the second training sample and the target second sample class as second target training data, and the step of comprehensively training the back-translated second training sample and the target second sample class as third target training data to obtain a target classification model comprises:
Randomly extracting at least two target training data from the first target training data, the second target training data and the third target training data for a plurality of times;
for the at least two target training data randomly extracted at any time, respectively carrying out random weighting treatment on the at least two target training data to obtain new target training data;
training is carried out based on the new target training data obtained for many times, and the target classification model is generated.
5. The classification model training method of claim 4, wherein each of the at least two target training data comprises a target training sample and a target sample class corresponding to the target training sample, respectively;
The random weighting processing is performed on the at least two target training data aiming at the at least two target training data randomly extracted at any time, so as to obtain new target training data, wherein the method comprises the following steps:
According to the forward distribution formula, carrying out weighted summation on target training samples in the at least two target training data to obtain new target training samples; and
According to the front distribution formula, carrying out weighted summation on target sample categories in the two target training data to obtain new target sample categories;
and taking the new target training sample and the new target sample category as the new target training data.
6. A classification model training apparatus, comprising:
The first training module is used for carrying out model training based on first training data to obtain an initial classification model, wherein the first training data comprises a first training sample and a first sample class corresponding to the first training sample;
The back translation module is used for obtaining a second training sample of the unlabeled sample class, and carrying out back translation processing on the second training sample to obtain a back translated second training sample; each second training sample corresponds to a plurality of the back-translated second training samples;
The input module is used for inputting the back-translated second training sample to the initial classification model to obtain a target second sample category corresponding to the back-translated second training sample;
The second training module is used for taking the first training data as first target training data, taking the second training sample and the target second sample class as second target training data, and comprehensively training the back-translated second training sample and the target second sample class as third target training data to obtain a target classification model;
the input module is further configured to:
Inputting the back-translated second training sample to the initial classification model to obtain a back-translated second sample class corresponding to the back-translated second training sample; determining a transliteration category weight value corresponding to the transliteration second sample category; inputting a second training sample of the unlabeled sample class to the initial classification model to obtain an initial second sample class corresponding to the second training sample; determining a preset weight value corresponding to the initial second sample category; and carrying out weighting treatment on the back-translated second sample category and the initial second sample category according to the preset weighting formula, the back-translated category weight value and the preset weight value to obtain the target second sample category.
7. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 5 when executing the computer program.
8. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 5.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110476068.7A CN113011531B (en) | 2021-04-29 | 2021-04-29 | Classification model training method, device, terminal equipment and storage medium |
PCT/CN2021/097286 WO2022227214A1 (en) | 2021-04-29 | 2021-05-31 | Classification model training method and apparatus, and terminal device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110476068.7A CN113011531B (en) | 2021-04-29 | 2021-04-29 | Classification model training method, device, terminal equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113011531A CN113011531A (en) | 2021-06-22 |
CN113011531B true CN113011531B (en) | 2024-05-07 |
Family
ID=76381033
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110476068.7A Active CN113011531B (en) | 2021-04-29 | 2021-04-29 | Classification model training method, device, terminal equipment and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113011531B (en) |
WO (1) | WO2022227214A1 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113408291B (en) * | 2021-07-09 | 2023-06-30 | 平安国际智慧城市科技股份有限公司 | Training method, training device, training equipment and training storage medium for Chinese entity recognition model |
CN115858783A (en) * | 2022-11-30 | 2023-03-28 | 北京猿力教育科技有限公司 | Training method and device of theme recognition model |
CN115983294B (en) * | 2023-01-06 | 2024-01-02 | 北京有竹居网络技术有限公司 | Translation model training method, translation method and translation equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112581250A (en) * | 2019-09-30 | 2021-03-30 | 深圳无域科技技术有限公司 | Model generation method and device, computer equipment and storage medium |
CN112597766A (en) * | 2020-12-29 | 2021-04-02 | 杭州电子科技大学 | Noisy semi-supervised text classification method based on BERT-base network |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8521507B2 (en) * | 2010-02-22 | 2013-08-27 | Yahoo! Inc. | Bootstrapping text classifiers by language adaptation |
CN113785314A (en) * | 2019-05-06 | 2021-12-10 | 谷歌有限责任公司 | Semi-supervised training of machine learning models using label guessing |
US11568307B2 (en) * | 2019-05-20 | 2023-01-31 | International Business Machines Corporation | Data augmentation for text-based AI applications |
CN110543645B (en) * | 2019-09-04 | 2023-04-07 | 网易有道信息技术(北京)有限公司 | Machine learning model training method, medium, device and computing equipment |
CN111858935A (en) * | 2020-07-13 | 2020-10-30 | 北京航空航天大学 | Fine-grained emotion classification system for flight comment |
CN111881983B (en) * | 2020-07-30 | 2024-05-28 | 平安科技(深圳)有限公司 | Data processing method and device based on classification model, electronic equipment and medium |
CN112347769B (en) * | 2020-10-30 | 2024-01-23 | 北京百度网讯科技有限公司 | Entity recognition model generation method and device, electronic equipment and storage medium |
CN112347261A (en) * | 2020-12-07 | 2021-02-09 | 携程计算机技术(上海)有限公司 | Classification model training method, system, equipment and storage medium |
-
2021
- 2021-04-29 CN CN202110476068.7A patent/CN113011531B/en active Active
- 2021-05-31 WO PCT/CN2021/097286 patent/WO2022227214A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112581250A (en) * | 2019-09-30 | 2021-03-30 | 深圳无域科技技术有限公司 | Model generation method and device, computer equipment and storage medium |
CN112597766A (en) * | 2020-12-29 | 2021-04-02 | 杭州电子科技大学 | Noisy semi-supervised text classification method based on BERT-base network |
Also Published As
Publication number | Publication date |
---|---|
WO2022227214A1 (en) | 2022-11-03 |
CN113011531A (en) | 2021-06-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113011531B (en) | Classification model training method, device, terminal equipment and storage medium | |
Lin et al. | Comparison of handcrafted features and convolutional neural networks for liver MR image adequacy assessment | |
Sheehan et al. | Deep learning for population genetic inference | |
US11803758B2 (en) | Adversarial pretraining of machine learning models | |
JP2022542639A (en) | Systems and methods for training machine learning algorithms for processing biology-related data, microscopes and trained machine learning algorithms | |
Pérez-Rodríguez et al. | An R package for fitting Bayesian regularized neural networks with applications in animal breeding | |
US20150095017A1 (en) | System and method for learning word embeddings using neural language models | |
CN109086265B (en) | Semantic training method and multi-semantic word disambiguation method in short text | |
Jiang et al. | Multi-domain neural machine translation with word-level adaptive layer-wise domain mixing | |
Zhou et al. | Interactive segmentation as gaussion process classification | |
CN111782804B (en) | Text CNN-based co-distributed text data selection method, system and storage medium | |
EP4361843A1 (en) | Neural network searching method and related device | |
CN110717013B (en) | Vectorization of documents | |
CN113377909B (en) | Paraphrasing analysis model training method and device, terminal equipment and storage medium | |
Pramanik et al. | TOPSIS aided ensemble of CNN models for screening COVID-19 in chest X-ray images | |
CN114678141A (en) | Method, apparatus and medium for predicting drug-pair interaction relationship | |
Wu et al. | AGNet: Automatic generation network for skin imaging reports | |
US20220172055A1 (en) | Predicting biological functions of proteins using dilated convolutional neural networks | |
US20200311538A1 (en) | Methods and systems for text sequence style transfer by two encoder decoders | |
CN114970467B (en) | Method, device, equipment and medium for generating composition manuscript based on artificial intelligence | |
Jain et al. | Detecting Twitter posts with Adverse Drug Reactions using Convolutional Neural Networks. | |
CN111767710B (en) | Indonesia emotion classification method, device, equipment and medium | |
Han et al. | Latent variable autoencoder | |
CN114328916A (en) | Event extraction and training method of model thereof, and device, equipment and medium thereof | |
Gönen et al. | Bayesian multiview dimensionality reduction for learning predictive subspaces |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |