CN115688868A - Model training method and computing device - Google Patents

Model training method and computing device Download PDF

Info

Publication number
CN115688868A
CN115688868A CN202211715713.7A CN202211715713A CN115688868A CN 115688868 A CN115688868 A CN 115688868A CN 202211715713 A CN202211715713 A CN 202211715713A CN 115688868 A CN115688868 A CN 115688868A
Authority
CN
China
Prior art keywords
data
training
ith
added
computing device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211715713.7A
Other languages
Chinese (zh)
Other versions
CN115688868B (en
Inventor
崔和涛
张云柯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Honor Device Co Ltd
Original Assignee
Honor Device Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Honor Device Co Ltd filed Critical Honor Device Co Ltd
Priority to CN202211715713.7A priority Critical patent/CN115688868B/en
Publication of CN115688868A publication Critical patent/CN115688868A/en
Application granted granted Critical
Publication of CN115688868B publication Critical patent/CN115688868B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Electrically Operated Instructional Devices (AREA)

Abstract

The embodiment of the application discloses a model training method and computing equipment, wherein the computing equipment inputs n pieces of original data into a first neural network model for training to obtain a pre-training model; the computing equipment performs data enhancement on the n pieces of original data to obtain m pieces of total enhanced data; the computing equipment takes n pieces of original data and m pieces of total enhancement data as training data, and trains a pre-training model in Z steps; for the ith training, the computing equipment determines an ith enhancement ratio lambda (i); the enhancement ratio is the proportion of the enhancement data participating in the training in the step to the m pieces of total enhancement data; the computing equipment determines ith training data based on the lambda (i), and inputs the ith training data into an ith-1 training model for training to obtain an ith training model; the value of i is sequentially from 1 to Z; λ (i) is equal to or greater than λ (i-1); and lambda (i-1) is the i-1 enhancement ratio of the i-1 step. According to the embodiment of the application, the model training effect can be improved.

Description

Model training method and computing device
Technical Field
The application relates to the technical field of terminals, in particular to a model training method and computing equipment.
Background
For model training, the data scarcity can lead to poor results of model training. In order to improve the model training effect, data enhancement can be performed on the known data, so that more training data can be obtained. However, noise is introduced in the data enhancement process, so that data has deviation, and the result index of model training is poor.
Disclosure of Invention
The embodiment of the application discloses a model training method and computing equipment, which can improve the model training effect.
In a first aspect, the present application provides a model training method, which is applied to a computing device, and includes: the computing equipment inputs n pieces of original data into a first neural network model for training to obtain a pre-training model; the computing equipment performs data enhancement on the n pieces of original data to obtain m pieces of total enhanced data; the computing equipment takes the n pieces of original data and the m pieces of total enhancement data as training data, and trains the pre-training model in Z steps; wherein, for the ith training, the computing device determines an ith enhancement ratio λ (i); the enhancement ratio is the proportion of the enhancement data participating in the training in the step to the m pieces of total enhancement data; the computing equipment determines ith training data based on the lambda (i), and inputs the ith training data into an ith-1 training model for training to obtain an ith training model; the value of i is sequentially from 1 to Z; the lambda (i) is more than or equal to lambda (i-1); and the lambda (i-1) is the i-1 enhancing proportion determined in the training process of the i-1 step, and the n, the m and the Z are positive integers.
Wherein the (i-1) th training model is a first neural network model output by the (i-1) th training step; during the training of the ith step, the ith-1 training model is a pre-training model; and during the Z-th training, the obtained ith training model is a Z-th training model, and the Z-th training model is a model result output by the whole training.
In the embodiment of the application, the training step is divided into a plurality of steps by the computing equipment in the training process, and the enhanced data in the training data in each step is increased, so that the enhanced data can be dynamically adjusted to be fused with the original data, the enhanced data can gradually participate in the training, the model offset caused by data deterioration is reduced, all the enhanced data participate in the training, and the enhanced data are ensured to fully play a role. In addition, the training amount of the enhanced data is gradually increased, the model training effect can be improved, and the result index of the training model can be improved.
In a possible implementation manner, the computing device determines an ith enhancement ratio λ (i), and specifically includes: the computing device determines a function proportion based on the current ith step
Figure 758554DEST_PATH_IMAGE001
(ii) a And determining an i-th enhancement ratio λ (i) = min (1, d) based on the d; said Z and said λ 0 Is a preset value; or, the computing equipment acquires the ith-1 verification result, and based on the ith-1 verification result logit dev And ideal results label dev Determining a gap value k:
Figure 250716DEST_PATH_IMAGE002
the computing device determines the d and determines an i-th enhancement ratio λ (i) = max (1, 1+ k) \8729ad based on the d and the k; wherein the loss pre Representing a loss value between the i-2 th verification result and the ideal result; the loss post Representing a loss value between the i-1 th verification result and the ideal result; the ith-1 verification result is obtained by inputting verification data into the current ith-1 training model; the ith-2 verification result is obtained by inputting the verification data into the current ith-2 training model; the ideal result is a correct output result of the verification data. In this way, the computing device can determine the ith enhancement ratio λ (i), determine the amount of incremental data, and ensure that the increased amount of data is beneficial to the training results. For the first method, with the increase of training steps, the training amount of actually added enhancement data is smaller and smaller, so that the enhancement data in the early training stage can bring substantial influence to the model, and the model in the later training stage tends to be stable, so that the training result is better. For the second method, the ith enhancement ratio is adjusted based on the verification result of the training model in the previous step, and k has two conditions: in one case, loss pre Greater than loss post The k value is positive, the loss value becomes better after the training results of the front and the back steps, and at the moment, the ith enhancement is carried outThe percentage is (1 + k) d, and the incremental data is increased. In another case, loss pre Less than loss post And the k value is negative, the loss value is changed after the previous and next two-step training, at the moment, the ith enhancement ratio is d, and the incremental data are added according to the size of d. In the process, under the condition of good verification effect, incremental data can be improved without increasing, the output effect can be improved, the training accuracy is improved, and the training efficiency is ensured.
Therein, loss pre Is loss of step i-1 post As a result, the computing device has been calculated once and stored, and is used directly in the calculation process described above. Ideal results label dev Is pre-stored data.
In a possible implementation manner, the computing device determines the ith training data based on the λ (i), and specifically includes: the computing device determining ith incremental data based on the λ (i); the ith incremental data is enhancement data of newly added training in the ith training; the computing equipment acquires the ith-1 verification result, and based on the ith-1 verification result logit dev And ideal results label dev Determining a gap value k:
Figure 318772DEST_PATH_IMAGE002
wherein the loss pre Representing a loss value between the i-2 th verification result and the ideal result; the loss post Representing a loss value between the i-1 th verification result and the ideal result; the ith-1 verification result is obtained by inputting verification data into the current ith-1 training model; the ith-2 verification result is obtained by inputting the verification data into the current ith-2 training model; the ideal result is a correct output result of the verification data; the computing device determines an adjustment value Δ = pitch based on the k ori 8729and (1-k) based on the initial scale value batch of the original data ori Initial proportion value batch of enhanced data aug And said k determines the ith training proportion batch ori (i):batch aug (i)=batch ori -∆:batch aug +. An; wherein the batch ori And the batch aug Is a preset value; the ith training proportion is the proportion of the original data and the enhanced data in the training number in the ith training; the computing device determines ith training data based on the ith training scale, the ith incremental data, and the n pieces of raw data. Therefore, the method can ensure that the enhanced data does not excessively act on the neural network training model, avoid the deviation of the model and improve the output effect of the trained first neural network model.
In a possible implementation manner, the determining, by the computing device, ith incremental data based on λ (i) specifically includes: the computing device determines the first m \ 8729; (lambda (i) -lambda (i-1)) pieces of data in the i-1 th data to be added as the i-th incremental data; and the data to be added in the (i-1) th step is the enhancement data which are not involved in training in the first (i-1) step and are waiting to be involved in training. Therefore, the computing equipment can select data to be added into the ith-1 th data to be added according to the ith enhancement proportion to be added into training, the accuracy of the data volume of the added enhancement data is guaranteed, and the training effect can be improved.
Wherein, the i-1 th data to be added can be the i-1 th data sequence or the i-1 st non-added data. The i-1 th non-added data is the enhanced data which does not participate in the training in the previous i-1 th step; the (i-1) th data sequence is the data after the (i-1) th non-added data sequence. When the data to be added in the (i-1) th stage is the data which is not added in the (i-1) th stage, the training steps can be simplified, and the training efficiency of the training model can be improved.
In a possible implementation manner, in the case that the i-1 th evaluation result and the i-1 th non-added data are obtained, the method further includes: the computing equipment carries out sequencing processing on the i-1 th non-added data based on the i-1 th evaluation result to obtain an i-1 th data sequence, wherein the i-1 th evaluation result is a result obtained after the i-1 th non-added data is evaluated through the i-1 st step of training; the i-1 th non-added data is enhancement data which does not participate in training in the previous i-1 steps; the computing device determines the first m (8729) pieces of data to be added in the ith-1 to-be-added data as ith incremental data, and specifically comprises the following steps: the computing device determines the first m 8729; (λ (i) - λ (i-1)) pieces of data in the i-1 th data sequence as the i-th incremental data. Therefore, the computing equipment can sort the data which is not added in the ith-1 in advance, and the incremental data is the data with better training effect, so that the training model can be ensured to have better effect.
In a possible implementation manner, under the condition that the i-1 th evaluation result is not obtained, the i-1 th data to be added is the i-1 th non-added data, and the i-1 th non-added data is enhancement data which does not participate in training in the previous i-1 steps; the computing device determines the first m \ 8729; (lambda (i) -lambda (i-1)) pieces of data in the i-1 th data to be added as the i-th incremental data, and specifically comprises the following steps: the first m \8729; (lambda (i) -lambda (i-1)) pieces of data in the i-1 th non-added data are determined as the i-th incremental data. Therefore, the training steps can be simplified, and the training efficiency of the training model can be improved.
In a possible implementation manner, the computing device performs sorting processing on the i-1 th non-added data based on the i-1 th evaluation result to obtain an i-1 th data sequence, which specifically includes: the computing device determines a first scoring index score1 of each piece of enhancement data based on the text feature data of the i-1 th non-added data; the computing device determines a second score index score2 of each enhanced data in the i-1 th non-added data based on the i-1 th evaluation result; the computing device determines a first score value s based on the first scoring index score1 and the second scoring index score2 1 (ii) a The computing device based on the first score value s 1 And sequencing the (i-1) th non-added data to obtain an (i-1) th data sequence. Therefore, the text features are simpler, correct enhancement data are easier to output, the enhancement data are more preferentially input into the neural network model, the enhancement data are added into the training set from easiness to difficulty, better high-quality data are used in the stage with better early-stage training effect, and the training model is more stable and better in effect.
In one possible implementation, the computing device determines each enhancement based on textual feature data of the i-1 st un-added dataThe first scoring index score1 of the data specifically includes: the computing equipment determines text characteristic data of each enhanced data in the i-1 th non-added data and determines a first score index score1 as weighted sum of parameters of the text characteristic data; the parameters of the text characteristic data comprise one or more of text length l, entity number c, text clause number s and edit distance; the computing equipment determines a second scoring index score2 of each piece of enhanced data in the i-1 th non-added data based on the i-1 th evaluation result, and specifically comprises the following steps: the computing device determines a loss value and a confidence coefficient of each piece of enhanced data in the i-1 th non-added data based on the i-1 th evaluation result, and determines a second evaluation index score2 as a weighted sum of the loss value and the confidence coefficient; the computing device determines a first score value s based on the first scoring index score1 and the second scoring index score2 1 The method specifically comprises the following steps: the computing equipment determines a first scoring value s of each piece of enhanced data in the i-1 th non-added data 1 A weighted sum of the first score index score1 and the second score index score2; the computing device based on the first score value s 1 Sequencing the i-1 th non-added data to obtain an i-1 th data sequence, which specifically comprises: the computing equipment enables the i-1 th non-added data to be according to the first scoring value s 1 Sequencing from small to large to obtain an i-1 data sequence; wherein the original data and the total enhancement data are text data. Therefore, through the process, the remaining untrained enhancement data can be sequenced before training, the text features are simpler, correct enhancement data are easier to output, the more preferential enhancement data are input into the neural network model, the more difficult enhancement data are added into the training set from the easy, better high-quality data are used in the stage with better early-stage training effect, and the more stable training model and better effect are ensured.
Wherein the text length l is the number of character strings of the text; the entity number c represents the number of extracted information; the text clause number s represents the number of sentences into which one text is divided by punctuation marks; the edit distance is the edit distance, which is the distance between two character strings and is the minimum number of edit operations required to change from one to another.
In a possible implementation manner, the determining, by the computing device, a first score index score1 of each piece of enhanced data based on the text feature data of the i-1 th non-added data specifically includes: the computing equipment determines text characteristic data of each enhanced data in the i-1 th non-added data and determines that the first scoring index score1 is normalized sum of all parameters of the text characteristic data; the parameters of the text characteristic data comprise one or more of text length l, entity number c, text clause number s and edit distance; the computing equipment determines a second scoring index score2 of each piece of enhanced data in the i-1 th non-added data based on the i-1 th evaluation result, and specifically comprises the following steps: the computing device determines a loss value and a confidence coefficient of each piece of enhancement data in the i-1 th non-added data based on the i-1 th evaluation result, and determines a second evaluation index score2 as a normalized sum of the loss value and the confidence coefficient; the computing device determines a first score value s based on the first scoring index score1 and the second scoring index score2 1 The method specifically comprises the following steps: the computing equipment determines a first score value s of each piece of enhancement data in the i-1 th non-added data 1 Is the sum of the first score index score1 and the second score index score2; the computing device is based on the first score value s 1 Sequencing the i-1 th non-added data to obtain an i-1 th data sequence, which specifically comprises: the computing equipment enables the i-1 th non-added data to be according to the first scoring value s 1 Sequencing from small to large to obtain an i-1 data sequence; wherein the original data and the total enhancement data are text data. Therefore, through the process, the residual enhancement data which are not trained can be sequenced before training, the text features are simpler, the more accurate enhancement data which are easier to output are input into the neural network model in a priority mode, the more difficult the enhancement data are added into the training set, and early-stage training is ensuredBetter and high-quality data are used in the stage with better exercise effect, and the training model is more stable and better in effect.
In one possible implementation, after the computing device determines the ith incremental data based on the λ (i), the method further includes: under the condition that the ith-1 un-added data is obtained, the computing equipment eliminates the ith incremental data in the ith-1 un-added data, and the residual data is determined as the ith un-added data; the ith added data is enhanced data which do not participate in training in the previous i training steps; the i-1 th non-added data is enhanced data which do not participate in training in the previous i-1 training step; and under the condition of obtaining the ith training model, the computing equipment inputs the ith added data into the ith training model to obtain an ith evaluation result. Therefore, the evaluation result can determine parameters for the next sequencing, and the realizability and the integrity of the sequencing process are ensured, so that the accuracy of the sequencing result is improved, and the training effect of the model is improved.
In a possible implementation manner, the determining, by the computing device, ith training data based on the ith training proportion, the ith incremental data, and the n pieces of raw data specifically includes: under the condition that the ith-1 screened data is acquired, the computing equipment determines the ith incremental data and the ith-1 screened data as ith added data; the ith added data is enhanced data which is already involved in training in the previous i training steps; the i-1 screened data is data for screening the enhanced data which participate in training in the previous i-1 training steps; the computing device determining an ith training data based on the ith added data, the n pieces of raw data, and the ith training proportion; the ratio of the ith added data to the n pieces of original data in the training number is the ith training ratio. In this way, the computing device can ensure that the proportion of the ith training data to the training number is completed based on the ith training proportion, thereby improving the training effect.
In one possible implementation, after the computing device determines the ith incremental data based on the λ (i), the method further includes: under the condition that ith added data is obtained and an ith training model is obtained, the computing equipment inputs the ith added data into the ith training model to obtain an output result of the ith added data, and screens the ith added data based on the output result to obtain ith screened data. Therefore, the computing equipment can screen the trained enhanced data, reject abnormal data, namely screen according to the result after training, continue training the screened data in the next step based on the result, reject abnormal data, guarantee the effect of the model while guaranteeing the screening accuracy, and guarantee that the effect of the enhanced data is exerted to the maximum.
And the ith screened data is data for screening the enhanced data which are already involved in training in the previous i training steps.
In a possible implementation manner, the screening, by the computing device, the ith added data based on the output result to obtain ith screened data specifically includes: the computing device determining a second score value s2 for the ith added data; and the computing equipment determines the enhanced data with the second score value s2 larger than a first threshold value in the ith added data as abnormal data, and eliminates the abnormal data in the ith added data to obtain ith screened data. In this way, the computing device can determine whether each ith added data is abnormal or not through the specific score value, and reject the ith added data under the abnormal condition, so that the ith screened data can improve the effect of training the model in the subsequent step.
In one possible implementation, the computing device determines a second score value s for the ith added data 2 The method specifically comprises the following steps: the computing device determining a third score index score3 for each piece of enhancement data based on the text feature data of the ith added data; the computing device determines a fourth score index score4 of each piece of enhancement data in the ith added data based on an output result of the ith added data; the computing device determines a second score value s based on the third scoring index score3 and the fourth scoring index score4 2 . Therefore, the computing equipment can determine the text characteristic data and the output result of the ith screened data, so that the second score value can be determined, the accuracy of screening can be guaranteed by judging the accuracy of the score value, and the effect of training a model can be improved.
In a second aspect, the present application provides a computing device comprising: one or more processors and one or more memories; the one or more processors are coupled with the one or more memories, the one or more memories for storing computer program code, the computer program code comprising computer instructions that, when executed by the one or more processors, cause the computing device to perform:
inputting n pieces of original data into a first neural network model for training to obtain a pre-training model; performing data enhancement on the n pieces of original data to obtain m pieces of total enhanced data; taking the n pieces of original data and the m pieces of total enhancement data as training data, and training the pre-training model in Z steps; aiming at the ith training, determining the ith enhancement ratio lambda (i); the enhancement ratio is the proportion of the enhancement data participating in the training in the step to the m pieces of total enhancement data; determining ith training data based on the lambda (i), and inputting the ith training data into an ith-1 training model for training to obtain an ith training model; the value of i is sequentially from 1 to Z; the lambda (i) is not less than lambda (i-1); and the lambda (i-1) is the i-1 enhancing ratio determined in the training process of the i-1 step, and the n, the m and the Z are positive integers.
Wherein the (i-1) th training model is a first neural network model output by the (i-1) th training step; in the training of the ith step, the (i-1) th training model is a pre-training model; and during the Z-th training, the obtained ith training model is a Z-th training model, and the Z-th training model is a model result output by the whole training.
In the embodiment of the application, the training step is divided into a plurality of steps by the computing equipment in the training process, and the enhanced data in the training data in each step is increased, so that the enhanced data can be dynamically adjusted to be fused with the original data, the enhanced data can gradually participate in the training, the model offset caused by data variation is reduced, all the enhanced data participate in the training, and the enhanced data can be ensured to fully play a role. In addition, the training amount of the enhanced data is gradually increased, the model training effect can be improved, and the result index of the training model can be improved.
In one possible implementation, the computing device determines an ith enhancement ratio λ (i) and specifically performs: determining a function proportion based on the current ith step
Figure DEST_PATH_IMAGE003
(ii) a And determining an i-th enhancement ratio λ (i) = min (1, d) based on the d; said Z and said λ 0 Is a preset value; or, under the condition that the ith-1 verification result is acquired, based on the ith-1 verification result logit dev And ideal results label dev Determining a gap value k:
Figure 125054DEST_PATH_IMAGE004
determining the d, and determining an i-th enhancement ratio lambda (i) = max (1, 1+ k) \8729dbased on the d and the k; wherein the loss pre Representing a loss value between the i-2 th verification result and the ideal result; the loss post Representing a loss value between the i-1 th verification result and the ideal result; the (i-1) th verification result is obtained by inputting verification data into the current (i-1) th training model; the ith-2 verification result is obtained by inputting the verification data into the current ith-2 training model; the ideal result is a correct output result of the verification data. In this way, the computing device can determine the ith enhancement ratio λ (i), determine the amount of incremental data, and ensure that the increased amount of data is beneficial to the training results. For the first method, with the increase of training steps, the training amount of actually added enhancement data is smaller and smaller, so that the enhancement data in the early training stage can bring substantial influence to the model, and the model in the later training stage tends to be stable, so that the training result is better. For the second method, based on the experiment of the training model of the previous stepThe ith enhancement ratio is adjusted according to the result, and k has positive and negative conditions: in one case, loss pre Greater than loss post And the k value is positive, the loss value becomes better through two training steps, at the moment, the ith enhancement ratio is (1 + k) × d, and the incremental data become larger. In another case, loss pre Less than loss post And the k value is negative, the loss value is changed through two-step training, at the moment, the ith enhancement ratio is d, and incremental data are added according to the size of d. In the process, under the condition of good verification effect, incremental data can be improved without increasing, the output effect can be improved, the training accuracy is improved, and the training efficiency is ensured.
Among them, loss pre Is loss of step i-1 post As a result, the computing device has been calculated once and stored, and is used directly in the calculation process described above. Ideal results label dev Is pre-stored data.
In a possible implementation, the computing device determines, based on the λ (i), an ith training data, specifically performing: determining ith incremental data based on the λ (i); the ith increment data is enhancement data of newly added training in the ith training; under the condition of acquiring the (i-1) th verification result, based on the (i-1) th verification result logit dev And ideal results label dev Determining a gap value k:
Figure 96421DEST_PATH_IMAGE002
wherein said loss pre Representing a loss value between the i-2 th verification result and the ideal result; loss as post Representing a loss value between the i-1 th verification result and the ideal result; the (i-1) th verification result is obtained by inputting verification data into the current (i-1) th training model; the (i-2) th verification result is obtained by inputting the verification data into the current (i-2) th training model; the ideal result is a correct output result of the verification data; determining an adjustment value Δ = batch based on the k ori 8729and (1-k) based on the initial scale value batch of the original data ori Initial proportion value batch of enhanced data aug And k determines the ith training proportion batch ori (i):batch aug (i)=batch ori -∆:batch aug +. An; wherein the batch ori And the batch aug Is a preset value; the ith training proportion is the proportion of the original data and the enhanced data in the training number in the ith training; the computing device determines ith training data based on the ith training scale, the ith incremental data, and the n pieces of raw data. Therefore, the method can ensure that the enhanced data does not excessively act on the neural network training model, avoid the deviation of the model and improve the output effect of the trained first neural network model.
In a possible implementation, the computing device determines, based on the λ (i), ith incremental data, and specifically performs: determining the first m (8729) pieces of data to be added in the ith-1 to be added as ith incremental data; and the data to be added in the (i-1) th step is the enhanced data which is not involved in training in the first (i-1) step and is waiting to be involved in training. Therefore, the computing equipment can select data to be added into the ith-1 th data to be added according to the ith enhancement proportion to be added into training, the accuracy of the data volume of the added enhancement data is guaranteed, and the training effect can be improved.
Wherein, the i-1 th data to be added can be the i-1 th data sequence or the i-1 st non-added data. The i-1 st non-added data is the enhanced data which does not participate in the training in the previous i-1 steps; the (i-1) th data sequence is the data after the (i-1) th data sequence without adding data. When the data to be added in the (i-1) th stage is the data which is not added in the (i-1) th stage, the training steps can be simplified, and the training efficiency of the training model can be improved.
In a possible implementation manner, in the case that the i-1 th evaluation result and the i-1 th non-added data are obtained, the computing device further performs: sequencing the i-1 th non-added data based on the i-1 th evaluation result to obtain an i-1 th data sequence, wherein the i-1 th evaluation result is a result obtained after the i-1 th non-added data is evaluated through the i-1 st training; the i-1 th non-added data is enhancement data which does not participate in training in the previous i-1 steps; the computing device determines the first m \ 8729; (lambda (i) -lambda (i-1)) pieces of data in the i-1 th data to be added as the ith incremental data, and specifically executes the following steps: the first m \8729; (λ (i) - λ (i-1)) pieces of data in the i-1 th data sequence are determined as the i-th incremental data. Therefore, the computing equipment can sort the data which are not added in the ith-1 in advance, and the incremental data are the data with better training effect, so that the training model can be ensured to have better effect.
In a possible implementation manner, under the condition that the i-1 th evaluation result is not obtained, the i-1 th data to be added is the i-1 th non-added data, and the i-1 th non-added data is enhancement data which does not participate in training in the previous i-1 steps; the computing device determines the first m (8729) of the i-1 th data to be added as the ith incremental data, and specifically executes the following steps: the first m \8729; (lambda (i) -lambda (i-1)) pieces of data in the i-1 th non-added data are determined as the i-th incremental data. Therefore, the training steps can be simplified, and the training efficiency of the training model can be improved.
In a possible implementation manner, the computing device performs sorting processing on the i-1 th non-added data based on the i-1 th evaluation result to obtain an i-1 th data sequence, and specifically performs: determining a first scoring index score1 of each piece of enhanced data based on the text feature data of the i-1 th non-added data; determining a second score index score2 of each enhanced data in the i-1 th non-added data based on the i-1 th evaluation result; determining a first score value s based on the first and second scoring index score1, score2 1 (ii) a Based on the first score value s 1 And sequencing the (i-1) th non-added data to obtain an (i-1) th data sequence. Therefore, the text features are simpler, correct enhancement data are easier to output, the enhancement data are more preferentially input into the neural network model, the enhancement data are added into the training set from easiness to difficulty, better high-quality data are used in the stage with better early-stage training effect, and the training model is more stable and better in effect.
In one possible implementation, the computing device bases the text feature number of the i-1 th non-added dataAccording to the first scoring index score1 of each piece of enhanced data, the following steps are carried out: determining text characteristic data of each enhanced data in the i-1 th non-added data, and determining a first score index score1 as weighted sum of parameters of the text characteristic data; the parameters of the text characteristic data comprise one or more of text length l, entity number c, text clause number s and edit distance; the computing equipment determines a second scoring index score2 of each piece of enhanced data in the i-1 th non-added data based on the i-1 st evaluation result, and specifically executes the following steps: determining a loss value and a confidence coefficient of each piece of enhanced data in the i-1 th non-added data based on the i-1 th evaluation result, and determining a second evaluation index score2 as a weighted sum of the loss value and the confidence coefficient; the computing device determines a first score value s based on the first scoring index score1 and the second scoring index score2 1 Specifically, the following steps are executed: determining a first scoring value s of each piece of enhanced data in the i-1 th non-added data 1 A weighted sum of the first score index score1 and the second score index score2; the computing device is based on the first score value s 1 Sequencing the (i-1) th data which are not added to obtain an (i-1) th data sequence, and specifically executing the following steps: adding the i-1 th non-added data according to the first scoring value s 1 Sequencing from small to large to obtain an i-1 data sequence; wherein the original data and the total enhancement data are text data. Therefore, through the process, the residual enhancement data which are not trained can be sequenced before training, the text features are simpler, the correct enhancement data are easier to output, the more preferential input of the enhancement data into the neural network model is realized, the enhancement data are added into the training set from easy to difficult, the better stage of the early-stage training effect is ensured to use the better and high-quality data, and the training model is ensured to be more stable and the effect is better.
Wherein the text length l is the number of character strings of the text; the entity number c represents the number of extracted information; the text clause number s represents the number of sentences into which one text is divided by punctuation marks; the edit distance is the edit distance, which is the distance between two character strings and is the minimum number of edit operations required to change from one to another.
In a possible implementation manner, the computing device determines, based on the text feature data of the i-1 th non-added data, a first score index score1 of each piece of enhancement data, and specifically performs: determining text characteristic data of each enhanced data in the i-1 th non-added data, and determining the first score index score1 as normalized summation of each parameter of the text characteristic data; the parameters of the text characteristic data comprise one or more of text length l, entity number c, text clause number s and edit distance; the computing equipment determines a second scoring index score2 of each piece of enhanced data in the i-1 th non-added data based on the i-1 st evaluation result, and specifically executes the following steps: determining a loss value and a confidence coefficient of each piece of enhanced data in the i-1 th non-added data based on the i-1 th evaluation result, and determining a second evaluation index score2 as a normalized sum of the loss value and the confidence coefficient; the computing device determines a first score value s based on the first scoring index score1 and the second scoring index score2 1 Specifically, the following steps are carried out: determining a first score value s of each piece of enhanced data in the i-1 th non-added data 1 Is the sum of the first score index score1 and the second score index score2; the computing device is based on the first score value s 1 Sequencing the (i-1) th data which are not added to obtain an (i-1) th data sequence, and specifically executing the following steps: adding the i-1 th non-added data according to the first scoring value s 1 Sequencing from small to large to obtain an i-1 data sequence; wherein the original data and the total enhancement data are text data. Therefore, through the process, the remaining untrained enhancement data can be sequenced before training, the text features are simpler, correct enhancement data are easier to output, the more preferential enhancement data are input into the neural network model, the more difficult enhancement data are added into the training set from the easy, better high-quality data are used in the stage with better early-stage training effect, and the more stable training model and better effect are ensured.
In one possible implementation, after the computing device determines the ith incremental data based on the λ (i), the computing device further performs: under the condition that the i-1 th non-added data is obtained, the computing equipment eliminates the i-th incremental data in the i-1 th non-added data, and the residual data are determined as the i-th non-added data; the ith is added data which is enhanced data which does not participate in training in the previous i training steps; the i-1 th non-added data is enhanced data which do not participate in training in the previous i-1 training step; and under the condition of obtaining the ith training model, inputting the ith added data into the ith training model to obtain an ith evaluation result. Therefore, the evaluation result can determine parameters for the next sequencing, and the realizability and the integrity of the sequencing process are ensured, so that the accuracy of the sequencing result is improved, and the training effect of the model is improved.
In a possible implementation manner, the computing device determines ith training data based on the ith training proportion, the ith incremental data, and the n pieces of raw data, and specifically performs: under the condition of acquiring the ith-1 screened data, determining the ith incremental data and the ith-1 screened data as ith added data; the ith added data is enhanced data which is already involved in training in the previous i training steps; the i-1 screened data is data for screening the enhanced data which participate in training in the previous i-1 training steps; determining ith training data based on the ith added data, the n pieces of original data and the ith training proportion; the ratio of the ith added data to the n pieces of original data in the training number is the ith training ratio. In this way, the computing device can ensure that the proportion of the ith training data on the training number is completed based on the ith training proportion, thereby improving the training effect.
In one possible implementation, after the computing device determines the ith incremental data based on the λ (i), the computing device further performs: and under the condition that ith added data is obtained and an ith training model is obtained, inputting the ith added data into the ith training model to obtain an output result of the ith added data, and screening the ith added data based on the output result to obtain ith screened data. Therefore, the computing equipment can screen the trained enhanced data, reject the abnormal data, namely screen according to the result after the training, continue training the screened data in the next step based on the result, reject the abnormal data, ensure the screening accuracy, ensure the effect of the model and ensure the effect of the enhanced data to be exerted to the maximum.
In a possible implementation manner, the computing device filters the ith added data based on the output result to obtain ith filtered data, and specifically performs: determining a second score value s2 of the ith added data; and determining the enhanced data with the second score value s2 larger than a first threshold value in the ith added data as abnormal data, and removing the abnormal data in the ith added data to obtain the ith screened data. In this way, the computing device can determine whether each ith added data is abnormal through the specific score value, and reject the ith added data in the abnormal condition, so that the ith screened data can be ensured to improve the effect of training the model in the following steps.
In one possible implementation, the computing device determines a second score value s for the ith added data 2 Specifically, the following steps are carried out: determining a third score index score3 of each piece of enhanced data based on the text feature data of the ith added data; determining a fourth score index score4 of each piece of enhancement data in the ith added data based on an output result of the ith added data; determining a second score value s based on the third scoring index score3 and the fourth scoring index score4 2 . Therefore, the computing equipment can determine the text characteristic data and the output result of the ith screened data, so that the second score value can be determined, the accuracy of screening can be ensured by judging the accuracy of the score value, and the effect of training a model can be improved.
In a third aspect, the present application provides a computing device comprising: one or more functional modules. One or more functional modules are used for executing the model training method in any possible implementation manner of any one of the above aspects.
In a fourth aspect, an embodiment of the present application provides a computer storage medium, which includes computer instructions that, when executed on a computing device, cause a communication apparatus to perform a model training method in any one of the possible implementations of any one of the above aspects.
In a fifth aspect, the present application provides a computer program product, which when executed on a computer, causes the computer to execute the model training method in any one of the possible implementation manners of the foregoing aspect.
Drawings
FIG. 1 is a schematic structural diagram of a training verification and use process of a neural network model provided in an embodiment of the present application;
FIG. 2 is a schematic structural diagram of a method for training a model according to an embodiment of the present application;
FIG. 3 is a diagram illustrating a function between an ith enhancement ratio and a training step i according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of another method for training a model according to an embodiment of the present application;
fig. 5 is a schematic hardware structure diagram of a computing device 100 according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and exhaustively described below with reference to the accompanying drawings. Wherein in the description of the embodiments of the present application, "/" indicates an inclusive meaning, for example, a/B may indicate a or B; the "and/or" in the text is only an association relation describing the association object, and indicates that three relations may exist, for example, a and/or B may indicate: three cases of a alone, a and B both, and B alone exist, and in addition, "a plurality" means two or more than two in the description of the embodiments of the present application.
In the following, the terms "first", "second" are used for descriptive purposes only and are not to be understood as implying or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature, and in the description of embodiments of this application, a "plurality" means two or more unless indicated otherwise.
1. Natural Language Processing (NLP).
NLP is an important direction in the field of research on computer science and artificial intelligence, and the field researches how to process natural language and builds a communication bridge between a machine language and a human language to realize man-machine interaction. Natural language is a speech and character system used by human communication, and the subject under study of NLP is mainly the character system (i.e., "words").
The NLP may include Natural Language Understanding (NLU) and Natural Language Generation (NLG). The NLU is a general term of all knowledge machine understanding text internal action method models or tasks. NLUs may include word segmentation, part-of-speech tagging, syntactic analysis, text classification/clustering, information extraction, and so forth. NLG is a software process that automatically converts structured data into human-readable text. The NLG may include content determination, text structure building, sentence aggregation, grammar, reference expression generation, and language implementation, among other things.
In the NLP processing process, lexical analysis, syntactic analysis and semantic analysis are required. The lexical analysis may include word segmentation and part-of-speech tagging, i.e., each word is assigned to a category after the text is divided into individual words. Categories may be nouns, verbs, adjectives, and so on. The syntactic analysis may be a syntactic structure analysis process in units of sentences. Semantic analysis is the understanding of the true semantics of sentence expression.
At present, NLP is widely used in the fields of speech recognition, text generation, information extraction, text classification, information recommendation, and the like. And the implementation of NLP often adopts deep learning means, such as convolutional neural network CNN, cyclic neural network RNN, long-short term memory LSTM, and so on.
2. Concepts in a neural network model training process.
In the deep learning process, model training needs to be performed in advance, and the training process is a process of adjusting the network structure and network parameters of the neural network model. For example, the training text data is output to the neural network model, and a loss value exists between an output result corresponding to the preset text data and a correct result. And adjusting the training model according to the direction of the loss value becoming smaller, continuously iterating, and converging the model when the loss value is not reduced any more.
Illustratively, the training method of the neural network model may be a gradient descent method. The gradient descent method can carry out back propagation on a cost function (cost) in the neural network, and through continuous iteration, the weight parameter theta is updated, so that the lowest point of the loss function is found. The weight parameters of the neural network model can be continuously adjusted in the training process of the neural network model, so that the model is converged. If the weight parameter of the current neural network model training is theta, the weight parameter after iteration is theta = theta-eta 8711 θ J (θ), where η is the step size (learning rate) that determines the length of each step that advances along the negative direction of the gradient during the gradient descent. \8711Themeans that the gradient is,
Figure 493905DEST_PATH_IMAGE005
. J (θ) is a loss function.
In order to be able to evaluate the quality of the neural network model, the degree of fit is measured by using a loss function. The minimization of the loss function means the best fitting degree, and the corresponding model parameters are the optimal parameters. In the training process of the neural network model, based on the result of the loss function, the loss value is used for measuring the training condition of the neural network model. The loss function can determine the performance of the model by comparing the predicted output result and the ideal output result of the neural network model, thereby seeking the optimization direction of the model. If the difference between the predicted output result and the ideal output result is large, the loss value is large; conversely, the deviation is small and the loss value will be small. For different neural network models, different loss functions may be selected to calculate the loss values. For the classification problem or the label problem, the loss value may be calculated using square loss (square loss), exponential loss (exponential loss), hinge loss (hinge loss), negative log likelihood loss (NLL loss), cross entropy loss (cross entropy loss), KL divergence loss (KL divergence loss), cosine similarity loss (cosine embedding loss), and the like, and the loss function is not limited in the present application.
Batch and round epoch are important concepts in neural network training of large-scale data.
In the training process, a batch of samples used in the iteration is called a batch, the number of samples is batch _ size, and the process of training once by using the batch _ size is called an iteration. The batch size is a hyper-parameter that defines the number of samples to be processed before updating the internal model parameters. The loss function of one time parameter update in the deep learning is not obtained by one sample, but is obtained by data weighting of batch. One pass epoch is trained once using all samples in the training set. Colloquially, the value of epoch is the number of times the entire training data set is used over and over. The epoch number is a hyper-parameter that defines the number of times the learning algorithm works through the training dataset, an epoch meaning that every sample in the training dataset has the opportunity to update the internal model parameters. An epoch may consist of one or more batchs.
For example, in the process of batch training, assuming that there are r data, and the size of each batch is batch _ size, there are r/batch _ size +1 batches, which is called an epoch. Each time an epoch training is performed. After an epoch training, the training data can be shuffled and ordered for the next epoch training.
3. Data Augmentation (DA).
Data enhancement (also referred to as data augmentation) can alleviate scenes with insufficient data in deep learning, is widely used in the image field firstly, further extends to the NLP field, and achieves effects on many tasks. One major direction is to increase the diversity of the training data, thereby improving the model generalization capability. Data enhancement refers to a method of increasing the amount of data by adding a slight change to existing data or newly creating synthetic data from existing data. I.e. the original data is known, minor modifications can be made to the original data, so that synthetic enhanced data can be created.
In the NLP field, the data enhancement can be varied at the time, as specified below:
1. and performing data enhancement based on the paraphrase.
Given the raw data, the computing device may take all of the alternative synonyms for the raw data and select r for substitution, after which the enhanced data is formed. Or segmenting the original text data to obtain a plurality of word segments, and replacing the words of the segments, thereby forming the enhanced data. Enhanced data may also be generated by machine translating the original text data and back translating.
2. Data enhancement is performed based on specific rules.
The raw text data may be subjected to different processing tasks and in the face of such specific different tasks, corresponding information may be extracted, e.g. data tags, data formats, etc.
The computing device may determine the type of the original text data, select different augmentation rules according to the type, and process the original text data according to the corresponding augmentation rules to obtain the enhanced data. For example, when the type of the original text data is an express short message, the key information in the original text data can be extracted according to a specific text sorting rule of the express short message, and the extracted key information is recombined according to a preset text sorting rule to obtain an enhanced text.
It should be noted that the method for enhancing data described above may also include other methods, and the present application is not limited thereto.
4. And editing the distance.
The editing distance refers to the minimum number of editing operations required for converting one character string into another character string, and if the distance is larger, the two character strings are different. The editing operation comprises three operations of inserting, deleting and replacing. Of two character strings a and bThe edit Distance between Levenshtein Distance can be expressed as Lev a,b (| a |, | b |). Where | a | and | b | correspond to the lengths of a and b, respectively (which may refer to the length of a piece of text data in this application). Then the Levenshtein Distance, i.e. Lev, of the two strings a, b a,b (| a |, | b |), mathematically described as:
Figure 52187DEST_PATH_IMAGE006
wherein, lev a,b (q, p) refers to the distance between the first q characters in a and the first p characters in b. q and p can be considered as the length of a and b. The first character index of the character string here starts from 1 (actually, since 0 needs to be added before the character string at the time of operation), the last edit distance is the distance when q = | a |, p = | b |: lev a,b (|a|,|b|)。
When min (q, p) =0, corresponding to the first q characters in the character string a and the first p characters in the character string b, at this time, q, p has a value of 0, which means that one of the character strings a and b is an empty string, only max (q, p) single character editing operations are needed to switch from a to b, so the editing distance between them is max (q, p), i.e. the largest of q, p.
When min (q, p) ≠ 0, lev a,b (| a |, | b |) there is a minimum of three cases:
1.Lev a,b (q-1, p) +1, representing deletion a i ;2.Lev a,b (q, p-1) +1 represents the insertion b j ;3.Lev a,b (q-1,p-1)+1(a q ≠b p ) Represents replacement b p 。1(a q ≠b p ) Is an exponential function, expressed in a q =b p When the time is 0; at a q ≠b p At this time, the value is 1.
Fig. 1 is a schematic structural diagram of a training verification and use process of a neural network model disclosed in an embodiment of the present application. As shown in fig. 1, a neural network model generally requires three stages of training, validation and use. In the training stage, training data needs to be input into a training neural network model, and network parameters and/or structures are adjusted, so that the output training output result is converged, and the training can be completed. And entering a verification stage after the training is finished. The verification data is required to be input into the verification neural network model to obtain a verification output result, and the current neural network model obtained through training can be determined to enter a use stage under the condition that the verification output result meets a convergence condition; otherwise, further training is required based on the validation results. In the use stage, the use data can be input into the use neural network model, and the use output result is obtained. Wherein the training data, the validation data, and the usage data are different data. The computing device may leave a portion of the training data untrained for use as validation data.
In the process of training the neural network model, the related technical problems are generally solved through the neural network mode in the NLP field, the neural network model needs to be trained in advance, a large amount of training data needs to be input into the neural network model in the process of training the model, the training model is trained, and the neural network model is converged through iteration for a plurality of times and network parameters and/or structures are adjusted, so that the training is completed. Before training, training data (e.g., text data) input by a training model has strong privacy for a user, and a process of collecting the training data needs to be purchased or collected according to a rule, so that the training data is scarce, the scarcity of the training data causes a large deterioration of the trained neural network model, and an output result is not ideal.
In order to solve the above problem, text data may be augmented, that is, original data may be enhanced to obtain enhanced data. The raw data and the enhanced data are trained as training data. The enhanced data is obtained by adding extra words or some semantic information into the original data, thereby bringing noise. Noise can lead to poor result indexes of model training and data deviation. For the noise problem brought by the enhanced data, the enhanced data can be filtered in advance, however, the filtering of the enhanced data can make the data not fully utilized, and the enhanced data is limited to play the whole role. And the deviation of the filter criteria also causes a deviation of the data.
In the training process, text data is required to be used as training data of the neural network model, and under the condition that the collected text data is limited, data augmentation needs to be performed on the basis of the existing text to obtain an enhanced text. The enhanced text data may have noise, which affects the training result and the enhanced text has deviation; screening the enhanced text data may limit the effect of the enhanced text.
In view of the above problems, an embodiment of the present application discloses a model training method, in which a computing device may divide training into two stages in a training process with raw data and enhanced data. In the first stage, the computing device may input raw data into a first neural network model for training, and adjust the network model to obtain a pre-training model. In the second stage, the training may be divided into multiple steps, the enhancement data is dynamically adjusted to determine the enhancement data involved in the training in each step, and the dynamically adjusted enhancement data is input into the model according to the steps until all the enhancement data are trained. The dynamic adjustment enhanced data and the original data are fused and input into the model for training, so that the model index can be guaranteed to be trained even if noise exists in the enhanced data, the training effect is good, and the problem of enhanced data deviation can be relieved. In addition, in each training process, the enhanced data can be evaluated, and abnormal evaluation data can be screened out, so that the situation that partial abnormal results in the next training enhanced data are removed can be ensured, and the dynamic filtering ensures that the effect of the training model is better.
Fig. 2 is a schematic structural diagram of a method for training a model according to an embodiment of the present application. As shown in fig. 2, in the embodiment of the present application, the training of the first neural network model may be divided into two stages, i.e., a first stage training and a second stage training. In the first stage training process, training is carried out through original data, in the second stage training process, the original data and the enhanced data are dynamically fused and screened, and training is carried out in steps. And adding the enhanced data in batches according to the training steps until all the enhanced data are trained to obtain the trained model.
In the embodiment of the present application, the training model is a neural network model, and the neural network model may be used for processing text data. For example, a classification model of text content, a text content information extraction model, a semantic understanding model, and the like. The neural network model is also a network model for image processing, and may also be a network model for voice analysis, and the like. The specific function and structure of the neural network model are not limited in this application.
The first stage is as follows: the first neural network model is trained on raw data.
And inputting a data set which is trained by the first neural network model as a training data set. In the present application, the training data set may include raw data and total enhancement data. Wherein, the raw data is the data acquired for training. The number of the raw data is n, and the raw data may be data of text. For example, textual data is collected to a computing device. The raw data is collected with the user's consent. For example, where the raw data is collected by a legal way, the data is collected or the user determines that it can be collected. The total enhancement data is data obtained by performing data enhancement on the original data. The total number of enhancement data is m, and the enhancement data may also be text data.
As shown in fig. 2, during the training process of the first stage training, the computing device may use a training method such as gradient descent to allow the first neural network model to converge. Under the condition of acquiring the original data, the original data is input into the first neural network model for processing to obtain a pre-output result. In the training process, an ideal output result corresponding to each piece of raw data is known, and the ideal output result can be regarded as a correct output result of the first neural network model aiming at the raw data. The computing device may calculate a loss value between the pre-output result and the ideal output result, adjust parameters of the first neural network model, and input the original data again, so that the loss value between the pre-output result and the ideal output result decreases, and thus iterate until the loss value does not decrease any more, determining that the first neural network model converges. For the above loss function, loss value, gradient descent, and the like, reference may be made to the related description of the concept in the neural network model training process, which is not repeated herein.
For example, the first neural network model may classify the input text data and extract the text key information to obtain text labels and key information. For example, one piece of raw data is "you take an MM1234 flight that flies from city a to city B will be 12 at 10 month 1 in 2022: 00 take-off 8230, after the processing of the first neural network model, the obtained pre-output result is that the text label is ' flight ', and the key information is ' departure place: city A; destination: a city B; flight number: MM1234; takeoff time: 10/2022, 1/12: 00". Y pieces of original data can be used as a batch and input into the first neural network model, y pieces of pre-output results are determined, y pieces of ideal output results are known, and the y pieces of pre-output results and the corresponding y pieces of ideal output results can be input into a loss function to be calculated, so that a loss value is obtained. Where y is a positive integer, i.e., batch _ size, e.g., 16, 32, 64, etc. Training all the raw data as an epoch can obtain the loss values of all the batchs, and the loss value of the epoch can be determined as the average value of the loss values of the batchs. And calculating a weight parameter of the next iteration based on the loss value of the epoch, starting iterative training of the next epoch until the loss value of each epoch does not decrease relative to the last (or a plurality of) epochs, and determining that the training of the first stage is finished to obtain a first neural network model after the training of the first stage.
After the first-stage training is completed, the result obtained by the last training may be determined as the result of pre-output, that is, the result of pre-output is the result output by the first neural network model that is not adjusted (training is completed) in the first-stage training.
And a second stage: the first neural network model continues to be trained with the raw data and the enhanced data.
Where the raw data is determined (as determined in the first stage), the computing device may determine the total enhancement data based on the raw data. The total amount of enhancement data may be m pieces. The quantitative ratio of the original data to the total enhancement data is n: and m is selected. n: m may be 1: 1. 1: 5. 1:10 and 1:20, etc., without limiting the range of their ratios. It should be noted that there is no sequence in determining the execution of the total enhancement data and the pre-training, and the execution sequence is not limited in the present application. In addition, in the process of determining the total enhanced data based on the original data, each piece of the total enhanced data is generated according to one piece of the original data, so that the enhanced data and the original data have a corresponding relation. Under the condition that the training step of the second stage is not started, the data quantity of the training process in the total enhanced data is 0; the number of untrained data is m.
In the case that the original data is text data, the original data may be subjected to synonym replacement, or may be subjected to data enhancement by a method such as translation and the like through a data enhancement process in the NLP field, so as to obtain total enhanced data, which may specifically refer to the related description of the data enhancement process, and is not described herein again.
In the above process, if the original data is in short supply, the total amount of the enhanced data is very large, and all the training can ensure the accuracy and the effectiveness of the training result. Too large a ratio of the total enhancement data to the original data may cause problems such as degradation. Therefore, the training in the second stage can be performed in steps, the training data in each step of training can be determined as a part of the enhanced data and the original data, and the enhanced data of the added training is increased as the steps are increased. In each training process, part of the enhanced data needing training and the proportion of the original data and the enhanced data are determined, and then training is carried out.
As shown in FIG. 2, after the training of the first neural network model is completed, the training process of the second phase may begin. The second stage of training may be separately trained by Z-stepping. Wherein Z is a positive integer. The computing device may train by adding a portion of the enhanced data and all of the raw data during each training step, with the data that has not been trained in the total enhanced data of the enhanced data added for each training step. After Z-step training, all the total enhancement data are input into the first neural network model for training.
In the training process of Z steps, when the first neural network model in one step is converged, corresponding output data is determined, and the training of the step is determined to be finished. In the next training process, the training is required to be continued based on the output data after the training of the previous step is completed and the first neural network model. Wherein, in the training of each step, the input data is the output data of the previous step. The output data may include training models and non-added enhancement data, added and filtered enhancement data, validation results, and the like. Taking the training of the ith step as an example, the training data of the training of the ith step is input as the i-1 th data. The i-1 th data may include the i-1 th training model, the i-1 th assessment results, the i-1 th non-added data, and the i-1 th screened data. Optionally, the (i-1) th data can also comprise the (i-1) th verification result.
Wherein the (i-1) th training model is a first neural network model which is trained in the (i-1) th step; the ith-1 evaluation result is the result data after the ith-1 added data is evaluated through the ith-1 training model; the i-1 th added data is the total enhancement data (the enhancement data added to the training set) already participating in the training in the previous i-1 steps; the i-1 th non-added data is the total enhancement data which are not involved in training in the previous i-1 steps; the i-1 th screened data is the enhanced data for evaluation screening of the i-1 th added data in the previous i-1 steps. The output data of the ith training is ith data, and the ith data can comprise an ith training model, an ith evaluation result, ith non-added data and ith screened data. Optionally, the ith data may further include an ith verification result. The ith data can be used for training at step i + 1. i is an integer from 1 to Z.
In the training process of the ith step, the input training data may include pre-training data and total enhancement data in 0 th data, where the pre-training data may include a pre-training model (which may be understood as a 0 th training model, that is, a first neural network model output by the first stage training), at this time, the 0 th enhancement data is the total enhancement data, which indicates that the 0 th non-added data is the total enhancement data and the 0 th filtered data is 0 data, that is, the enhancement data is not currently trained. The input for the first training step may be absent to screen data and evaluate results. For the output of the second stage, the output result is the Z-th training model.
In the embodiment of the present application, the ith training model refers to the first neural network model trained in the ith-1 st step, which is not described in detail.
In the training process of each step, the basic idea is the same, and the training of the ith step is taken as an example and is described in detail below.
As shown in fig. 2, the training module may include a data ordering module, an enhanced data determination module, a model training module, and a result evaluation module. The training module can execute the training process of the ith step, and the execution process of each module is specifically described as follows:
the data sorting module can sort the i-1 th data which is not added according to the i-1 th evaluation result to obtain an i-1 th data sequence.
The data sorting module may receive the i-1 th evaluation result and the i-1 th non-addition. Wherein, the i-1 th non-added data is the total enhancement data (enhancement data not added to the training set) which are not involved in the training in the previous i-1 steps. The (i-1) th evaluation result is the result data after the (i-1) th added data is evaluated through the (i-1) th training model.
Under the condition that the data sorting module receives the i-1 th non-added data and the i-1 th evaluation result output by the i-1 st step of training, the data sorting module can calculate a first score value of each piece of enhancement data in the i-1 th non-added data based on the i-1 th evaluation result, and sort according to the size of the first score value to obtain an i-1 th data sequence. The ith-1 data sequence is data after the ith-1 data sequence is sorted by adding no data, and the ith-1 data sequence is sorted from small to large according to the score value.
Specifically, the data sorting module can determine a first score value based on the text characteristic data of each piece of enhanced data in the i-1 th non-added data and the evaluation result of the corresponding previous model. Specifically, a first scoring index may be determined based on the text feature data, a second scoring index may be determined based on the evaluation result of the previous training model, and then the first scoring index and the second scoring index are summed to obtain a first scoring value.
First, a first score index score1 is determined based on text feature data of each piece of enhancement data in the i-1 th non-added data:
the data sorting module can acquire the text characteristic data of each piece of enhanced data in the i-1 th non-added data. The text characteristic data may comprise (parameters of) the length of the text l, the number of entities c, the number of text clauses s, and enhancement data t aug And original data t ori Distance (t) between text aug ,t ori ) One or more of (a).
Where the number of entities (number of pieces of information) c indicates the number of pieces of information extracted. In a particular training model, the computing device can extract information for each piece of enhancement data and determine useful points of information extracted. For example, the enhancement data is: the courier is characterized in that the courier arrives at a respected client, the pickup code is 1234, and the courier is required to be picked as soon as possible, wherein the extracted information points comprise three information, namely courier, courier post and pickup code 1234. The amount of information it extracts may be different for different texts. The text length l may be the number of character strings of the piece of text. The text clause number s indicates the number of sentences into which one text is divided by punctuation marks. distance (t) aug_i-1 ,t ori_i-1 ) Can be an edit distance, and refers to two character strings t aug_i-1 ,t ori_i-1 The minimum number of editing operations required to switch from one to another. If their distance is greater, it indicates that they are more different. Permitted editing operations include replacing one character with another, inserting one character, deleting one character. The specific edit distance calculation method can be referred to aboveThe description of the edit distance is omitted for brevity.
Optionally, the first score index score1 is a weighted sum of the parameters of the text feature data. The data sorting module may determine, based on the text length l, the entity number c, the text clause number s, and the edit distance (each parameter of the text feature data), that the first scoring index is: score1= α l + β c + γ s + distance (t) aug_i-1 ,t ori_i-1 ). Wherein α, β and γ are weighting coefficients, which are preset values of the computing device, and need to be specifically determined according to training requirements, which is not limited in the present application. Furthermore, t aug_i-1 A text string of one piece of enhancement data in the i-1 th non-added data; t is t ori_i-1 A text string of the original data corresponding to the piece of enhanced data; wherein the piece of enhancement data t aug_i-1 Is the original data t ori_i-1 And data enhancement is carried out, so that the data enhancement and the data enhancement have a corresponding relation. Of course, the first scoring index may be: score1= α l + β c; score1= α l + γ s; score1= α l + distance (t) aug_i-1 ,t ori_i-1 );score1=γ s+ distance(t aug_i-1 ,t ori_i-1 );score1=β c+ distance(t aug_i-1 ,t ori_i-1 );score1=β c+γ s; score1=;score1=α l+β c+ distance(t aug_i-1 ,t ori_i-1 );score1=α l+γ s+ distance(t aug_i-1 ,t ori_i-1 );score1=α l+β c+γ s;score1=β c+γ s+ distance(t aug_i-1 ,t ori_i-1 ) Etc., and the application is not limited thereto.
Optionally, the first score index score1 is a sum of normalized parameters of the text feature data. score1= l/l max + c/c max + s/s ma + distance(t aug_i-1 ,t ori_i-1 )/distance max . Wherein l max 、β max 、γ max 、distance max The value may be a preset value (both positive values), or may be a maximum value among all parameters, and the present application is not limited. l/l max Is a result of normalizing the text length; c/c max Is the result of normalizing the number of entities; s/s max Is the result of normalizing the number of text clauses; distance (t) aug_i-1 ,t or_i-1i )/distance max Is the result of normalizing the edit distance. Of course, the first score index may also be: score1= l/l max + c/c max ;score1=l/l max + s/s ma ;score1= l/l max + distance(t aug_i-1 ,t ori_i-1 )/distance max ;score1= c/c max + s/s ma ;score1=c/c max + distance(t aug_i-1 ,t ori_i-1 )/distance max ;score1=s/s ma + distance(t aug_i-1 ,t ori_i-1 )/distance max ;score1= l/l max + c/c max + s/s ma ;score1= l/l max + c/c max + distance(t aug_i-1 ,t ori_i-1 )/distance max ;score1= l/l max + s/s ma + distance(t aug_i-1 ,t ori_i-1 )/distance max ;score1=c/c max + s/s ma + distance(t aug_i-1 ,t ori_i-1 )/distance max 8230, etc., and the application is not limited herein.
Secondly, determining a second scoring index score2 of each enhanced data in the i-1 th non-added data based on the evaluation result (i-1 st evaluation result) of the training model in the previous step:
after the training of the step i-1 of the computing equipment is completed, each piece of data in the step i-1 of the non-added data has a corresponding evaluation result, namely each piece of enhanced data in the step i-1 of the non-added data in the step i-1 of the data is evaluated through the training model of the step i-1 to obtain an evaluation result logit of the step i-1 aug_i-1 . The evaluation result of the (i-1) th step is the result after the evaluation is carried out on the (i-1) th non-added data through the training of the (i-1) th step. For each item of data in the i-1 th non-added data, the data sorting module can calculate the output data logit of each item of data in the i-1 th non-added data aug_i-1 And ideal results label aug_i-1 Loss value loss (logit) therebetween aug_i-1 ,label aug_i-1 ) And confidence (location) aug_i-1 ,label aug_i-1 ). The data sorting module may then sum the loss value loss and the confidence, to obtain a second evaluation index score2.
The loss (x, y) represents a loss value between x and y, and can represent a difference between x and y, and the calculation of the loss value of the neural network model may be different. confidence (x, y) represents the confidence between x and y, and the confidence is the confidence corresponding to the evaluation result being an ideal result. When x and y are discrete results such as classification results or label results, the confidence level can be calculated by means of soft max. In this embodiment of the application, the computing device may store the ideal result of all the enhancement data, which is not described in detail later.
Optionally, the second evaluation index score2 is a weighted sum of the loss value loss and the confidence. The data ranking module may determine the second evaluation index score2= loss (logit) aug_i-1 ,label aug_i-1 )- μ confidence(logit aug_i-1 ,label aug_i-1 ). Wherein, μ is a weight coefficient, is a value preset by the computing device, and needs to be specifically determined according to the training requirement. Where the weighted sum corresponds to the weighted sum in score1 of the previous step, the method should remain the same.
Optionally, the second evaluation index score2 is a normalized sum of the loss value loss and the confidence. The data ranking module may determine the second evaluation index score2= loss (logit) aug_i-1 ,label aug_i-1 )/loss max - confidence(logit aug_i-1 ,label aug_i-1 )/confidence max . Among them, loss max And confidence max The value may be a preset value (both positive values), or may be a maximum value among all parameters, and the present application is not limited. loss (logit) aug_i-1 ,label aug_i-1 )/ loss max Is the result of normalizing the loss value; confidence (location) aug_i-1 ,label aug_i-1 )/confidence max Is the result of normalizing the confidence. Here the normalized sum is paired with the normalized sum in score1 of the previous stepThe method should remain consistent.
Finally, a first scoring value s is determined based on the first scoring index score1 and the second scoring index score2 1
Determining a first score value based on the first evaluation index score1 and the second evaluation index score2:
optionally, the first score value s 1 For a weighted summation of the first scoring index score1 and the second scoring index score2, the data sorting module may determine a first scoring value s 1 = score1+ ν score2. Wherein ν is a value preset by the computing device, and needs to be specifically determined according to training needs. Where the weighted sum corresponds to the weighted sum in the previous steps score1 and score2, the method should remain consistent.
Optionally, the first score value s 1 For the sum of the first score index score1 and the second score index score2, the data sorting module may determine a first score value s 1 (= score1+ score 2). Here the normalized sum corresponds to the normalized sum in score1, the method should remain consistent. Here the normalized sum corresponds to the normalized sum in the previous steps score1 and score2, and the method should remain consistent.
After determining the first score value of each piece of enhancement data of the i-1 st non-added data, the first score value s can be based on 1 And sequencing the (i-1) th data without addition to obtain an (i-1) th data sequence. Specifically, the data sorting module can enable the i-1 th non-added data to be divided according to a first score value s 1 And sequencing from small to large to obtain the (i-1) th data sequence. I.e. s 1 The smaller the sequence, the more forward the sequence; s 1 The larger the order, the later the order of sorting. The data sorting module may then send the i-1 th data sequence to the proportion determining module, and correspondingly, the data sorting module may receive the i-1 th data sequence from the proportion determining module.
In the above embodiment, the computing device can determine the score value based on the text feature data and the data of the last step, i.e. the level of the score value is related to the characteristics of the enhancement data itself and the quality of the evaluation of the enhancement data. Through foretell process, can be with the reinforcing data of remaining not training sequencing before the training, it is simpler with the text characteristic, the easier correct reinforcing data of output is preferred the input to neural network model more, be about to reinforcing data from easy to difficult add into the training set, guarantee that the better stage of training effect uses better high-quality data earlier stage, guarantee that the training model is more stable and the effect is better.
In the above process, the training step reduces the number of the i-1 th non-added data as i increases, and when i is Z, the number of the Z th enhanced data is 0, that is, no enhanced data is output.
The enhancement data determining module determines the enhancement data input in the training of the ith step and can comprise a proportion determining module and a proportion adjusting module. The enhanced data determination module may determine the ith training data based on the ith-1 data sequence, the ith-1 filtered data, and the raw data. Wherein, the ith training data is the training set of the ith step and can comprise original data and enhanced data.
The proportion determining module is used for determining the proportion (i-th enhancement proportion) of the enhancement data needing to be trained in the current step (i-th training) to all the enhancement data (total enhancement data).
The ratio determining module may first determine an ith enhancement ratio of the training input of the ith step to the total enhancement data in terms of number, and then determine ith incremental data from the ith-1 data sequence based on the ith enhancement ratio. Wherein, the ith enhancement ratio is the proportion of the increment data input by the ith training to the total number of the enhancement data. The ith incremental data is an added part of the enhanced data input to the first neural network model for training in the ith step.
In the embodiment of the application, in the training process from the step 1 to the step Z, the enhancement ratio of the training of the next step is greater than or equal to the enhancement ratio of the training of the previous step. That is, the current step is that i, the value of i is sequentially from 1 to Z, and then λ (i) is greater than or equal to λ (i-1). Wherein, the ith enhancement ratio lambda (i) is the ith enhancement ratio in the training process of the ith-1 step, lambda (i-1) is the ith-1 enhancement ratio determined in the training process of the ith-1 step, and the enhancement ratio is the ratio of the enhancement data participating in the training of the current step to the m total enhancement data.
Based on the above description, two methods for determining the ith enhancement ratio are described below.
The method comprises the following steps: determining an ith enhancement ratio based on a function:
the proportion determining module may calculate the function proportion of the current step
Figure 334264DEST_PATH_IMAGE007
. Wherein, Z and λ 0 May be a set value. Wherein Z is the maximum step of training, when i reaches Z, d =1 can be determined, at this time, the ith enhancement ratio is 1, and all the remaining enhancement data are trained.
In case of determining the function proportion d, the proportion determining module may determine the i-th enhancement proportion as the minimum of the function proportions and 1, i.e. may determine the i-th enhancement proportion of the i-th training step as λ (i) = min (1, d). Where min (1, d) results in a minimum between 1 and d.
Fig. 3 is a schematic diagram of a function between an ith enhancement ratio and a training step i according to an embodiment of the present application. As shown in fig. 3, as i increases, the function ratio d gradually increases, and the function ratio d increases in an unequal ratio, with the earlier-stage increase ratio being larger and the later-stage increase ratio being smaller. i is equal to Z at maximum (in this case, Z = 100), and therefore, if i is larger than Z, λ (i) may be 1 and maintained at the maximum value of 1. The function can be seen that with the increase of the training steps, the training amount of the actually added enhancement data is smaller and smaller, so that the early-stage training enhancement data can bring substantial influence on the model, and the later-stage model tends to be stable, so that the training result is better.
The method 2 comprises the following steps: and dynamically adjusting the ith enhancement ratio based on the verification result of the previous step:
the proportion determining module can obtain the (i-1) th verification result. The verification result of the (i-1) th step is to verify the first neural network model trained in the last step (the (i-1) th step), namely to input verification data into the current (i-1) th training model, and the verification result is a logic dev . Wherein the ideal result of the verification data is label dev . Ideal results label dev To verify the correct output of the data, the computing device may be pre-stored. The duty ratio determination module may determine the duty ratio based on the i-1 th verification result logit dev And ideal results label dev A gap value k between the two can be determined.
Figure 160138DEST_PATH_IMAGE002
Therein, loss pre Represents a loss value, loss, between the i-2 th verification result and the ideal result pre The result is calculated (stored) in the last step and can be directly used; loss post Representing a loss value between the i-1 th verification result and the ideal result; the (i-1) th verification result is obtained by inputting verification data into the current (i-1) th training model; loss post Representing the loss value between the verification result of the (i-1) th verification result and the ideal result of the (i-1) th ideal result; loss post Represents the loss value between the i-2 th verification result and the i-2 th ideal result, generally loss post It can be calculated in the training of the (i-1) th step, and can be directly used for calculating k.
Similarly, the ratio determination module needs to determine the ratio d according to the determination function in the method 1. For a specific determination method, reference may be made to the specific description in method 1 above, which is not repeated.
After k and d are determined, the occupancy determination module may obtain the ith enhanced occupancy as: λ (i) = max (1, 1+ k) \ 8729d. Where max (x, y) results in the maximum of x, y.
In the process of determining the ith enhancement ratio by the method 2, the ith enhancement ratio can be adjusted based on the verification result of the training model in the previous step, and k has positive and negative conditions: in one case, loss pre Greater than loss post And the k value is positive, the loss value becomes better through two training steps, at the moment, the ith enhancement ratio is (1 + k) × d, and the incremental data become larger. In another case, loss pre Less than loss post And the k value is negative, the loss value is deteriorated through two-step training, at the moment, the ith enhancement ratio is d, and incremental data are added according to the size of d. Above-mentioned in-process, under the effectual circumstances of verification, can advance to increase incremental data, when can improving output effect, improve the training degree of accuracy, guarantee training efficiency, training effect not good increase according to d can.
After determining the ith enhanced ratio, the ratio determination module may determine the ith incremental data based on the ith enhanced ratio, the ith-1 data sequence data. Wherein, the ratio determining module knows the ith enhancement ratio lambda (i-1) in the training process of the previous step (the training of the ith-1 step), and 1-lambda (i-1) is the proportion of the (i-1) th non-added data to the total enhancement data. Assuming that the current total enhancement data number is m, it can be determined that the ith incremental data is the first m (8729; (λ (i) - λ (i-1)) pieces of data in the i-1 data sequence, i.e., the ith incremental data is m (8729; (λ (i) - λ (i-1)) pieces. The ratio determination module may then send the ith incremental data to the ratio adjustment module. Correspondingly, the scale adjustment module may receive ith incremental data from the duty determination module. At this time, the i-1 th data to be added is the enhancement data which is not involved in training in the first i-1 step and is waiting to be involved in training, and the i-1 th data to be added is the i-1 th data not to be added.
For example, assume that the training of one step is an epoch, the ith step is the ith epoch, and the training data of the ith epoch includes the original data and the ith incremental data. Assuming that m =10000 total enhancement data, Z is determined to be 20 (30 epochs in total), i-th enhancement duty ratio λ (i) of the i-th epoch is determined to be 35.5%, and i-1-th enhancement duty ratio λ (i-1) of the i-1-th epoch is 30.0%, and the ratio of the input enhancement data to the total data at this time is 35.5% -30.0% =5.5%. The computing device may determine that the number of i-th incremental data is 10000 × 5.5% =550 pieces. I.e., the first 550 pieces in the i-1 th data sequence can be determined as the i-th incremental data. The above description is illustrative and not restrictive.
In addition, the proportion determination module can acquire the screened data of the (i-1) th mode. The proportion determination module can determine the ith non-added data and the ith added data based on the ith-1 filtered data, the ith incremental data and the ith-1 data sequence. The ith non-added data is enhanced data of the ith-1 non-added data (or the ith-1 data sequence) except the ith incremental data, namely the ith incremental data in the ith-1 non-added data is removed, and the residual data is determined as the ith non-added data. The ith added data is the enhancement data of the ith screened data added with the ith increment data, namely the ith increment data and the ith-1 screened data are determined as the ith added data. Thereafter, the duty determination module may send the ith non-added data and the ith added data to the result evaluation module. Correspondingly, the result evaluation module may receive the ith non-added data and the ith added data from the proportion determination module. Wherein, the ith is added data which is the enhanced data which does not participate in the training in the previous i training steps; the i-1 th non-added data is the enhanced data which does not participate in the training in the previous i-1 training step; the ith added data is the enhancement data that has participated in training in the previous i training steps.
Illustratively, 5000 pieces of screened data are known for the i-1 st; 5000 pieces of i-1 data sequence and 550 pieces of i incremental data can be determined, and 4450 pieces of 550 pieces of i incremental data are removed from 5000 pieces of i-1 non-added data. The ith added data is 5000 pieces of the ith-1 filtered data plus 550 pieces of the ith increment data, namely 5550 pieces.
To this end, the proportion determining module can determine the enhanced data participating in the training of the current step, namely the ith incremental data and the ith filtered data. In the above process, the computing device can determine which part of the total enhancement data is involved in the training through the ith enhancement ratio. The size of the ith enhancement ratio is adjustable, which means the adjustment of the quantity of the enhancement data, so that the training effect can be improved, and the training efficiency can be improved.
The scaling module determines a ratio between the enhanced data and the raw data over the amount of data trained.
The scaling module may determine the ith training data after determining the raw data, the ith-1 filtered data, and the ith incremental data. Wherein, the data participating in the ith training comprises the initial data and the ith added data. Wherein the ith added data comprises the ith-1 screened data and the ith incremental data.
Specifically, the scale adjustment module may further receive an i-1 th verification result (specifically, the identifier of the proportion determination module may be referred to, which is not repeated), determine an i-th training scale based on the i-1 th verification result, and then adjust a ratio of the original data and the enhanced data participating in the training based on the i-th training scale to obtain the i-th training data. And the ith training proportion is the proportion of the enhanced data to the original data in the ith training on the data volume. The ith training data is formed training data in which the original data and the enhanced data (ith added data) are distributed in the ith training ratio.
Optionally, the scale adjustment module may pre-store an initial scale batch of the original data and the enhanced data in quantity ori :batch aug . Wherein, batch ori Batch, an initial proportional value of raw data aug To enhance the initial scale value of the data. The scale adjustment module may determine an adjustment value Δ = batch based on k ori 8729and (1-k). And k is a difference value between the verification results of the ith step and the verification results of the (i-1) th step, and the calculation process can refer to the related description in the proportion determining module, specifically can delay the calculation results, and is not repeated. Then, the proportion adjustment module may adjust the base based on the adjustment value ori And batch aug The ith training proportion of the ith step training can be determined to be batch ori (i):batch aug (i)=batch ori -∆:batch aug Δ. Namely batch ori (i)=batch ori -∆;batch aug (i)=batch aug +∆。batch ori And batch aug Is a preset value; the ith training ratio is the ratio of the original data and the enhanced data in the ith training in the training number.
Optionally, the scale adjustment module may also scale the ith training scale to batch ori (i):batch aug (i) Determined directly as the initial proportion batch ori :batch aug . At the moment, the training proportion does not need to be adjusted, the training process can be simplified, and the training efficiency is improved.
In the case of determining the ith training proportion, the computing device may determine the ith training data based on the ith training proportion, the raw data, and the ith added data. The ith training data can reflect the training proportion of the original data and the enhanced data, that is, the proportion of the ith added data to the n pieces of original data in the training quantity is the ith training proportion, and the training quantity proportion of the original data and the enhanced data can be determined to be a proper size, for example, about 1. Therefore, the method can ensure that the enhanced data do not excessively act on the neural network training model, avoid the deviation of the model and ensure the output effect of the trained first neural network model.
Illustratively, assume a batch has 32 pieces of data, where the initial ratio of raw data to enhanced data is batch ori :batch aug And (5) = 16. When Δ is calculated to be 2, the batch can be determined aug (i):batch aug (i) =14:18. the number n of the current original data is 1400, the number of the enhanced data is 1200, and 2600 pieces of data need to be trained for one epoch. At this time, 1200 pieces of enhancement data may be repeated, so that the total number becomes 1800 (i.e., 600 pieces of enhancement data are reread), and thus, the ratio of the number of original data to the number of enhancement data may be maintained at 14:18. at this time, the i-th training data is 1400 pieces of original data and 1800 pieces of enhanced data (600 pieces of which have repetitions).
And the model training module inputs the ith training data into an ith-1 training model for training to obtain an ith training model.
Wherein, the first neural network model trained in the ith step is the network model after the training convergence in the ith-1 step. And in the model training process of the ith step, a training set is ith training data, the ith training data is input into the ith-1 training model, and the converged model result, namely the ith training model, is output. In the training process, the parameters of the first neural network model are adjusted according to the direction of reducing the loss value until the loss value is not reduced any more, that is, the training in the ith step is determined to be completed, and the next training (the (i + 1) step) can be carried out. The (i-1) th training model is a model result obtained by the training of the (i-1) th step (the last step); the ith training model is the model result obtained by training in the ith step (this step).
After the model training module obtains the ith training model, the ith training model can be sent to the result evaluation module; correspondingly, the result evaluation module may receive an ith training model from the model training module.
And the result evaluation module evaluates the ith added data based on the ith training model to obtain an output result of the ith added data, and screens the ith added data based on the output result to obtain the ith screened data. In addition, the result evaluation module can also input the ith un-added data into the ith training model to obtain an ith evaluation result, and can also input the verification data into the ith training model to obtain an ith verification result.
Firstly, screening abnormal data in the ith added data to obtain the ith screened data.
Under the condition that the ith added data is obtained and the ith training model is obtained, the result evaluation module can input the ith added data into the ith training model to obtain an output result of the ith added data, and screen the ith added data based on the output result to obtain the ith screened data. The ith screened data is data for screening the enhanced data that has participated in training in the previous i training steps.
The result evaluation module may first determine a third scoring index based on the text feature data of the ith added data. When the model training module completes training and obtains the ith training model, the result evaluation module may input the ith added data to the ith training model to obtain an output result, and score the output result to obtain a fourth scoring index. The result evaluation module may then determine a second score value based on the third scoring index and the fourth scoring index.
First, a third score index score3 is determined based on the text feature data of each piece of enhancement data in the ith added data:
optionally, the third score index score3 is a weighted sum of parameters of the text feature data of each piece of data in the ith added data. The result evaluation module may be based on each piece of dataThe third scoring index may be determined as follows: score3= α l + β c + γ s + distance (t) aug_i ,t ori_i ). Wherein, t aug_i For the data of the ith added data, t ori_i Is the original data corresponding to the ith added data. The specific parameters involved in the calculation process may refer to the description related to the first score indicator in the text sorting module, which is not repeated.
Optionally, the third score index score3 is a sum of normalized parameters of the text feature data. Score2= l/l max + c/c max + s/s ma + distance(t aug_i ,t ori_i )/distance max . Wherein l max 、β max 、γ max 、distance max The value may be a preset value (both positive values), or may be a maximum value among all parameters, and the present application is not limited. l/l max Is the result of normalizing the text length; c/c max Is the result of normalizing the number of entities; s/s max Is the result of normalizing the number of text clauses; distance (t) aug_i ,t ori )/distance max Is the result of normalizing the edit distance.
The specific parameters involved in the calculation process of the two methods may refer to the description related to the first score index in the data sorting module, which is not repeated herein.
Next, a fourth score index score4 is calculated based on the output result of the ith added data:
fourth evaluation index score4= loss (logit) aug_i ,label aug_i )+ μ confidence(logit aug_i ,label aug_i ). Wherein, logit aug_i Is the ith incremental data, label aug_i Is ideal data of the ith incremental data.
Optionally, the fourth evaluation index score4 is a weighted sum of the loss value loss and the confidence. The result evaluation module may determine a fourth evaluation index score4= loss (logit) aug_i ,label aug_i ) - μ confidence(logit aug_i ,label aug_i ). Wherein, μ is a weight coefficient, is a value preset by the computing device, and needs to be specifically determined according to the training requirement. Where the weighted sum corresponds to the weighted sum in the previous step score3, the method should remain consistent. Wherein, each piece of data in the ith added data outputs a result logic aug_i And ideal results label aug_i
Optionally, the fourth evaluation index score4 is a normalized sum of the loss value loss and the confidence. The result evaluation module may determine that the fourth evaluation index score4= loss (logit) aug_i ,label aug_i )/loss max - confidence(logit aug_i ,label aug_i )/confidence max . Here the normalized sum corresponds to the normalized sum in the previous step score3 and the method should remain consistent.
The specific parameters involved in the calculation process of the two methods may refer to the relevant description of the second scoring index in the data sorting module, which is not repeated herein.
Finally, a second score value s is determined based on the third score index score3 and the fourth score index score4 2
Optionally, a second score value s 2 For a weighted summation of the third scoring index score3 and the fourth scoring index score3, the result evaluation module may determine a second score value s 2 (= score3+ ν score 4). Wherein ν is a value preset by the computing device, and needs to be specifically determined according to training needs. Where the weighted sum corresponds to the weighted sum in the previous steps score3 and score4, the method should remain consistent.
Optionally, a second score value s 2 For the sum of the third scoring index score3 and the fourth scoring index score4, the result evaluation module may determine a second scoring value s 2 Score3+ score4. Here the normalized sum corresponds to the normalized sum in the previous step score3 and the method should remain consistent. Here the normalized sum corresponds to the normalized sum in the previous steps score3 and score4 and the method should remain consistent.
The calculation process of the second score value may also refer to the calculation process of the first score value, and specifically, the specific evaluation method may refer to the data sorting module, which is not described in detail.
After the result evaluation module obtains the ith added data, the ith added data can be filtered based on the output result of the ith added data to obtain the ith filtered data.
The result evaluation module may determine abnormal data in the ith added data based on an output result of the ith added data. Specifically, the result evaluation module may add the second score value s to the ith added data 2 And determining the enhanced data larger than the first threshold value as abnormal data, and removing the abnormal data in the ith added data to obtain the ith screened data. The output result of the ith added data is the output result of inputting the ith added data to the ith training model.
Optionally, a second score value s of each piece of data in the ith added data is obtained 2 Thereafter, the result evaluation module can determine anomalous data. At a certain second score value s 2 In case of a second score value significantly larger than the other ith incremental data, this data may be determined as anomalous data. In particular, s 2_n -s value >f (first threshold), s can be determined 2_n The corresponding enhancement data is anomalous data. Wherein s is 2_n For one of X pieces of data of ith incremental data, s value Second score value s for X ith incremental data 2 F is a preset value.
Alternatively, the result evaluation module may determine the first threshold according to the principles of σ, 2 σ and 3 σ in the probability density function of the gaussian distribution and screen the ith added data for anomalous data.
The above two ways are merely exemplary, and the method for determining the abnormal data is not limited in the embodiment of the present application.
After determining the abnormal enhanced data, the abnormal data in the ith added data can be removed to obtain the ith screened data.
Illustratively, the ith added data is 5550 pieces, 2 of which are abnormal data, and the resulting ith screened data is 5548 pieces. Therefore, in the next training process, the problem that the model effect is poor due to abnormal data is avoided, the convergence data of the first neural network model can be optimized, and the influence of the abnormal points on the model is avoided.
And secondly, inputting the ith non-added data into the ith training model to obtain an ith evaluation result.
And the result evaluation module knows the ith training model obtained after the ith training is finished, can input the ith data which is not added into the ith training model, evaluates the data which does not participate in the training, and obtains an output result which is the ith evaluation result. The ith evaluation result is used for carrying out sorting processing on the data sorting module in the next step (step i + 1).
And thirdly, inputting the verification data into the ith training model to obtain an ith verification result.
The computing device stores verification data, and the result evaluation module can input the verification data into the ith training model to obtain an ith verification result. The ith verification result can be used for (step i + 1) to determine incremental data and training data of the next step through the enhanced data determination module.
In the embodiment of the present application, all the enhancement data should have been trained as training set before the step Z training. Illustratively, the training can be determined to be finished after training H steps from all the enhanced data input into the training model as a training set, wherein H is an integer greater than 1 and smaller than Z. Supposing that the enhanced data are gradually increased to the maximum from the step 1 to the step H, all the enhanced data are input into the model for training, and all the enhanced data with the abnormal data removed are used as a training set in the training from the step H +1 to the step Z, so that the abnormal data are dynamically screened during training, and after all the abnormal data are removed, the training is carried out, the effect of the training model is improved, and the indexes of the output model are ensured.
The modules can be used commonly in the training of the steps from 1 to Z, so that the training result of the previous step is memorized, and the data of the previous step, the untrained enhanced data and the like can be directly determined. As the value of i increases from 1 to Z, the computing device may train the first neural network model in each step, through the completion of the Z-th training step.
In addition, of the 4 training modules, the data sorting module, the enhanced data determining module and the result evaluating module are all optional. The three modules may be combined in any combination, and for the methods formed by various combinations, reference may be made to the above description, and details are not described herein again. Further, the enhancement data determination module may include a proportion determination module and/or a scale adjustment module.
In order to provide a training method for more accurately comparing the embodiments of the present application, the training results are compared. The general method is a training method for directly inputting the original data and the enhanced data into a neural network model. The training method is a method for dividing training into two training stages, performing training through original data in the first stage, and training and adding enhanced data in the second stage according to proportion in steps. Aiming at the neural network model of intention inference and information extraction, comparing the results of the two models: in a general training method, the accuracy of the intention estimation result output by the trained neural network model is 95.53%, and the accuracy of information extraction is 97.27%. The accuracy of the intention guess output in the neural network model trained in the application is 96.25%; the accuracy of information extraction was 97.62%. The comparison of the results of the training is found that the result of the neural network model trained by the embodiment of the application is higher in accuracy, and the training method for dividing the enhanced data into steps according to different proportions in different steps in the training process can ensure that the neural network model does not have larger deviation, so that the training result is more stable.
In the above embodiment, the ith incremental data determined by dynamically generating the ith enhancement ratio is added to the training set for training, and the original data and the enhancement data are distributed according to the dynamically changing ith training ratio, so that the index of the training model and the data offset can be improved. The incremental data are added into the training set according to the steps for training, all the enhanced data can be guaranteed to participate in the training process, the reliability and the model effect of model training can be improved by gradually adding the incremental data, the effect of the enhanced data is fully exerted, meanwhile, deviation of the training model is avoided, and the training effect is improved. After the model training of each step is finished, abnormal noise data are screened, the noise problem caused by data enhancement can be solved, and the training index of the training model under the scene of data scarcity is improved.
With reference to the method for training a model shown in fig. 2, fig. 4 is a flowchart illustrating another method for training a model according to an embodiment of the present application. As shown in fig. 4, in the model training method, the computing device includes only the enhanced data determination module, the result evaluation module, and the model training module, i.e., does not include the data sorting module in fig. 2. The following model training method differs from the method in fig. 2, and is described in detail below.
As shown in FIG. 4, in the training of the ith step, the ith-1 data input into the ith step for training comprises the ith-1 non-added data, the ith-1 screened data, the ith-1 verification result and the ith-1 training model. Since the computing device does not need to sort through the data sorting module and no (i-1) th evaluation result is input, the (i-1) th data does not comprise the (i-1) th evaluation result.
As shown in fig. 4, the duty ratio determination module may determine m \8729; (λ (i) - λ (i-1)) pieces of data among the i-1 th non-added data as the i-th incremental data after determining the i-th enhanced duty ratio λ (i). At this time, the input data of the duty determination module replaces the i-1 th data sequence in fig. 2. At the moment, the data to be added in the (i-1) th step is the enhanced data which is not involved in training and is waiting to participate in training in the previous (i-1) th step, and the data to be added in the (i-1) th step is the data not added in the (i-1) th step.
As shown in fig. 4, the result evaluation module may evaluate the ith added data based on the ith training model to obtain an output result of the ith added data, and remove abnormal data in the ith added data based on the output result of the ith added data to obtain the ith screened data. In addition, the result evaluation module can also input the verification data into the ith training model to obtain an ith verification result. At this time, the result evaluation module does not need to input the ith non-added data into the ith training model in fig. 2 to obtain the ith evaluation result.
In the embodiment of the method shown in fig. 4, the ordering process is removed, and the training efficiency of the computing device is calculated and the training method is more efficient and more efficient to execute compared with that shown in fig. 2 while the training effect is ensured.
In this embodiment of the application, the training process may be performed by a computing device, and may be performed by a cloud device, for example, a server. After the training is completed, a first neural network model can be issued to the terminal equipment, so that the use stage is entered.
The following describes an apparatus according to an embodiment of the present application.
Fig. 5 is a schematic hardware structure diagram of a computing device 100 according to an embodiment of the present disclosure.
Computing device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, and a wireless communication module 160.
It is to be understood that the illustrated architecture of the embodiments of the invention does not constitute a specific limitation of the computing device 100. In other embodiments of the present application, computing device 100 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Processor 110 may include one or more processing units, such as: the Processor 110 may include an Application Processor (AP), a modem Processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband Processor, and/or a Neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.
Among other things, the controller can be a neural center and a command center of the computing device 100. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.
A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.
The charging management module 140 is configured to receive charging input from a charger. The charging management module 140 may also provide power to the computing device 100 via the power management module 141 while charging the battery 142. In an embodiment of the present application, the charging management module may include a battery charging module.
The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like.
The wireless communication function of the computing device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.
The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in computing device 100 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas.
The mobile communication module 150 may provide a solution for applications on the computing device 100 that includes wireless communication, such as 2G/3G/4G/5G. The mobile communication module 150 may include at least one filter, a switch, a power Amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 150 may receive the electromagnetic wave from the antenna 1, filter, amplify, etc. the received electromagnetic wave, and transmit the electromagnetic wave to the modem processor for demodulation. The mobile communication module 150 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave.
The modem processor may include a modulator and a demodulator. The modulator is used for modulating a low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.) or displays an image or video through the display screen 194.
The Wireless Communication module 160 may provide solutions for Wireless Communication applied to the computing device 100, including Wireless Local Area Networks (WLANs) (e.g., wireless Fidelity (Wi-Fi) network), bluetooth (BT), global Navigation Satellite System (GNSS), frequency Modulation (FM), near Field Communication (NFC), infrared (IR), and so on. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, perform frequency modulation and amplification on the signal, and convert the signal into electromagnetic waves through the antenna 2 to radiate the electromagnetic waves.
The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the storage capability of the computing device 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music, video, etc. are saved in an external memory card.
The internal memory 121 may be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the computing device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image video playing function, and the like) required by at least one function, and the like. The data storage area may store data created during use of the computing device 100 (e.g., audio data, phone books, etc.), and the like.
As used in the above embodiments, the term "when 8230; may be interpreted to mean" if 8230, "or" after 8230; or "in response to a determination of 8230," or "in response to a detection of 8230," depending on the context. Similarly, the phrase "at the time of determination of \8230;" or "if (a stated condition or event) is detected" may be interpreted to mean "if it is determined 8230;" or "in response to the determination of 8230;" or "upon detection (a stated condition or event)" or "in response to the detection (a stated condition or event)" depending on the context.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), among others.
One of ordinary skill in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the above method embodiments. And the aforementioned storage medium includes: various media capable of storing program codes, such as ROM or RAM, magnetic or optical disks, etc.

Claims (15)

1. A model training method, applied to a computing device, the method comprising:
the computing equipment inputs n pieces of original data into a first neural network model for training to obtain a pre-training model;
the computing equipment performs data enhancement on the n pieces of original data to obtain m pieces of total enhanced data;
the computing equipment takes the n pieces of original data and the m pieces of total enhancement data as training data, and trains the pre-training model in Z steps;
wherein, for the ith training, the computing device determines an ith enhancement ratio λ (i); the enhancement ratio is the proportion of the enhancement data participating in the training in the step to the m pieces of total enhancement data;
the computing equipment determines ith training data based on the lambda (i), and inputs the ith training data into an ith-1 training model for training to obtain an ith training model; the value of i is from 1 to Z in sequence; the lambda (i) is not less than lambda (i-1); and the lambda (i-1) is the i-1 enhancing ratio determined in the training process of the i-1 step, and the n, the m and the Z are positive integers.
2. The method according to claim 1, wherein the computing device determines an ith enhancement ratio λ (i), in particular comprising:
the computing device determines a function scale based on the current ith step
Figure 986595DEST_PATH_IMAGE001
(ii) a And determining an i-th enhancement ratio λ (i) = min (1, d) based on the d; said Z and said λ 0 Is a preset value; or the like, or a combination thereof,
the computing equipment acquires the ith-1 verification result, and based on the ith-1 verification result logit dev And ideal results label dev Determining a gap value k:
Figure 803242DEST_PATH_IMAGE002
the computing device determines the d and determines an i-th enhancement ratio λ (i) = max (1, 1+ k) \8729ad based on the d and the k;
wherein the loss pre Representing a loss value between the i-2 th verification result and the ideal result; the loss post Representing a loss value between the i-1 th verification result and the ideal result; the (i-1) th verification result is obtained by inputting verification data into the current (i-1) th training model; the ith-2 verification result is obtained by inputting the verification data into the current ith-2 training model; the ideal result is a correct output result of the verification data.
3. The method of claim 1, wherein the computing device determines ith training data based on the λ (i), specifically comprising:
the computing device determining ith incremental data based on the λ (i); the ith increment data is enhancement data of newly added training in the ith training;
the computing equipment acquires the verification of the (i-1) thResults logit dev Based on the i-1 th verification result and the ideal result label dev Determining a gap value k:
Figure 95246DEST_PATH_IMAGE003
wherein said loss pre Representing a loss value between the i-2 th verification result and the ideal result; the loss post Representing a loss value between the i-1 th verification result and the ideal result; the ith-1 verification result is obtained by inputting verification data into the current ith-1 training model; the (i-2) th verification result is obtained by inputting the verification data into the current (i-2) th training model; the ideal result is a correct output result of the verification data;
the computing device determines an adjustment value Δ = pitch based on the k ori 8729j (1-k) based on the initial scale value batch of the original data ori Initial proportion value batch of enhanced data aug And said k determines the ith training proportion batch ori (i):batch aug (i)=batch ori -∆:batch aug +. An; wherein the batch ori And the batch aug Is a preset value; the ith training proportion is the proportion of the original data and the enhanced data in the training number in the ith training;
the computing device determines ith training data based on the ith training scale, the ith incremental data, and the n pieces of raw data.
4. The method of claim 3, wherein the computing device determines ith incremental data based on the λ (i), comprising in particular:
the computing device determines the first m (8729) of the i-1 th data to be added as the i-th incremental data; and the data to be added in the (i-1) th step is the enhanced data which is not involved in training in the first (i-1) step and is waiting to be involved in training.
5. The method according to claim 4, wherein in the case of acquiring the i-1 th evaluation result and the i-1 th non-added data, the method further comprises: the computing equipment carries out sequencing processing on the i-1 th non-added data based on the i-1 th evaluation result to obtain an i-1 th data sequence, wherein the i-1 th evaluation result is a result obtained after the i-1 th non-added data is evaluated through the i-1 st step of training; the i-1 th non-added data is enhancement data which does not participate in training in the previous i-1 step;
the computing device determines the first m \ 8729; (lambda (i) -lambda (i-1)) pieces of data in the i-1 th data to be added as the i-th incremental data, and specifically comprises the following steps:
the computing device determines the first m 8729; (λ (i) - λ (i-1)) pieces of data in the i-1 th data sequence as the i-th incremental data.
6. The method according to claim 5, wherein the computing device performs sorting processing on the i-1 th non-added data based on the i-1 th evaluation result to obtain an i-1 th data sequence, specifically comprising:
the computing device determines a first scoring index score1 of each piece of enhancement data based on the text feature data of the i-1 th non-added data;
the computing device determining a second score index score2 for each enhanced data in the i-1 th non-added data based on the i-1 th assessment result;
the computing device determines a first score value s based on the first scoring index score1 and the second scoring index score2 1
The computing device is based on the first score value s 1 And sequencing the (i-1) th data without addition to obtain an (i-1) th data sequence.
7. The method according to claim 6, wherein the determining, by the computing device, the first score index score1 of each piece of enhancement data based on the text feature data of the i-1 st un-added data comprises:
the computing equipment determines text characteristic data of each piece of enhancement data in the i-1 th non-added data, and determines a first score index score1 as weighted summation of parameters of the text characteristic data; the parameters of the text characteristic data comprise one or more of text length l, entity number c, text clause number s and edit distance;
the computing equipment determines a second scoring index score2 of each piece of enhanced data in the i-1 th non-added data based on the i-1 th evaluation result, and specifically comprises the following steps:
the computing device determines a loss value and a confidence coefficient of each piece of enhancement data in the i-1 th non-added data based on the i-1 th evaluation result, and determines a second evaluation index score2 as a weighted sum of the loss value and the confidence coefficient;
the computing device determines a first score value s based on the first scoring index score1 and the second scoring index score2 1 The method specifically comprises the following steps:
the computing equipment determines a first score value s of each piece of enhancement data in the i-1 th non-added data 1 A weighted sum of the first score index score1 and the second score index score2;
the computing device is based on the first score value s 1 Sequencing the (i-1) th non-added data to obtain an (i-1) th data sequence, which specifically comprises the following steps:
the computing equipment enables the i-1 th non-added data to be according to the first score value s 1 Sequencing from small to large to obtain an i-1 data sequence;
wherein the original data and the total enhancement data are text data.
8. The method according to claim 6, wherein the determining, by the computing device, the first score index score1 of each piece of enhancement data based on the text feature data of the i-1 st un-added data comprises:
the computing equipment determines text characteristic data of each enhanced data in the i-1 th non-added data and determines a first score index score1 as normalized sum of all parameters of the text characteristic data; the parameters of the text characteristic data comprise one or more of text length l, entity number c, text clause number s and edit distance;
the computing equipment determines a second score index score2 of each piece of enhanced data in the i-1 th non-added data based on the i-1 th evaluation result, and specifically comprises the following steps:
the computing device determines a loss value and a confidence coefficient of each piece of enhancement data in the i-1 th non-added data based on the i-1 th evaluation result, and determines a second evaluation index score2 as a normalized sum of the loss value and the confidence coefficient;
the computing device determines a first score value s based on the first scoring index score1 and the second scoring index score2 1 The method specifically comprises the following steps:
the computing equipment determines a first scoring value s of each piece of enhanced data in the i-1 th non-added data 1 Is the sum of the first scoring index score1 and the second scoring index score2;
the computing device based on the first score value s 1 Sequencing the i-1 th non-added data to obtain an i-1 th data sequence, which specifically comprises:
the computing equipment enables the i-1 th non-added data to be according to the first score value s 1 Sequencing from small to large to obtain an i-1 data sequence;
wherein the original data and the total enhancement data are text data.
9. The method of claim 5, wherein after the computing device determines the ith incremental data based on the λ (i), the method further comprises:
under the condition that the ith-1 un-added data is obtained, the computing equipment eliminates the ith incremental data in the ith-1 un-added data, and the residual data is determined as the ith un-added data; the ith is added data which is enhanced data which does not participate in training in the previous i training steps; the i-1 th non-added data is enhanced data which do not participate in training in the previous i-1 training step;
and under the condition of obtaining the ith training model, the computing equipment inputs the ith non-added data into the ith training model to obtain the ith evaluation result.
10. The method of claim 3, wherein the computing device determines an ith training data based on the ith training proportion, the ith incremental data, and the n pieces of raw data, and specifically comprises:
under the condition that the ith-1 screened data is acquired, the computing equipment determines the ith incremental data and the ith-1 screened data as ith added data; the ith added data is enhanced data which is already involved in training in the previous i training steps; the i-1 screened data is data for screening the enhanced data which participate in the training in the previous i-1 training step;
the computing device determining an ith training data based on the ith added data, the n pieces of raw data, and the ith training proportion; the ratio of the ith added data to the n pieces of original data in the training number is the ith training ratio.
11. The method of claim 10, wherein after the computing device determines the ith incremental data based on the λ (i), the method further comprises:
under the condition that ith added data are obtained and an ith training model is obtained, the computing equipment inputs the ith added data into the ith training model to obtain an output result of the ith added data, and screens the ith added data based on the output result to obtain ith screened data.
12. The method according to claim 11, wherein the computing device filters the ith added data based on the output result to obtain ith filtered data, specifically comprising:
the computing device determines a second score value s for the ith added data 2
The computing equipment is used for adding a second scoring value s in the ith added data 2 And determining the enhanced data larger than the first threshold value as abnormal data, and removing the abnormal data in the ith added data to obtain the ith screened data.
13. The method of claim 12, wherein the computing device determines a second score value s for the ith added data 2 The method specifically comprises the following steps:
the computing device determining a third scoring index score3 for each enhanced data based on the text feature data of the ith added data;
the computing device determines a fourth score index score4 of each piece of enhancement data in the ith added data based on the output result of the ith added data;
the computing device determines a second score value s based on the third scoring index score3 and the fourth scoring index score4 2
14. A computing device, comprising: one or more processors and one or more memories; the one or more processors are coupled with the one or more memories for storing computer program code, the computer program code comprising computer instructions that, when executed by the one or more processors, cause the computing device to perform the method of any of claims 1-13.
15. A computer-readable storage medium comprising instructions that, when executed on a computing device, cause the computing device to perform the method of any of claims 1-13.
CN202211715713.7A 2022-12-30 2022-12-30 Model training method and computing equipment Active CN115688868B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211715713.7A CN115688868B (en) 2022-12-30 2022-12-30 Model training method and computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211715713.7A CN115688868B (en) 2022-12-30 2022-12-30 Model training method and computing equipment

Publications (2)

Publication Number Publication Date
CN115688868A true CN115688868A (en) 2023-02-03
CN115688868B CN115688868B (en) 2023-10-20

Family

ID=85056988

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211715713.7A Active CN115688868B (en) 2022-12-30 2022-12-30 Model training method and computing equipment

Country Status (1)

Country Link
CN (1) CN115688868B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117494672A (en) * 2023-11-13 2024-02-02 北京大学长沙计算与数字经济研究院 Method and device for generating industry document and computer readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110543645A (en) * 2019-09-04 2019-12-06 网易有道信息技术(北京)有限公司 Machine learning model training method, medium, device and computing equipment
WO2020220539A1 (en) * 2019-04-28 2020-11-05 平安科技(深圳)有限公司 Data increment method and device, computer device and storage medium
WO2021139250A1 (en) * 2020-07-31 2021-07-15 平安科技(深圳)有限公司 Data enhancement model training method and apparatus
CN113407842A (en) * 2021-06-28 2021-09-17 携程旅游信息技术(上海)有限公司 Model training method, method and system for obtaining theme recommendation reason and electronic equipment
CN114398893A (en) * 2021-12-15 2022-04-26 北京易康医疗科技有限公司 Clinical data processing model training method and device based on contrast learning
CN114637847A (en) * 2022-03-15 2022-06-17 平安科技(深圳)有限公司 Model training method, text classification method and device, equipment and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020220539A1 (en) * 2019-04-28 2020-11-05 平安科技(深圳)有限公司 Data increment method and device, computer device and storage medium
CN110543645A (en) * 2019-09-04 2019-12-06 网易有道信息技术(北京)有限公司 Machine learning model training method, medium, device and computing equipment
WO2021139250A1 (en) * 2020-07-31 2021-07-15 平安科技(深圳)有限公司 Data enhancement model training method and apparatus
CN113407842A (en) * 2021-06-28 2021-09-17 携程旅游信息技术(上海)有限公司 Model training method, method and system for obtaining theme recommendation reason and electronic equipment
CN114398893A (en) * 2021-12-15 2022-04-26 北京易康医疗科技有限公司 Clinical data processing model training method and device based on contrast learning
CN114637847A (en) * 2022-03-15 2022-06-17 平安科技(深圳)有限公司 Model training method, text classification method and device, equipment and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CONNOR SHORTEN等: "Text Data Augmentation for Deep Learning", 《JOURNAL OF BIG DATA》, pages 1 - 34 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117494672A (en) * 2023-11-13 2024-02-02 北京大学长沙计算与数字经济研究院 Method and device for generating industry document and computer readable storage medium

Also Published As

Publication number Publication date
CN115688868B (en) 2023-10-20

Similar Documents

Publication Publication Date Title
US10949709B2 (en) Method for determining sentence similarity
WO2020248376A1 (en) Emotion detection method and apparatus, electronic device, and storage medium
CN110675859B (en) Multi-emotion recognition method, system, medium, and apparatus combining speech and text
CN111160452A (en) Multi-modal network rumor detection method based on pre-training language model
CN111078887A (en) Text classification method and device
CN111401259B (en) Model training method, system, computer readable medium and electronic device
CN112199501A (en) Scientific and technological information text classification method
CN110489747A (en) A kind of image processing method, device, storage medium and electronic equipment
CN112002311A (en) Text error correction method and device, computer readable storage medium and terminal equipment
CN111625649A (en) Text processing method and device, electronic equipment and medium
CN115688868A (en) Model training method and computing device
CN110781329A (en) Image searching method and device, terminal equipment and storage medium
CN114169340A (en) Cognition method and system based on fusion of multi-mode data matrix
CN115775349A (en) False news detection method and device based on multi-mode fusion
WO2024159858A1 (en) Entity recognition model training method and apparatus, device, storage medium, and product
CN108461091A (en) Intelligent crying detection method towards domestic environment
CN114613387A (en) Voice separation method and device, electronic equipment and storage medium
CN117350287B (en) Text emotion analysis method based on public opinion big data
CN111833842B (en) Synthetic tone template discovery method, device and equipment
CN117038099A (en) Medical term standardization method and device
CN116432664A (en) Dialogue intention classification method and system for high-quality data amplification
CN116257616A (en) Entity relation extraction method and system for music field
CN115472182A (en) Attention feature fusion-based voice emotion recognition method and device of multi-channel self-encoder
CN114297409A (en) Model training method, information extraction method and device, electronic device and medium
CN111383641B (en) Voice recognition method, device and controller

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant