CN115688868B - Model training method and computing equipment - Google Patents

Model training method and computing equipment Download PDF

Info

Publication number
CN115688868B
CN115688868B CN202211715713.7A CN202211715713A CN115688868B CN 115688868 B CN115688868 B CN 115688868B CN 202211715713 A CN202211715713 A CN 202211715713A CN 115688868 B CN115688868 B CN 115688868B
Authority
CN
China
Prior art keywords
data
training
ith
computing device
enhancement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211715713.7A
Other languages
Chinese (zh)
Other versions
CN115688868A (en
Inventor
崔和涛
张云柯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Honor Device Co Ltd
Original Assignee
Honor Device Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Honor Device Co Ltd filed Critical Honor Device Co Ltd
Priority to CN202211715713.7A priority Critical patent/CN115688868B/en
Publication of CN115688868A publication Critical patent/CN115688868A/en
Application granted granted Critical
Publication of CN115688868B publication Critical patent/CN115688868B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Electrically Operated Instructional Devices (AREA)

Abstract

The embodiment of the application discloses a model training method and computing equipment, wherein the computing equipment inputs n pieces of original data into a first neural network model for training to obtain a pre-training model; the computing equipment carries out data enhancement on n pieces of original data to obtain m pieces of total enhancement data; the computing equipment takes n pieces of original data and m pieces of total enhancement data as training data, and trains a pre-training model in Z steps; for the ith training step, the computing device determines an ith enhancement duty cycle λ (i); the enhancement ratio is the proportion of the enhancement data participating in the training of the step to the m pieces of total enhancement data; the computing equipment determines the ith training data based on the lambda (i), and inputs the ith training data into the ith-1 training model for training to obtain the ith training model; i is sequentially from 1 to Z; lambda (i) is equal to or greater than lambda (i-1); lambda (i-1) is the i-1 th enhancement duty cycle of step i-1. According to the embodiment of the application, the model training effect can be improved.

Description

Model training method and computing equipment
Technical Field
The present application relates to the field of terminal technologies, and in particular, to a model training method and a computing device.
Background
For model training, data scarcity can lead to poor results of model training. In order to improve the model training effect, data enhancement can be performed on the known data to obtain more training data. However, noise is introduced in the data enhancement process, so that data is offset, and the result index of model training is poor.
Disclosure of Invention
The embodiment of the application discloses a model training method and computing equipment, which can improve the model training effect.
In a first aspect, the present application provides a model training method, the method being applied to a computing device, the method comprising: the computing equipment inputs n pieces of original data into a first neural network model for training to obtain a pre-training model; the computing equipment carries out data enhancement on the n pieces of original data to obtain m pieces of total enhancement data; the computing equipment takes the n pieces of original data and the m pieces of total enhancement data as training data, and trains the pre-training model in Z steps; wherein, for the ith training step, the computing device determines an ith enhancement duty cycle λ (i); the enhancement ratio is the proportion of the enhancement data participating in the training of the step to the m pieces of total enhancement data; the computing equipment determines ith training data based on the lambda (i), and inputs the ith training data into an ith-1 training model for training to obtain an ith training model; the value of i is sequentially from 1 to Z; the lambda (i) is more than or equal to lambda (i-1); and lambda (i-1) is the i-1 enhancement duty ratio determined in the training process of the i-1 step, and n, m and Z are positive integers.
The ith training model is a first neural network model output by the ith training step-1; in the i-th training, the i-1 training model is a pre-training model; and in the Z-step training, the obtained ith training model is a Z-th training model, and the Z-th training model is a model result output by the whole training.
In the embodiment of the application, the computing equipment divides the training step into a plurality of steps in the training process, and the enhancement data in the training data in each step is increased, so that the fusion of the enhancement data and the original data can be dynamically adjusted, the enhancement data can gradually participate in the training, the model offset caused by the data deterioration is reduced, all the enhancement data participate in the training, and the enhancement data can fully play a role. In addition, the training amount of the enhancement data is gradually increased, so that the model training effect can be improved, and the result index of the training model can be improved.
In one possible implementation, the computing device determines an i-th enhancement duty cycle λ (i), specifically including: the computing device determines the function proportion based on the current ith stepThe method comprises the steps of carrying out a first treatment on the surface of the And determining an ith enhancement duty cycle λ (i) =min (1, d) based on said d; said Z and said lambda 0 Is a preset value; or, the computing device obtains the ith-1 verification result, and based on the ith-1 verification result logic dev And ideal results label dev Determining a gap value k:
the computing device determines the d and determines an i-th enhancement duty cycle λ (i) =max (1, 1+k) ∙ d based on the d and the k; wherein the loss is pre Representing a loss value between an i-2 th validation result and the ideal result; the loss is post Representing a loss value between the i-1 th validation result and the ideal result; the i-1 verification result is obtained by inputting verification data into the current i-1 training model; the i-2 verification result is obtained by inputting the verification data into the current i-2 training model; the ideal result is the correct output result of the verification data. In this way, the computing device can determine the ith enhancement duty cycle λ (i), determine the amount of incremental data, and ensure that the increased amount of data is better for training results. For the first method, as the training steps are increased, the quantity of the actually added enhancement data for training is smaller and smaller, so that the pre-training enhancement data can bring substantial influence to the model, and the later model tends to be stable, and the training result is better. For the second method, the ith enhancement duty ratio is adjusted based on the verification result of the training model in the last step, and k has positive and negative two conditions: in the case of loss pre Greater than loss of post The k value is positive, the loss value is better after the training results of the front step and the back step, at this time, the ith enhancement duty ratio is (1+k) d, and the incremental data is larger. In another case, loss pre Less than loss of post The k value is negative, the loss value is degraded through two steps of training, at the moment, the ith enhancement duty ratio is d, and incremental data are increased according to the size of d. In the process, under the condition of good verification effect, incremental data can be improved without increasing, the output effect can be improved, the training accuracy is improved, and the training efficiency is ensured.
Wherein loss is pre Is the loss of step i-1 post As a result, the computing device has been calculated once and stored, and can be used directly in the above-described calculation process. Ideal result label dev Is pre-stored data.
In one possible implementation, the calculationThe device determines ith training data based on the lambda (i), and specifically comprises the following steps: the computing device determining an ith delta data based on the λ (i); the ith increment data is enhancement data of newly added training in the ith training; the computing device obtains the ith-1 verification result, and based on the ith-1 verification result logic dev And ideal results label dev Determining a gap value k:
Wherein the loss is pre Representing a loss value between an i-2 th validation result and the ideal result; the loss is post Representing a loss value between the i-1 th validation result and the ideal result; the i-1 verification result is obtained by inputting verification data into the current i-1 training model; the i-2 verification result is obtained by inputting the verification data into the current i-2 training model; the ideal result is a correct output result of the verification data; the computing device determines an adjustment value fatly=batch based on the k ori ∙ (1-k) and based on an initial scale value batch of said raw data ori Initial scale value batch of enhancement data aug And said k determining an ith training ratio batch ori (i):batch aug (i)=batch ori -∆:batch aug Performing a +; wherein the batch ori And the batch aug Is a preset value; the ith training proportion is the proportion of original data and enhanced data in the number of training in the ith training; the computing device determines an ith training data based on the ith training scale, the ith delta data, and the n pieces of raw data. Therefore, the enhancement data can be ensured not to excessively act on the neural network training model, the deviation of the model is avoided, and the output effect of the trained first neural network model is improved.
In one possible implementation, the computing device determines the i-th incremental data based on the λ (i), specifically including: the computing device determines the previous m ∙ (lambda (i) -lambda (i-1)) piece of data in the i-1 th data to be added as the i-th increment data; the data to be added in the ith step-1 is the enhancement data which does not participate in training in the previous step i-1 and waits for participating in training. Therefore, the computing equipment can add the selected data in the i-1 data to be added according to the i enhancement proportion into training, and the accuracy of the data quantity of the added enhancement data is ensured, so that the training effect can be improved.
The i-1 th data to be added can be an i-1 th data sequence or i-1 th data without being added. The i-1 th non-added data is the enhancement data which has not participated in training in the previous i-1 step; the i-1 data sequence is the data after the i-1 data sequence is not added. When the data to be added of the ith-1 is the data which is not added of the ith-1, the training step can be simplified, and the training efficiency of the training model can be improved.
In a possible implementation manner, in a case that the i-1 th evaluation result is obtained and no data is added to the i-1 th evaluation result, the method further includes: the computing equipment performs sorting processing on the i-1-th non-added data based on the i-1-th evaluation result to obtain an i-1-th data sequence, wherein the i-1-th evaluation result is a result obtained after the i-1-th non-added data is evaluated through i-1-th training; the i-1 th non-added data is the enhancement data which does not participate in training in the previous i-1 step; the computing device determines the data of the previous m ∙ (lambda (i) -lambda (i-1)) in the i-1 th data to be added as the i-th increment data, and specifically comprises the following steps: the computing device determines the first m ∙ (λ (i) - λ (i-1)) piece of data in the i-1 th data sequence as the i-th delta data. Therefore, the computing device can sort the data which is not added in the ith-1 in advance, so that the incremental data is the data with better training effect, and the better training model effect can be ensured.
In a possible implementation manner, in the case that the i-1 evaluation result is not obtained, the i-1 data to be added is i-1 non-added data, and the i-1 non-added data is the enhancement data which does not participate in training in the previous i-1 step; the computing device determines the data of the previous m ∙ (lambda (i) -lambda (i-1)) in the i-1 th data to be added as the i-th increment data, and specifically comprises the following steps: the first m ∙ (lambda (i) -lambda (i-1)) piece of data in the i-1 th non-added data is determined as the i-th delta data. Thus, the training steps can be simplified, and the training efficiency of the training model can be improved.
In one possible implementation manner, the computing device performs sorting processing on the i-1 non-added data based on the i-1 evaluation result to obtain an i-1 data sequence, which specifically includes: the computing device determining a first scoring index score1 for each piece of enhancement data based on the text feature data for which no data is added by the i-1 th item; the computing device determines a second scoring index score2 for each piece of enhancement data in the i-1 th non-added data based on the i-1 th evaluation result; the computing device determines a first score value s based on the first score index score1 and the second score index score2 1 The method comprises the steps of carrying out a first treatment on the surface of the The computing device is based on the first scoring value s 1 And sequencing the i-1 non-added data to obtain an i-1 data sequence. Therefore, the text features are simpler, the correct enhancement data is easier to output, the enhancement data is more preferentially input into the neural network model, namely the enhancement data is added into the training set from easy to difficult, the better quality data is used in the stage with better effect of the early training, and the training model is more stable and better in effect.
In one possible implementation manner, the computing device determines a first score index score1 of each piece of enhancement data based on the text feature data of the i-1 th non-added data, specifically including: the computing device determines text feature data of each piece of enhancement data in the i-1 non-added data, and determines a first scoring index score1 as a weighted summation of parameters of the text feature data; the parameters of the text characteristic data comprise one or more of text length l, entity number c, text clause number s and editing distance; the computing device determines a second score index score2 of each piece of enhancement data in the i-1 non-added data based on the i-1 evaluation result, and specifically includes: the computing device determines a loss value and a confidence of each piece of enhanced data in the i-1-th non-added data based on the i-1-th evaluation result, and determines a second evaluation index score2 as an addition of the loss value and the confidence Summing the weights; the computing device determines a first score value s based on the first score index score1 and the second score index score2 1 The method specifically comprises the following steps: the computing device determines a first scoring value s for each piece of enhancement data in the i-1 th non-added data 1 A weighted sum of the first score index score1 and the second score index score 2; the computing device is based on the first scoring value s 1 Sequencing the i-1 non-added data to obtain an i-1 data sequence, which specifically comprises the following steps: the computing device follows the i-1 th non-added data according to the first grading value s 1 Sequencing from small to large to obtain an i-1 data sequence; wherein the raw data and the total enhancement data are text data. Therefore, through the process, the rest enhancement data which are not trained before training can be ordered, the text features are simpler, the more accurate enhancement data which are easier to output are input into the neural network model preferentially, namely the enhancement data are added into the training set from easy to difficult, better quality data are used in the stage with better effect of early training, and the training model is more stable and better in effect.
The text length l is the number of character strings of the text; the entity number c represents the amount of information extracted; the number s of text clauses represents the number of sentences in which a piece of text is divided by punctuation; the edit distance is the edit distance, which is the distance between two character strings, and is the minimum number of editing operations required to change from one to the other.
In one possible implementation manner, the computing device determines a first score index score1 of each piece of enhancement data based on the text feature data of the i-1 th non-added data, specifically including: the computing device determines text feature data of each piece of enhancement data in the i-1 non-added data, and determines the first scoring index score1 as normalized summation of various parameters of the text feature data; the parameters of the text characteristic data comprise one or more of text length l, entity number c, text clause number s and editing distance; the computing device is based on theThe i-1 evaluation result determines a second score index score2 of each piece of enhancement data in the i-1 non-added data, which specifically includes: the computing device determines a loss value and a confidence coefficient of each piece of enhanced data in the i-1-th non-added data based on the i-1-th evaluation result, and determines a second evaluation index score2 as a normalized summation of the loss value and the confidence coefficient; the computing device determines a first score value s based on the first score index score1 and the second score index score2 1 The method specifically comprises the following steps: the computing device determines a first scoring value s for each piece of enhancement data in the i-1 th non-added data 1 A sum of the first score index score1 and the second score index score 2; the computing device is based on the first scoring value s 1 Sequencing the i-1 non-added data to obtain an i-1 data sequence, which specifically comprises the following steps: the computing device follows the i-1 th non-added data according to the first grading value s 1 Sequencing from small to large to obtain an i-1 data sequence; wherein the raw data and the total enhancement data are text data. Therefore, through the process, the rest enhancement data which are not trained before training can be ordered, the text features are simpler, the more accurate enhancement data which are easier to output are input into the neural network model preferentially, namely the enhancement data are added into the training set from easy to difficult, better quality data are used in the stage with better effect of early training, and the training model is more stable and better in effect.
In one possible implementation, after the computing device determines the i-th delta data based on the λ (i), the method further includes: under the condition that the ith incremental data in the ith-1 non-added data is obtained, the computing equipment eliminates the ith incremental data in the ith-1 non-added data, and the rest data is determined as the ith non-added data; the ith added data is the enhancement data which does not participate in training in the previous i training steps; the i-1 th non-added data is the enhancement data which does not participate in training in the previous i-1 training steps; and under the condition that an ith training model is obtained, the computing equipment inputs the ith added data into the ith training model to obtain an ith evaluation result. Therefore, the evaluation result can determine parameters for the next step of sequencing, and the realizability and the integrity of the sequencing process are ensured, so that the accuracy of the sequencing result is improved, and the training effect of the model is improved.
In one possible implementation manner, the computing device determines the ith training data based on the ith training proportion, the ith incremental data and the n pieces of original data, specifically including: in the event that i-1 filtered data is acquired, the computing device determines the i-th delta data and the i-1 filtered data as i-th added data; the ith added data is the enhancement data which is already participated in training in the previous i training steps; the i-1 screened data is data for screening the enhancement data which are already involved in training in the previous i-1 training steps; the computing device determining an ith training data based on the ith added data, the n pieces of raw data, and the ith training scale; the ratio of the ith added data to the n pieces of original data in the training quantity is the ith training ratio. In this way, the computing device may ensure that the ratio of the ith training data in the number of exercises is done based on the ith training ratio, thereby improving the training effect.
In one possible implementation, after the computing device determines the i-th delta data based on the λ (i), the method further includes: and under the condition that the ith added data is obtained and the ith training model is obtained, the computing equipment inputs the ith added data into the ith training model to obtain an output result of the ith added data, and screens the ith added data based on the output result to obtain ith screened data. In this way, the computing device can screen the trained enhancement data, reject abnormal data, namely screen the data according to the result after training, and continuously train the screened data in the next step based on the result, reject the abnormal data, ensure the screening accuracy, simultaneously ensure the effect of the model, and ensure that the effect of the enhancement data is exerted to the maximum.
Wherein the ith screened data is data for screening the enhancement data which are already in the training in the previous i training steps.
In one possible implementation manner, the computing device screens the ith added data based on the output result to obtain ith screened data, which specifically includes: the computing device determining a second scoring value s2 for the ith added data; and the computing equipment determines the enhancement data with the second grading value s2 greater than the first threshold value in the i-th added data as abnormal data, and eliminates the abnormal data in the i-th added data to obtain i-th screened data. In this way, the computing device can determine whether each piece of i-th added data is abnormal or not according to the specific scoring value, and reject the data under abnormal conditions, so that the i-th screened data can be ensured to improve the effect of the training model in the later step.
In one possible implementation, the computing device determines a second scoring value s for the ith added data 2 The method specifically comprises the following steps: the computing device determining a third scoring index score3 for each piece of enhancement data based on the text feature data of the ith added data; the computing device determines a fourth score index score4 of each piece of enhancement data in the ith added data based on the output result of the ith added data; the computing device determines a second score value s based on the third score index score3 and the fourth score index score4 2 . In this way, the computing device can determine the text feature data and the output result of the ith screened data, so that the second grading value can be determined, and the accuracy of screening can be ensured by judging the accuracy of the grading value, so that the effect of training a model can be improved.
In a second aspect, the present application provides a computing device comprising: one or more processors and one or more memories; the one or more processors are coupled with the one or more memories, the one or more memories for storing computer program code comprising computer instructions that, when executed by the one or more processors, cause the computing device to perform:
inputting n pieces of original data into a first neural network model for training to obtain a pre-training model; carrying out data enhancement on the n pieces of original data to obtain m pieces of total enhancement data; taking the n pieces of original data and the m pieces of total enhancement data as training data, and training the pre-training model in Z steps; wherein, for the ith training step, an ith enhancement duty cycle λ (i) is determined; the enhancement ratio is the proportion of the enhancement data participating in the training of the step to the m pieces of total enhancement data; determining ith training data based on the lambda (i), and inputting the ith training data into an ith-1 training model for training to obtain an ith training model; the value of i is sequentially from 1 to Z; the lambda (i) is more than or equal to lambda (i-1); and lambda (i-1) is the i-1 enhancement duty ratio determined in the training process of the i-1 step, and n, m and Z are positive integers.
The ith training model is a first neural network model output by the ith training step-1; in the i-th training, the i-1 training model is a pre-training model; and in the Z-step training, the obtained ith training model is a Z-th training model, and the Z-th training model is a model result output by the whole training.
In the embodiment of the application, the computing equipment divides the training step into a plurality of steps in the training process, and the enhancement data in the training data in each step is increased, so that the fusion of the enhancement data and the original data can be dynamically adjusted, the enhancement data can gradually participate in the training, the model offset caused by the data deterioration is reduced, all the enhancement data participate in the training, and the enhancement data can fully play a role. In addition, the training amount of the enhancement data is gradually increased, so that the model training effect can be improved, and the result index of the training model can be improved.
In one possible implementation, the computing device determines an i-th enhancement duty cycle λ (i), specifically performs: determining a function proportion based on the current ith stepThe method comprises the steps of carrying out a first treatment on the surface of the And determining an ith enhancement duty cycle λ (i) =min (1, d) based on said d; said Z and said lambda 0 Is a preset value; or, in the case where the i-1 th authentication result is acquired, Logic based on the i-1 th verification result dev And ideal results label dev Determining a gap value k:
determining said d and determining an i-th enhancement duty cycle λ (i) =max (1, 1+k) ∙ d based on said d and said k; wherein the loss is pre Representing a loss value between an i-2 th validation result and the ideal result; the loss is post Representing a loss value between the i-1 th validation result and the ideal result; the i-1 verification result is obtained by inputting verification data into the current i-1 training model; the i-2 verification result is obtained by inputting the verification data into the current i-2 training model; the ideal result is the correct output result of the verification data. In this way, the computing device can determine the ith enhancement duty cycle λ (i), determine the amount of incremental data, and ensure that the increased amount of data is better for training results. For the first method, as the training steps are increased, the quantity of the actually added enhancement data for training is smaller and smaller, so that the pre-training enhancement data can bring substantial influence to the model, and the later model tends to be stable, and the training result is better. For the second method, the ith enhancement duty ratio is adjusted based on the verification result of the training model in the last step, and k has positive and negative two conditions: in the case of loss pre Greater than loss of post The k value is positive, the loss value is better after the training results of the front step and the back step, at this time, the ith enhancement duty ratio is (1+k) d, and the incremental data is larger. In another case, loss pre Less than loss of post The k value is negative, the loss value is degraded through two steps of training, at the moment, the ith enhancement duty ratio is d, and incremental data are increased according to the size of d. In the process, under the condition of good verification effect, incremental data can be improved without increasing, the output effect can be improved, the training accuracy is improved, and the training efficiency is ensured.
Wherein loss is pre Is the loss of step i-1 post As a result, the computing deviceThe calculation is performed once and stored, and the calculation is directly used in the calculation process. Ideal result label dev Is pre-stored data.
In one possible implementation, the computing device determines the ith training data based on the λ (i), specifically performs: determining an ith delta data based on the λ (i); the ith increment data is enhancement data of newly added training in the ith training; in case that the i-1 th verification result is obtained, logic is based on the i-1 th verification result dev And ideal results label dev Determining a gap value k:
wherein the loss is pre Representing a loss value between an i-2 th validation result and the ideal result; the loss is post Representing a loss value between the i-1 th validation result and the ideal result; the i-1 verification result is obtained by inputting verification data into the current i-1 training model; the i-2 verification result is obtained by inputting the verification data into the current i-2 training model; the ideal result is a correct output result of the verification data; determining an adjustment value fatly=batch based on the k ori ∙ (1-k) and based on an initial scale value batch of said raw data ori Initial scale value batch of enhancement data aug And said k determining an ith training ratio batch ori (i):batch aug (i)=batch ori -∆:batch aug Performing a +; wherein the batch ori And the batch aug Is a preset value; the ith training proportion is the proportion of original data and enhanced data in the number of training in the ith training; the computing device determines an ith training data based on the ith training scale, the ith delta data, and the n pieces of raw data. Therefore, the enhancement data can be ensured not to excessively act on the neural network training model, the deviation of the model is avoided, and the output effect of the trained first neural network model is improved.
In one possible implementation, the computing device determines the i-th delta data based on the λ (i), specifically performs: determining the data of the previous m ∙ (lambda (i) -lambda (i-1)) in the data to be added of the i-1 th increment data as the i-th increment data; the data to be added in the ith step-1 is the enhancement data which does not participate in training in the previous step i-1 and waits for participating in training. Therefore, the computing equipment can add the selected data in the i-1 data to be added according to the i enhancement proportion into training, and the accuracy of the data quantity of the added enhancement data is ensured, so that the training effect can be improved.
The i-1 th data to be added can be an i-1 th data sequence or i-1 th data without being added. The i-1 th non-added data is the enhancement data which has not participated in training in the previous i-1 step; the i-1 data sequence is the data after the i-1 data sequence is not added. When the data to be added of the ith-1 is the data which is not added of the ith-1, the training step can be simplified, and the training efficiency of the training model can be improved.
In one possible implementation, in the case that the i-1 th evaluation result is obtained and no data is added to the i-1 th evaluation result, the computing device further performs: sorting the i-1 non-added data based on the i-1 evaluation result to obtain an i-1 data sequence, wherein the i-1 evaluation result is a result obtained after the i-1 non-added data is evaluated through i-1 training; the i-1 th non-added data is the enhancement data which does not participate in training in the previous i-1 step; the computing device determines the data of the previous m ∙ (lambda (i) -lambda (i-1)) in the i-1 th data to be added as the i-th increment data, and specifically performs: the first m ∙ (lambda (i) -lambda (i-1)) data in the i-1 th data sequence is determined as the i-th increment data. Therefore, the computing device can sort the data which is not added in the ith-1 in advance, so that the incremental data is the data with better training effect, and the better training model effect can be ensured.
In a possible implementation manner, in the case that the i-1 evaluation result is not obtained, the i-1 data to be added is i-1 non-added data, and the i-1 non-added data is the enhancement data which does not participate in training in the previous i-1 step; the computing device determines the data of the previous m ∙ (lambda (i) -lambda (i-1)) in the i-1 th data to be added as the i-th increment data, and specifically performs: the first m ∙ (lambda (i) -lambda (i-1)) piece of data in the i-1 th non-added data is determined as the i-th delta data. Thus, the training steps can be simplified, and the training efficiency of the training model can be improved.
In one possible implementation manner, the computing device performs sorting processing on the i-1 non-added data based on the i-1 evaluation result to obtain an i-1 data sequence, and specifically performs: determining a first scoring index score1 of each piece of enhancement data based on the text feature data of the i-1 th non-added data; determining a second score index score2 of each piece of enhancement data in the i-1 non-added data based on the i-1 evaluation result; determining a first score value s based on the first score index score1 and the second score index score2 1 The method comprises the steps of carrying out a first treatment on the surface of the Based on the first scoring value s 1 And sequencing the i-1 non-added data to obtain an i-1 data sequence. Therefore, the text features are simpler, the correct enhancement data is easier to output, the enhancement data is more preferentially input into the neural network model, namely the enhancement data is added into the training set from easy to difficult, the better quality data is used in the stage with better effect of the early training, and the training model is more stable and better in effect.
In one possible implementation manner, the computing device determines a first score index score1 of each piece of enhancement data based on the text feature data of the i-1 th non-added data, specifically performing: determining text characteristic data of each piece of enhancement data in the i-1 non-added data, and determining a first scoring index score1 as weighted summation of all parameters of the text characteristic data; the parameters of the text characteristic data comprise one or more of text length l, entity number c, text clause number s and editing distance; the computing device determines a second score index score2 of each piece of enhancement data in the i-1 non-added data based on the i-1 evaluation result, and specifically performs: determining a loss value and a confidence of each piece of enhanced data in the i-1-th non-added data based on the i-1-th evaluation result, and determining a second evaluation index score2 as the loss value and the confidence Weighted summation of the degrees; the computing device determines a first score value s based on the first score index score1 and the second score index score2 1 The method specifically comprises the following steps: determining a first grading value s of each piece of enhancement data in the i-1 th non-added data 1 A weighted sum of the first score index score1 and the second score index score 2; the computing device is based on the first scoring value s 1 Ordering the i-1 non-added data to obtain an i-1 data sequence, and specifically executing: the i-1 th non-added data is processed according to the first grading value s 1 Sequencing from small to large to obtain an i-1 data sequence; wherein the raw data and the total enhancement data are text data. Therefore, through the process, the rest enhancement data which are not trained before training can be ordered, the text features are simpler, the more accurate enhancement data which are easier to output are input into the neural network model preferentially, namely the enhancement data are added into the training set from easy to difficult, better quality data are used in the stage with better effect of early training, and the training model is more stable and better in effect.
The text length l is the number of character strings of the text; the entity number c represents the amount of information extracted; the number s of text clauses represents the number of sentences in which a piece of text is divided by punctuation; the edit distance is the edit distance, which is the distance between two character strings, and is the minimum number of editing operations required to change from one to the other.
In one possible implementation manner, the computing device determines a first score index score1 of each piece of enhancement data based on the text feature data of the i-1 th non-added data, specifically performing: determining text characteristic data of each piece of enhancement data in the i-1 non-added data, and determining the first scoring index score1 as normalized summation of all parameters of the text characteristic data; the parameters of the text characteristic data comprise one or more of text length l, entity number c, text clause number s and editing distance; the computing device determines that the i-1 th is not added based on the i-1 th evaluation resultThe second score index score2 of each piece of enhancement data in the data specifically performs: determining a loss value and a confidence coefficient of each piece of enhanced data in the i-1 non-added data based on the i-1 evaluation result, and determining a second evaluation index score2 as a normalized summation of the loss value and the confidence coefficient; the computing device determines a first score value s based on the first score index score1 and the second score index score2 1 The method specifically comprises the following steps: determining a first grading value s of each piece of enhancement data in the i-1 th non-added data 1 A sum of the first score index score1 and the second score index score 2; the computing device is based on the first scoring value s 1 Ordering the i-1 non-added data to obtain an i-1 data sequence, and specifically executing: the i-1 th non-added data is processed according to the first grading value s 1 Sequencing from small to large to obtain an i-1 data sequence; wherein the raw data and the total enhancement data are text data. Therefore, through the process, the rest enhancement data which are not trained before training can be ordered, the text features are simpler, the more accurate enhancement data which are easier to output are input into the neural network model preferentially, namely the enhancement data are added into the training set from easy to difficult, better quality data are used in the stage with better effect of early training, and the training model is more stable and better in effect.
In one possible implementation, after the computing device determines the i-th delta data based on the λ (i), the computing device further performs: under the condition that the ith incremental data in the ith-1 non-added data is obtained, the computing equipment eliminates the ith incremental data in the ith-1 non-added data, and the rest data is determined as the ith non-added data; the ith added data is the enhancement data which does not participate in training in the previous i training steps; the i-1 th non-added data is the enhancement data which does not participate in training in the previous i-1 training steps; and under the condition that an ith training model is obtained, inputting the ith added data into the ith training model to obtain an ith evaluation result. Therefore, the evaluation result can determine parameters for the next step of sequencing, and the realizability and the integrity of the sequencing process are ensured, so that the accuracy of the sequencing result is improved, and the training effect of the model is improved.
In one possible implementation, the computing device determines the ith training data based on the ith training proportion, the ith incremental data, and the n pieces of raw data, specifically performs: in the case that the i-1 th filtered data is acquired, determining the i-th increment data and the i-1 th filtered data as i-th added data; the ith added data is the enhancement data which is already participated in training in the previous i training steps; the i-1 screened data is data for screening the enhancement data which are already involved in training in the previous i-1 training steps; determining an ith training data based on the ith added data, the n pieces of raw data, and the ith training scale; the ratio of the ith added data to the n pieces of original data in the training quantity is the ith training ratio. In this way, the computing device may ensure that the ratio of the ith training data in the number of exercises is done based on the ith training ratio, thereby improving the training effect.
In one possible implementation, after the computing device determines the i-th delta data based on the λ (i), the computing device further performs: inputting the ith added data into the ith training model to obtain an output result of the ith added data under the condition that the ith added data is obtained and the ith training model is obtained, and screening the ith added data based on the output result to obtain ith screened data. In this way, the computing device can screen the trained enhancement data, reject abnormal data, namely screen the data according to the result after training, and continuously train the screened data in the next step based on the result, reject the abnormal data, ensure the screening accuracy, simultaneously ensure the effect of the model, and ensure that the effect of the enhancement data is exerted to the maximum.
In one possible implementation manner, the computing device screens the ith added data based on the output result to obtain ith screened data, and specifically performs: determining a second scoring value s2 of the ith added data; and determining the enhancement data with the second grading value s2 greater than the first threshold value in the i-th added data as abnormal data, and eliminating the abnormal data in the i-th added data to obtain i-th screened data. In this way, the computing device can determine whether each piece of i-th added data is abnormal or not according to the specific scoring value, and reject the data under abnormal conditions, so that the i-th screened data can be ensured to improve the effect of the training model in the later step.
In one possible implementation, the computing device determines a second scoring value s for the ith added data 2 The method specifically comprises the following steps: determining a third scoring index score3 for each piece of enhancement data based on the text feature data of the ith added data; determining a fourth score index score4 of each piece of enhancement data in the ith added data based on the output result of the ith added data; determining a second score value s based on the third score index score3 and the fourth score index score4 2 . In this way, the computing device can determine the text feature data and the output result of the ith screened data, so that the second grading value can be determined, and the accuracy of screening can be ensured by judging the accuracy of the grading value, so that the effect of training a model can be improved.
In a third aspect, the present application provides a computing device comprising: one or more functional modules. One or more functional modules are used to perform the model training method in any of the possible implementations of the above aspect.
In a fourth aspect, embodiments of the present application provide a computer storage medium comprising computer instructions which, when run on a computing device, cause a communications apparatus to perform the model training method in any of the possible implementations of the above aspect.
In a fifth aspect, embodiments of the present application provide a computer program product, which when run on a computer causes the computer to perform the model training method in any one of the possible implementations of the above aspect.
Drawings
FIG. 1 is a schematic diagram of a training verification and use process of a neural network model according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a method for training a model according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a function between an ith enhancement duty cycle and training step i according to an embodiment of the present application;
FIG. 4 is a schematic diagram of another method for training a model according to an embodiment of the present application;
fig. 5 is a schematic hardware structure of a computing device 100 according to an embodiment of the present application.
Detailed Description
The technical solutions of the embodiments of the present application will be clearly and thoroughly described below with reference to the accompanying drawings. Wherein, in the description of the embodiments of the present application, unless otherwise indicated, "/" means or, for example, a/B may represent a or B; the text "and/or" is merely an association relation describing the associated object, and indicates that three relations may exist, for example, a and/or B may indicate: the three cases where a exists alone, a and B exist together, and B exists alone, and furthermore, in the description of the embodiments of the present application, "plural" means two or more than two.
The terms "first," "second," and the like, are used below for descriptive purposes only and are not to be construed as implying or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature, and in the description of embodiments of the application, unless otherwise indicated, the meaning of "a plurality" is two or more.
1. Natural language processing (natural language processing, NLP).
NLP is an important direction in the fields of research computer science and artificial intelligence, which is to study how to process natural language and build a bridge for communication between machine language and human language, so as to realize man-machine interaction. Natural language is a speech and character system used by human communication, and the object under investigation by NLP is based on a character system (i.e. "text").
The NLP may include natural language understanding (natural language understanding, NLU) and natural language generation (natural language generation, NLG). Where NLU is a generic term for all knowledge machines to understand the method model or task of text introgression. NLU may include word segmentation, part-of-speech tagging, syntactic analysis, text classification/clustering, information extraction, and so forth. NLG is a software process that automatically converts structured data into human readable text. NLGs can include content determination, text structure creation, sentence aggregation, grammar, reference expression generation, language implementation, and so forth.
In the NLP processing process, lexical analysis, syntactic analysis and semantic analysis are needed. Where lexical analysis may include word segmentation and part-of-speech tagging, i.e., assigning each word to a category after the text is divided into separate words. The categories may be nouns, verbs, adjectives, and the like. The syntactic analysis may be a syntactic structure analysis process in sentence units. Semantic analysis is the understanding of the true semantics of sentence expressions.
At present, NLP is widely applied in the fields of voice recognition, text generation, information extraction, text classification, information recommendation and the like. While for NLP, deep learning means such as convolutional neural network CNN, recurrent neural network RNN and long-term memory LSTM are often adopted.
2. Concepts in the neural network model training process.
In the deep learning process, model training needs to be performed in advance, and the training process is a process of adjusting the network structure and network parameters of the neural network model. For example, training text data is output to a neural network model, and a loss value exists between an output result corresponding to the preset text data and a correct result. And adjusting the training model according to the direction of decreasing the loss value, iterating continuously, and converging the model when the loss value is not decreased any more.
For example, the training method of the neural network model may be a gradient descent method. The gradient descent method can counter-propagate cost function (cost) in the neural network, and find out the weight parameter theta by continuous iterationThe loss function nadir. The weight parameters of the neural network model can be continuously adjusted in the training process of the neural network model, so that the model is converged. If the weight parameter trained by the current neural network model is θ, the weight parameter after iteration is θ=θ - η ∇ θ J (θ), where η is the step size (learning rate) that determines the length of each step that progresses along the negative direction of the gradient during the gradient descent. ∇ is meant to be a gradient in the sense that,. J (θ) is a loss function. />
In order to be able to evaluate the quality of the neural network model, the degree of fitting is measured by using the loss function. Minimizing the loss function means that the fitting degree is best, and the corresponding model parameter is the optimal parameter. In the training process of the neural network model, the training condition of the neural network model is measured based on the result of the loss function and the loss value. The loss function can be used for determining the performance of the model by comparing the predicted output result and the ideal output result of the neural network model, so that the optimization direction of the model is sought. If the difference between the predicted output result and the ideal output result is relatively large, the loss value is large; whereas the deviation is small and the loss value is small. For different neural network models, different loss functions may be selected to calculate the loss value. The loss value may be calculated using square loss (square loss), exponential loss (exponential loss), hinge loss (hinge loss), negative log likelihood loss (NLL loss), cross entropy loss (cross entropy loss), KL divergence loss (KL divergence loss), cosine similarity loss (cosine embedding loss), and the like, and the present application is not limited to the loss function.
Batch and round epoch are important concepts in neural network training of large-scale data.
In the training process, a batch of samples used in iteration is called a batch, the number of samples is batch_size, and the process of training once by using the batch_size samples is called an iteration. The batch size is a superparameter that defines the number of samples to be processed before updating the internal model parameters. The updated loss function of the primary parameters in deep learning is not derived from a sample, but rather is derived from a data weighting of a batch. One round epoch is trained once using all samples in the training set. Colloquially, the value of epoch is that the entire training data set is used repeatedly several times. The epoch number is a hyper-parameter that defines the number of times the learning algorithm works throughout the training data set, and an epoch means that each sample in the training data set has the opportunity to update the internal model parameters. epoch may consist of one or more latches.
Illustratively, during data batch training, assuming r data, each batch is of the size batch_size, then there are a total of r/batch_size+1 batches, referred to as one epoch. Each time epoch training is performed. After one epoch training, the training data may be shuffled to perform the next epoch training.
3. Data enhancement (data augmentation, DA).
Data enhancement (also referred to as data augmentation) can alleviate the scenario of insufficient data in deep learning, and is widely used in the image field first, and further extends to the NLP field, and achieves effects on many tasks. One of the main directions is to increase the diversity of training data, thereby improving model generalization ability. Data enhancement refers to a method of increasing the amount of data by adding minor changes to existing data or newly creating composite data from existing data. I.e. the original data is known, minor modifications can be made to the original data so that synthetic enhancement data can be created.
In the NLP field, the data enhancement at the time can be varied, and the following is specifically described:
1. data enhancement is performed based on paraphrasing.
With the original data known, the computing device may obtain all of the alternative synonyms for the original data and select r alternatives, which, after replacement, form enhanced data. The original text data may also be fragmented to obtain a plurality of word fragments, and the words of the fragments replaced, thereby forming enhanced data. Enhancement data may also be generated by machine translating the original text data and back translating.
2. Data enhancement is performed based on specific rules.
The raw text data may be subject to different processing tasks, and corresponding information, e.g., data tags, data formats, etc., may be extracted in the face of such specific different tasks.
The computing device may first determine the type of the original text data, and select different augmentation rules according to the type, that is, process according to the corresponding augmentation rules, to obtain the augmentation data. For example, when the type of the original text data is an express short message, the key information in the express short message can be extracted according to a specific text ordering rule of the express short message, and the extracted key information is recombined again according to a preset text ordering rule to obtain the enhanced text.
It should be noted that the above method for enhancing data may also include other methods, which are not limited by the present application.
4. Editing the distance.
The edit distance refers to the minimum number of editing operations required to change from one character string to another, and the larger the distance, the more different the two character strings. Editing operations include three types of insertion, deletion, and substitution. The edit distance Levenshtein Distance between two strings a and b can be expressed as Lev a,b (|a|, |b|). Where |a| and |b| correspond to the lengths of a, b, respectively (in the present application, the length of one piece of text data may be referred to). Then Levenshtein Distance of the two strings a, b, i.e. Lev a,b (|a|, |b|), described mathematically as:
wherein Lev is a,b (q, p) refers to the distance between the first q characters in a and the first p characters in b. q and p can be seen as the lengths of a and b. The first character ind of the character string hereex starts with 1 (actually because 0 is needed before the string at the time of operation), so the last edit distance is q= |a|, p= |b|distance: lev a,b (|a|,|b|)。
When min (q, p) =0, q, p have a value of 0 corresponding to the first q characters in the character string a and the first p characters in the character string b, indicating that one of the character strings a and b is an empty string, and only max (q, p) single character editing operations need to be performed to switch from a to b, so that the editing distance between them is max (q, p), i.e., the largest one of q, p.
When min (q, p) noteq0, lev a,b (|a|, |b|) there are minimum values for three cases:
1.Lev a,b (q-1, p) +1, indicating deletion of a i ;2.Lev a,b (q, p-1) +1 represents an insertion b j ;3.Lev a,b (q-1,p-1)+1(a q ≠b p ) Representing alternative b p 。1(a q ≠b p ) As an exponential function, expressed in a q =b p Taking 0 at the time of (1); at a q ≠b p And its value is 1.
Fig. 1 is a schematic structural diagram of a training verification and use process of a neural network model according to an embodiment of the present application. As shown in fig. 1, neural network models typically require three stages of training, validation and use. In the training stage, training data is required to be input into a training neural network model, network parameters, structures or the like are adjusted, so that the output training output result converges, and training can be completed. And entering a verification stage after training is completed. The verification data is required to be input into the verification neural network model to obtain a verification output result, and under the condition that the verification output result meets the convergence condition, the current neural network model obtained through training can be determined to enter a use stage; otherwise, further training is required based on the verification result. In the use stage, the use data can be input into the use neural network model to obtain the use output result. Wherein the training data, the validation data and the usage data are all different data. The computing device may leave a portion of the training data untrained for use as verification data.
In the process of training the neural network model, related technical problems are generally solved through the neural network mode in the NLP field, and in the process of training the neural network model, a large amount of training data is required to be input into the neural network model, the training model is trained, and network parameters and/or structures are adjusted through a plurality of iterations, so that the neural network model converges, and training is completed. Before training, since training data (e.g., text data) input by the training model has strong privacy for users, and the process of collecting the training data needs to be purchased or collected according to specifications, the training data is scarce, the trained neural network model is greatly deteriorated, and the output result is not ideal.
In order to solve the above-mentioned problem, the text data may be augmented, that is, the original data may be data-enhanced to obtain enhanced data. The original data and the enhancement data are used as training data for training. The enhancement data is obtained by adding extra words or some semantic information into the original data, thereby bringing noise. Noise firstly causes poor results of model training, and secondly causes data migration. For the noise problem brought by the enhancement data, the enhancement data can be filtered in advance, however, the filtration of the enhancement data can make the data underutilized, and the enhancement data is limited to play all roles. And the deviation of the filtering criteria also brings about a deviation of the data.
In an exemplary embodiment, in a training process, text data is required as training data of the neural network model, and in a case that collected text data is limited, data augmentation based on existing text is required to obtain enhanced text. The enhanced text data may have noise, influence the training result and have deviation; screening the enhanced text data may limit the effect of the enhanced text.
In view of the above problems, the embodiments of the present application disclose a model training method, where a computing device may divide training into two phases during training by using raw data and enhanced data. In the first stage, the computing device may input raw data into the first neural network model for training, and adjust the network model to obtain a pre-training model. In the second stage, training can be divided into multiple steps, the enhancement data are dynamically adjusted to determine the enhancement data participated in the training in each step, and the dynamically adjusted enhancement data are input into the model according to the steps until all the enhancement data are trained. The enhancement data and the original data are dynamically adjusted to be fused and input into the model for training, so that even if noise exists in the enhancement data, the model index can be trained, the training effect is good, and the problem of the offset of the enhancement data can be alleviated. In addition, in each training process, the reinforced data can be evaluated, abnormal evaluation data can be screened out, partial abnormal results in the reinforced data of the next training process can be eliminated, and the dynamic filtering ensures that the training model has better effect.
Fig. 2 is a schematic structural diagram of a method for training a model according to an embodiment of the present application. As shown in fig. 2, in the embodiment of the present application, the training of the first neural network model may be divided into two phases, the first phase training and the second phase training. In the first stage training process, training is carried out through the original data, and in the second stage training process, the original data and the enhanced data are dynamically fused and screened, and training is carried out in steps. And adding the enhancement data in batches according to the training step until all the enhancement data are trained, so as to obtain a model after training.
In an embodiment of the application, the training model is a neural network model, which may be used for processing text data. Such as a classification model for text content, a text content information extraction model, a semantic understanding model, and so forth. The neural network model is also a network model for image processing, but may also be a network model for speech analysis, and so on. The specific function and structure of the neural network model is not limited by the present application.
The first stage: the first neural network model is trained from the raw data.
The data set input to the first neural network model for training is a training data set. In the present application, the training data set may include raw data and total enhancement data. Wherein, the original data is the data acquired for training. The number of the original data is n, and the original data may be text data. For example, text data of a computing device is collected. The raw data is collected with the user's consent. For example, the acquisition data or the user determination may be collected where the way the raw data was collected is legal. The total enhancement data is data obtained by data enhancement of the original data. The total enhancement data is m pieces in number, and the enhancement data may be text data.
As shown in fig. 2, during the training process of the first stage training, the computing device may use a training method such as gradient descent to make the first neural network model converge. Under the condition that the original data is obtained, the original data is input into a first neural network model for processing, and a pre-output result is obtained. During the training process, the ideal output result corresponding to each piece of original data is known, and the ideal output result can be regarded as the correct output result of the first neural network model aiming at the original data. The computing device may calculate a loss value between the pre-output result and the ideal output result, adjust parameters of the first neural network model, and input the raw data again such that the loss value between the pre-output result and the ideal output result decreases, so that iteration is continued until the loss value no longer decreases, and determine that the first neural network model converges. The above-mentioned loss function, loss value, gradient drop, etc. may be specifically referred to the related description of the concept in the training process of the neural network model, which is not described herein.
For example, the first neural network model may classify the input text data and extract text key information to obtain text labels and key information. For example, an MM1234 flight with raw data "you take from city a to city B will be at 2022, 10, 1, 12: after 00 take-off … …' is processed by the first neural network model, a pre-output result is obtained, the text label is "flight", and the key information is "departure place: city a; destination: city B; flight number: MM1234; take-off time: 2022, 10, 1, 12:00". Y pieces of original data can be used as a batch to be input into the first neural network model, y pieces of pre-output results are determined, y pieces of ideal output results are known, and the y pieces of pre-output results and the corresponding y pieces of ideal output results can be input into a loss function to be calculated, so that a loss value is obtained. Where y is a positive integer, i.e., batch_size, e.g., 16, 32, 64, etc. Training all raw data as one epoch can obtain the loss values of all the multiple latches, and the loss value of the epoch can be determined as the average value of the loss values of the multiple latches. And calculating weight parameters of the next iteration based on the loss values of the epochs, starting the iterative training of the next epochs until the loss value of each epochs is not reduced relative to the last time (or several times), and determining that the first-stage training is finished to obtain a first neural network model after the first-stage training.
After the first stage training is completed, the result obtained by the last training may be determined as the pre-output result, that is, the pre-output result is the result output by the first neural network model that is not adjusted (training is completed) in the first stage training.
And a second stage: the first neural network model is continuously trained through the original data and the enhanced data.
In the case of determining the raw data (determined in the first stage), the computing device may determine total enhancement data based on the raw data. The total enhancement data may be m pieces in number. The number ratio of raw data to total enhancement data is n: m. n: m may be 1: 1. 1: 5. 1:10 and 1:20, etc., without limiting the range of ratios thereof. It should be noted that, the execution sequence of the total enhancement data and the pre-training is determined to be not in sequence, and the application is not limited to the execution sequence. In addition, in the process of determining the total enhancement data based on the original data, the computing device generates each piece of total enhancement data according to one piece of original data, so that the enhancement data and the original data have a corresponding relation. Under the condition that the training step of the second stage is not started, the data quantity of the training process in the total enhancement data is 0; the number of data not yet trained is m.
In the case that the original data is text data, the data enhancement can be performed by performing synonym replacement on the original data, or performing back translation or the like through a data enhancement process in the NLP field, so as to obtain total enhancement data, and specific reference may be made to the related description of the data enhancement process, which is not repeated here.
In the above process, if the original data is short, the total enhancement data is large, and all training can ensure the accuracy and effectiveness of the training result. The ratio of the total enhanced data to the original data is too large, which causes problems such as deterioration. Thus, training in the second stage may be performed in steps, and the training data in each step may be determined as a part of the enhancement data and the original data, and as the number of steps increases, the enhancement data of the added training increases. And determining part of enhancement data which needs to be trained firstly and the proportion of the original data and the enhancement data in the training process of each step, and then training.
As shown in fig. 2, after the first neural network model training is completed, the second stage training process may begin. The second stage training may be performed by Z steps, respectively. Wherein Z is a positive integer. The computing device may train by adding more of a portion of the enhancement data and all of the raw data during each training step, each training step adding data that has not been trained in the total enhancement data of the enhancement data. After Z-step training, all the total enhancement data are input into the first neural network model for training.
In the training process of the Z steps, when the first neural network model in one step is converged, corresponding output data is determined and the completion of the training is determined. In the next training process, training needs to be continued based on the output data after the completion of the training of the previous step and the first neural network model. Wherein, the input data of each step of training is the output data of the last step. The output data may include training models and non-added enhancement data, added and filtered enhancement data, validation results, and the like. Taking the ith training as an example, the training data input in the ith training is the ith-1 data. The i-1 th data may include an i-1 th training model, an i-1 th evaluation result, i-1 th non-added data, and i-1 th filtered data. Optionally, the i-1 th data may also include the i-1 th validation result.
The ith training model is a first neural network model which is trained in the ith step-1; the i-1 evaluation result is the result data after the i-1 added data is evaluated through the i-1 training model; the i-1 th added data is the total enhancement data that has been involved in training (enhancement data that has been added to the training set) in the previous i-1 step; the i-1 th non-added data is total enhancement data which is not participated in training in the previous i-1 step; the i-1 th screened data is the enhancement data of the evaluation screening of the i-1 st added data in the previous i-1 th step. The output data of the i-th training is i-th data, and the i-th data can comprise an i-th training model, an i-th evaluation result, i-th non-added data and i-th screened data. Optionally, the ith data may further include an ith verification result. The ith data may be used for training at step i+1. i is an integer from 1 to Z.
In the i-th training process, the input training data may include pre-training data and total enhancement data, where the pre-training data may include a pre-training model (which may be understood as a 0 th training model, i.e. a first neural network model output by the first stage training), where the 0 th enhancement data is total enhancement data, which means that the 0 th non-added data is total enhancement data and the 0 th filtered data is 0 pieces of data, i.e. enhancement data is not currently trained. The input of the first training step may not be used to screen the data and evaluate the results. For the output of the second stage, the output result is the Z training model.
In the embodiment of the application, the ith training model refers to the first neural network model trained in the ith-1 step, and is not repeated.
In the training process of each step, the basic ideas are the same, and the i-th training is taken as an example and is explained in detail below.
As shown in FIG. 2, the training module may include a data ordering module, an enhanced data determination module, a model training module, and a result evaluation module. The training module may perform the training process of step i, and the following describes the execution process of each module specifically:
The data sorting module can sort the i-1 non-added data based on the i-1 evaluation result to obtain an i-1 data sequence.
The data ordering module may receive the i-1 th evaluation result and the i-1 th is not added. The i-1 th non-added data is total enhancement data which is not participated in training in the previous i-1 steps (enhancement data which is not added to the training set). The i-1 th evaluation result is the result data after the i-1 th added data is evaluated by the i-1 th training model.
Under the condition that the data sorting module receives the i-1 th non-added data and the i-1 th evaluation result output by the i-1 th training, the data sorting module can calculate a first grading value of each piece of enhancement data in the i-1 th non-added data based on the i-1 th evaluation result, and sort the data according to the size of the first grading value to obtain an i-1 th data sequence. The i-1 data sequence is the data after the i-1 data sequence is ordered without adding data, and the i-1 data sequence is ordered according to the order from the small score value to the large score value.
Specifically, the data sorting module may determine the first scoring value based on the text feature data of each piece of enhanced data in the i-1 th non-added data and the evaluation result corresponding to the previous model. Specifically, a first scoring index may be determined based on text feature data, a second scoring index may be determined based on an evaluation result of the training model in the previous step, and then the first scoring index and the second scoring index may be summed to obtain a first scoring value.
First, a first scoring index score1 is determined based on text feature data of each piece of enhancement data in the i-1 th non-added data:
the data sorting module may obtain text feature data of each piece of enhancement data in the i-1 th non-added data. Text characterThe (parameters of the) sign data may include text length l, number of entities c, number of text clauses s, and enhancement data t aug And the original data t ori Distance (t) between aug ,t ori ) One or more of the following.
Where the entity number (the number of information) c represents the number of information extracted. In a particular training model, the computing device can extract information for each piece of enhancement data and determine the extracted points of useful information. For example, the enhancement data is: the "express" respect client, your express arrived at the bird station, pick up the piece code 1234, please get the "wherein, the extracted information point includes" three information of "express", "bird station and pick up the piece code 1234". The amount of information extracted may vary from text to text. The text length l may be the number of strings of the piece of text. The number of text clauses s represents the number of sentences in which a piece of text is divided by punctuation. distance (t) aug_i-1 ,t ori_i-1 ) Can be an editing distance, which means two character strings t aug_i-1 ,t ori_i-1 The distance between them, the minimum number of editing operations required to switch from one to the other. The larger their distance, the more different they are. The permitted editing operations include replacing one character with another, inserting one character, deleting one character. The specific calculation manner of the editing distance may refer to the above description of the editing distance, which is not repeated.
Optionally, the first score index score1 is a weighted sum of the parameters of the text feature data. The data sorting module may determine, based on the text length l, the entity number c, the text clause number s, and the edit distance (each parameter of the text feature data), a first scoring index as: score1=αl+βc+γs+distance (t) aug_i-1 ,t ori_i-1 ). Wherein, alpha, beta and gamma are weight coefficients, which are preset values of the computing equipment, and are required to be specifically determined according to training requirements, and the application is not limited. In addition, t aug_i-1 A text character string of one piece of enhancement data in the data is not added to the i-1 th data; t is t ori_i-1 A text string that is original data corresponding to the piece of enhanced data; wherein the piece of enhanced data t aug_i-1 Is the original data t ori_i-1 And the data enhancement is carried out, so that the two have a corresponding relationship. Of course, the first scoring index may also be: score1 = αl + βc; score1 = αl + γs; score1=αl+distance (t) aug_i-1 ,t ori_i-1 );score1=γ s+ distance(t aug_i-1 ,t ori_i-1 );score1=β c+ distance(t aug_i-1 ,t ori_i-1 );score1=β c+γ s; score1=;score1=α l+β c+ distance(t aug_i-1 ,t ori_i-1 );score1=α l+γ s+ distance(t aug_i-1 ,t ori_i-1 );score1=α l+β c+γ s;score1=β c+γ s+ distance(t aug_i-1 ,t ori_i-1 ) Etc., the application is not limited herein.
Optionally, the first score index score1 is normalized and summed for each parameter of the text feature data. score 1=l/l max + c/c max + s/s ma + distance(t aug_i-1 ,t ori_i-1 )/distance max . Wherein l max 、β max 、γ max 、distance max The values may be preset values (all positive values) or maximum values among all parameters, and the application is not limited. l/l max Is the result of normalizing the text length; c/c max Is the result of normalizing the number of entities; s/s max Is the result of normalizing the number of text clauses; distance (t) aug_i-1 ,t or_i-1i )/distance max Is the result of normalizing the edit distance. Of course, the first scoring index may also be: score 1=l/l max + c/c max ;score1=l/l max + s/s ma ;score1= l/l max + distance(t aug_i-1 ,t ori_i-1 )/distance max ;score1= c/c max + s/s ma ;score1=c/c max + distance(t aug_i-1 ,t ori_i-1 )/distance max ;score1=s/s ma + distance(t aug_i-1 ,t ori_i-1 )/distance max ;score1= l/l max + c/c max + s/s ma ;score1= l/l max + c/c max + distance(t aug_i-1 ,t ori_i-1 )/distance max ;score1= l/l max + s/s ma + distance(t aug_i-1 ,t ori_i-1 )/distance max ;score1=c/c max + s/s ma + distance(t aug_i-1 ,t ori_i-1 )/distance max … …, etc., the application is not limited herein.
Second, a second score index score2 of each piece of enhancement data in the i-1 th non-added data is determined based on the evaluation result (i-1 st evaluation result) of the training model in the last step:
after the i-1 th training of the computing equipment is finished, each piece of data in the i-1 th non-added data has a corresponding evaluation result, namely each piece of enhancement data in the i-1 th non-added data in the i-1 th data is evaluated through the training model of the i-1 th step to obtain an i-1 th evaluation result logic aug_i-1 . The i-1 evaluation result is the result after the i-1-th non-added data is evaluated through the i-1-th training. For each piece of data in the i-1 th non-added data, the data sorting module can calculate the output data logic of each piece of data in the i-1 th non-added data aug_i-1 And ideal results label aug_i-1 Loss value loss (logic) aug_i-1 ,label aug_i-1 ) And confidence (logic) aug_i-1 ,label aug_i-1 ). The data sorting module may then sum the loss value loss and the confidence level confidence to obtain a second evaluation index score2.
The loss (x, y) represents a loss value between x and y, can represent a difference between x and y, and can be calculated differently, so that the application is not limited, and the related description of the loss value can be considered by parameters, and is not repeated. confidence (x, y) represents the confidence between x, y, which is the confidence that the evaluation result is the confidence that the ideal result corresponds to. Where x, y is a classification result or a discrete result such as a label result, the confidence level may be calculated by means of soft max. In the embodiment of the present application, the computing device may store the ideal results of all the enhancement data, which will not be described in detail later.
Optionally, the second evaluation index score2 is a weighted sum of the loss value loss and the confidence score. The data ordering module may determine a second evaluation index score 2=loss (logic aug_i-1 ,label aug_i-1 )- μ confidence(logit aug_i-1 ,label aug_i-1 ). Wherein μ is a weight coefficient, and is a value preset by the computing device, and is specifically determined according to training requirements. Where the weighted summation corresponds to the weighted summation in score1 of the previous step, the method should remain consistent.
Optionally, the second evaluation index score2 is a normalized sum of the loss value loss and the confidence score. The data sorting module may determine a second evaluation index score 2=loss (logic aug_i-1 ,label aug_i-1 )/loss max - confidence(logit aug_i-1 ,label aug_i-1 )/confidence max . Wherein loss is max And confidence max The values may be preset values (all positive values) or maximum values among all parameters, and the application is not limited. loss (logic) aug_i-1 ,label aug_i-1 )/ loss max Is the result of normalizing the loss value; confidence (logic) aug_i-1 ,label aug_i-1 )/confidence max Is the result of normalizing the confidence. Where the normalized summation corresponds to the normalized summation in the previous score1, the method should remain consistent.
Finally, a first score value s is determined based on the first score index score1 and the second score index score2 1
Determining a first score value based on the first evaluation index score1 and the second evaluation index score 2:
optionally, a first scoring value s 1 For a weighted sum of the first score index score1 and the second score index score2, the data ordering module may determine a first score value s 1 =score 1+v score2. Where v is a value preset by the computing device, and needs to be specifically determined according to training requirements. Where the weighted summation corresponds to the weighted summation in score1 and score2 of the previous step, The method should remain consistent.
Optionally, a first scoring value s 1 For the sum of the first score index score1 and the second score index score2, the data ordering module may determine a first score value s 1 =score 1+score 2. Where the normalized summation corresponds to the normalized summation in the previous score1, the method should remain consistent. Where the normalized summation corresponds to the normalized summation in score1 and score2 of the previous step, the method should remain consistent.
After determining the first score value of each piece of enhancement data of the i-1 th non-added data, the first score value s may be based on 1 And sequencing the i-1 non-added data to obtain an i-1 data sequence. Specifically, the data sorting module may sort the i-1 th non-added data according to the first scoring value s 1 And sequencing from small to large to obtain an i-1 data sequence. I.e. s 1 The smaller the order, the earlier the order of ordering; s is(s) 1 The larger the order, the later the order of ordering. The data ordering module may then send the i-1 th data sequence to the duty cycle determination module, and correspondingly, the data ordering module may receive the i-1 th data sequence from the duty cycle determination module.
In the above embodiment, the computing device is able to determine the score value based on the text feature data and the data of the last step, that is, the height of the score value is related to the characteristics of the enhancement data itself, and the quality of the enhancement data evaluation. Through the process, the rest enhancement data which are not trained before training can be ordered, the text features are simpler, the more accurate enhancement data are easier to output, the more preferentially the enhancement data are input into the neural network model, namely the enhancement data are added into the training set from easy to difficult, the better quality data are used at the stage of better effect of the early training, and the more stable and better effect of the training model are ensured.
In the above process, as i increases, the number of i-1 non-added data becomes smaller, and when i is Z, the number of Z-th enhanced data is 0, i.e. there is no output of enhanced data.
The enhancement data determining module determines the enhancement data input to the i-th training step and may include a duty ratio determining module and a scaling module. The enhanced data determination module may determine the ith training data based on the ith data sequence, the ith-1 filtered data, and the original data. The ith training data is the training set of the ith step, and may include original data and enhanced data.
The duty ratio determining module is used for determining the proportion (i-th enhancement duty ratio) of the enhancement data to be trained in the current step (i-th training) to all the enhancement data (total enhancement data).
The duty ratio determining module may determine an i-th enhancement duty ratio of the i-th enhancement data of the training input in the i-th step in terms of number to the total enhancement data, and then determine the i-th incremental data from the i-1-th data sequence based on the i-th enhancement duty ratio. Wherein the ith enhancement duty cycle is a proportion of the incremental data of the ith training input to the total enhancement data. The ith incremental data is an incremental portion of the augmentation data input to the first neural network model for training at the ith step.
In the embodiment of the application, in the training process from the 1 st step to the Z-th step, the enhancement duty ratio of the subsequent training is greater than or equal to the enhancement duty ratio of the previous training. I.e. the current step is that i is sequentially from 1 to Z, and lambda (i) is greater than or equal to lambda (i-1). The ith enhancement duty ratio lambda (i) is the ith enhancement duty ratio in the training process of the ith-1 step, lambda (i-1) is the ith-1 enhancement duty ratio determined in the training process of the ith-1 step, and the enhancement duty ratio is the proportion of the enhancement data participating in the training of the step to m pieces of total enhancement data.
Based on the above description, two methods of determining the i-th enhancement ratio are described below, respectively.
Method 1: determining an ith enhancement duty cycle based on a function:
the duty ratio determining module can calculate the function proportion of the current step. Wherein Z and lambda 0 May be a set value. Where Z is the maximum step of training, when i reaches Z, it can be determined that d=1, where the i-th enhancement duty cycle is 1, and thenTraining all the enhancement data.
In the case of determining the function proportion d, the duty cycle determination module may determine the i-th enhancement duty cycle as the minimum value of the function proportion and 1, i.e., may determine that the i-th enhancement duty cycle of the i-th training step is λ (i) =min (1, d). Wherein the result of min (1, d) is a minimum between 1 and d.
Fig. 3 is a schematic diagram of a function between an ith enhancement duty cycle and training step i according to an embodiment of the present application. As shown in fig. 3, as i increases, the function proportion d gradually increases, and the unequal proportion increases, with a larger proportion of the early increase and a smaller proportion of the later increase. Since i is equal to Z (in this case, z=100), if i is greater than Z, λ (i) is set to 1 and kept at the maximum value of 1. According to the function, as the training steps are increased, the quantity of the actually added enhancement data for training is smaller and smaller, so that the enhancement data of the early training can bring substantial influence to the model, and the model of the later training tends to be stable, and the training result is better.
Method 2: dynamically adjusting the ith enhancement duty cycle based on the results of the previous step of verification:
the duty cycle determination module may obtain the i-1 th verification result. The i-1 th verification result is obtained by verifying the first neural network model trained in the previous step (i-1 st step), namely, inputting verification data into the current i-1 th training model, wherein the verification result is logic dev . Wherein the ideal result of the verification data is label dev . Ideal result label dev To verify the correct output of data, the computing device may be pre-stored. The duty cycle determination module may determine the duty cycle based on the i-1 th verification result logic dev And ideal results label dev A gap value k between the two can be determined.
Wherein loss is pre Representing the loss value, loss between the i-2 th verification result and the ideal result pre Results of (3)The method is that the obtained (stored) data is calculated in the last step and is directly used; loss of loss post Representing a loss value between the i-1 th validation result and the ideal result; the i-1 verification result is obtained by inputting verification data into the current i-1 training model; loss of loss post Representing a loss value between the i-1 th validation result and the i-1 th ideal result; loss of loss post Representing the loss value between the i-2 th verification result and the i-2 th ideal result, typically loss post The calculation can be already performed in the i-1 step training, and the calculation can be directly used for calculating k.
Likewise, the duty cycle determination module also needs to scale d according to the determination function in method 1. The specific determination method may refer to the specific description in the above method 1, and is not repeated.
After determining k and d, the duty cycle determining module may obtain, based on the gap value k and the function proportion d, an ith enhancement duty cycle as follows: λ (i) =max (1, 1+k) ∙ d. Where the result of max (x, y) is the maximum value in x, y.
In the process of determining the ith enhancement duty ratio in the method 2, the ith enhancement duty ratio can be adjusted based on the verification result of the training model in the previous step, and positive and negative conditions exist in k: in the case of loss pre Greater than loss of post The k value is positive, the loss value is better after the training results of the front step and the back step, at this time, the ith enhancement duty ratio is (1+k) d, and the incremental data is larger. In another case, loss pre Less than loss of post The k value is negative, the loss value is degraded through two steps of training, at the moment, the ith enhancement duty ratio is d, and incremental data are increased according to the size of d. Above-mentioned in-process, under the effectual circumstances of verification, can improve incremental data, when can improving output, improve training degree of accuracy, guarantee training efficiency, training effect is not good according to d increase can.
After determining the ith enhancement duty cycle, the duty cycle determination module may determine the ith delta data based on the ith enhancement duty cycle, the ith-1 data sequence data. The duty ratio determining module knows the i-th enhancement duty ratio lambda (i-1) in the current training process of the previous step (i-1 training), and 1-lambda (i-1) is the proportion of the i-1-th non-added data to the total enhancement data. Assuming that the current total enhancement data is m in number, it can be determined that the i-th delta data is the first m ∙ (λ (i) - λ (i-1)) piece of data in the i-1-th data sequence, that is, the i-th delta data is m ∙ (λ (i) - λ (i-1)). The duty cycle determination module may then send the i-th delta data to the scaling module. Correspondingly, the scaling module may receive the i-th delta data from the duty cycle determination module. At this time, the i-1 th data to be added is the enhancement data which does not participate in the training in the previous i-1 step and waits for participating in the training, and the i-1 th data to be added is the i-1 th data not to be added.
Illustratively, assuming that the training of one step is one epoch, the ith step is the ith epoch, and the training data of the ith epoch includes the original data and the ith increment data. Assuming that the total enhancement data has m=10000, Z is determined to be 20 (30 epochs in total), the ith enhancement ratio lambda (i) of the ith epochs is determined to be 35.5%, the ith-1 enhancement ratio lambda (i-1) of the ith-1 epochs is determined to be 30.0%, and the proportion of the input enhancement data to the total data is 35.5% -30.0% =5.5%. The computing device may determine that the number of i-th incremental data is 10000 x 5.5% = 550. The first 550 pieces of i-1 data in the i-1 data sequence can be determined as i-th incremental data. The above examples are not intended to be limiting.
In addition, the duty cycle determination module may obtain the i-1 th filtered data. The duty cycle determination module may determine the ith non-added data and the ith added data based on the ith-1 filtered data, the ith delta data, and the ith-1 data sequence. The i-th non-added data is the enhancement data of the i-th incremental data part removed from the i-1-th non-added data (or the i-1-th data sequence), namely the i-th incremental data in the i-1-th non-added data is removed, and the rest data is determined as the i-th non-added data. The i-th added data is the i-1-th screened data added with the enhancement data of the i-th increment data, namely the i-th increment data and the i-1-th screened data are determined as the i-th added data. The duty cycle determination module may then send the i-th non-added data and the i-th added data to the result evaluation module. Correspondingly, the result evaluation module may receive the i-th non-added data and the i-th added data from the duty cycle determination module. The ith added data is the enhancement data which does not participate in training in the previous i training steps; the i-1 th non-added data is the enhancement data which does not participate in training in the previous i-1 training steps; the ith added data is the enhancement data which has participated in training in the previous i training steps.
Illustratively, the i-1 th screened data is known to have 5000 pieces; the ith-1 data sequence has 5000 pieces, the ith increment data has 550 pieces, and 4450 pieces of which 550 pieces of the ith increment data are removed from 5000 pieces of the ith-1 non-added data can be determined. The ith added data is 5000 pieces of the ith-1 screened data plus 550 pieces of the ith increment data 5550 pieces of the ith increment data.
The duty ratio determining module can determine the enhancement data participating in the training of the current step, namely the ith increment data and the ith filtered data. In the above process, the computing device can determine which portion of the total enhancement data participates in the training through the ith enhancement ratio. The size of the ith enhancement duty cycle is adjustable, meaning that the adjustment of the amount of data is enhanced, thereby improving the training effect and improving the training efficiency.
The scaling module determines a ratio between the enhanced data and the original data over the amount of data trained.
The scaling module may determine the ith training data after determining the raw data, the ith-1 filtered data, and the ith delta data. Wherein the data participating in the i-th training includes the start data and the i-th added data. Wherein the ith added data includes the ith-1 filtered data and the ith delta data.
Specifically, the scaling module may further receive the i-1 verification result (specifically, reference may be made to the identifier of the above-mentioned duty ratio determining module, which is not described in detail), determine the i-1 training ratio based on the i-1 verification result, and then may adjust the ratio of the original data and the enhanced data participating in the training based on the i-1 training ratio, to obtain the i-training data. The ith training proportion is the proportion of the enhanced data and the original data in the data quantity in the ith training. The ith training data is formed training data in which original data and enhanced data (ith added data) are allocated in accordance with the ith training proportion.
Alternatively, the scaling module may pre-store the original data and the enhanced data in terms of number of initial scale batches ori :batch aug . Wherein, batch ori As an initial scale value of the original data, batch aug To enhance the initial scale value of the data. The scaling module may determine the scaling value fatly based on k ori ∙ (1-k). Wherein k is a difference value between the verification results of the ith step and the i-1 th step, and the calculation process can refer to the related description in the above duty ratio determining module, and the calculation result can be specifically prolonged, which is not repeated. The scaling module may then fatter based on the scaling value ori And batch aug The ith training proportion of the ith training can be determined to be batch ori (i):batch aug (i)=batch ori -∆:batch aug And + in. Namely batch ori (i)=batch ori -∆;batch aug (i)=batch aug +∆。batch ori And batch aug Is a preset value; the ith training proportion is the proportion of the original data and the enhancement data in the ith training in the training quantity.
Optionally, the scaling module may scale the ith training scale to be batch ori (i):batch aug (i) Directly as an initial ratio batch ori :batch aug . At this time, the training proportion does not need to be adjusted, so that the training process can be simplified, and the training efficiency is improved.
In the case of determining the ith training proportion, the computing device may determine the ith training data based on the ith training proportion, the raw data, and the ith added data. The training data i can represent the training proportion of the original data and the enhanced data, that is, the training quantity proportion of the i added data and the n pieces of original data is the training proportion i, and the training quantity proportion of the original data and the enhanced data can be determined to be in a proper size, for example, about 1:1. Therefore, the enhancement data can be ensured not to excessively act on the neural network training model, the deviation of the model is avoided, and the output effect of the trained first neural network model is ensured.
Exemplary, assume a batch has32 pieces of data, wherein the initial proportion value of the original data and the enhanced data is batch ori :batch aug =16:16. In the case where the father is calculated to be 2, the batch may be determined aug (i):batch aug (i) =14: 18. the number n of the current original data is 1400, the number of the enhanced data is 1200, and one epoch needs to train 2600 pieces of data. At this time, 1200 pieces of enhanced data may be repeated so that the total amount becomes 1800 (i.e., 600 pieces of enhanced data are re-read therein), so that the ratio of the number of original data to the number of enhanced data may be kept at 14:18. at this time, the ith training data is 1400 pieces of original data and 1800 pieces of enhanced data (600 pieces of which have duplicates).
The model training module inputs the ith training data into the ith-1 training model to train, and the ith training model is obtained.
Wherein the first neural network model trained in the ith step is the network model after convergence of training in the ith-1 step. In the process of training the ith model, the model training module uses the ith training data as a training set, inputs the ith training data into the ith-1 training model, and outputs a converged model result, namely the ith training model. In the training process, the parameters of the first neural network model are adjusted according to the direction of decreasing the loss value until the loss value is not decreased, that is, the training of the ith step is determined to be completed, and the next (i+1) step training can be performed. The i-1 training model is a model result obtained by training in the i-1 step (last step); the ith training model is a model result obtained by training in the ith step (this step).
After the model training module obtains the ith training model, the ith training model can be sent to the result evaluation module; correspondingly, the result evaluation module may receive the ith training model from the model training module.
The result evaluation module evaluates the ith added data based on the ith training model to obtain an output result of the ith added data, and screens the ith added data based on the output result to obtain ith screened data. In addition, the result evaluation module can input the ith non-added data into the ith training model to obtain an ith evaluation result, and can also input the verification data into the ith training model to obtain an ith verification result.
Firstly, screening abnormal data in the ith added data to obtain ith screened data.
Under the condition that the ith added data is obtained and the ith training model is obtained, the result evaluation module can input the ith added data into the ith training model to obtain an output result of the ith added data, and screen the ith added data based on the output result to obtain the ith screened data. The i-th screened data is data for screening the enhancement data which has participated in training in the previous i training steps.
The result evaluation module may first determine a third scoring indicator based on text feature data of the ith added data. And under the condition that the model training module finishes training to obtain an ith training model, the result evaluation module can input the ith added data into the ith training model to obtain an output result, and score the output result to obtain a fourth scoring index. The result evaluation module may then determine a second score value based on the third score indicator and the fourth score indicator.
First, a third score index score3 is determined based on the text feature data of each piece of enhancement data in the i-th added data:
optionally, the third score index score3 is a weighted sum of parameters of the text feature data of each piece of the i-th added data. The result evaluation module may determine, based on the text length l, the entity number c, the text clause number s, and the edit distance of each piece of data, a third scoring index as: score3 = αl + βc + γs + distance (t) aug_i ,t ori_i ). Wherein t is aug_i Data of the i-th added data, t ori_i Is the original data corresponding to the i-th added data. The specific parameters involved in the calculation process may refer to the related description of the first scoring index in the text ranking module, which is not repeated.
Optionally, the third score index score3 normalizes and sums the parameters of the text feature data. Scor2=l/l max + c/c max + s/s ma + distance(t aug_i ,t ori_i )/distance max . Wherein l max 、β max 、γ max 、distance max The values may be preset values (all positive values) or maximum values among all parameters, and the application is not limited. l/l max Is the result of normalizing the text length; c/c max Is the result of normalizing the number of entities; s/s max Is the result of normalizing the number of text clauses; distance (t) aug_i ,t ori )/distance max Is the result of normalizing the edit distance.
The specific parameters involved in the calculation process of the two methods may refer to the related description of the first scoring index in the data sorting module, which is not repeated.
Next, a fourth score index score4 is calculated based on the output result of the i-th added data:
fourth evaluation index score 4=loss (logic aug_i ,label aug_i )+ μ confidence(logit aug_i ,label aug_i ). Wherein, logic aug_i Is the data of the ith increment data, label aug_i Is ideal data for the i-th incremental data.
Optionally, the fourth evaluation index score4 is a weighted sum of the loss value loss and the confidence score. The result evaluation module may determine a fourth evaluation index score 4=loss (logic aug_i ,label aug_i ) - μ confidence(logit aug_i ,label aug_i ). Wherein μ is a weight coefficient, and is a value preset by the computing device, and is specifically determined according to training requirements. Where the weighted summation corresponds to the weighted summation in the previous score3, the method should remain consistent. Wherein, each piece of data in the ith added data outputs a result logic aug_i And ideal results label aug_i
Optionally, the fourth evaluation index score4 is a normalized sum of the loss value loss and the confidence score. The result evaluation module may determine a fourth evaluation index score 4=loss (logic aug_i ,label aug_i )/loss max - confidence(logit aug_i ,label aug_i )/confidence max . Where the normalized summation corresponds to the normalized summation in the previous score3, the method should remain consistent.
The specific parameters involved in the calculation process of the two methods may refer to the related description of the second scoring index in the data sorting module, which is not repeated.
Finally, a second score value s is determined based on the third score index score3 and the fourth score index score4 2
Optionally, a second score value s 2 For a weighted sum of the third score index score3 and the fourth score index score3, the result evaluation module may determine a second score value s 2 =score 3+v score4. Where v is a value preset by the computing device, and needs to be specifically determined according to training requirements. Where the weighted summation corresponds to the weighted summation in score3 and score4 of the previous step, the method should remain consistent.
Optionally, a second score value s 2 For the sum of the third score index score3 and the fourth score index score4, the result evaluation module may determine the second score value s 2 =score 3+score 4. Where the normalized summation corresponds to the normalized summation in the previous score3, the method should remain consistent. Where the normalized summation corresponds to the normalized summation in score3 and score4 of the previous step, the method should remain consistent.
The calculating process of the second score value may also refer to the calculating process of the first score value, and specifically, the specific evaluation method thereof may refer to the above-mentioned data sorting module, which is not described in detail.
After the result evaluation module obtains the ith added data, the ith added data can be filtered based on the output result of the ith added data, so as to obtain the ith filtered data.
The result evaluation module may determine abnormal data in the i-th added data based on the output result of the i-th added data. Specifically, the result evaluation module may evaluate the second scoring value s in the ith added data 2 And determining the enhancement data which is larger than the first threshold value as abnormal data, and eliminating the abnormal data in the i-th added data to obtain i-th screened data. Ith addedThe output result of the data is the output result of inputting the i-th added data to the i-th training model.
Optionally, a second score value s is obtained for each piece of data in the ith added data 2 The result evaluation module may then determine the anomaly data. At a certain second score value s 2 In the case of a second score value significantly greater than the other i-th incremental data, this data may be determined as abnormal data. Specifically, s 2_n -s value >In the case of f (first threshold), s can be determined 2_n The corresponding enhancement data is abnormal data. Wherein s is 2_n Is one of the X pieces of data of the ith increment data, s value Second scoring value s for the X ith incremental data 2 F is a preset value.
Alternatively, the result evaluation module may determine the first threshold according to the sigma, 2sigma, and 3sigma principles in the probability density function of the gaussian distribution and screen the anomaly data from the ith added data.
The above two ways are merely illustrative, and the method for determining abnormal data according to the embodiment of the present application is not limited.
After the abnormal enhancement data is determined, the abnormal data in the i-th added data can be removed, so that i-th screened data can be obtained.
Illustratively, the ith added data is 5550 pieces of abnormal data, 2 pieces of which are pieces of abnormal data, and the obtained ith filtered data is 5548 pieces of data. In this way, in the next training process, poor model effect caused by abnormal data is avoided, so that convergence data of the first neural network model can be optimized, and influence of abnormal points on the model is avoided.
Inputting the ith non-added data into the ith training model to obtain an ith evaluation result.
The result evaluation module knows the ith training model obtained after the ith training is completed, the ith non-added data can be input into the ith training model, the data which does not participate in training is evaluated, and the obtained output result is the ith evaluation result. The ith evaluation result is used for sorting processing on the data sorting module of the next step (the (i+1) th step).
Thirdly, inputting the verification data into an ith training model to obtain an ith verification result.
The computing device stores verification data, and the result evaluation module can input the verification data into an ith training model to obtain an ith verification result. The ith verification result may be used for (step i+1) determining incremental data and training data of the next step by the enhanced data determining module.
In the embodiment of the application, all enhancement data should have been trained as a training set prior to the Z-th step training. For example, the end of training may be determined after training H steps from all the enhancement data as a training set input into the training model, where H is an integer greater than 1 and less than Z. Assuming that from step 1 to step H, the enhancement data gradually increases to the maximum, and to step H, all enhancement data are input into the model for training, and in the training of step H+1 to step Z, all enhancement data of the original data and the abnormal data are removed are used as training sets, so that the abnormal data are dynamically screened during training, training is performed after all the abnormal data are removed, the effect of training the model is improved, and the indexes of the model are ensured to be output.
The above modules may be common to the training of 1~Z steps, and therefore, the training result of the previous step is also memorized, so that the data of the previous step, the enhancement data which have not been trained, and the like can be directly determined. As the value of i increases from 1 to Z, the computing device may train the first neural network model in each step, through completion of the Z-th step training.
In addition, the data sorting module, the enhanced data determining module and the result evaluating module are all optional in 4 modules in the training modules. Any combination of these three modules may be used, and reference may be made to the above description for the methods formed by the various combinations, which are not repeated here. Further, the enhancement data determination module may include a duty cycle determination module and/or a scaling module.
In order to more accurately compare the embodiment of the application, a training method is provided, and the training results are compared. The general method is a training method in which raw data and enhanced data are directly input to a neural network model. The training method of the application is a method for dividing training into two training stages, training through original data in a first stage and training and increasing enhancement data in proportion step by step in a second stage. For the neural network model of intent speculation and information extraction, the results of the intent speculation and the information extraction are compared with each other: in the general training method, the accuracy of the intention estimation result output by the trained neural network model is 95.53%, and the accuracy of the information extraction is 97.27%. The accuracy of the intent speculation results output in the neural network mode trained by the present application was 96.25%; the accuracy of the information extraction was 97.62%. The training results are compared to find that the neural network model trained by the embodiment of the application has higher accuracy, and the training method for the reinforced data sub-steps according to different proportions in different steps in the training process is described, so that the neural network model is ensured not to have larger deviation, and the training result is more stable.
In the above embodiment, the i-th increment data is determined by adding the i-th increment ratio generated dynamically to the training set to perform training, and the training proportion is allocated to the original data and the enhanced data according to the i-th training proportion which dynamically changes, so that the index of the training model and the data offset can be improved. According to the method, incremental data are added into the training set for training according to the steps, and when all the enhancement data are ensured to participate in the training process, the reliability and the model effect of model training can be improved by gradual addition, the effect of the enhancement data is fully exerted, meanwhile, deviation of a training model is avoided, and the training effect is improved. After training of each step of model is finished, abnormal noise data are screened, so that the noise problem caused by enhancing the data can be solved, and the training index of the training model in a data scarcity scene is improved.
Method structure of training model in connection with fig. 2 fig. 4 is a schematic flow chart of another method of training a model according to an embodiment of the present application. As shown in fig. 4, in the model training method, the computing device only includes the enhanced data determining module, the result evaluating module, and the model training module, that is, does not include the data sorting module in fig. 2. The following model training method differs from the method in fig. 2, and is specifically described below.
As shown in FIG. 4, in the training of the ith step, the ith-1 data input for training of the ith step includes the ith-1 non-added data, the ith-1 filtered data, the ith-1 verification result and the ith-1 training model. Because the computing device does not need to be ordered by the data ordering module, the i-1 th evaluation result is no longer input, and therefore the i-1 th data does not include the i-1 th evaluation result.
As shown in fig. 4, the duty ratio determining module may determine m ∙ (λ (i) - λ (i-1)) pieces of data among the i-1 th non-added data as the i-th delta data after determining the i-th enhancement duty ratio λ (i). At this time, the input data of the duty ratio determining module replaces the i-1 data sequence in fig. 2. At this time, the i-1 th data to be added is the enhancement data which does not participate in the training in the previous i-1 step and waits for participating in the training, and the i-1 th data to be added is the i-1 th data not to be added.
As shown in fig. 4, the result evaluation module may evaluate the i-th added data based on the i-th training model to obtain an output result of the i-th added data, and reject the abnormal data in the i-th added data based on the output result of the i-th added data to obtain the i-th filtered data. In addition, the result evaluation module can also input verification data into an ith training model to obtain an ith verification result. At this time, the result evaluation module does not need to execute the input of the ith non-added data into the ith training model in fig. 2 to obtain the ith evaluation result.
In the method embodiment shown in fig. 4, the sorting process is eliminated, and the training method is more efficient and better in execution efficiency than the training method of the computing device shown in fig. 2 while the training effect can be ensured.
In the embodiment of the application, the training process can be completed by a computing device, a cloud device and the like, for example, a server and the like. After the training is completed, the first neural network model can be issued to the terminal device, so that the use stage is entered.
The following describes the apparatus according to the embodiment of the present application.
Fig. 5 is a schematic hardware structure of a computing device 100 according to an embodiment of the present application.
Computing device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (Universal Serial Bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, and a wireless communication module 160.
It should be understood that the architecture illustrated by embodiments of the present application is not intended to constitute a particular limitation of computing device 100. In other embodiments of the application, computing device 100 may include more or less components than illustrated, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (Application Processor, AP), a modem processor, a graphics processor (Graphics Processing unit, GPU), an image signal processor (Image Signal Processor, ISP), a controller, a memory, a video codec, a digital signal processor (Digital Signal Processor, DSP), a baseband processor, and/or a Neural network processor (Neural-network Processing Unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.
Among other things, the controller may be a neural hub and a command center of the computing device 100. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.
A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.
The charge management module 140 is configured to receive a charge input from a charger. The charging management module 140 may also power the computing device 100 through the power management module 141 while charging the battery 142. In an embodiment of the present application, the charge management module may include a battery charging module.
The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like.
The wireless communication functions of computing device 100 may be implemented by antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, modem processor, baseband processor, and the like.
The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in computing device 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas.
The mobile communication module 150 may provide a solution for wireless communications, including 2G/3G/4G/5G, as applied on the computing device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (Low Noise Amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate.
The modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or video through the display screen 194.
The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (Wireless Local Area Networks, WLAN) (e.g., wireless fidelity (Wireless Fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (Global Navigation Satellite System, GNSS), frequency modulation (Frequency Modulation, FM), near field wireless communication technology (Near Field Communication, NFC), infrared technology (IR), etc., as applied on the computing device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.
The external memory interface 120 may be used to connect external memory cards, such as Micro SD cards, to enable expansion of the memory capabilities of the computing device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.
The internal memory 121 may be used to store computer executable program code including instructions. The processor 110 executes various functional applications of the computing device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image video playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the computing device 100 (e.g., audio data, phonebooks, etc.), and so on.
As used in the above embodiments, the term "when …" may be interpreted to mean "if …" or "after …" or "in response to determination …" or "in response to detection …" depending on the context. Similarly, the phrase "at the time of determination …" or "if detected (a stated condition or event)" may be interpreted to mean "if determined …" or "in response to determination …" or "at the time of detection (a stated condition or event)" or "in response to detection (a stated condition or event)" depending on the context.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), etc.
Those of ordinary skill in the art will appreciate that implementing all or part of the above-described method embodiments may be accomplished by a computer program to instruct related hardware, the program may be stored in a computer readable storage medium, and the program may include the above-described method embodiments when executed. And the aforementioned storage medium includes: ROM or random access memory RAM, magnetic or optical disk, etc.

Claims (15)

1. A model training method, the method being applied to a computing device, the method comprising:
the computing equipment inputs n pieces of original data into a first neural network model for training to obtain a pre-training model;
the computing equipment carries out data enhancement on the n pieces of original data to obtain m pieces of total enhancement data; the original data and the total enhancement data are text data;
the computing equipment takes the n pieces of original data and the m pieces of total enhancement data as training data, and trains the pre-training model in Z steps;
wherein, for step i training, the computing device determines a function proportion d and determines an i-th enhancement duty cycle λ (i) based on the d; the enhancement ratio is the proportion of the enhancement data participating in the training of the step to the m pieces of total enhancement data; the said Said Z and said lambda 0 Is a preset value;
the computing equipment determines ith training data based on the lambda (i), and inputs the ith training data into an ith-1 training model for training to obtain an ith training model; the value of i is sequentially from 1 to Z; the lambda (i) is more than or equal to lambda (i-1); the lambda (i-1) is the i-1 enhancement duty ratio determined in the training process of the i-1 step, the i-th training data is added with the i-th increment data more than the i-1 training data, the i-th training data comprises the n pieces of original data, and the n, the m and the Z are all positive integers.
2. The method according to claim 1, wherein the computing device determines an i-th enhancement duty cycle λ (i) based on the d, in particular comprising:
the computing device determining an i-th enhancement duty cycle λ (i) =min (1, d) based on the d; or alternatively, the first and second heat exchangers may be,
the computing device obtains the ith-1 verification result, and based on the ith-1 verification result logic dev And ideal results label dev Determining a gap value k:
the computing device determining the d and determining an ith enhancement duty cycle λ (i) =max (1, 1+k) ·d based on the d and the k;
wherein the loss is pre Representing a loss value between an i-2 th validation result and the ideal result; the loss is post Representing a loss value between the i-1 th validation result and the ideal result; the i-1 verification result is obtained by inputting verification data into the current i-1 training model; the i-2 verification result is obtained by inputting the verification data into the current i-2 training model; the ideal result is the correct output result of the verification data.
3. The method according to claim 1, wherein the computing device determines an ith training data based on the λ (i), in particular comprising:
the computing device determining an ith delta data based on the λ (i); the ith increment data is enhancement data of newly added training in the ith training;
the computing device obtains the ith-1 verification result logic dev Based on the i-1 th verification result and the ideal result label dev Determining a gap value k:
wherein the loss is pre Representing the i-2 th verification result and the ideal junctionLoss values between fruits; the loss is post Representing a loss value between the i-1 th validation result and the ideal result; the i-1 verification result is obtained by inputting verification data into the current i-1 training model; the i-2 verification result is obtained by inputting the verification data into the current i-2 training model; the ideal result is a correct output result of the verification data;
The computing device determines an adjustment value Δ=batch based on the k ori (1-k) and based on an initial scale value batch of the raw data ori Initial scale value batch of enhancement data aug And said k determining an ith training ratio batch ori (i):batch aug (i)=batch ori -Δ:batch aug +Δ; wherein the batch ori And the batch aug Is a preset value; the ith training proportion is the proportion of original data and enhanced data in the number of training in the ith training;
the computing device determines an ith training data based on the ith training scale, the ith delta data, and the n pieces of raw data.
4. A method according to claim 3, wherein the computing device determines the i-th delta data based on the λ (i), comprising in particular:
the computing device determines the data of the previous m (lambda (i) -lambda (i-1)) in the i-1 th data to be added as the i-th incremental data; the data to be added in the ith step-1 is the enhancement data which does not participate in training in the previous step i-1 and waits for participating in training.
5. The method according to claim 4, wherein in the case where the i-1 th evaluation result is obtained and no data is added to the i-1 th evaluation result, the method further comprises: the computing equipment performs sorting processing on the i-1-th non-added data based on the i-1-th evaluation result to obtain an i-1-th data sequence, wherein the i-1-th evaluation result is a result obtained after the i-1-th non-added data is evaluated through i-1-th training; the i-1 th non-added data is the enhancement data which does not participate in training in the previous i-1 step;
The computing device determines the data of the previous m (lambda (i) -lambda (i-1)) in the i-1 th data to be added as the i-th increment data, and specifically comprises the following steps:
the computing device determines the first m- (λ (i) - λ (i-1)) pieces of data in the i-1 th data sequence as i-th delta data.
6. The method according to claim 5, wherein the computing device performs a sorting process on the i-1 non-added data based on the i-1 evaluation result to obtain an i-1 data sequence, specifically including:
the computing device determining a first scoring index score1 for each piece of enhancement data based on the text feature data for which no data is added by the i-1 th item;
the computing device determines a second scoring index score2 for each piece of enhancement data in the i-1 th non-added data based on the i-1 th evaluation result;
the computing device determines a first score value s based on the first score index score1 and the second score index score2 1
The computing device is based on the first scoring value s 1 And sequencing the i-1 non-added data to obtain an i-1 data sequence.
7. The method of claim 6, wherein the computing device determines a first score index score1 for each piece of enhancement data based on the text feature data of the i-1 th non-added data, specifically comprising:
The computing device determines text feature data of each piece of enhancement data in the i-1 non-added data, and determines a first scoring index score1 as a weighted summation of parameters of the text feature data; the parameters of the text characteristic data comprise one or more of text length l, entity number c, text clause number s and editing distance;
the computing device determines a second score index score2 of each piece of enhancement data in the i-1 non-added data based on the i-1 evaluation result, and specifically includes:
the computing device determines a loss value and a confidence coefficient of each piece of enhanced data in the i-1-th non-added data based on the i-1-th evaluation result, and determines a second evaluation index score2 as a weighted sum of the loss value and the confidence coefficient;
the computing device determines a first score value s based on the first score index score1 and the second score index score2 1 The method specifically comprises the following steps:
the computing device determines a first scoring value s for each piece of enhancement data in the i-1 th non-added data 1 A weighted sum of the first score index score1 and the second score index score 2;
the computing device is based on the first scoring value s 1 Sequencing the i-1 non-added data to obtain an i-1 data sequence, which specifically comprises the following steps:
the computing device follows the i-1 th non-added data according to the first grading value s 1 And sequencing from small to large to obtain an i-1 data sequence.
8. The method of claim 6, wherein the computing device determines a first score index score1 for each piece of enhancement data based on the text feature data of the i-1 th non-added data, specifically comprising:
the computing equipment determines text characteristic data of each piece of enhancement data in the i-1 non-added data, and determines a first scoring index score1 as normalized summation of various parameters of the text characteristic data; the parameters of the text characteristic data comprise one or more of text length l, entity number c, text clause number s and editing distance;
the computing device determines a second score index score2 of each piece of enhancement data in the i-1 non-added data based on the i-1 evaluation result, and specifically includes:
the computing device determines a loss value and a confidence coefficient of each piece of enhanced data in the i-1-th non-added data based on the i-1-th evaluation result, and determines a second evaluation index score2 as a normalized summation of the loss value and the confidence coefficient;
The computing device determines a first score value s based on the first score index score1 and the second score index score2 1 The method specifically comprises the following steps:
the computing device determines a first scoring value s for each piece of enhancement data in the i-1 th non-added data 1 A sum of the first score index score1 and the second score index score 2;
the computing device is based on the first scoring value s 1 Sequencing the i-1 non-added data to obtain an i-1 data sequence, which specifically comprises the following steps:
the computing device follows the i-1 th non-added data according to the first grading value s 1 And sequencing from small to large to obtain an i-1 data sequence.
9. The method of claim 5, wherein after the computing device determines the i-th delta data based on the λ (i), the method further comprises:
under the condition that the ith incremental data in the ith-1 non-added data is obtained, the computing equipment eliminates the ith incremental data in the ith-1 non-added data, and the rest data is determined as the ith non-added data; the ith added data is the enhancement data which does not participate in training in the previous i training steps; the i-1 th non-added data is the enhancement data which does not participate in training in the previous i-1 training steps;
And under the condition that an ith training model is obtained, the computing equipment inputs the ith non-added data into the ith training model to obtain an ith evaluation result.
10. A method according to claim 3, wherein the computing device determines an ith training data based on the ith training scale, the ith delta data and the n pieces of raw data, comprising:
in the event that i-1 filtered data is acquired, the computing device determines the i-th delta data and the i-1 filtered data as i-th added data; the ith added data is the enhancement data which is already participated in training in the previous i training steps; the i-1 screened data is data for screening the enhancement data which are already involved in training in the previous i-1 training steps;
the computing device determining an ith training data based on the ith added data, the n pieces of raw data, and the ith training scale; the ratio of the ith added data to the n pieces of original data in the training quantity is the ith training ratio.
11. The method of claim 10, wherein after the computing device determines the i-th delta data based on the λ (i), the method further comprises:
And under the condition that the ith added data is obtained and the ith training model is obtained, the computing equipment inputs the ith added data into the ith training model to obtain an output result of the ith added data, and screens the ith added data based on the output result to obtain ith screened data.
12. The method according to claim 11, wherein the computing device screens the i-th added data based on the output result to obtain i-th screened data, specifically comprising:
the computing device determines a second scoring value s for the ith added data 2
The computing device adds a second score value s to the ith added data 2 And determining the enhancement data which is larger than a first threshold value as abnormal data, and eliminating the abnormal data in the i-th added data to obtain i-th screened data.
13. The method of claim 12, wherein the computing device determines a second scoring value s for the ith added data 2 The method specifically comprises the following steps:
the computing device determining a third scoring index score3 for each piece of enhancement data based on the text feature data of the ith added data;
The computing device determines a fourth score index score4 of each piece of enhancement data in the ith added data based on the output result of the ith added data;
the computing device determines a second score value s based on the third score index score3 and the fourth score index score4 2
14. A computing device, comprising: one or more processors and one or more memories; the one or more processors being coupled with the one or more memories, the one or more memories being configured to store computer program code, the computer program code comprising computer instructions that, when executed by the one or more processors, cause the computing device to perform the method of any of claims 1-13.
15. A computer-readable storage medium comprising instructions that, when executed on a computing device, cause the computing device to perform the method of any of claims 1-13.
CN202211715713.7A 2022-12-30 2022-12-30 Model training method and computing equipment Active CN115688868B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211715713.7A CN115688868B (en) 2022-12-30 2022-12-30 Model training method and computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211715713.7A CN115688868B (en) 2022-12-30 2022-12-30 Model training method and computing equipment

Publications (2)

Publication Number Publication Date
CN115688868A CN115688868A (en) 2023-02-03
CN115688868B true CN115688868B (en) 2023-10-20

Family

ID=85056988

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211715713.7A Active CN115688868B (en) 2022-12-30 2022-12-30 Model training method and computing equipment

Country Status (1)

Country Link
CN (1) CN115688868B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117494672A (en) * 2023-11-13 2024-02-02 北京大学长沙计算与数字经济研究院 Method and device for generating industry document and computer readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110543645A (en) * 2019-09-04 2019-12-06 网易有道信息技术(北京)有限公司 Machine learning model training method, medium, device and computing equipment
WO2020220539A1 (en) * 2019-04-28 2020-11-05 平安科技(深圳)有限公司 Data increment method and device, computer device and storage medium
WO2021139250A1 (en) * 2020-07-31 2021-07-15 平安科技(深圳)有限公司 Data enhancement model training method and apparatus
CN113407842A (en) * 2021-06-28 2021-09-17 携程旅游信息技术(上海)有限公司 Model training method, method and system for obtaining theme recommendation reason and electronic equipment
CN114398893A (en) * 2021-12-15 2022-04-26 北京易康医疗科技有限公司 Clinical data processing model training method and device based on contrast learning
CN114637847A (en) * 2022-03-15 2022-06-17 平安科技(深圳)有限公司 Model training method, text classification method and device, equipment and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020220539A1 (en) * 2019-04-28 2020-11-05 平安科技(深圳)有限公司 Data increment method and device, computer device and storage medium
CN110543645A (en) * 2019-09-04 2019-12-06 网易有道信息技术(北京)有限公司 Machine learning model training method, medium, device and computing equipment
WO2021139250A1 (en) * 2020-07-31 2021-07-15 平安科技(深圳)有限公司 Data enhancement model training method and apparatus
CN113407842A (en) * 2021-06-28 2021-09-17 携程旅游信息技术(上海)有限公司 Model training method, method and system for obtaining theme recommendation reason and electronic equipment
CN114398893A (en) * 2021-12-15 2022-04-26 北京易康医疗科技有限公司 Clinical data processing model training method and device based on contrast learning
CN114637847A (en) * 2022-03-15 2022-06-17 平安科技(深圳)有限公司 Model training method, text classification method and device, equipment and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Text Data Augmentation for Deep Learning;Connor Shorten等;《Journal of big data》;第1-34页 *

Also Published As

Publication number Publication date
CN115688868A (en) 2023-02-03

Similar Documents

Publication Publication Date Title
WO2020143844A1 (en) Intent analysis method and apparatus, display terminal, and computer readable storage medium
CN108399428B (en) Triple loss function design method based on trace ratio criterion
WO2020083073A1 (en) Non-motorized vehicle image multi-label classification method, system, device and storage medium
CN110033008B (en) Image description generation method based on modal transformation and text induction
CN113313022B (en) Training method of character recognition model and method for recognizing characters in image
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
CN113255755A (en) Multi-modal emotion classification method based on heterogeneous fusion network
WO2020155619A1 (en) Method and apparatus for chatting with machine with sentiment, computer device and storage medium
CN108563791A (en) A kind of construction quality complains the method and system of text classification
CN114973062A (en) Multi-modal emotion analysis method based on Transformer
WO2020151690A1 (en) Statement generation method, device and equipment and storage medium
CN111523640A (en) Training method and device of neural network model
CN112633420B (en) Image similarity determination and model training method, device, equipment and medium
CN114445831A (en) Image-text pre-training method, device, equipment and storage medium
CN115688868B (en) Model training method and computing equipment
WO2024098524A1 (en) Text and video cross-searching method and apparatus, model training method and apparatus, device, and medium
WO2024098623A1 (en) Cross-media retrieval method and apparatus, cross-media retrieval model training method and apparatus, device, and recipe retrieval system
CN112307048B (en) Semantic matching model training method, matching method, device, equipment and storage medium
CN112002311A (en) Text error correction method and device, computer readable storage medium and terminal equipment
JP2021081713A (en) Method, device, apparatus, and media for processing voice signal
CN114579746A (en) Optimized high-precision text classification method and device
CN113469338B (en) Model training method, model training device, terminal device and storage medium
WO2022246986A1 (en) Data processing method, apparatus and device, and computer-readable storage medium
CN113870863A (en) Voiceprint recognition method and device, storage medium and electronic equipment
CN117725220A (en) Method, server and storage medium for document characterization and document retrieval

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant