WO2020168843A1

WO2020168843A1 - Model training method and apparatus based on disturbance samples

Info

Publication number: WO2020168843A1
Application number: PCT/CN2020/070290
Authority: WO
Inventors: 林建滨
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2019-02-22
Filing date: 2020-01-03
Publication date: 2020-08-27
Also published as: CN110033094A

Abstract

Provided are a method and apparatus for acquiring a disturbance sample set and training a model on the basis of the disturbance sample set. The method for acquiring a disturbance sample set comprises: calculating a mean square error of feature values of each dimension in a plurality of feature vectors respectively corresponding to a plurality of initial samples (S202); and for each dimension in each of the plurality of feature vectors, generating a corresponding random number, and updating the current feature value of the dimension of the feature vector to be the sum of the current feature value and the corresponding random number so as to generate a plurality of disturbance samples respectively corresponding to the plurality of feature vectors, and thereby acquiring a disturbance sample set (S204), wherein a value range of each random number is determined based on the product of a predetermined first parameter and a mean square error of feature values of a dimension corresponding to the random number.

Description

Model training method and device based on disturbance samples

Technical field

The embodiments of this specification relate to the field of machine learning, and more specifically, to a method and device for obtaining a disturbance sample set based on an initial sample set, a method and device for obtaining a model training sample set, and a method and device for obtaining a model training sample set and a test sample set , And a model training method and device based on disturbance samples.

Background technique

The deployment of the machine model into the actual environment will be subject to various challenges, and one of the most important challenges is the stability of the model. Take the speech recognition model as an example. The data used in the training of the machine learning model is often properly processed and noise-reduced. However, in the actual environment, the situation that the model has to face is very complicated, such as being in a noisy environment, microphone The echo of the model will cause the data to be processed by the model to be noisy, which is inconsistent with the actual training data, resulting in a large change in the accuracy of the model. Therefore, improving the robustness of the machine learning model is of great significance to the practical application of the machine learning model. At present, machine learning algorithms generally use L1 regularity and L2 regularity to enhance the robustness of the model. Both of these methods achieve robustness by limiting the search space of model parameters (the absolute value of the parameter).

Therefore, a more effective model training method for enhancing the robustness of the model is needed.

Summary of the invention

The embodiments of this specification aim to provide a more effective model training method to solve the deficiencies in the prior art.

To achieve the above objective, one aspect of this specification provides a method for obtaining a disturbance sample set based on an initial sample set, the initial sample set includes a plurality of initial samples, and each initial sample includes a corresponding feature vector, the method includes:

Calculating the mean square error of the eigenvalues of the eigenvalues of each dimension in the eigenvectors corresponding to the multiple initial samples; and

For each dimension of each feature vector in the plurality of feature vectors, a corresponding random number is generated, and the current feature value of the dimension of the feature vector is updated to the sum of the current feature value and the corresponding random number , To generate a plurality of disturbance samples respectively corresponding to the plurality of feature vectors, thereby obtaining a disturbance sample set, wherein the value range of each of the random numbers is based on the predetermined first parameter and the feature of the dimension corresponding to the random number The product of the mean square deviation of the values is determined.

In an embodiment, the random number is a Gaussian distributed random number, and the mean square error of the Gaussian distributed random number is a product of the first parameter and the mean square error of the eigenvalues of the dimension corresponding to the random number.

In one embodiment, the random number is an average random number, wherein the value range of the average random number is between plus and minus a first value, wherein the first value is the first parameter and the The product of the mean square deviation of the eigenvalues of the dimension corresponding to the random number.

Another aspect of this specification provides a method for obtaining a model training sample set based on an initial sample set, wherein the initial sample set includes a plurality of initial samples, and the method includes:

Obtain a disturbance sample set by the above method, the disturbance sample set including a plurality of disturbance samples respectively corresponding to the plurality of initial samples; and

By combining the multiple initial samples with the multiple disturbance samples, a training sample set is obtained.

Another aspect of this specification provides a method for obtaining a model training sample set and a test sample set based on an initial sample set, wherein the initial sample set includes a plurality of initial samples, and the method includes:

Obtain a disturbance sample set by the above method for obtaining a disturbance sample set, the disturbance sample set including a plurality of disturbance samples respectively corresponding to the plurality of initial samples;

Obtaining a training sample set by merging part of the initial samples in the plurality of initial samples with part of the disturbance samples in the plurality of disturbance samples; and

The test sample set is obtained by merging at least part of the remaining initial samples in the plurality of initial samples with at least part of the remaining disturbance samples in the plurality of disturbance samples.

In one embodiment, the proportion of the partial initial samples in the plurality of initial samples is the same as the proportion of the partial disturbance samples in the plurality of disturbance samples.

In one embodiment, by combining at least part of the remaining initial samples in the plurality of initial samples with at least part of the remaining disturbance samples in the plurality of disturbance samples, obtaining a test sample set includes: The remaining initial samples in the initial samples are combined with the remaining disturbance samples in the plurality of disturbance samples to obtain a test sample set.

In an embodiment, the part of the initial sample corresponds to the part of the disturbance sample respectively.

The embodiment of this specification provides a model training method, including:

Acquiring an initial sample set, wherein the initial sample set includes a plurality of initial samples;

Through the above method of obtaining training sample sets and test sample sets, a plurality of training sample sets and a plurality of test sample sets corresponding to the plurality of training sample sets are obtained based on the initial sample set, wherein the plurality of training sample sets are The sample set corresponds to multiple first parameters with different values;

Use the multiple training sample sets to train the current model respectively to obtain multiple updated models;

Using the multiple test sample sets to respectively evaluate corresponding update models, wherein the test sample set and the corresponding update model correspond to the same training sample set; and

Based on the evaluation result, an update model of the current model is determined among the multiple update models.

In one embodiment, the model is any of the following types of models: supervised learning models, unsupervised learning models, and reinforcement learning models.

Another aspect of this specification provides a device for obtaining a disturbance sample set based on an initial sample set, the initial sample set includes a plurality of initial samples, each initial sample includes a corresponding feature vector, and the device includes:

The calculation unit is configured to calculate the mean square deviation of the eigenvalues of the eigenvalues of each dimension in the multiple eigenvectors respectively corresponding to the multiple initial samples; and

The generating unit is configured to generate a corresponding random number for each dimension in each feature vector of the multiple feature vectors, and update the current feature value of the feature vector in that dimension to the current feature value and The sum of the corresponding random numbers is used to generate a plurality of disturbance samples respectively corresponding to the plurality of feature vectors, thereby obtaining a disturbance sample set, wherein the value range of each of the random numbers is based on the predetermined first parameter and the random The product of the mean square deviation of the eigenvalues of the dimension corresponding to the number is determined.

Another aspect of this specification provides a device for obtaining a model training sample set based on an initial sample set, wherein the initial sample set includes a plurality of initial samples, and the device includes:

The obtaining unit is configured to obtain a disturbance sample set through the above-mentioned device, the disturbance sample set including a plurality of disturbance samples respectively corresponding to the plurality of initial samples; and

The merging unit is configured to obtain a training sample set by merging the multiple initial samples with the multiple disturbance samples.

Another aspect of this specification provides a device for acquiring a model training sample set and a test sample set based on an initial sample set, wherein the initial sample set includes a plurality of initial samples, and the device includes:

The obtaining unit is configured to obtain a disturbance sample set through the foregoing apparatus for obtaining a disturbance sample set, the disturbance sample set including a plurality of disturbance samples respectively corresponding to the plurality of initial samples;

The first merging unit is configured to obtain a training sample set by merging part of the initial samples in the plurality of initial samples and part of the disturbance samples in the plurality of disturbance samples; and

The second merging unit is configured to obtain a test sample set by merging at least part of the remaining initial samples in the plurality of initial samples with at least part of the remaining disturbance samples in the plurality of disturbance samples.

In an embodiment, the second merging unit is further configured to obtain a test sample set by merging the remaining initial samples in the plurality of initial samples with the remaining disturbance samples in the plurality of disturbance samples.

Another aspect of this specification provides a model training device, including:

The first obtaining unit is configured to obtain an initial sample set, wherein the initial sample set includes a plurality of initial samples;

The second obtaining unit is configured to obtain a plurality of training sample sets and a plurality of test sample sets corresponding to the plurality of training sample sets based on the initial sample set through the above-mentioned apparatus for obtaining training sample sets and test sample sets , Wherein the multiple training sample sets respectively correspond to multiple first parameters with different values;

The training unit is configured to use the multiple training sample sets to train the current model respectively to obtain multiple updated models;

The evaluation unit is configured to use the multiple test sample sets to respectively evaluate corresponding update models, wherein the test sample set and the corresponding update model correspond to the same training sample set; and

The determining unit is configured to determine the update model of the current model among the multiple update models based on the evaluation result.

Another aspect of this specification provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed in a computer, the computer is caused to execute any of the above methods.

Another aspect of this specification provides a computing device including a memory and a processor, wherein the memory stores executable code, and when the processor executes the executable code, any one of the above methods is implemented.

The embodiment of this specification perturbs the training data of the model to simulate the data noise in the real environment, thereby increasing the robustness of the model, and trains and evaluates the model by using the perturbation data, and determines the predetermined parameters of the model based on the evaluation result, thereby quantifying This improves the effectiveness of the model for abnormal data. In addition, the parameters of the machine learning model are not limited in the embodiments of this specification, so that the learning ability of the model is not limited.

Description of the drawings

By describing the embodiments of this specification in conjunction with the accompanying drawings, the embodiments of this specification can be made clearer:

Fig. 1 shows a schematic diagram of a model training system 100 according to an embodiment of the present specification;

Fig. 2 shows a method for obtaining a disturbance sample set based on an initial sample set according to an embodiment of the present specification;

Fig. 3 schematically shows the calculation of the mean square error of one-dimensional eigenvalues in multiple eigenvectors;

FIG. 4 shows n disturbance eigenvectors respectively corresponding to the n eigenvectors in FIG. 3;

FIG. 5 shows a flowchart of a method for obtaining a model training sample set based on an initial sample set according to an embodiment of the present specification;

Fig. 6 shows a flow chart of a method for acquiring a model training sample set and a test sample set based on an initial sample set according to an embodiment of the specification;

FIG. 7 shows a flowchart of a model training method according to an embodiment of the specification;

FIG. 8 shows an apparatus 800 for obtaining a disturbance sample set based on an initial sample set according to an embodiment of the present specification;

FIG. 9 shows an apparatus 900 for acquiring a model training sample set based on an initial sample set according to an embodiment of the present specification;

FIG. 10 shows an apparatus 1000 for acquiring a model training sample set and a test sample set based on an initial sample set according to an embodiment of the present specification;

FIG. 11 shows a model training device 1100 according to an embodiment of this specification.

detailed description

The embodiments of this specification will be described below with reference to the drawings.

Fig. 1 shows a schematic diagram of a model training system 100 according to an embodiment of the present specification. As shown in FIG. 1, the system 100 includes a data processing module 11, a training module 12 and an evaluation module 13. In the data processing module 11, the data set B is obtained by applying disturbance to each dimension value of the feature vector in each sample in the data set A. In the training module 12, at least part of the data (for example, 70% of the data in the data set A) is obtained from the data set A, and at least part of the data (for example, 70% of the data in the data set B) is obtained from the data set B, so as The training data set is obtained by merging, and the machine learning model is trained using the training data set. The machine learning model can be any model, for example, the speech recognition model mentioned above. The speech recognition model is, for example, a supervised learning model or a reinforcement learning model. It can be understood that the arbitrary model may also be an unsupervised learning model. In the evaluation module 13, the remaining data is obtained from data set A (for example, 30% of data in data set A), and the remaining data is obtained from data set B (for example, 30% of data in data set B), so as to merge them Instead, obtain a test data set and use the test data set to evaluate the trained model.

The various processing procedures described above are described in detail below.

Fig. 2 shows a method for obtaining a disturbance sample set based on an initial sample set according to an embodiment of the present specification, the initial sample set includes a plurality of initial samples, each initial sample includes a corresponding feature vector, and the method includes:

In step S202, calculating the eigenvalue mean square error of the eigenvalues of each dimension in the eigenvectors corresponding to the multiple initial samples; and

In step S204, for each dimension of each feature vector in the plurality of feature vectors, a corresponding random number is generated, and the current feature value of the dimension of the feature vector is updated to the current feature value and the corresponding The sum of random numbers is used to generate a plurality of disturbance samples respectively corresponding to the plurality of feature vectors, thereby obtaining a disturbance sample set, wherein the value range of each of the random numbers corresponds to the random number based on a predetermined first parameter The product of the mean square deviation of the eigenvalues of the dimension is determined.

First, in step S202, the mean square error of the eigenvalues of the eigenvalues of each dimension in the eigenvectors corresponding to the multiple initial samples is calculated.

The initial sample set is, for example, the data set A shown in FIG. 1. The data set A includes, for example, n initial samples, and each sample includes a respective feature vector, and the feature vector is, for example, an m-dimensional feature. Vector, each dimension value of which corresponds to a feature value. In addition, in the case where the model to be trained is, for example, a supervised model, each sample also includes a corresponding label value.

The mean square error is the standard deviation, which is the square root of the variance σ ² , which can be represented by σ. When calculating the variance for multiple samples, the following formula (1) is usually used:

Among them, n is the total number of samples, μ is the mean value of n x _i .

Fig. 3 schematically shows the calculation of the mean square error of one-dimensional eigenvalues in a plurality of eigenvectors. In Figure 3, assuming that there are n feature vectors, each feature vector includes the feature values of m features, and the i-th dimension feature (that is, the vector in the figure) of the j-th feature vector (ie, the vector j in the figure) The feature value of feature i) is expressed as x _ij , where i ∈ [1, m], j ∈ [1, n]. Therefore, the standard deviation σ _i of the eigenvalues of the i-th dimension of each of the n eigenvectors can be calculated by the following formula (2):

Among them, μ _i is calculated by the following formula (3)

Specifically, for example, for the eigenvalues x ₂₁ , x ₂₂ ,..., x _2n of the second-dimensional feature (ie feature 2) of each vector in the dashed box in Fig. 3, based on the n values, the formula (3 ) Calculate its mean

Calculate its variance by formula (2)

Still referring to Figure 3, suppose that for the vector 1 in the figure, for each dimension 1, 2, ..., m of the feature vector, the corresponding random numbers a ₁₁ , a ₂₁ , ... a _m1 are generated respectively, and the feature vector The current eigenvalue x _{i1 of} each dimension is updated to x _i1 +a _i1 , where i ∈ [1, m], so that the disturbance eigenvector corresponding to the eigenvector 1 can be obtained (as shown in the dashed box in Figure 4) . For each feature vector, the perturbation can be similarly applied to obtain the perturbation feature vector corresponding to it. FIG. 4 shows n disturbance feature vectors corresponding to the n feature vectors in FIG. 3 respectively. Training the model through the disturbance feature vector, that is, adding noise to the training sample, will improve the model's resistance to noise and enhance the stability of the model.

In one embodiment, for each random number a _ij included in Fig. 4, where i ∈ [1, m], j ∈ [1, n], Gaussian random variable A=norm(0, λσ _i ) Obtained, that is to say, the mean value of the random variable is 0, and the mean square error is λσ _i , where σ _i is the mean square error of the feature value of dimension i (feature i) calculated by the above formula (2), and λ is a predetermined parameter, which can be taken The value is between 0.0001 and 0.1. For example, in combination with FIGS. 3 and 4, each a _2j corresponding to the feature 2 dimension in FIG. 4 is generated by A=norm(0, λσ ₂ ), where σ ₂ is σ ₂ shown in FIG. 3. As is well known to those skilled in the art, according to the probability density graph of Gaussian variables, 99% of the possible values of a _ij will fall in the interval of [-3λσ _i ,3λσ _i ], that is, through the predetermined parameter λ and The product λσ _{i of the} mean square error σ _i of the eigenvalues of the dimension i defines the value range of each random number a _ij of the dimension i, where the value range of j is 1 to n.

In an embodiment, each random number a _ij in FIG. 4 can be obtained by an average random variable B, and the random variable B may be, for example, an average random variable in the range of [-λσ _i ,λσ _i ]. In other words, the value range of each random number a _ij is limited by the product λσ _i . The value range of the average random variable B is not limited to be set to [-λσ _i ,λσ _i ], for example, it can also be [-3λσ _i ,3λσ _i ] and so on. In addition, each random number a _{ij is} not limited to be obtained by the above-mentioned Gaussian random variable or average random variable, but can be obtained by any other random variable, such as a Poisson random variable, etc., as long as its value range is limited by λσ _i OK.

Among them, λ is used to balance the added noise and model performance. The smaller the value of λ, the smaller the value range of a _ij , that is, the smaller the disturbance applied. When the disturbance is too small, the effect on the eigenvector is too small. Can not play a role in improving the performance of the model, when the disturbance imposed is too large, it will affect the prediction accuracy of the model. Therefore, the value of λ is more important for the improvement of model performance. In one embodiment, the value of λ can be determined by the specific environment in which the model is applied. For example, for a speech recognition model, when the environment in which it is applied is relatively noisy and the noise is large, the value of λ can be set to be larger. The application environment is relatively quiet, and in the case of low noise, the value of λ can be set to be small. In an embodiment, as will be described in detail below, the value of λ is determined by evaluating the trained model, that is, λ with a better evaluation value is selected as the final λ for model training. In one embodiment, after the model has determined λ in training using the training sample set of the previous batch, the λ value can be reused in subsequent repeated training.

After obtaining the disturbance samples corresponding to each initial sample of the initial sample set by the method shown in FIG. 2, the disturbance sample set can be obtained, so that the training sample set and the test sample set of the model can be obtained based on the initial sample set and the disturbance sample set.

Fig. 5 shows a flowchart of a method for acquiring a model training sample set based on an initial sample set according to an embodiment of the present specification, wherein the initial sample set includes a plurality of initial samples, and the method includes:

In step S502, a disturbance sample set is obtained based on the initial sample set by the method shown in FIG. 2, the disturbance sample set includes a plurality of disturbance samples respectively corresponding to the plurality of initial samples; and

In step S504, a training sample set is obtained by combining the multiple initial samples with the multiple disturbance samples.

First, in step S502, by using the method shown in FIG. 2, a plurality of disturbance samples respectively corresponding to the plurality of initial samples are generated, wherein the plurality of disturbance samples correspond to the first parameter with the same value. The multiple perturbation samples are, for example, multiple perturbation samples as shown in FIG. 4. As described above, each random number a _ij in the multiple perturbation samples can pass Gaussian random variable A=norm(0,λσ _i ) Obtain, that is, the batch of disturbed samples corresponds to the same value.

In step S504, a training sample set is obtained by combining the multiple initial samples with the multiple disturbance samples. In the case where the training model does not need to be evaluated, all the samples in the initial sample set and all the samples in the disturbance sample set can be combined to obtain the training sample set. By including the initial sample set and the disturbance sample set in the training sample set, the training samples of the model are enriched, so that the model can be adapted to different actual environments.

Fig. 6 shows a flow chart of a method for obtaining a model training sample set and a test sample set based on an initial sample set according to an embodiment of the present specification, wherein the initial sample set includes a plurality of initial samples, and the method includes:

In step S602, by using the method shown in FIG. 2, a disturbance sample set is obtained based on the initial sample set, the disturbance sample set including a plurality of disturbance samples respectively corresponding to the plurality of initial samples;

In step S604, a training sample set is obtained by merging part of the initial samples in the plurality of initial samples with part of the disturbance samples in the plurality of disturbance samples; and

In step S606, a test sample set is obtained by merging at least part of the remaining initial samples in the plurality of initial samples with at least part of the remaining disturbance samples in the plurality of disturbance samples.

In this method, after obtaining the disturbance sample set, part of the initial sample and part of the disturbance sample can be merged together to obtain the training sample set. In one embodiment, for example, 70% of the initial samples in the initial sample set and 70% of the disturbance samples in the disturbance sample set may be combined to obtain a training sample set. Afterwards, the remaining initial sample set, for example 30% of the initial sample and 30% of the disturbed sample in the initial sample set and the disturbance sample set, are combined to obtain a test sample set corresponding to the training sample set. Wherein, the 70% initial samples and the 70% disturbance samples in the training sample set may respectively correspond to each other, or may not correspond to each other.

In an embodiment, the proportion of the partial initial samples in the plurality of initial samples and the proportion of the partial disturbance samples in the plurality of disturbance samples may also be different, for example, in the case of a noisy model in the actual application environment In the training sample set, a larger proportion of disturbance samples may be included in the training sample set. For example, the training sample set may include 80% of all disturbance samples and 20% of all initial samples. Correspondingly, the test sample set can also be configured in the same proportion, for example, including the remaining 20% of the disturbed samples (accounting for all disturbed samples) and the initial sample of 5% of the remaining initial samples (accounting for all the initial samples). .

After obtaining the training sample set and the test sample set of the model as shown in FIG. 5 and FIG. 6, the model can be trained by the training sample set, and the model can be evaluated by the test sample set evaluation. The following describes the method of selecting the predetermined parameter λ of the model based on the evaluation of the model by the test sample set, so as to further optimize the model.

Fig. 7 shows a flow chart of a model training method according to an embodiment of the specification, including:

In step S702, an initial sample set is obtained, wherein the initial sample set includes a plurality of initial samples;

In step S704, multiple training sample sets and multiple test sample sets corresponding to the multiple training sample sets are obtained based on the initial sample set by the method shown in FIG. 6, wherein the multiple training sample sets Correspond to multiple first parameters with different values;

In step S706, use the multiple training sample sets to train the current model respectively to obtain multiple updated models;

In step S708, use the multiple test sample sets to evaluate corresponding update models respectively, wherein the test sample set and the corresponding update model correspond to the same training sample set; and

In step S710, based on the evaluation result, an update model of the current model is determined among the multiple update models.

First, in step S702, an initial sample set is obtained, wherein the initial sample set includes a plurality of initial samples. The model is not limited to a specific type. As described above, it can be any type of a supervised learning model, an unsupervised learning model, and a reinforcement learning model. For example, the model is a speech recognition model as described above, which is, for example, a supervised learning model. In this case, the corresponding feature vector can be extracted from the speech by manually inputting the voice, thereby using the feature vector and label The value (semantics) is used as the initial sample of the model. However, the model may encounter different environments in practical applications, such as a quiet environment, a variety of noisy environments with different noises, and so on. The initial samples obtained manually in a single environment cannot simulate so many different environments, and the cost of obtaining samples manually in different environments is relatively high. Therefore, the method can be used to expand the sample based on the initial sample set to obtain the training sample set.

In step S704, multiple training sample sets and multiple test sample sets corresponding to the multiple training sample sets are obtained based on the initial sample set by the method shown in FIG. 6, wherein the multiple training sample sets Correspond to multiple first parameters with different values.

That is to say, in the case of taking different values for the aforementioned predetermined parameter λ, multiple training sample sets and multiple test sample sets corresponding to the initial sample set are acquired through the method shown in FIG. 6 multiple times. For example, λ can be set to 0.0001, 0.001, 0.01, and 0.1, respectively. Therefore, the influence of the magnitude of λ on model training can be determined. It can be understood that the value of λ is not limited to the foregoing manner and the foregoing number, but may be specifically limited according to a specific model. Specifically, for the above four λ values, based on the above initial sample set A, four perturbation sample sets B ₁ , B ₂ , B ₃ , and B ₄ can be obtained by the method shown in Fig. 2 respectively. It is assumed that based on the 4 perturbations Sample sets were obtained 4 sets of sample sets (C ₁ , D ₁ ), (C ₂ , D ₂ ), (C ₃ , D ₃ ), (C ₄ , D ₄ ), where C _i represents the training sample set, D _i represents the test sample set.

In step S706, the multiple training sample sets are used to train the current model respectively to obtain multiple updated models.

In the above example, that is, each training sample set C ₁ , C ₂ , C ₃ , and C ₄ are used to train the current model to obtain 4 updated models M ₁ , M ₂ , M ₃ , and M _{4 respectively} .

In step S708, the multiple test sample sets are used to respectively evaluate the corresponding update models, where the test sample set and the corresponding update model correspond to the same training sample set.

In the above example, that is, each test sample set D ₁ , D ₂ , D ₃ , D _{4 is used} to evaluate the four update models M ₁ , M ₂ , M ₃ , M ₄ , and the test sample set Both D ₁ and the updated model M ₁ correspond to the training sample set C ₁ , that is, the test sample set D ₁ corresponds to the updated model M _1. Similarly, it can be concluded that the test sample set D ₂ corresponds to the updated model M ₂ , The test sample set D ₃ corresponds to the updated model M ₃ , and the test sample set D ₄ corresponds to the updated model M ₄ . The test samples can be used to calculate various evaluation indicators of the corresponding update model, such as accuracy, precision, recall, etc., so as to evaluate the corresponding update model. For example, the above evaluation indicators can be combined to obtain the model’s The assessed value.

In the above example, after obtaining the respective evaluation values of the update models corresponding to each λ, for example, the update model with the highest evaluation value may be determined as the update model of the current model, that is, the post-training model, and the determined update model is retained For subsequent model use, such as model prediction.

Fig. 8 shows a device 800 for obtaining a disturbance sample set based on an initial sample set according to an embodiment of the present specification, the initial sample set includes a plurality of initial samples, each initial sample includes a corresponding feature vector, and the device includes:

The calculation unit 81 is configured to calculate the mean square deviation of the eigenvalues of the eigenvalues of each dimension in the eigenvectors corresponding to the multiple initial samples; and

The generating unit 82 is configured to generate a corresponding random number for each dimension of each feature vector in the plurality of feature vectors, and update the current feature value of the dimension of the feature vector to the current feature value And the corresponding random numbers to generate a plurality of disturbance samples respectively corresponding to the plurality of feature vectors, thereby obtaining a disturbance sample set, wherein the value range of each of the random numbers is based on the predetermined first parameter and the The product of the mean square deviation of the eigenvalues of the dimension corresponding to the random number is determined.

Fig. 9 shows a device 900 for acquiring a model training sample set based on an initial sample set according to an embodiment of the present specification, wherein the initial sample set includes a plurality of initial samples, and the device includes:

The obtaining unit 91 is configured to obtain a disturbance sample set based on the initial sample set through the foregoing device, the disturbance sample set including a plurality of disturbance samples respectively corresponding to the plurality of initial samples; and

The merging unit 92 is configured to obtain a training sample set by merging the multiple initial samples with the multiple disturbance samples.

Fig. 10 shows an apparatus 1000 for acquiring a model training sample set and a test sample set based on an initial sample set according to an embodiment of the present specification, wherein the initial sample set includes a plurality of initial samples, and the device includes:

The obtaining unit 101 is configured to obtain a disturbance sample set based on the initial sample set through the aforementioned apparatus for obtaining a disturbance sample set, the disturbance sample set including a plurality of disturbance samples respectively corresponding to the plurality of initial samples;

The first merging unit 102 is configured to obtain a training sample set by merging part of the initial samples in the plurality of initial samples and part of the disturbance samples in the plurality of disturbance samples; and

The second merging unit 103 is configured to obtain a test sample set by merging at least part of the remaining initial samples in the plurality of initial samples with at least part of the remaining disturbance samples in the plurality of disturbance samples.

FIG. 11 shows a model training device 1100 according to an embodiment of this specification, including:

The first obtaining unit 111 is configured to obtain an initial sample set, wherein the initial sample set includes a plurality of initial samples;

The second obtaining unit 112 is configured to obtain a plurality of training sample sets and a plurality of test samples respectively corresponding to the plurality of training sample sets based on the initial sample set through the foregoing apparatus for obtaining training sample sets and test sample sets Set, wherein the multiple training sample sets correspond to multiple first parameters with different values;

The training unit 113 is configured to separately train the current model using the multiple training sample sets to obtain multiple updated models respectively;

The evaluation unit 114 is configured to use the plurality of test sample sets to respectively evaluate corresponding update models, where the test sample set and the corresponding update model correspond to the same training sample set; and

The determining unit 115 is configured to determine an update model of the current model among the multiple update models based on the evaluation result.

The various embodiments in this specification are described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.

The foregoing describes specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps described in the claims may be performed in a different order than in the embodiments and still achieve desired results. In addition, the processes depicted in the drawings do not necessarily require the specific order or sequential order shown to achieve the desired result. In certain embodiments, multitasking and parallel processing are also possible or may be advantageous.

Those of ordinary skill in the art should be further aware that the units and algorithm steps of the examples described in the embodiments disclosed herein can be implemented by electronic hardware, computer software or a combination of the two, in order to clearly illustrate the hardware For the interchangeability with software, the composition and steps of each example have been described generally in accordance with the function in the above description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those of ordinary skill in the art can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of this application.

The steps of the method or algorithm described in the embodiments disclosed in this document can be implemented by hardware, a software module executed by a processor, or a combination of the two. The software module can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or all areas in the technical field. Any other known storage medium.

The specific embodiments described above further describe the purpose, technical solutions and beneficial effects of the present invention in further detail. It should be understood that the above are only specific embodiments of the present invention and are not intended to limit the scope of the present invention. The scope of protection, any modification, equivalent replacement, improvement, etc., made within the spirit and principle of the present invention shall be included in the scope of protection of the present invention.

Claims

A method for obtaining a disturbance sample set based on an initial sample set, the initial sample set includes a plurality of initial samples, and each initial sample includes a corresponding feature vector, the method includes:

Calculating the mean square error of the eigenvalues of the eigenvalues of each dimension in the eigenvectors corresponding to the multiple initial samples; and

For each dimension of each feature vector in the plurality of feature vectors, a corresponding random number is generated, and the current feature value of the dimension of the feature vector is updated to the sum of the current feature value and the corresponding random number , To generate a plurality of disturbance samples respectively corresponding to the plurality of feature vectors, thereby obtaining a disturbance sample set, wherein the value range of each of the random numbers is based on the predetermined first parameter and the feature of the dimension corresponding to the random number The product of the mean square deviation of the values is determined.
The method according to claim 1, wherein the random number is a Gaussian distributed random number, and the mean square error of the Gaussian distributed random number is a product of the first parameter and the mean square error of the eigenvalues of the dimension corresponding to the random number .
The method according to claim 1, wherein the random number is an average random number, wherein a value range of the average random number is between positive and negative first numerical values, wherein the first numerical value is the The product of the first parameter and the mean square deviation of the eigenvalues of the dimension corresponding to the random number.
A method for obtaining a model training sample set based on an initial sample set, wherein the initial sample set includes a plurality of initial samples, and the method includes:

According to the method of claim 1, obtaining a disturbance sample set based on the initial sample set, the disturbance sample set including a plurality of disturbance samples respectively corresponding to the plurality of initial samples; and

By combining the multiple initial samples with the multiple disturbance samples, a training sample set is obtained.
A method for acquiring a model training sample set and a test sample set based on an initial sample set, wherein the initial sample set includes a plurality of initial samples, and the method includes:

According to the method of claim 1, obtaining a disturbance sample set based on the initial sample set, the disturbance sample set including a plurality of disturbance samples respectively corresponding to the plurality of initial samples;

Obtaining a training sample set by merging part of the initial samples in the plurality of initial samples with part of the disturbance samples in the plurality of disturbance samples; and

The test sample set is obtained by merging at least part of the remaining initial samples in the plurality of initial samples with at least part of the remaining disturbance samples in the plurality of disturbance samples.
The method according to claim 5, wherein the proportion of the partial initial samples to the plurality of initial samples is the same as the proportion of the partial disturbance samples to the plurality of disturbance samples.
The method according to claim 6, wherein obtaining a test sample set by combining at least part of the remaining initial samples in the plurality of initial samples with at least part of the remaining disturbance samples in the plurality of disturbance samples includes, The test sample set is obtained by merging the remaining initial samples in the plurality of initial samples with the remaining disturbance samples in the plurality of disturbance samples.
The method according to claim 6, wherein the part of the initial sample corresponds to the part of the disturbance sample respectively.
A model training method includes:

Acquiring an initial sample set, wherein the initial sample set includes a plurality of initial samples;

According to the method of claim 5, multiple training sample sets and multiple test sample sets corresponding to the multiple training sample sets are obtained based on the initial sample set, wherein the multiple training sample sets are The first parameters with different values correspond respectively;

Use the multiple training sample sets to train the current model respectively to obtain multiple updated models;

Using the multiple test sample sets to respectively evaluate corresponding update models, wherein the test sample set and the corresponding update model correspond to the same training sample set; and

Based on the evaluation result, an update model of the current model is determined among the multiple update models.
The method according to claim 9, wherein the model is any of the following types of models: supervised learning models, unsupervised learning models, and reinforcement learning models.
A device for obtaining a disturbance sample set based on an initial sample set, the initial sample set includes a plurality of initial samples, each initial sample includes a corresponding feature vector, and the device includes:

The calculation unit is configured to calculate the mean square deviation of the eigenvalues of the eigenvalues of each dimension in the multiple eigenvectors respectively corresponding to the multiple initial samples; and

The generating unit is configured to generate a corresponding random number for each dimension in each feature vector of the multiple feature vectors, and update the current feature value of the feature vector in that dimension to the current feature value and The sum of the corresponding random numbers is used to generate a plurality of disturbance samples respectively corresponding to the plurality of feature vectors, thereby obtaining a disturbance sample set, wherein the value range of each of the random numbers is based on the predetermined first parameter and the random The product of the mean square deviation of the eigenvalues of the dimension corresponding to the number is determined.
The device according to claim 11, wherein the random number is a Gaussian distributed random number, and the mean square error of the Gaussian distributed random number is a product of the first parameter and the mean square error of the eigenvalues of the dimension corresponding to the random number .
11. The device according to claim 11, wherein the random number is an average random number, wherein a value range of the average random number is between a positive and negative first value, wherein the first value is the The product of the first parameter and the mean square deviation of the eigenvalues of the dimension corresponding to the random number.
A device for acquiring a model training sample set based on an initial sample set, wherein the initial sample set includes a plurality of initial samples, and the device includes:

The obtaining unit is configured to obtain a disturbance sample set based on the initial sample set through the apparatus of claim 11, the disturbance sample set including a plurality of disturbance samples respectively corresponding to the plurality of initial samples; and

The merging unit is configured to obtain a training sample set by merging the multiple initial samples with the multiple disturbance samples.
A device for acquiring a model training sample set and a test sample set based on an initial sample set, wherein the initial sample set includes a plurality of initial samples, and the device includes:

The obtaining unit is configured to obtain a disturbance sample set based on the initial sample set through the device according to claim 11, the disturbance sample set including a plurality of disturbance samples respectively corresponding to the plurality of initial samples;

The first merging unit is configured to obtain a training sample set by merging part of the initial samples in the plurality of initial samples and part of the disturbance samples in the plurality of disturbance samples; and

The second merging unit is configured to obtain a test sample set by merging at least part of the remaining initial samples in the plurality of initial samples with at least part of the remaining disturbance samples in the plurality of disturbance samples.
The device according to claim 15, wherein the proportion of the partial initial samples in the plurality of initial samples is the same as the proportion of the partial disturbance samples in the plurality of disturbance samples.
The apparatus according to claim 16, wherein the second merging unit is further configured to obtain a test by merging the remaining initial samples in the plurality of initial samples with the remaining disturbance samples in the plurality of disturbance samples Sample set.
The apparatus according to claim 16, wherein the part of the initial sample corresponds to the part of the disturbance sample respectively.
A model training device includes:

The first obtaining unit is configured to obtain an initial sample set, wherein the initial sample set includes a plurality of initial samples;

The second acquiring unit is configured to acquire a plurality of training sample sets and a plurality of test sample sets corresponding to the plurality of training sample sets respectively based on the initial sample set through the apparatus of claim 15, wherein, The multiple training sample sets are respectively corresponding to multiple first parameters with different values;

The training unit is configured to use the multiple training sample sets to train the current model respectively to obtain multiple updated models;

The evaluation unit is configured to use the multiple test sample sets to respectively evaluate corresponding update models, wherein the test sample set and the corresponding update model correspond to the same training sample set; and

The determining unit is configured to determine an updated model of the current model among the multiple updated models based on the evaluation result.
The device according to claim 19, wherein the model is any of the following types of models: supervised learning models, unsupervised learning models, and reinforcement learning models.
A computer-readable storage medium with a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method of any one of claims 1-10.
A computing device, comprising a memory and a processor, characterized in that executable code is stored in the memory, and when the processor executes the executable code, the device described in any one of claims 1-10 method.