CN117077813A

CN117077813A - Training method and training system for machine learning model

Info

Publication number: CN117077813A
Application number: CN202311234190.9A
Authority: CN
Inventors: 陆苗
Original assignee: Jiangsu College Of Finance & Accounting
Current assignee: Jiangsu College Of Finance & Accounting
Priority date: 2023-09-21
Filing date: 2023-09-21
Publication date: 2023-11-17

Abstract

The application provides a training method and a training system of a machine learning model, and relates to the technical field of deep learning. The method comprises the following steps: acquiring an original data set and dividing the original data set into a training set and a verification set, wherein the training set is used for training a model, and the verification set is used for evaluating the model; dividing the training set into K subsets which are not overlapped with each other, sequentially taking each subset as a verification set, taking other K-1 subsets as the training set, training K models, and obtaining average verification errors of the K models; in the model training process, the complexity of the model is reduced through regularization, and the model is prevented from being overfitted with training set data; the model that performs the first rank in the validation set is selected as the final machine learning model and the final machine learning model is evaluated using the test set. More training data can be used or data enhancement techniques employed to help solve the over-fitting problem.

Description

Training method and training system for machine learning model

Technical Field

The application relates to the technical field of deep learning, in particular to a training method and a training system of a machine learning model.

Background

Currently, with the widespread popularity of machine learning, various machine learning models are attracting more and more attention. For machine learning models, it is generally necessary to train them based on training data (also referred to as training samples) first, and then perform some prediction, such as performing class prediction, using the trained machine learning model.

In the training process of the machine learning model, samples need to be added or modified to the machine learning model. In order to increase training samples of machine learning, different features are required to be added, or different features are required to be combined and input into a machine learning model one by one, but the existing training method of the learning model is complex, long in time consumption, low in training efficiency and low in flexibility and applicability. While machine learning models have achieved tremendous success in many areas, they still suffer from several potential drawbacks, including the following:

data bias problem: the quality and accuracy of the machine learning model depends on the quality and diversity of the training data. If there are deviations in the data, such as insufficient, unbalanced or insufficiently representative data sets, the model will tend to learn these deviations rather than the actual pattern, resulting in poor performance of the model on the new data.

Overfitting problem: overfitting problems occur when the model performs well on training data, but poorly on test data. The reason for the overfitting is typically that the model is too complex or training data is too little. This can result in the model not being generalizable to new data sets, thereby losing its practical value.

Interpretation problem: some machine learning models, particularly deep learning models, can be very complex and difficult to understand the basis of their decisions. This makes the machine learning model less interpretable and thus difficult to apply in some fields where transparency to decision making processes is required, such as medical and financial fields.

Resource consumption problem: training the deep learning model requires a significant amount of computational resources and time. The training time may be very long if there is insufficient hardware equipment or computing power. Furthermore, training the model requires a lot of energy consumption, which may also have a negative impact on the environment.

Privacy problem: machine learning models require access to large amounts of personal data for training. If such data is abused or compromised, the privacy of the user may be adversely affected. Furthermore, since the machine learning model itself may contain sensitive information, it may be necessary to protect and encrypt it in some cases.

Disclosure of Invention

The application aims to provide a training method of a machine learning model, which can use more training data or adopts a data enhancement technology to help solve the problem of overfitting.

It is another object of the present application to provide a training system of a machine learning model, which is capable of running a training method of the machine learning model.

Embodiments of the present application are implemented as follows:

in a first aspect, an embodiment of the present application provides a training method of a machine learning model, including obtaining an original data set and dividing the original data set into a training set and a verification set, where the training set is used for training of the model, and the verification set is used for evaluating the model;

dividing the training set into K subsets which are not overlapped with each other, sequentially taking each subset as a verification set, taking other K-1 subsets as the training set, training K models, and obtaining average verification errors of the K models;

in the model training process, the complexity of the model is reduced through regularization, and the model is prevented from being overfitted with training set data;

the model that performs the first rank in the validation set is selected as the final machine learning model and the final machine learning model is evaluated using the test set.

In some embodiments of the present application, the acquiring the original data set and dividing the original data set into a training set and a verification set, wherein the training set is used for training the model, and the verification set is used for evaluating the model comprises: according to the size of the data set, the sizes of the training set and the verification set are increased or decreased, so that the stability and generalization capability of the machine learning model are improved, and the distribution of the data set and the actual application scene are kept in the same technical field.

In some embodiments of the present application, dividing the training set into K subsets that do not overlap each other, sequentially taking each subset as the verification set, taking the other K-1 subsets as the training set, training the K models, and obtaining the average verification error of the K models includes: A1. dividing the data set into K subsets that do not overlap each other; A2. for each subset, selecting the subset as a verification set, and the rest subsets as training sets; A3. training a model, evaluating on a verification set, and recording performance indexes; A4. repeating the step A2 and the step A3 for K times, and selecting a different subset as a verification set each time; A5. and calculating an average value of the K times of performance indexes as a final performance index of the machine learning model.

In some embodiments of the application, the foregoing further comprises: the model performance is typically evaluated using a K-fold cross-validation that divides the data set into K subsets, one subset at a time being selected as the validation set, the remaining subsets being the training set, and/or a leave-one cross-validation that takes each sample as the validation set, the remaining samples being the training set.

In some embodiments of the present application, the above-mentioned reducing complexity of the model by regularization in the model training process, preventing the model from overfitting the training set data includes: B1. dividing the data set into k subsets, wherein k-1 subsets are used as training sets and 1 subset is used as verification set; B2. for each regularization coefficient λ, training a model using k-1 subsets and evaluating model performance on a validation set; B3. repeating the step B2 until all regularization coefficients are evaluated; B4. the regularization coefficient of the first performance rank is selected.

In some embodiments of the application, the foregoing further comprises: the regularization coefficient lambda is adjusted during training, too small lambda can cause over fitting, too large lambda can cause under fitting, and the problem of over fitting or under fitting caused by randomness of a data set is avoided.

In some embodiments of the application, the foregoing further comprises: the performance of the model is improved by adjusting super-parameters in the model, including regularization coefficients, learning rate, network depth, and activation functions.

In a second aspect, an embodiment of the present application provides a training system for a machine learning model, including a data set partitioning module configured to obtain an original data set and partition the original data set into a training set and a verification set, where the training set is used for training the model, and the verification set is used for evaluating the model;

the cross verification module is used for dividing the training set into K subsets which are not overlapped with each other, taking each subset as a verification set in sequence, taking other K-1 subsets as the training set, and training K models to obtain average verification errors of the K models;

the regularization module is used for reducing the complexity of the model through regularization in the model training process and preventing the model from fitting the training set data;

and the model selection module is used for selecting the model which represents the first rank in the verification set as a final machine learning model and evaluating the final machine learning model by using the test set.

In a third aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as any one of the training methods of a machine learning model.

Compared with the prior art, the embodiment of the application has at least the following advantages or beneficial effects:

by collecting more data, diversity and representativeness of the data set is ensured. In addition, data enhancement techniques can be used to augment the data set to balance the number differences between different categories in the data set. Regularization techniques are used to reduce the complexity of the model and prevent overfitting. An interpretable model, such as a decision tree, linear model, bayesian model, etc., is used to better understand the decision process of the model. More efficient algorithms and model architectures are used to reduce the need for computing resources and time.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of steps of a training method of a machine learning model according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a training system module of a machine learning model according to an embodiment of the present application;

fig. 3 is an electronic device provided in an embodiment of the present application.

Icon: 101-memory; 102-a processor; 103-communication interface.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

It should be noted that the term "comprises," "comprising," or any other variation thereof is intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The various embodiments and features of the embodiments described below may be combined with one another without conflict.

Example 1

Referring to fig. 1, fig. 1 is a schematic diagram of steps of a training method of a machine learning model according to an embodiment of the present application, which is as follows:

step S100, acquiring an original data set and dividing the original data set into a training set and a verification set, wherein the training set is used for training a model, and the verification set is used for evaluating the model;

in some embodiments, the data set partitioning generally includes the following three parts:

training set (Training set): for training of the model, i.e. for estimation of model parameters. The size of the training set directly affects the complexity and fit ability of the model, and is typically required to be large enough to cover most of the information of the data set.

Validation set (Validation set): for the selection and adjustment of the model, i.e. for adjusting the hyper-parameters of the model. The validation set is typically smaller in size than the training set, but needs to be large enough to reflect the generalization ability of the model.

Test set (Test set): for evaluating the generalization ability of the model, i.e. for measuring the performance of the model on unseen data. The test set is typically the same size as the validation set, but needs to be large enough to reflect the generalization ability of the model.

The partitioning of the data set needs to take into account several factors:

data set size: the larger the data set, the larger the training set and validation set can be, and the more appropriate the size of the training set and validation set can be, thereby improving the stability and generalization ability of the model.

Data distribution: the distribution of the data sets should be as consistent as possible with the actual application scenario to ensure that the model performs well on unseen data.

Task difficulty: the more difficult the task, the larger data sets and finer data set divisions are required to ensure the performance of the model.

Computing resources: the partitioning of the data set also requires consideration of computational resource constraints to ensure that training and evaluation of the model can be completed within an acceptable time.

In practical applications, the data set partitioning generally employs a cross-validation technique to improve the stability and generalization ability of the model, and to ensure that the deviation and variance of the model from data can be reasonably controlled.

Step S110, dividing the training set into K subsets which are not overlapped with each other, sequentially taking each subset as a verification set, taking other K-1 subsets as the training set, and training K models to obtain average verification errors of the K models;

in some embodiments, the flow of cross-validation is as follows:

A1. the data set is divided into K subsets that do not overlap each other (typically using random partitioning).

A2. For each subset, it is chosen as the validation set, and the remaining subsets are chosen as training sets.

A3. The model is trained and evaluated on a validation set, and performance metrics (e.g., accuracy, precision, recall, etc.) are recorded.

A4. Repeating steps A2 and A3K times, and selecting a different subset as the verification set each time.

A5. And calculating the average value of the K-time performance indexes to be used as the final performance index of the model.

The cross-validation has the advantage that the data can be more fully utilized, and the stability and generalization capability of the model are improved. The performance of the model can be evaluated for multiple times, the influence caused by randomness is reduced, and more accurate performance indexes are obtained. The problems of deviation and variance caused by improper data set division can be avoided, and the generalization capability of the model is improved.

In some embodiments, K-Fold Cross-Validation (K-Fold Cross-Validation) or Leave-One-Out Cross-Validation may also be used to evaluate model performance. The method comprises the steps of dividing a data set into K subsets through K-fold cross validation, selecting one subset as a validation set each time, and taking the rest subsets as training sets; leaving a cross-validation with each sample as the validation set and the remaining samples as the training set.

Step S120, in the model training process, the complexity of the model is reduced through regularization, and the model is prevented from being overfitted with training set data;

in some embodiments, the model complexity is penalized or the model parameters are limited in size by adding a regularization term (Regularization Term) to the model's loss function, thereby reducing the variance of the model and improving the generalization ability of the model.

The basic idea of regularization is to add a regularization term to the model loss function, taking the complexity of the model into account. The usual regularization methods are L1 regularization and L2 regularization, corresponding to Lasso and Ridge regression, respectively.

The regularization term of L1 regularization (Lasso regression) is the sum of absolute values of model parameters, and parameter values of some irrelevant or unimportant features can be contracted to 0, so that the effect of feature selection is achieved. The L1 regularization has the advantages that feature selection can be performed in the model, and some irrelevant or unimportant features are removed; the disadvantage is that regularization terms are not guided and require the use of numerical optimization algorithms to solve.

The regularization term of the L2 regularization (Ridge regression) is the sum of squares of the model parameters, and can restrict the parameter values so as not to be too large, thereby reducing the risk of overfitting. The L2 regularization has the advantages that overfitting can be prevented, and the generalization capability of the model is improved; the disadvantage is that extraneous or unimportant features cannot be culled.

Specific implementations of regularization are typically achieved by adding a regularization term to the model's loss function. Taking L2 regularization as an example, the loss function of the model can be written as:

L(w)＝L0(w)+λ*||w||2^2

where L0 (w) is the original loss function of the model, w is the parameter vector of the model, lambda is the regularization coefficient (Regularization Coefficient), the L2 norm of the parameter vector is denoted by L2 w. When lambda is larger, the influence of regularization term is larger, the complexity of the model is smaller, and therefore the risk of overfitting is reduced; when λ is smaller, the effect of regularization term is smaller, the complexity of the model is higher, and fitting capability of the model is improved.

It should be noted that the effect of regularization is closely related to the choice of regularization coefficients, with too large regularization coefficients resulting in under-fitting and too small regularization coefficients resulting in over-fitting. It is often necessary to select the optimal regularization coefficients by cross-validation or the like.

In some implementations, cross-validation may be used to select the optimal regularization coefficients during model training. In particular, the data set may be divided into subsets, a portion of which is used as the validation set at a time, the remainder as the training set, the model trained, and evaluated on the validation set. The model performance under different regularization coefficients can be obtained through multiple times of cross validation, so that the optimal regularization coefficient is selected.

For example, when L2 regularization is used, the optimal regularization coefficient λ may be selected by cross-validation. The specific steps are as follows:

B1. the dataset is divided into k subsets, with k-1 subsets as training sets and 1 subset as validation set.

B2. For each regularization coefficient λ, a k-1 subset training model was used and model performance was evaluated on the validation set.

B3. Step B2 is repeated until all regularization coefficients are evaluated.

B4. The regularization coefficient with the best performance is selected.

The optimal regularization coefficient is selected through cross verification, so that model overfitting can be effectively avoided, and generalization capability of the model is improved. Meanwhile, regularization can limit the complexity of the model, overfitting is avoided, and performance and stability of the model can be further improved by combining the model with the model.

Step S130, selecting the model representing the first rank in the verification set as the final machine learning model, and evaluating the final machine learning model using the test set.

In some embodiments, performance of the model is improved by adjusting hyper-parameters in the model. Super-parameters are parameters that do not automatically learn during training, and typically need to be set manually prior to training. Common super-parameters include regularization coefficients, learning rate, network depth, activation functions, etc. The irrational selection of the hyper-parameters can lead to over-fitting or under-fitting of the model, thereby affecting the performance of the model.

Super-parameter tuning is generally classified into grid searching, random searching, bayesian optimization and other methods. The grid search is an exhaustive optimization method, and the idea is to search on a preset grid with super-parameter values to find the optimal super-parameter combination. The method has the advantages of simplicity and easy understanding, and is suitable for a small-scale super-parameter space. However, when the hyper-parameter space is large, the grid search can be computationally expensive and unsuitable for handling large-scale hyper-parameter spaces.

Random search is an optimization method based on random sampling, and the idea is to randomly sample a group of super parameters in a super parameter space, train a model and calculate performance indexes, and iterate until the optimal super parameter combination is found. The random search can handle large-scale hyper-parameter space more efficiently than the grid search. However, since the process of random sampling is random, it is not guaranteed that a globally optimal solution is found.

Bayesian optimization is a probability model-based tuning method, the idea is to model the hyper-parameter space by using a gaussian process model, and perform the next training by selecting the hyper-parameter combination with the maximum expected next evaluation performance, so as to minimize the objective function. The Bayesian optimization has the advantages that the objective function can be modeled through the Gaussian process model, so that the sampling times are reduced, and the search time is shortened. The method is particularly remarkable in the conditions of large hyper-parameter space and high calculation cost of the objective function.

Whichever super-parametric tuning method is used, care needs to be taken to the fitting problem. In performing the super-parametric tuning, the data set should be divided into a training set, a validation set and a test set, and the validation set is used to evaluate the model performance, thereby avoiding the situation of overfitting on the training set.

In some embodiments, selecting the model requires consideration of the following:

data set characteristics: different data sets have different features, such as data set size, number of features, sample distribution, etc., and an appropriate model needs to be selected based on the features of the data sets. For example, when the data set sample size is small, decision tree-based models, such as random forests, may be selected because of the high robustness and ease of interpretation of these models.

Task type: different task types require different models to be selected. For example, for classification tasks, models such as logistic regression, support vector machines, decision trees, etc. may be selected; for regression tasks, models of linear regression, ridge regression, neural networks, etc. may be selected.

Model complexity: the complexity of the model directly affects the generalization ability of the model. In general, the higher the complexity of the model, the more likely the problem of overfitting occurs, so that it is necessary to select the appropriate model complexity according to the actual situation.

Training and prediction time: in practical applications, training and prediction time of the model are also important considerations. Some models may require long training times or significant computational resources, which may place certain limitations on the application.

Optimization and debugging of the model: some models require complex optimization and tuning to achieve better performance. For example, neural networks require adjustments to hyper-parameters and optimization of model structures, which require high skill and experience.

When selecting the model, comprehensive consideration needs to be carried out according to specific situations, and the most suitable model is selected. Meanwhile, model evaluation and verification are required to be carried out so as to ensure that the model has better performance and generalization capability.

Example 2

Referring to fig. 2, fig. 2 is a schematic diagram of a training system module of a machine learning model according to an embodiment of the present application, which is as follows:

the data set dividing module is used for acquiring an original data set and dividing the original data set into a training set and a verification set, wherein the training set is used for training a model, and the verification set is used for evaluating the model;

As shown in fig. 3, an embodiment of the present application provides an electronic device including a memory 101 for storing one or more programs; a processor 102. The method of any of the first aspects described above is implemented when one or more programs are executed by the processor 102.

And a communication interface 103, where the memory 101, the processor 102 and the communication interface 103 are electrically connected directly or indirectly to each other to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The memory 101 may be used to store software programs and modules that are stored within the memory 101 for execution by the processor 102 to perform various functional applications and data processing. The communication interface 103 may be used for communication of signaling or data with other node devices.

The Memory 101 may be, but is not limited to, a random access Memory (Random Access Memory, RAM), a Read Only Memory (ROM), a programmable Read Only Memory (Programmable Read-Only Memory, PROM), an erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.

The processor 102 may be an integrated circuit chip with signal processing capabilities. The processor 102 may be a general purpose processor including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

In the embodiments provided in the present application, it should be understood that the disclosed method and training system may be implemented in other manners. The above-described method and training system embodiments are merely illustrative, for example, the flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods and training systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

In another aspect, an embodiment of the application provides a computer readable storage medium having stored thereon a computer program which, when executed by the processor 102, implements a method as in any of the first aspects described above. The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In summary, the training method and training system for machine learning model provided by the embodiments of the present application ensure diversity and representativeness of data sets by collecting more data. In addition, data enhancement techniques can be used to augment the data set to balance the number differences between different categories in the data set. Regularization techniques are used to reduce the complexity of the model and prevent overfitting. An interpretable model, such as a decision tree, linear model, bayesian model, etc., is used to better understand the decision process of the model. More efficient algorithms and model architectures are used to reduce the need for computing resources and time.

The above is only a preferred embodiment of the present application, and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

It will be evident to those skilled in the art that the application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims

1. A method of training a machine learning model, comprising:

acquiring an original data set and dividing the original data set into a training set and a verification set, wherein the training set is used for training a model, and the verification set is used for evaluating the model;

2. The method of training a machine learning model of claim 1 wherein the acquiring the raw data set and dividing the raw data set into a training set and a validation set, wherein the training set is used for training the model and the validation set is used for evaluation of the model comprises:

according to the size of the data set, the sizes of the training set and the verification set are increased or decreased, so that the stability and generalization capability of the machine learning model are improved, and the distribution of the data set and the actual application scene are kept in the same technical field.

3. The method for training a machine learning model according to claim 1, wherein dividing the training set into K subsets that do not overlap each other, sequentially taking each subset as a verification set, taking other K-1 subsets as training sets, training the K models, and obtaining an average verification error of the K models includes:

A1. dividing the data set into K subsets that do not overlap each other;

A2. for each subset, selecting the subset as a verification set, and the rest subsets as training sets;

A3. training a model, evaluating on a verification set, and recording performance indexes;

A4. repeating the step A2 and the step A3 for K times, and selecting a different subset as a verification set each time;

A5. and calculating an average value of the K times of performance indexes as a final performance index of the machine learning model.

4. A method of training a machine learning model as claimed in claim 3, further comprising:

the model performance is typically evaluated using a K-fold cross-validation that divides the data set into K subsets, one subset at a time being selected as the validation set, the remaining subsets being the training set, and/or a leave-one cross-validation that takes each sample as the validation set, the remaining samples being the training set.

5. The method for training a machine learning model of claim 1, wherein said reducing the complexity of the model by regularization during model training to prevent the model from overfitting the training set data comprises:

B1. dividing the data set into k subsets, wherein k-1 subsets are used as training sets and 1 subset is used as verification set;

B2. for each regularization coefficient λ, training a model using k-1 subsets and evaluating model performance on a validation set;

B3. repeating the step B2 until all regularization coefficients are evaluated;

B4. the regularization coefficient of the first performance rank is selected.

6. The method of training a machine learning model of claim 5 further comprising:

the regularization coefficient lambda is adjusted during training, too small lambda can cause over fitting, too large lambda can cause under fitting, and the problem of over fitting or under fitting caused by randomness of a data set is avoided.

7. The method of training a machine learning model of claim 6 further comprising:

the performance of the model is improved by adjusting super-parameters in the model, including regularization coefficients, learning rate, network depth, and activation functions.

8. A training system for a machine learning model, comprising:

9. A computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the method according to any of claims 1-7.