CN112668639A

CN112668639A - Model training method and device, server and storage medium

Info

Publication number: CN112668639A
Application number: CN202011576010.1A
Authority: CN
Inventors: 辛永欣
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2021-04-16

Abstract

The invention discloses a model training method, a device, a server and a storage medium, wherein the method comprises the following steps: determining an initial image recognition model and an initial sample number for model training; taking the initial image recognition model as a current image recognition model, taking the number of initial samples as the number of current samples, and executing a model parameter adjustment step; determining whether the number of the current samples needs to be adjusted or not based on N gradient matrixes obtained by N rounds of training; if so, increasing the number of the current samples, taking the increased number of the samples as the number of the current samples, and executing the model parameter adjusting step again until the training of the initial image recognition model is completed. The method can effectively improve the efficiency of model training and enable the training model to reach the optimal model more quickly.

Description

Model training method and device, server and storage medium

Technical Field

The invention relates to the technical field of computers, in particular to a model training method, a model training device, a model training server and a storage medium.

Background

With the continuous development of scientific technology, image recognition technology is applied in more and more fields, and image recognition can be generally realized through a machine learning model. In the prior art, when a machine learning model is used for identifying an image, iterative training needs to be performed on the machine model. In the iterative model training process, model parameters need to be adjusted to obtain an optimal machine model.

The hyper-parameter is a parameter set before the model training process is started, and is not parameter data obtained by training, and there are many hyper-parameters such as a learning rate, the number of network layers, the number of neuron nodes, Batch _ Size (the Size of a sample number selected per training), and the like. In the field of machine learning, the setting of the hyper-parameters has direct influence on the performance of the model, and the appropriate hyper-parameters can enable the model to converge more quickly to achieve the optimal model. Therefore, how to optimize the hyper-parameters is crucial in deep learning.

In the prior art, the hyper-parameters are usually adjusted by manual parameter adjustment, gridding optimization, random optimization and the like. For the Batch _ Size, in the prior art, the optimal Batch _ Size is obtained by traversing all values.

However, in the process of implementing the technical solution of the invention in the embodiments of the present application, the inventors of the present application find that the above-mentioned technology has at least the following technical problems:

the mode of manual parameter adjustment and the like causes low efficiency of model optimization training, slow convergence speed of the model and waste of a large amount of resources.

Disclosure of Invention

By providing the model training method, the model training device, the server and the storage medium, the technical problems that in the prior art, the model optimization training efficiency is low, the model convergence speed is low, and a large amount of resources are wasted due to the manual parameter adjustment and other modes are solved, and the technical effects of improving the model training efficiency, improving the model convergence speed and saving resources are achieved.

In a first aspect, the present application provides a model training method, the method comprising:

determining an initial image recognition model and an initial sample number for model training;

taking the initial image recognition model as a current image recognition model, taking the initial sample number as a current sample number, and executing a model parameter adjusting step, wherein the model parameter adjusting step comprises the following steps: performing N rounds of training on the current image recognition model based on the sample images of the current sample number to obtain a gradient matrix corresponding to each round of training; adjusting model parameters based on the gradient matrix corresponding to each round of training, wherein N is a positive integer greater than 1;

determining whether the number of the current samples needs to be adjusted or not based on N gradient matrixes obtained by N rounds of training; if so, increasing the number of the current samples, taking the increased number of the samples as the number of the current samples, and executing the model parameter adjusting step again until the training of the initial image recognition model is completed.

Preferably, the determining whether the current sample number needs to be adjusted by the N gradient matrices obtained by the N rounds of training includes:

determining two target gradient matrixes from the N gradient matrixes;

determining whether an adjustment to the current number of samples is needed based on the two target gradient matrices.

Preferably, the determining whether the current sample number needs to be adjusted based on the two target gradient matrices includes:

extracting M groups of gradient vector groups from the two target gradient matrixes, wherein each group of gradient vector group comprises two gradient vectors which are respectively positioned at the same position of the two target gradient matrixes, and M is a positive integer;

and calculating cosine values of two gradient vectors in each group of gradient vector groups, and determining whether the current sample number needs to be adjusted or not based on the cosine values corresponding to each group of gradient vector groups.

Preferably, the determining whether the current sample number needs to be adjusted based on the cosine values corresponding to each group of gradient vectors includes:

if the cosine values corresponding to each group of gradient vector groups are all larger than a threshold value, determining that the number of the current samples needs to be adjusted;

and if the cosine values corresponding to each group of gradient vector groups are all smaller than the threshold value, keeping the number of the current samples unchanged.

Preferably, the increasing the current sample number and taking the increased sample number as the current sample number includes:

and doubling the number of the current samples, and taking the doubled number of the samples as the number of the current samples.

Preferably, before determining whether the current sample number needs to be adjusted based on N gradient matrices obtained by N rounds of training, the method further includes:

judging whether the number of times of model training reaches a preset number of times;

if not, executing the N gradient matrixes obtained based on N rounds of training, and determining whether the current sample number needs to be adjusted;

and if so, determining to finish the training of the initial image recognition model.

In a second aspect, the present application provides a model training apparatus, the apparatus comprising:

the first processing module is used for determining an initial image recognition model and the number of initial samples for model training;

a second processing module, configured to use the initial image recognition model as a current image recognition model, use the initial sample number as a current sample number, and perform a model parameter adjustment step, where the model parameter adjustment step includes: performing N rounds of training on the current image recognition model based on the sample images of the current sample number to obtain a gradient matrix corresponding to each round of training; adjusting model parameters based on the gradient matrix corresponding to each round of training, wherein N is a positive integer greater than 1;

the third processing module is used for determining whether the number of the current samples needs to be adjusted or not based on N gradient matrixes obtained by N rounds of training; if so, increasing the number of the current samples, taking the increased number of the samples as the number of the current samples, and executing the model parameter adjusting step again until the training of the initial image recognition model is completed.

determining two target gradient matrixes from the N gradient matrixes;

Preferably, before determining whether the current sample number needs to be adjusted based on N gradient matrices obtained by N rounds of training, the apparatus further includes:

the judging module is used for judging whether the model training times reach preset times or not;

In a third aspect, the present application provides a server, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the aggregation task processing method when executing the program.

In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods described above.

One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:

in the model training method of the embodiment of the invention, firstly, an initial image recognition model and the number of initial samples for model training are determined; then, taking the initial image recognition model as a current image recognition model, taking the initial sample number as a current sample number, and executing a model parameter adjusting step, wherein the model parameter adjusting step comprises the following steps: performing N rounds of training on the current image recognition model based on the sample images of the current sample number to obtain a gradient matrix corresponding to each round of training; secondly, adjusting model parameters based on the gradient matrix corresponding to each round of training, wherein N is a positive integer greater than 1; further, determining whether the number of the current samples needs to be adjusted or not based on N gradient matrixes obtained by N rounds of training; if so, increasing the number of the current samples, taking the increased number of the samples as the number of the current samples, and executing the model parameter adjusting step again until the training of the initial image recognition model is completed.

In the above scheme, the model parameters are adjusted based on the gradient matrix obtained by model training, and the number of training samples used in each model training is further adjusted. Model parameters and the size of a training sample used for model training each time are continuously and dynamically adjusted in a circulating training mode, so that the model is trained to be the most model quickly. The method effectively solves the technical problems that the optimization training efficiency of the image recognition model is low, the model convergence speed is low and a large amount of resources are wasted due to manual parameter adjustment and other modes, and achieves the technical effects of improving the training efficiency of the image recognition model, improving the convergence speed of the image recognition model and saving training resources.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a flowchart of a model training method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a model training apparatus according to a second embodiment of the present invention;

fig. 3 is a schematic diagram of a server according to a third embodiment of the present invention.

Detailed Description

The embodiment of the application provides a model training method, and the technical problems that in the prior art, the model optimization training efficiency is low, the model convergence speed is low, and a large amount of resources are wasted due to the manual parameter adjusting mode and the like are solved.

In order to solve the technical problems, the general idea of the embodiment of the application is as follows:

a method of model training, the method comprising: determining an initial image recognition model and an initial sample number for model training; taking the initial image recognition model as a current image recognition model, taking the initial sample number as a current sample number, and executing a model parameter adjusting step, wherein the model parameter adjusting step comprises the following steps: performing N rounds of training on the current image recognition model based on the sample images of the current sample number to obtain a gradient matrix corresponding to each round of training; adjusting model parameters based on the gradient matrix corresponding to each round of training, wherein N is a positive integer greater than 1; determining whether the number of the current samples needs to be adjusted or not based on N gradient matrixes obtained by N rounds of training; if so, increasing the number of the current samples, taking the increased number of the samples as the number of the current samples, and executing the model parameter adjusting step again until the training of the initial image recognition model is completed. In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.

Example one

The present embodiment provides a model training method, as shown in fig. 1, which is a flowchart of the model training method provided in the embodiments of the present specification, and the method includes the following steps:

step S101: determining an initial image recognition model and an initial sample number for model training;

step S102: taking the initial image recognition model as a current image recognition model, taking the initial sample number as a current sample number, and executing a model parameter adjusting step, wherein the model parameter adjusting step comprises the following steps: performing N rounds of training on the current image recognition model based on the sample images of the current sample number to obtain a gradient matrix corresponding to each round of training; adjusting model parameters based on the gradient matrix corresponding to each round of training, wherein N is a positive integer greater than 1;

step S103: determining whether the number of the current samples needs to be adjusted or not based on N gradient matrixes obtained by N rounds of training; if so, increasing the number of the current samples, taking the increased number of the samples as the number of the current samples, and executing the model parameter adjusting step again until the training of the initial image recognition model is completed.

The model training method provided by the embodiment of the specification can be applied to the model training process under various scenes. The present specification mainly describes the present invention by taking model training in an image recognition scenario as an example.

In the specific implementation process, step S101 is first executed: an initial image recognition model is determined, as well as an initial number of samples for model training.

The initial image recognition model may be a training model that is not trained for performing image recognition, and the initial image recognition model may be selected according to actual needs, for example, the initial image recognition model may be a convolutional neural network model, a fast convolutional neural network model, or the like. The initial sample number may be an initially set value of Batch _ Size, and the initial sample number may be randomly set or obtained according to a priori experience.

Further, step S102 is executed: taking the initial image recognition model as a current image recognition model, taking the initial sample number as a current sample number, and executing a model parameter adjusting step, wherein the model parameter adjusting step comprises the following steps: performing N rounds of training on the current image recognition model based on the sample images of the current sample number to obtain a gradient matrix corresponding to each round of training; and adjusting model parameters based on the gradient matrix corresponding to each round of training, wherein N is a positive integer greater than 1.

In a specific implementation process, the initial image recognition model needs to gradually adjust model parameters through multiple rounds of training so as to obtain a final image recognition model with optimal model parameters. In this embodiment of the present specification, a training process of an initial image recognition model may be implemented through a model parameter adjustment step, specifically, sample images of an initial sample number are obtained, input data of the image recognition model is determined based on the sample images, and the input data may be determined according to actual needs, for example, feature extraction is performed on the sample images to obtain image features, and the image features are used as input data of the image recognition model, or the sample images may be directly used as input data of the image recognition model.

In the first round of training, the initial image recognition model is used as the current recognition model, and input data determined based on the sample image is input into the current image recognition model to obtain an image recognition result. And calculating a loss function based on the image recognition result, wherein the selection of the loss function can be set according to actual needs, and is not limited here. After the loss function is obtained, the gradient matrix of the round of training is calculated, and the parameters of the current image recognition model are correspondingly adjusted based on the gradient matrix. Further, the image recognition model after parameter adjustment is used as the current recognition model, the parameter adjustment step is executed again, namely, the second round of training is carried out, and N rounds of model training are completed through the steps, wherein the number of samples used in each round of training is the same.

In the embodiment of the specification, because the number of the training samples can affect the training speed of the model, the larger the number of the training samples is, the more accurate the gradient descending direction is, and the smaller the training oscillation is caused. Therefore, after N rounds of model training are performed using the same number of samples, the gradient drop at that number of samples can be detected to determine whether an adjustment to the number of training samples is needed. Namely, step S103: determining whether the number of the current samples needs to be adjusted or not based on N gradient matrixes obtained by N rounds of training; if so, increasing the number of the current samples, taking the increased number of the samples as the number of the current samples, and executing the model parameter adjusting step again until the training of the initial image recognition model is completed.

Specifically, for different training iteration processes, if the gradient vectors between the gradient matrices are more similar, indicating that the gradient descending direction is more accurate, the corresponding training sample number is more reasonable. On the contrary, if the difference between the gradient vectors of the gradient matrixes is larger in different training iteration processes, the larger the oscillation occurring in the training process is, and the number of training samples can be adjusted in order to improve the training precision and reduce the oscillation in the training process. In the embodiment of the present specification, when the number of training samples needs to be adjusted, the number of samples may be enlarged based on the current number of training samples, the model is trained again using the enlarged number of samples, and meanwhile, the similarity of gradient vectors between gradient matrices obtained in the training process after the number of samples is adjusted is detected, if the similarity is large, the number of samples continues to be increased, and if the similarity satisfies a preset condition, the number of samples is kept unchanged. And repeating the processes until the training process of the image recognition model is completed.

In this embodiment of the present specification, for N gradient matrices obtained under the same number of training samples, when determining whether to adjust the training samples, the N gradient matrices may be determined according to all gradient matrices or a part of gradient matrices. For example, the determination may be made based on all of the N gradient matrices, and a gradient vector may be determined in each gradient matrix, and the similarity between all gradient vectors may be compared to determine whether to adjust the number of samples. The determination is performed according to a partial gradient matrix, a partial gradient matrix may be selected from the N gradient matrices, and a similarity between gradient vectors may be determined according to the gradient vectors included in the partial gradient matrix to determine whether to adjust the number of samples.

As a preferred embodiment, the determining whether the current sample number needs to be adjusted by the N gradient matrices obtained by the N training rounds includes: determining two target gradient matrixes from the N gradient matrixes; determining whether an adjustment to the current number of samples is needed based on the two target gradient matrices.

The target gradient matrix may be determined in various ways, for example, a gradient matrix obtained by a preset round of training is selected as the target gradient matrix, or two gradient matrices are randomly extracted from N gradient matrices as the target gradient matrix, or two gradient matrices of adjacent rounds are randomly extracted as the target gradient matrix. And then judging whether to adjust the Size of the Batch _ Size according to the determined target gradient matrix. The embodiments of the present disclosure preferably randomly extract gradient matrices of two adjacent rounds as target gradient matrices, for example, randomly extract gradient matrices corresponding to the K-th round and the K + 1-th round of training as target gradient matrices.

As a preferred embodiment, the determining whether the current sample number needs to be adjusted based on the two target gradient matrices includes: extracting M groups of gradient vector groups from the two target gradient matrixes, wherein each group of gradient vector group comprises two gradient vectors which are respectively positioned at the same position of the two target gradient matrixes, and M is a positive integer; and calculating cosine values of two gradient vectors in each group of gradient vector groups, and determining whether the current sample number needs to be adjusted or not based on the cosine values corresponding to each group of gradient vector groups.

In a specific implementation process, the gradient matrixes obtained after each round of training are the same in size, and M groups of gradient vector groups are extracted from two target gradient matrixes, wherein the extraction can be performed at any position of the gradient matrixes, and the extraction can also be performed at a preset position. For the gradient vectors in each set of gradient vectors, the vectors at the same positions of the two target gradient matrices are respectively corresponding. For example, the two target gradient matrices are respectively a matrix a and a matrix B, the sizes of the matrices are i rows and j columns, and the gradient vectors in the third row and the fourth column in the matrix a and the gradient vectors in the third row and the fourth column in the matrix B are randomly selected as a group of gradient vectors. And then calculating cosine values of each extracted gradient vector group, such as: two gradient vectors contained in one gradient vector group are a and b respectively, cosine values are represented by cos theta, and a cosine value calculation formula is as follows:

the relation between the cosine value and the preset threshold value is used as the judgment condition for adjusting, and the judgment condition can be in various situations, such as: if one group of cosine values corresponding to the M groups of gradient vector groups is larger than the threshold value, determining that the number of the current samples needs to be adjusted; if more than half of cosine values corresponding to the M groups of gradient vector groups are larger than a threshold value, determining that the number of the current samples needs to be adjusted; and if the cosine values corresponding to each gradient vector group are all larger than the threshold value, determining that the number of the current samples needs to be adjusted, and the like.

In this embodiment of the present description, taking extracting a group of gradient vector sets as an example, that is, M is 1, extracting a group of gradient vector sets at the same position in two target gradient matrices, respectively, calculating a cosine value of the group of gradient vectors, if the cosine value is greater than a threshold, adjusting the number of the current samples, if the cosine value is less than the threshold, keeping the number of the current samples unchanged, and training the image recognition model again.

As a preferred embodiment, the increasing the current sample number and taking the increased sample number as the current sample number includes: and doubling the number of the current samples, and taking the doubled number of the samples as the number of the current samples.

For example, if the current sample number is a, N rounds of training are performed on the image recognition model, two gradient matrices are randomly extracted from the N gradient matrices obtained after the training as target gradient matrices, then two gradient vectors at the same position are randomly extracted from the two target gradient matrices, and a cosine value is calculated on the two gradient vectors to obtain the cosine value. And comparing the cosine value with a threshold value, when the cosine value is greater than the threshold value, determining that the number of the current samples is to be increased, adjusting the number of the current samples to be 2A, and continuing to perform N-round training on the model. It should be noted that the number of times of training with different current sample numbers each time may be set according to actual needs, that is, the value of N may be arbitrary, for example, when the number of samples is a, the image recognition model is trained 5 times, and when the number of samples is 2A, the image recognition model is trained 8 times.

As a preferred embodiment, before determining whether the current sample number needs to be adjusted based on N gradient matrices obtained by N rounds of training, the method further includes: judging whether the number of times of model training reaches a preset number of times; if not, executing the N gradient matrixes obtained based on N rounds of training, and determining whether the current sample number needs to be adjusted; and if so, determining to finish the training of the initial image recognition model.

Specifically, each image recognition model is set with a training number, i.e., a preset number, before being trained. Before determining whether the Batch _ Size needs to be adjusted, judging whether the training times of the image recognition model reach preset times, and if the training times reach the preset times, finishing the training; if the preset times are not reached, further judging whether the Batch _ Size needs to be adjusted or not, and then training the image recognition model again.

In order to better explain the scheme, the specification also provides a model training method flow under the preferred embodiment. The following explains the procedure:

firstly, the initial image recognition model is determined as the current image recognition model, and an initial Batch _ Size is set as the current Batch _ Size.

Then, a parameter adjusting step is executed:

firstly, performing N-round training on a current image recognition model by using a current Batch _ Size;

further, two adjacent gradient matrixes are calculated and randomly extracted from the N rounds of training results to serve as target gradient matrixes;

further, randomly extracting two gradient vectors at the same position from the two target gradient matrixes;

second, cosine values of the two gradient vectors are calculated.

And then judging whether the cosine values of the two gradient vectors are larger than a preset threshold value or not, and if so, repeatedly executing the parameter adjusting step.

If the cosine value is larger than the threshold value, further judging whether the training times reach the preset times, if not, doubling the Size of the Batch _ Size, and taking the increased Batch _ Size as the current Batch _ Size to execute the parameter adjusting step again.

And when the training times reach the preset times, ending the model training.

The technical scheme in the embodiment of the invention at least has the following technical effects or advantages:

in the embodiment of the invention, the cosine value is calculated by randomly extracting the target gradient matrix and randomly extracting the gradient vector in the target gradient matrix, and then the number of training samples is dynamically adjusted by comparing the cosine value with the threshold value. Compared with the manual adjustment of the number of training samples in the prior art, the training efficiency of the image recognition model is greatly improved, the training time is shortened, the image recognition model is converged more quickly, and the training resources are saved.

Example two

Based on the same inventive concept, as shown in fig. 2, an embodiment of the present specification provides a model training apparatus 200, including:

a first processing module 201, configured to determine an initial image recognition model and an initial sample number for model training;

a second processing module 202, configured to use the initial image recognition model as a current image recognition model, use the initial sample number as a current sample number, and perform a model parameter adjustment step, where the model parameter adjustment step includes: performing N rounds of training on the current image recognition model based on the sample images of the current sample number to obtain a gradient matrix corresponding to each round of training; adjusting model parameters based on the gradient matrix corresponding to each round of training, wherein N is a positive integer greater than 1;

the third processing module 203 determines whether the current sample number needs to be adjusted based on N gradient matrices obtained by N rounds of training; if so, increasing the number of the current samples, taking the increased number of the samples as the number of the current samples, and executing the model parameter adjusting step again until the training of the initial image recognition model is completed.

As an alternative embodiment, the determining whether the current sample number needs to be adjusted by the N gradient matrices obtained by the N rounds of training includes:

determining two target gradient matrixes from the N gradient matrixes;

As an alternative embodiment, the determining whether the current sample number needs to be adjusted based on the two target gradient matrices includes:

As an optional embodiment, the determining whether the current sample number needs to be adjusted based on the cosine values corresponding to each group of gradient vector groups includes:

As an alternative embodiment, the increasing the current sample number and taking the increased sample number as the current sample number includes:

As an optional embodiment, before determining whether the current sample number needs to be adjusted based on N gradient matrices obtained by N rounds of training, the apparatus further includes:

With regard to the above-mentioned apparatus, the specific functions of the respective modules have been described in detail in the embodiment of the model training method provided in the embodiment of the present specification, and will not be elaborated herein.

EXAMPLE III

Based on the same inventive concept as the aggregation task processing method in the foregoing embodiment, an embodiment of this specification further provides a server, as shown in fig. 3, including:

a memory 304, a processor 302 and a computer program stored on the memory 304 and executable on the processor 302, the processor 302 implementing the steps of the model training method described above when executing the program.

Where in fig. 3 a bus architecture (represented by bus 300), bus 300 may include any number of interconnected buses and bridges, bus 300 linking together various circuits including one or more processors, represented by processor 302, and memory, represented by memory 304. The bus 300 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 306 provides an interface between the bus 300 and the receiver 301 and transmitter 303. The receiver 301 and the transmitter 303 may be the same element, i.e., a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 302 is responsible for managing the bus 300 and general processing, and the memory 304 may be used for storing data used by the processor 302 in performing operations.

Example four

Based on the same inventive concept, the present specification provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of any one of the model training methods described above.

Since the electronic device described in this embodiment is an electronic device used for implementing the method for processing information in this embodiment, a person skilled in the art can understand the specific implementation manner of the electronic device of this embodiment and various variations thereof based on the method for processing information described in this embodiment, and therefore, how to implement the method in this embodiment by the electronic device is not described in detail here. Electronic devices used by those skilled in the art to implement the method for processing information in the embodiments of the present application are all within the scope of the present application.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method of model training, the method comprising:

2. The model training method of claim 1, wherein the N gradient matrices obtained from the N rounds of training to determine whether the current number of samples needs to be adjusted comprises:

determining two target gradient matrixes from the N gradient matrixes;

3. The model training method of claim 2, wherein said determining whether an adjustment to the current number of samples is needed based on the two target gradient matrices comprises:

4. The model training method of claim 3, wherein the determining whether the current number of samples needs to be adjusted based on the cosine values corresponding to each group of gradient vectors comprises:

5. The model training method of claim 1, wherein the increasing the current sample number and taking the increased sample number as the current sample number comprises:

6. The model training method of claim 1, wherein before determining whether the current number of samples needs to be adjusted based on N gradient matrices obtained by N rounds of training, the method further comprises:

7. A model training apparatus, the apparatus comprising:

8. The model training apparatus of claim 7, wherein before determining whether the current number of samples needs to be adjusted based on N gradient matrices obtained from N rounds of training, the apparatus further comprises:

9. A server, comprising a processor and a memory:

the memory is used for storing a program for executing the method of any one of claims 1-6;

the processor is configured to execute programs stored in the memory.

10. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.