CN114417982A

CN114417982A - Model training method, terminal device and computer readable storage medium

Info

Publication number: CN114417982A
Application number: CN202111642717.2A
Authority: CN
Inventors: 陶超; 周红林; 范先旭; 彭少杰; 龙汉
Original assignee: Shenzhen Juding Medical Co Ltd
Current assignee: Shenzhen Juding Medical Co Ltd
Priority date: 2021-12-29
Filing date: 2021-12-29
Publication date: 2022-04-29

Abstract

The application discloses a model training method, a terminal device and a computer readable storage medium, wherein the method comprises the following steps: reading a training data set; constructing a distributed network model, wherein the distributed network model comprises a plurality of nodes on which the same neural network is deployed; distributing the training data set to a neural network on each node for training; and training a final distributed network model based on model parameters obtained by the neural network training of each node. By the method, the training data set can be distributed to the neural network of each node for training, the training time is shortened, and the model training efficiency is effectively improved.

Description

Model training method, terminal device and computer readable storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a model training method, a terminal device, and a computer-readable storage medium.

Background

In recent years, natural language processing technology represented by pre-training in the field of artificial intelligence has gained explosive development, and new technologies and new models have emerged. Under the background of a new era, how to efficiently apply diversified scientific achievements in the advanced natural language processing field to industrial practice and solve practical problems is a core problem in the natural language processing field. Machine learning models are key technologies in the field of artificial intelligence, and developers usually carry out related work based on a machine learning framework.

However, in the process of applying the model to industrial practice, the data volume is increased due to complex application scenarios, and the traditional model training method has low training efficiency and cannot meet the requirements of users on an efficient model training method.

Disclosure of Invention

The application provides a model training method, a terminal device and a computer readable storage medium, which aim to solve the technical problem of low training efficiency in the prior art.

In order to solve the above problems, the first technical solution provided by the present application is: providing a model training method, wherein the model is distributed in a plurality of nodes for training, and the model training method comprises the following steps:

reading a training data set;

constructing a distributed network model, wherein the distributed network model comprises deploying the same neural network on a plurality of nodes;

distributing the training data set to a neural network on each node for training;

and training a final distributed network model based on model parameters obtained by the neural network training of each node.

In order to solve the above technical problem, a second technical solution provided by the present application is: providing a terminal device comprising a memory and a processor coupled to the memory; wherein the memory is configured to store program data and the processor is configured to execute the program data to implement the model training method as described above.

In order to solve the above technical problem, a third technical solution provided by the present application is: a computer-readable storage medium is provided, the storage medium storing program instructions that, when executed, implement the model training method described above.

In the model training method provided by the application, terminal equipment reads a training data set; constructing a distributed network model, wherein the distributed network model comprises a plurality of nodes on which the same neural network is deployed; distributing the training data set to a neural network on each node for training; and training a final distributed network model based on model parameters obtained by the neural network training of each node. By the model training method, the training data set can be distributed to the neural network of each node for training, the training time is shortened, and the model training efficiency is effectively improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:

FIG. 1 is a schematic flow chart diagram of a first embodiment of a model training method provided herein;

FIG. 2 is a schematic flow chart diagram of a second embodiment of a model training method provided by the present application;

FIG. 3 is a schematic flow chart diagram of a third embodiment of a model training method provided by the present application;

FIG. 4 is a schematic flow chart diagram of a fourth embodiment of the model training method provided by the present application;

FIG. 5 is a schematic flow chart diagram of a fifth embodiment of a model training method provided by the present application;

FIG. 6 is a schematic flow chart diagram illustrating a sixth embodiment of a model training method provided by the present application;

fig. 7 is a schematic structural diagram of an embodiment of a terminal device provided in the present application;

FIG. 8 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided in the present application.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, embodiments accompanying the present application are described in detail below with reference to the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first", "second", etc. in this application are used to distinguish between different objects and not to describe a particular order. Furthermore, the terms "include" and "provided," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

Referring to fig. 1, fig. 1 is a schematic flow chart of a first embodiment of a model training method provided in the present application. The model training method is applied to a terminal device, and the terminal device can be a server or a system in which the server and a local terminal are matched with each other. Accordingly, each part, such as each unit, sub-unit, module, and sub-module, included in the terminal device may be entirely disposed in the server, or may be disposed in the server and the local terminal, respectively.

The model obtained by training the model training method according to the embodiment of the present application may be applied to disease prediction, for example, the model may predict cancers such as breast cancer and lung cancer based on an input pathological section image or medical image, and an application scenario of the model is not specifically limited herein.

Further, the server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules, for example, software or software modules for providing distributed servers, or as a single software or software module, and is not limited herein. In some possible implementations, the model training method of the embodiments of the present application may be implemented by a processor calling computer-readable instructions stored in a memory.

As shown in fig. 1, the model of the embodiment of the present application is distributed to a plurality of nodes for training, and the specific steps of the model training method are as follows:

step S11: a training data set is read.

In the embodiment of the application, the terminal device reads the training data set to perform model training. Alternatively, the training data may be acquired for the relevant area by using an image acquisition device or the like; or can be obtained from databases of relevant institutions such as hospitals and schools; the training data set may also be obtained in a machine-learned or deep-learned database, for example, the training data set may be a UCI data set, or may be other machine-learned or deep-learned data sets. Here, the source of the training data is not particularly limited.

Optionally, after reading the training data set, the terminal device needs to divide the training data set into a training set and a test set, where the training set is used for the model to learn the training set data, and the test set is used for testing the prediction accuracy of the model on the subset. The training set and the test set may be divided by means of hierarchical sampling, random segmentation, and the like, and the method for dividing the training data set is not particularly limited.

Step S12: and constructing a distributed network model, wherein the distributed network model comprises deploying the same neural network on a plurality of nodes.

Specifically, the model of the application can be a machine learning model or a deep learning model, and the terminal device obtains the model and constructs a distributed network model, wherein the distributed network model comprises a plurality of nodes on which the same neural network is deployed, so that the model is distributed on the plurality of nodes for training.

Step S13: the training data set is distributed to the neural network on each node for training.

Optionally, before training with the training data set, data in the training data set may be cleaned to remove impure data, so as to avoid overfitting during model training and ensure training effect.

After the terminal equipment acquires the training data set, the training data set is distributed to the neural network on each node, network parameters are initialized according to model configuration, and the terminal equipment controls the neural network of each node to respectively train part of the training data set.

Step S14: and training a final distributed network model based on model parameters obtained by the neural network training of each node.

The neural network of each node trains part of the training data sets respectively to obtain model parameters of each node, and the terminal equipment trains a final distributed network model based on the model parameters obtained by the neural network training of each node.

In an embodiment of the application, a terminal device reads a training data set; constructing a distributed network model, wherein the distributed network model comprises a plurality of nodes on which the same neural network is deployed; distributing a training data set to a neural network on each node for training; and training a final distributed network model based on model parameters obtained by the neural network training of each node. By the method, the training data set can be distributed to the neural network of each node for training, the training time is shortened, and the model training efficiency is effectively improved.

Referring to fig. 2, fig. 2 is a schematic flowchart of a second embodiment of the model training method provided in the present application. As shown in fig. 2, step S13 further includes the following steps:

step S21: the training data set is sliced to obtain the same number of training data subsets as the number of nodes.

Specifically, the training data in the training data set includes features and labels, and when the terminal device slices the training data set, the data and the labels are separated, and the training data set is sliced to obtain training data subsets with the same number as the number of nodes.

Optionally, the data in the training data set may be digitized and normalized prior to slicing the training data set to normalize the training data. Specifically, the terminal device carries out standardized processing on the characteristics of the training data, so that the size of the training data is in a reasonable range, the phenomenon that the model excessively captures the characteristics with large numerical values during subsequent model training and lacks the characteristics with small numerical values is avoided, the influence of the excessively large and/or excessively small data on model training is reduced, and the training effect is improved.

Optionally, after the training data is normalized, the training data may be subjected to dimensionality reduction to project the training data with high dimensionality on low dimensionality, so that the projection points of the training data of each category are as close as possible, and the distances between the category centers of the data of different categories are as large as possible, thereby better distinguishing the training data of different categories. In a specific embodiment, the data dimension reduction method may use a Fisher's Linear Discriminant Analysis (LDA) and/or a Principal Component Analysis (PCA), or may use other dimension reduction methods, which are not limited herein.

Step S22: and distributing a plurality of training data subsets to the neural network on the corresponding nodes for training.

After the terminal equipment obtains the training data subsets with the same number as the nodes, the training data subsets are distributed to the neural networks on the corresponding nodes one by one for training.

In the embodiment of the application, the terminal device slices the training data set to obtain training data subsets with the same number as the number of nodes, and distributes the training data subsets to the neural network on the corresponding nodes for training. By the method of the embodiment, the terminal equipment can realize the model training of data parallel, reduce the training time and improve the model training speed.

Referring to fig. 3, fig. 3 is a schematic flow chart of a third embodiment of the model training method provided in the present application. As shown in fig. 3, after step S22, the model training method further includes the following steps:

step S31: and after the training of the training data subsets distributed to the corresponding nodes is finished, distributing the training data subsets of the nodes to the neural networks of other nodes for continuous training.

Specifically, because the neural networks of different nodes are used for training, the training schedules of different nodes are inconsistent, after the training of the training data subsets distributed to the corresponding nodes is completed, the terminal device distributes the training data subsets of the point to the neural networks of other nodes, and the neural networks of other nodes receive the training data subsets of the point and continue training, so that the training effect of other nodes is improved.

Step S32: and acquiring the training data subsets of other nodes, and continuing training the neural network of the node.

Similarly, after the training of the training data subsets distributed to the corresponding nodes is completed, the terminal device may acquire more training data subsets of other nodes to continue training the neural network of the node, thereby improving the training effect of the node.

In the embodiment of the application, after the training of the training data subsets distributed to the corresponding nodes is completed, the terminal equipment distributes the training data subsets of the nodes to the neural networks of other nodes for continuous training; and simultaneously, acquiring training data subsets of other nodes, and continuing training the neural network of the node. By the method of the embodiment, after each node finishes training the training data subset corresponding to the node, the training data subsets of other nodes can be obtained for cross training, and the training effect of the model is further improved.

Optionally, in an embodiment, the step S14 further includes the following steps: and sharing the model parameters obtained by training the neural network of each node to the neural networks of other nodes so as to update the neural networks of other nodes according to the model parameters of the neural networks of a plurality of nodes.

After each node is trained, the model parameters obtained by training the respective node are shared to the neural networks of other nodes, and each node obtains the model parameters obtained by training the other nodes and updates the model parameters according to the model parameters of the neural networks of a plurality of nodes. In an embodiment, when the training time difference of the plurality of nodes is large, an update time may be preset, and when the training of the node is completed, the model parameters of other nodes that have completed training within the update time are obtained, the mean value of the model parameters of the node and the model parameters of other nodes is calculated, and the model parameters of the node are updated according to the mean value of the plurality of nodes.

Referring to fig. 4, fig. 4 is a schematic flowchart illustrating a fourth embodiment of the model training method provided in the present application. As shown in fig. 4, in another embodiment, step S14 includes the steps of:

step S41: and uploading model parameters obtained by the neural network training of each node to the distributed network model.

Specifically, in another embodiment, when the computing capabilities of each node are similar, after the neural network training of each node is completed, the terminal device may upload the model parameters obtained by the neural network training of each node to the parameter server of the distributed network model.

Step S42: and updating the model parameters of the distributed network model according to the uploaded model parameters.

And the terminal equipment updates the model parameters of the distributed network model according to the uploaded model parameters. In order to improve the accuracy of the model, after the parameter server receives the model parameters, the parameter server may perform parameter averaging on the model parameters of each node, synchronously update the average parameters to each node, and continue iterative training. Optionally, the parameter server may have an average period, so as to perform synchronous updating of the parameters when the number of iterations reaches the average period, thereby increasing the rate of model training.

In the embodiment of the application, the terminal equipment uploads model parameters obtained by training the neural network of each node to a distributed network model; and updating the model parameters of the distributed network model according to the uploaded model parameters. By the method of the embodiment, the terminal equipment can perform parameter averaging on the model parameters of each node and synchronously update the average parameters to each node, so that the convergence of the neural network is accelerated, and the accuracy of the model is effectively improved.

Referring to fig. 5, fig. 5 is a schematic flowchart of a fifth embodiment of a model training method provided in the present application. As shown in fig. 5, step S13 further includes the steps of:

step S51: the neural network that assigns the training data set to each node is trained to obtain a training loss value.

After a distributed network model comprising a plurality of nodes is established by the terminal equipment, a training data set is distributed to the neural network of each node for training, the node can output a training loss value during each training, and the terminal equipment acquires the training loss value during each training.

Specifically, in the model training process, an optimization algorithm can be adopted to enable the model to continuously approach the optimal solution, and the training effect of the model is optimized. Optionally, the optimization algorithm may use a gradient descent method, or may use Local Lipschitz function, least square method, or the like, where the gradient descent method is taken as an example to describe the model training process.

After a distributed network model comprising a plurality of nodes is established by the terminal equipment, a random initial parameter is set for the neural network of each node, a training data set is distributed to the neural network of each node for training, the neural network is iterated according to the training data set, a loss function is calculated during each iteration, and the parameter is updated according to the gradient descending direction, so that the node is close to the optimal solution continuously.

Step S52: and when the training loss value is larger than a preset loss threshold value, obtaining model parameters of the neural network by adopting a Cauchy variation optimization algorithm.

In the iteration process, if a large number of local extreme values exist in the node, the gradient descending direction does not point to the optimal solution any more, so that the model falls into the local optimal solution, and the training effect of the model is poor.

Therefore, after the terminal equipment obtains the training loss value, whether the training loss value is larger than a preset loss threshold value is judged. When the training loss value is larger than the preset loss threshold value, the terminal equipment adopts a Cauchy variation optimization algorithm to obtain model parameters of the neural network, the Cauchy variation optimization algorithm can disturb the current position of an individual, the global search capability is improved, and the node is not easy to fall into a local optimal solution in the training process. As shown in the following formula:

M_i＝M_i+sign(1+η)×C(x)；

wherein M is_iRepresents the current position of the individual, C (x) represents the Cauchy variation, sign (1+ eta) represents the coefficient of Cauchy variation.

In the embodiment of the application, the terminal equipment distributes a training data set to a neural network on each node for training so as to obtain a training loss value; and when the training loss value is larger than a preset loss threshold value, obtaining model parameters of the neural network by adopting a Cauchy variation optimization algorithm. By the method, the global search capability of the model training process can be improved, so that the node is not easy to fall into a local optimal solution in the training process, and the convergence accuracy of the model is improved.

Referring to fig. 6, fig. 6 is a schematic flowchart of a sixth embodiment of a model training method provided in the present application. As shown in fig. 6, step S13 further includes the steps of:

step S61: the neural network that assigns the training data set to each node is trained to obtain a training loss value.

Step S61 is the same as step S51, and is not repeated here.

Step S62: and when the training loss value is larger than a preset loss threshold value, acquiring model parameters of the neural network by adopting a forced exploration strategy.

When the node carries out algorithm optimization solution, the individual structure diversity of the training data set needs to be ensured, and the searching is continuously carried out. If the individual diversity of the training data set is lost, the individual close to the optimal solution is easy to fall into the same extreme value with other individuals and stops evolving, and then the model falls into the local optimal solution. And due to the limitation of the algorithm, the node is not easy to jump out of the range of the local optimal solution, namely the condition of premature convergence occurs.

Therefore, after the terminal equipment obtains the training loss value, whether the training loss value is larger than a preset loss threshold value is judged. When the training loss value is greater than the preset loss threshold value, the terminal equipment acquires the model parameters of the neural network by adopting a forced exploration strategy, and continuously explores in the upper and lower limit areas to maintain the diversity of the training data set, as shown in the following formula:

wherein, X_maxRepresenting the sample individual furthest in the training dataset; x_minRepresenting the nearest sample individual in the training dataset; r is a value of (0, 1)]The random number of (2); ub is the maximum in the exploration space; lb is the minimum value of the exploration space; q is a division limit, when q is more than or equal to 0.5, the terminal equipment searches the upper limit area, and when q is more than or equal to 0.5, the terminal equipment searches the upper limit area<At 0.5, the terminal device will search in the lower limit area.

In the embodiment of the application, the terminal equipment distributes a training data set to a neural network on each node for training so as to obtain a training loss value; and when the training loss value is larger than a preset loss threshold value, acquiring model parameters of the neural network by adopting a forced exploration strategy. By the method, a forced exploration strategy is used when algorithm optimization is carried out in the training process, so that the premature convergence state is avoided, the algorithm is prevented from falling into a local optimal solution, the training effect of the model is improved, and the accuracy of the model is further improved.

Optionally, step S12 further includes the steps of: and building a support vector machine and/or a Bayesian network in the distributed network model.

Specifically, a Support Vector Machine (SVM) model is built in the distributed network model. The problem of SVM parameter setting is particularly critical in the process of building a support vector machine model. Alternatively, taking Radial Basis Function (RBF) as an example, the change in its parameters is essentially a change in the complexity of the feature space projected by the SVM to the high dimension. When the kernel parameter σ is increased, the complexity of the projection space is reduced, and the linear divisibility is also reduced, and when σ tends to 0, the complexity of the feature space tends to be infinite, and at this time, although any data can be mapped into linear divisibility, a serious overfitting problem is caused. Therefore, correct kernel parameters need to be set for the data set so as to be matched with the distribution of the data samples, and the classification effect of the trained SVM is naturally better. In order to improve the performance of the support vector machine model, the common methods for setting the hyper-parameters are roughly divided into two types:

one is to perform a grid search in the parameter definition domain space, and take the parameter with the minimum error as an output result, such as cross validation. Another is to employ heuristic optimization. The first method has high stability and accurate parameter estimation, but has large calculation amount and high complexity, and the performance is not good when the number of samples is large. In contrast, the second method can find the optimal parameter more quickly. Therefore, a bionic optimization algorithm is introduced, and the sample distribution is directly evaluated in a high-dimensional feature space, so that the SVM model is prevented from being repeatedly calculated, and the classification effect of the model is improved. The algorithm model after the support vector machine and the bionic optimization algorithm are fused is the algorithm model for performing the distributed operation subsequently.

In the embodiment, in order to improve the processing capacity of the model on the large data set, a bayesian network is built in the distributed network model, and the optimal classification category is selected by calculating the conditional probability and the misjudgment loss, so that the accuracy of the model is improved.

In the embodiment of the application, the terminal equipment builds a support vector machine and/or a Bayesian network in the distributed network model so as to improve the processing capacity of the model on the large data set and improve the accuracy of the model.

Optionally, after the model is built, the terminal device may calculate the accuracy and stability of the model by calculating the quality of the evaluation index function evaluation model on the test set to predict the result. In a specific embodiment, the evaluation index function may be a Root Mean Square Error (RMSE) of a regression algorithm, as shown in the following equation:

wherein, y_iThe actual value on the test set;

and inputting the predicted value after the test set for the model.

The evaluation index function may also be a prediction evaluation index for measuring the model effect from the loss value, such as a Mean Absolute Error (MAE), a Mean Absolute Percentage Error (MAPE), and a prediction validity (FVD), and the evaluation index function is not specifically limited herein.

Optionally, the model obtained by training the model training method according to any of the above embodiments may be applied to breast cancer prediction. Specifically, the image data set is input into the model to obtain the prediction result output by the model, the evaluation index of the prediction, and the staff can further evaluate the disease probability of the image data set based on the evaluation index of the obtained prediction result. Due to the fact that distributed operation is conducted on the algorithm model, the time required by breast cancer prediction can be effectively shortened, and the disease prediction efficiency is improved.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an embodiment of a terminal device provided in the present application. The terminal device comprises a memory 52 and a processor 51 connected to each other.

The memory 52 is used to store program instructions for implementing the model training method of any of the above. The processor 51 is operative to execute program instructions stored in the memory 52.

The processor 51 may also be referred to as a CPU (Central Processing Unit). The processor 51 may be an integrated circuit chip having the processing capability for signaling. The processor 51 may also be a general purpose processor, a Digital Signaling Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 52 may be a memory bank, a TF card, etc., and may store all information in the terminal device, including the input raw data, the computer program, the intermediate operation result, and the final operation result. It stores and retrieves information based on the location specified by the controller. With the memory, the string matching prediction device has a memory function, and normal operation can be guaranteed. The memory of the string matching prediction device can be classified into a main memory (internal memory) and an auxiliary memory (external memory) according to the use, and also into an external memory and an internal memory. The external memory is usually a magnetic medium, an optical disk, or the like, and can store information for a long period of time. The memory refers to a storage component on the main board, which is used for storing data and programs currently being executed, but is only used for temporarily storing the programs and the data, and the data is lost when the power is turned off or the power is cut off.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a system server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method of the embodiments of the present application.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided in the present application. As shown in fig. 8, the computer-readable storage medium 110 is used for storing program data 111, which when executed by a processor is used to implement the model training method as described in the above embodiments.

The unit in which the functional units in the embodiments of the present application are integrated may be stored in the computer-readable storage medium 110 if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, and the computer-readable storage medium 110 includes several instructions for enabling a computer device (which may be a personal computer, a system server, or a network device, etc.), an electronic device (such as MP3, MP4, etc., and may also be a mobile terminal such as a mobile phone, a tablet computer, a wearable device, etc., or a desktop computer, etc.), or a processor (processor) to execute all or part of the steps of the method of the embodiments of the present application.

Optionally, in an embodiment, the program data 111, when executed by the processor, is configured to implement the following method: acquiring a plurality of face images; determining the similarity between the unclassified first face image and the rest face images; in response to the existence of at least one image with the similarity of the first face image being greater than or equal to a set first similarity threshold, classifying the first face image and the at least one image into a class of images; and in response to the fact that no image with the similarity greater than or equal to a set first similarity threshold value exists, determining a second face image with the maximum similarity to the first face image, and classifying the first face image and/or the second face image according to the similarity of the first face image and the second face image.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media 110 (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It is to be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by the computer-readable storage medium 110. These computer-readable storage media 110 can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that program data 111, which is executed by the processor of the computer or other programmable data processing apparatus, produces means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer-readable storage media 110 may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the program data 111 stored in the computer-readable storage media 110 produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer-readable storage media 110 may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the program data 111 executing on the computer or other programmable apparatus provides steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one embodiment, these programmable data processing devices include a processor and memory thereon. The processor may also be referred to as a CPU (Central Processing Unit). The processor may be an electronic chip having signal processing capabilities. The processor may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may be a memory stick, TF card, etc. that stores and retrieves information based on the location specified by the processor. The memory is classified into a main memory (internal memory) and an auxiliary memory (external memory) according to the purpose, and also into an external memory and an internal memory. The external memory is usually a magnetic medium, an optical disk, or the like, and can store information for a long period of time. The memory refers to a storage component on the main board, which is used for storing data and programs currently being executed, but is only used for temporarily storing the programs and the data, and the data is lost when the power is turned off or the power is cut off.

The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application or are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A model training method is characterized in that the model is distributed on a plurality of nodes for training, and the model training method comprises the following steps:

reading a training data set;

2. Model training method according to claim 1,

the distributing the training data set to the neural network on each node for training comprises:

slicing the training data set to obtain training data subsets with the same number as the number of nodes;

and distributing a plurality of training data subsets to the neural network on the corresponding nodes for training.

3. The model training method according to claim 2,

after the neural network that distributes a plurality of training data subsets to corresponding nodes is trained, the model training method further includes:

after the training of the training data subsets distributed to the corresponding nodes is finished, distributing the training data subsets of the nodes to the neural networks of other nodes for continuous training;

and acquiring the training data subsets of other nodes, and continuing training the neural network of the node.

4. Model training method according to claim 1,

the model parameter training final distributed network model obtained based on the neural network training of each node comprises the following steps:

and sharing the model parameters obtained by training the neural network of each node to the neural networks of other nodes so as to update the neural networks of other nodes according to the model parameters of the neural networks of a plurality of nodes.

5. Model training method according to claim 1,

uploading model parameters obtained by training the neural network of each node to the distributed network model;

and updating the model parameters of the distributed network model according to the uploaded model parameters.

6. Model training method according to claim 1,

distributing the training data set to a neural network on each node for training to obtain a training loss value;

and when the training loss value is larger than a preset loss threshold value, acquiring model parameters of the neural network by adopting a Cauchy variation optimization algorithm.

7. Model training method according to claim 1 or 6,

and when the training loss value is larger than a preset loss threshold value, acquiring the model parameters of the neural network by adopting a forced exploration strategy.

8. Model training method according to claim 1,

the building of the distributed network model comprises the following steps:

and building a support vector machine and/or a Bayesian network in the distributed network model.

9. A terminal device, comprising a memory and a processor coupled to the memory;

wherein the memory is configured to store program data and the processor is configured to execute the program data to implement the model training method as claimed in any one of claims 1 to 8.

10. A computer-readable storage medium storing program instructions which, when executed, implement the model training method of any one of claims 1 to 8.