CN115527090A - Model training method, device, server and storage medium - Google Patents

Model training method, device, server and storage medium Download PDF

Info

Publication number
CN115527090A
CN115527090A CN202211029747.0A CN202211029747A CN115527090A CN 115527090 A CN115527090 A CN 115527090A CN 202211029747 A CN202211029747 A CN 202211029747A CN 115527090 A CN115527090 A CN 115527090A
Authority
CN
China
Prior art keywords
model
trained
client devices
training
gradient parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211029747.0A
Other languages
Chinese (zh)
Inventor
胡松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guang Dong Ming Chuang Software Technology Corp ltd
Original Assignee
Guang Dong Ming Chuang Software Technology Corp ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guang Dong Ming Chuang Software Technology Corp ltd filed Critical Guang Dong Ming Chuang Software Technology Corp ltd
Priority to CN202211029747.0A priority Critical patent/CN115527090A/en
Publication of CN115527090A publication Critical patent/CN115527090A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/95Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses a model training method, a model training device, a server and a storage medium. The method comprises the following steps: distributing a model to be trained to the plurality of client devices, and instructing the plurality of client devices to train the model to be trained based on respective image data; obtaining gradient parameters returned after the client devices respectively train the model to be trained so as to obtain a plurality of gradient parameters; a target model is obtained based on the plurality of gradient parameters. According to the method, the training of the model to be trained can be carried out by using the image data of the client by virtue of the client without collecting the image data to the server and then starting the model training, on one hand, the convenience of expanding a data set under a terminal cloud linkage scene is improved, so that a better deep learning model training effect is obtained by replacing a large data set, and further, the image quality evaluation effect of a deep neural network is improved; on the other hand, the privacy of the user is effectively protected while the user data is used.

Description

Model training method, device, server and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a model training method, an apparatus, a server, and a storage medium.
Background
With the development and popularity of deep learning, training a deep neural network has become a mainstream scheme for solving the problem of image quality evaluation, and the effect is greatly superior to that of a scheme without deep learning. The main development direction at present is to optimize a deep learning model or to optimize a data set. The scheme of the relevant optimization deep learning model is difficult to effectively extract relevant features of the image, and the interpretability is poor; the scheme for optimizing the data set is difficult to directly expand the scale of the data set due to the fact that the data set is manufactured, and manpower, material resources and financial resources are consumed, and therefore the image quality evaluation effect of the deep neural network is influenced.
Disclosure of Invention
In view of the foregoing problems, the present application provides a model training method, apparatus, server and storage medium to improve the foregoing problems.
In a first aspect, an embodiment of the present application provides a model training method, which is applied to a server, where the server is connected to multiple client devices, and the method includes: distributing a model to be trained to the plurality of client devices, and instructing the plurality of client devices to train the model to be trained based on respective image data; obtaining gradient parameters returned after the client devices respectively train the model to be trained so as to obtain a plurality of gradient parameters; a target model is obtained based on the plurality of gradient parameters.
In a second aspect, an embodiment of the present application provides a model training apparatus, which runs on a server, where the server is connected to a plurality of client devices, and the apparatus includes: the model distribution and training module is used for distributing a model to be trained to the plurality of client devices and instructing the plurality of client devices to train the model to be trained based on respective image data; the gradient parameter acquisition module is used for acquiring gradient parameters returned after the client devices respectively train the model to be trained so as to obtain a plurality of gradient parameters; a model acquisition module for acquiring a target model based on the plurality of gradient parameters.
In a third aspect, the present application provides a server comprising one or more processors and a memory; one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of the first aspect described above.
In a fourth aspect, the present application provides a computer-readable storage medium having program code stored therein, wherein the program code executes the method of the first aspect.
The method comprises the steps of distributing a model to be trained to a plurality of client devices and indicating the client devices to train the model to be trained based on respective image data; then obtaining gradient parameters returned by the client devices after the client devices respectively train the model to be trained so as to obtain a plurality of gradient parameters; a target model is then obtained based on the plurality of gradient parameters. Therefore, by means of the method, the training of the model to be trained can be achieved by using the image data of the client side without collecting the image data to the server and then starting the model training, on one hand, the convenience of expanding a data set under the terminal cloud linkage scene is improved, so that a large data set is used for replacing a better deep learning model training effect, and further, the image quality evaluation effect of the deep neural network is improved; on the other hand, the privacy of the user is effectively protected while the user data is used.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 shows an application environment diagram related to a model training method and apparatus provided by the embodiment of the present application.
Fig. 2 shows a flowchart of a model training method according to an embodiment of the present application.
Fig. 3 shows a flowchart of the method of step S130 in fig. 2.
Fig. 4 shows a flowchart of a model training method provided in another embodiment of the present application.
FIG. 5 is a flow chart illustrating a method for model training according to another embodiment of the present application.
FIG. 6 is a flow chart illustrating a model training method according to yet another embodiment of the present application.
Fig. 7 shows a block diagram of a model training apparatus according to an embodiment of the present application.
Fig. 8 shows a block diagram of a server for executing a model training method according to an embodiment of the present application.
FIG. 9 is a memory unit for storing or carrying program code for implementing a model training method according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
For the sake of easy understanding, terms referred to in the embodiments of the present application will be briefly described below.
(1) Federal Learning (Federated Learning, FL)
Federal learning is also known as Federal machine learning, joint learning, and Union learning. Federal learning is a machine learning framework of a distributed system based on cloud technology, and in the Federal learning framework, include server and a plurality of customer premise equipment, every customer premise equipment has respective training data in local storage, and all be provided with the model that the model architecture is the same in server and each customer premise equipment, carry out the training of machine learning model through the Federal learning framework, can effectively solve the data island problem, let the party jointly model on the basis of not shared data, can break the data island from the technique, realize AI cooperation.
With the development and popularity of deep learning, training a deep neural network has become a mainstream scheme for solving the problem of image quality evaluation, and the effect is greatly superior to that of a scheme without deep learning. The main development direction at present is to optimize a deep learning model or to optimize a data set. The scheme of the relevant optimization deep learning model is difficult to effectively extract relevant features of the image, and the interpretability is poor; the scheme for optimizing the data set is difficult to directly enlarge the scale of the data set due to the fact that the data set is manufactured, and manpower, material resources and financial resources are consumed, and therefore the image quality evaluation effect of the deep neural network is influenced.
Therefore, in order to improve the above problems, the inventors have studied for a long time and proposed a model training method, an apparatus, a server, and a storage medium provided by embodiments of the present application, which distribute a model to be trained to the plurality of client devices and instruct the plurality of client devices to train the model to be trained based on respective image data; then obtaining gradient parameters returned by the client devices after the client devices respectively train the model to be trained so as to obtain a plurality of gradient parameters; a target model is then obtained based on the plurality of gradient parameters. Therefore, the model to be trained can be trained by using the image data of the client side through the mode, the image data does not need to be collected to a server and then model training is started, on one hand, convenience of expanding a data set under a terminal cloud linkage scene is improved, a large data set is used for replacing a better deep learning model training effect, and further the image quality evaluation effect of a deep neural network is improved; on the other hand, the privacy of the user is effectively protected while the user data is used.
The following description is first made on an application environment related to the model training method and apparatus provided in the embodiment of the present application.
Referring to fig. 1, the application environment 10 shown in fig. 1 includes a server 100 and a plurality of client devices (only two of which are shown) 101. In some scenarios, the server 100 may be referred to as a central server (e.g., a data center), optionally, the server 100 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), and a big data and artificial intelligence platform. The client device 101 may be a mobile phone, a tablet computer, a PC computer, an intelligent wearable device, or other devices with a mobile communication function. The server 100 and the client device 101 may be directly or indirectly connected through a communication manner of a wired network or a wireless network, and a specific network connection manner may not be limited.
Embodiments of the present application will be described in detail below with reference to the accompanying drawings.
Referring to fig. 2, a flowchart of a model training method according to an embodiment of the present application is shown. The embodiment of the application provides a model training method, which is applied to a server, wherein the server is connected with a plurality of client devices, and the client devices can be electronic devices which can run application programs, such as smart phones, tablet computers, PC computers and the like, and the method comprises the following steps:
step S110: distributing a model to be trained to the plurality of client devices, and instructing the plurality of client devices to train the model to be trained based on respective image data.
Because federal learning is not enough in the actual application of mobile terminals (i.e. client devices), the model to be trained in the embodiment of the present application may be a model with a small volume and a small required computation amount, for example, the model to be trained may be a shufflenet model or squeezenet model. The model to be trained can be specified by the server, initialized at the server (i.e., the model to be trained is loaded into the training frame), and then distributed to the plurality of client devices.
In the same round of training, the models to be trained distributed to each client device in the plurality of client devices are the same models, and in different rounds of training, the models to be trained distributed to the same client device are different models. Each of the plurality of client devices is instructed to train the model to be trained based on respective image data (which may be understood as image data local to each client device, and optionally image data local to different client devices is different). Alternatively, the source of the image data may not be limited, and may be, for example, a picture taken by the client device, a picture downloaded from a network, a screenshot, or a picture from another source.
Optionally, in the actual federal learning training, the situation of the client device participating in the training may change instantly due to the influence of the network, communication and other factors. In general, situations such as joining in midway, exiting midway, failing to complete contact and training, failing to participate in each round of training, and the like may occur, and therefore, in the embodiment, the client device to be selected may be acquired; and then, acquiring a specified number of client devices from the client devices to be selected as a plurality of client devices. The number of the client devices to be selected is greater than or equal to the specified number. As an implementation, a plurality of client nodes may be simulated in the program, each client node represents a client device, and the training data set is divided into several parts, and it is set that a certain client device will have the divided pictures, that is, only this client node can use these pictures for training. Suppose is provided with A 1 、A 2 ……A n A total of n client devices (i.e., client devices to be selected) may be selected from { A } 1 、A 2 ……A n Arbitrarily select among
Figure BDA0003816391220000061
N in total k A plurality of client devices as a plurality of client devices.
Wherein the training data set is split, assuming client device A i Are finally divided into
Figure BDA0003816391220000062
Total m i It can be understood that in a real scene or a data set, the pictures may have various conditions such as non-uniform sizes and non-uniform formats. To is coming toThe image is conveniently trained and bicubic interpolation may be used to reset the image to an RGB image of W x H pixels in size. Further, to prevent overfitting, the transformed picture may be randomly cropped to finally retain W 1 *H 1 The size of (2). Wherein, W 1 、H 1 Can be set to any size that can smoothly pass through the neural network, for example, if the model to be trained uniformly uses the shufflenet or the resnet50, then W 1 =H 1 =224. The picture after reset and random cropping is set as
Figure BDA0003816391220000063
Then client device a i Can be used as picture
Figure BDA0003816391220000064
And training the model to be trained.
In the process that the client device trains the model to be trained based on respective image data, in order to ensure that the training process is not interrupted due to power shortage and avoid the influence of power consumption on the normal use of the client device, the client device can be plugged in a power supply.
Step S120: and obtaining gradient parameters returned after the client devices respectively train the model to be trained so as to obtain a plurality of gradient parameters.
Each client device may return a gradient parameter after performing one round of training on the model to be trained, optionally, the client device may obtain the gradient parameter after performing gradient descent iteration on the model to be trained by using local image data, and it is assumed that G is used k Representing a gradient parameter, then a gradient parameter can be obtained
Figure BDA0003816391220000065
Figure BDA0003816391220000071
N in total k Gradient parameters returned by each client device. In one embodiment, the gradient parameters may be understood as the model after the current round of trainingGradient decreased relative to the model after the previous round of training.
Step S130: a target model is obtained based on the plurality of gradient parameters.
The target model may be understood as a model obtained after the training is completed (for example, the target model may be an image quality evaluation model), and optionally, if the training rounds are set in advance, the target model is a model obtained after all the rounds of training are completed.
Referring to fig. 3, as an alternative, step S130 may include:
step S131: and carrying out weighted average on the gradient parameters and the number of the images corresponding to each client device to obtain an average gradient.
As one way, in a round of training, a plurality of gradient parameters and the number of images corresponding to each client device may be weighted and averaged to obtain an average gradient. For example, assume n k The number of pictures respectively owned by each client device is
Figure BDA0003816391220000072
Then the average gradient
Figure BDA0003816391220000073
Can be as follows:
Figure BDA0003816391220000074
step S132: and acquiring a target model based on the average gradient and the model to be trained.
As one way, assume that the current training round is the kth (k is a value between 0 and the total training round, k is an integer) round, and the corresponding model is M k Then the model to be trained is the model M generated by the server after the last round of training k-1 If the current training round is the first round, the model to be trained is M 0 (i.e., the model immediately after the server has completed initialization), in this manner, model M can be found as follows k
Figure BDA0003816391220000075
Optionally, after all rounds of training are completed, the model obtained according to the above formula may be used as the target model, where n is k Is an integer and can have a value in the range of 0.2 n-n k ≦ 0.3n (n is the total number of client devices). ,
according to the model training method provided by the application, the model to be trained is distributed to the plurality of client devices, and the plurality of client devices are instructed to train the model to be trained based on respective image data; then obtaining gradient parameters returned by the client devices after the client devices respectively train the model to be trained so as to obtain a plurality of gradient parameters; a target model is then obtained based on the plurality of gradient parameters. Therefore, by means of the method, the training of the model to be trained can be achieved by using the image data of the client side without collecting the image data to the server and then starting the model training, on one hand, the convenience of expanding a data set under the terminal cloud linkage scene is improved, so that a large data set is used for replacing a better deep learning model training effect, and further, the image quality evaluation effect of the deep neural network is improved; on the other hand, the privacy of the user is effectively protected while the user data is used.
Referring to fig. 4, a flowchart of a model training method according to another embodiment of the present application is shown. The embodiment of the application provides a model training method, which is applied to a server, wherein the server is connected with a plurality of client devices, and the method comprises the following steps:
step S211: and if the training is the first round, distributing the initial model serving as a model to be trained to the plurality of client devices, and instructing the plurality of client devices to train the model to be trained based on respective image data.
By one approach, in distributing models to multiple client devices, if it is a first round of training, the initial model may be treated as the first round of trainingAnd distributing the model to be trained to a plurality of client devices, and instructing the plurality of client devices to train the model to be trained based on respective image data. Wherein, the initial model is a server-specific model, which may be M described in the foregoing embodiments 0
In one implementation mode, the server can be modified aiming at the existing industrial-level framework FedML, the client can face to an Android client and is built from zero, the http protocol is used For connection between the client and the server, the mqtt protocol is used For communication, and the Deep Learning For Java (DL 4J) framework which is easy to develop in Android Studio is used For training. After the server specifies the learning rate and the training round (in this embodiment, the training round may be 50 rounds, and specific values may be adjusted according to actual needs), the total number of client devices, the number of clients required for each round, the model type, and other training hyper-parameters, the specified model type (i.e., the model to be trained) may be loaded to the training frame (i.e., the model to be trained is initialized), and after the server completes the initialization, the server may specify an IPv4 address. The client accesses the IPv4 address using http protocol, so that the server records the user, who obtains the trained hyper-parameters and the kind of model, and performs initialization (initialization here can be understood as registering the client with the server).
After the client registers in the server, the client can send an initialization application to the server, the server sends the initial model used for training to the client, and the training set of the client uses a local training set, so that the client does not need to apply for a data set from the server, and meanwhile, the server does not need to actively send data to the client.
Step S212: if the previous round of training is not the first round of training, obtaining the previous round of gradient parameters returned by the plurality of client devices after the previous round of training is finished so as to obtain a plurality of previous round of gradient parameters.
As another way, if the previous round of training is not the first round of training, the server may obtain a previous round of gradient parameters returned by the multiple client devices after the previous round of training is finished, so as to obtain multiple previous rounds of gradient parameters, and the specific obtaining principle and process of the gradient parameters may refer to the description in the foregoing embodiment, which is not described herein again.
Step S213: determining a previous round model based on the plurality of previous round gradient parameters.
The principle of determining the model in the previous round is similar to that of determining the target model described in the previous embodiment, and is not described herein again.
Step S214: and distributing the previous round of model as a model to be trained to the plurality of client devices, and instructing the plurality of client devices to train the model to be trained based on respective image data.
In this way, in addition to the first round of training, the previous round of model may be distributed to a plurality of client devices as a model to be trained, and by continuously performing iterative training on the previous round of model (which means repeating the process of distributing the previous round of model to be trained to the client devices as the model to be trained, for the current round of training, the model to be trained is the previous round of model, and for the next round of training, the model to be trained is the model generated in the current round), network parameters in the model may be adjusted to optimize the performance of the model, thereby improving the accuracy of image quality evaluation through the model.
As a manner, in a process of instructing a plurality of client devices to train a model to be trained based on respective image data, the plurality of client devices may be instructed to input the respective image data into the model to be trained to obtain a first quality score, so that the plurality of client devices obtain a first label score determined based on the respective image data; updating the parameters of the model to be trained by using a gradient descent method by taking the absolute value of the difference between the first quality score and the first label score as a loss function; and obtaining gradient parameters based on the model to be trained and the updated model to be trained.
As an embodiment, with client device B of the plurality of client devices ki For example, assume that the current training round is the kth round, and client device B ki Corresponding image data is
Figure BDA0003816391220000091
Then client device B may be instructed ki Will be provided with
Figure BDA0003816391220000101
Inputting the model to be trained (i.e. the model of the previous round, M) in sequence k-1 ) Calculating the total score of the image data, and setting p ij The score calculated by the network is s ij The label is marked with a score t ij L1loss, i.e. s, can be used ij And t ij Absolute value of difference as loss function, to model M k-1 Adopting a gradient descent method to update, and taking the model obtained after updating as M ki (where i represents the layer number in the model and k represents the training round). Can be combined with M ki And M (k-1)i The difference, i.e. the gradient G which decreases after such training ki And transmitting back to the server.
Step S220: and acquiring gradient parameters returned after the plurality of client devices respectively train the model to be trained in turn so as to obtain a plurality of gradient parameters.
In this embodiment, the gradient parameters returned by the same client device in each turn are different, so that, as a manner, the gradient parameters returned by multiple client devices after training the model to be trained respectively in a turn can be obtained to obtain multiple gradient parameters.
In the embodiment of the application, the client device can directly pack the gradient parameters into JsonString according to the parameter format (one-dimensional array) of DL4J, label the message type, and transmit the message type to the server. The server may also transmit to the client in the same manner.
Step S230: a target model is obtained based on the plurality of gradient parameters.
After receiving the preset gradient parameters returned by a certain number (assumed to be m) of client devices, the server may obtain the corresponding models to obtain a plurality of models, and then start federal averaging. The obtained models are respectively M1 and M2 \8230; \8230andMm, and the quantity of the image data corresponding to each client device is a1, a2 \8230; \8230am, and the averaged model is (M1 a1+ M2 a2+ \8230; + Mm \)/(a 1+ a2 \8230; + 8230 + am).
In the embodiment of the application, the server may start federal averaging when the gradient parameters are not completely received because the server and the client device may be interrupted. And after interrupting a plurality of rounds, the client device can retrieve the current model parameters, the training rounds and other hyper-parameters from the server based on the mqtt protocol.
According to the model training method provided by the application, if the training is the first round, an initial model is used as a model to be trained and distributed to the plurality of client devices; if the previous round of training is not the first round of training, acquiring the previous round of gradient parameters returned by the plurality of client devices after the previous round of training is finished so as to obtain a plurality of previous round of gradient parameters; determining a previous round model based on the previous round gradient parameters; distributing the previous round of models serving as models to be trained to the plurality of client devices; acquiring gradient parameters returned after the plurality of client devices respectively train the model to be trained in turn so as to obtain a plurality of gradient parameters; a target model is then obtained based on the plurality of gradient parameters. Therefore, the model to be trained can be trained by using the image data of the client side through the mode, the image data does not need to be collected to a server and then model training is started, on one hand, convenience of expanding a data set under a terminal cloud linkage scene is improved, a large data set is used for replacing a better deep learning model training effect, and further the image quality evaluation effect of a deep neural network is improved; on the other hand, the privacy of the user is effectively protected while the user data is used.
Meanwhile, the network parameters in the model can be adjusted by continuously performing iterative training on the previous model so as to optimize the performance of the model and further improve the accuracy of image quality evaluation through the model.
Referring to fig. 5, a flowchart of a model training method according to another embodiment of the present application is shown. The embodiment of the application provides a model training method, which is applied to a server, wherein the server is connected with a plurality of client devices, and the method comprises the following steps:
step S310: distributing a model to be trained to the plurality of client devices, and instructing the plurality of client devices to train the model to be trained based on respective image data.
Step S320: and obtaining gradient parameters returned after the client devices respectively train the model to be trained so as to obtain a plurality of gradient parameters.
Step S331: and if the training times reach the designated times, taking the model obtained based on the gradient parameters as a target model.
In one mode, the number of training rounds may be understood as a number of training rounds, and in this mode, if the number of training rounds reaches a specified number (a specific value may not be limited, and may be, for example, a value of 20, 30, or 50), it may be determined that the number of training rounds is sufficient, and a model obtained based on a plurality of gradient parameters may be used as a target model.
As another mode, the training frequency may be understood as the frequency of iterative training of the client device on the model to be trained, in this mode, the value of the designated frequency may be a numerical value such as 2, 3, 5, or 8, and the specific numerical value may not be limited, and if the training frequency reaches the designated frequency and the training round reaches the total training round, the model obtained based on the multiple gradient parameters may be used as the target model.
Step S332: if the training times do not reach the designated times, the step of distributing the model to be trained to the plurality of client devices and instructing the plurality of client devices to train the model to be trained on the basis of respective image data is repeatedly executed, wherein the model to be trained sent each time is generated on the basis of gradient parameters returned by the client devices after the last training is finished.
In an implementation manner, if the training times are understood as the training rounds, if the training times do not reach the specified times, it may be determined that the training rounds are insufficient, and the steps of distributing the model to be trained to the plurality of client devices and instructing the plurality of client devices to train the model to be trained based on respective image data need to be repeatedly performed, where it needs to be noted that the model to be trained sent each time is generated based on the gradient parameter returned by the client device after the last training is completed. The single client device performs one round of training in one round of global training, that is, all image data corresponding to the client device are put into the model to be trained and trained for 1 time only.
In another implementation, if the training frequency is understood as the frequency of iterative training performed by the client device on the model to be trained, if the training frequency does not reach the specified frequency, the client device may be instructed to perform iterative training on the model to be trained in one round of training (that is, the same client device performs iterative training on the model to be trained for the specified frequency, and returns the gradient parameter to the server after the training is completed), until the training frequency reaches the specified frequency and the training round reaches the total training round, the model obtained based on the multiple gradient parameters is taken as the target model.
Step S333: a target model is obtained based on the plurality of gradient parameters.
According to the model training method provided by the application, the model to be trained is distributed to the plurality of client devices, and the plurality of client devices are instructed to train the model to be trained based on respective image data; obtaining gradient parameters returned by the client devices after the client devices respectively train the model to be trained so as to obtain a plurality of gradient parameters; then if the training times reach the designated times, taking the model obtained based on the gradient parameters as a target model; if the training times do not reach the designated times, the step of distributing the model to be trained to the plurality of client devices and instructing the plurality of client devices to train the model to be trained on the basis of respective image data is repeatedly executed, wherein the model to be trained sent each time is generated on the basis of gradient parameters returned by the client devices after the last training is finished; a target model is then obtained based on the plurality of gradient parameters. Therefore, the model to be trained can be trained by using the image data of the client side through the mode, the image data does not need to be collected to a server and then model training is started, on one hand, convenience of expanding a data set under a terminal cloud linkage scene is improved, a large data set is used for replacing a better deep learning model training effect, and further the image quality evaluation effect of a deep neural network is improved; on the other hand, the privacy of the user is effectively protected while the user data is used.
Meanwhile, by judging whether the training times reach the specified times or not, and under the condition that the training times reach the specified times, the model obtained based on the multiple gradient parameters is used as the target model, so that the accuracy of the obtained model can be improved.
Referring to fig. 6, a flowchart of a model training method according to still another embodiment of the present application is shown. The embodiment of the application provides a model training method, which is applied to a server, wherein the server is connected with a plurality of client devices, and the method comprises the following steps:
step S410: distributing a model to be trained to the plurality of client devices, and instructing the plurality of client devices to train the model to be trained based on respective image data.
Step S420: and obtaining gradient parameters returned after the client devices respectively train the model to be trained so as to obtain a plurality of gradient parameters.
Step S430: a target model is obtained based on the plurality of gradient parameters.
Step S440: test image data is acquired.
The content of the test image data may be the same as or different from the content of the image data in the training data set. The test image data may be used to measure the image quality evaluation effect of the target model. As one way, the test image data may be downloaded from a network, or image data taken by the client device may be acquired as the test image data.
Optionally, in order to facilitate the test image data to smoothly pass through the network channel in the target model, the test image data may be preprocessed (i.e., resized and randomly cropped) as described in the foregoing embodiment, and it is assumed that the test image data obtained after preprocessing is C pictures.
Step S450: and inputting the test image data into the target model to obtain a second quality score.
As one way, C pictures may be input into the target model to obtain the second quality score, assuming that the second quality score is t 1 、t 2 ……t C
Step S460: a second label score determined based on the test image data is obtained.
Optionally, the score marking may be performed in advance based on the test image data to obtain a second label score, and it is assumed that the second label score is t 1 '、t 2 '……t C '。
Step S470: and obtaining a model evaluation parameter of the target model based on the second quality score and the second label score so as to verify the image quality evaluation effect of the target model.
The model evaluation parameter may include a Linear Correlation Coefficient (PLCC) and a Spearman Rank Correlation Coefficient (SRCC), and may be used to verify an image quality evaluation effect of the target model.
As one approach, t may be calculated 1 、t 2 ……t C And t 1 '、t 2 '……t C The linear correlation coefficient PLCC and the spearman rank correlation coefficient SRCC of the' are not described in detail herein, and a final model evaluation result can be obtained.
In a specific embodiment, in order to confirm the effectiveness of the target model, 3 mobile devices which are networked, have sim cards inserted therein and simultaneously run other applications in the background participate in training, and the systems of the 3 mobile devices are respectively android 7.0, android 10.0 and hong meng 2.1. 666, 668 pictures from the SPAQ data set and corresponding labels are placed on each mobile device. Meanwhile, the server uses 1000 pictures as a verification set. Because the mobile terminal is weak, a network is built on DL4J according to the squeezenet v1.1 of the pytorech itself, and the size of minibtch (i.e. the number of images per batch) is 6 during training. At the same time, a program that does not use federal learning, directly trains a single model (to compare the effect of using federal learning) was run on a computer using the DL4J framework, with the data set and other conditions identical. Finally, it turns out that the result of not using federal learning is a linear correlation coefficient PLCC equal to 0.8203, whereas the correlation coefficient PLCC obtained using this system is equal to 0.8031. And any abnormal exit or memory limitation condition does not occur in the system operation process. Thereby verifying the validity of the target model.
Wherein, to ensure that cross-platform problems do not arise (e.g., from PyTorch to DL 4J), the server still uses the DL4J framework for evaluation. Therefore, java codes for the evaluation model are completed in advance and compiled into jar files, and the jar files are directly called in python, so that the evaluation can be completed.
According to the model training method provided by the application, the model to be trained is distributed to the plurality of client devices, and the plurality of client devices are instructed to train the model to be trained based on respective image data; then obtaining gradient parameters returned by the client devices after the client devices respectively train the model to be trained so as to obtain a plurality of gradient parameters; then obtaining a target model based on the plurality of gradient parameters; and then obtaining test image data, inputting the test image data into the target model to obtain a second quality score, obtaining a second label score determined based on the test image data, and obtaining model evaluation parameters of the target model based on the second quality score and the second label score to verify the image quality evaluation effect of the target model. Therefore, the model to be trained can be trained by using the image data of the client side through the mode, the image data does not need to be collected to a server and then model training is started, on one hand, convenience of expanding a data set under a terminal cloud linkage scene is improved, a large data set is used for replacing a better deep learning model training effect, and further the image quality evaluation effect of a deep neural network is improved; on the other hand, the privacy of the user is effectively protected while the user data is used.
Referring to fig. 7, an embodiment of the present application provides a model training apparatus 500, operating on a server, where the server is connected to a plurality of client devices, and the apparatus 500 includes:
a model distribution and training module 510, configured to distribute a model to be trained to the multiple client devices, and instruct the multiple client devices to train the model to be trained based on respective image data.
As an implementation manner, the model distribution and training module 510 may be configured to, if it is the first round of training, distribute the initial model as the model to be trained to the plurality of client devices. If the previous round of training is not the first round of training, acquiring the previous round of gradient parameters returned by the plurality of client devices after the previous round of training is finished so as to obtain a plurality of previous round of gradient parameters; determining a previous round model based on the plurality of previous round gradient parameters; and distributing the previous round of model as a model to be trained to the plurality of client devices.
Optionally, the model distribution and training module 510 may be configured to instruct the plurality of client devices to input respective image data into the model to be trained, so as to obtain a first quality score, which is used for the plurality of client devices to obtain a first label score determined based on the respective image data; taking the absolute value of the difference between the first quality score and the first label score as a loss function, and updating the parameters of the model to be trained by adopting a gradient descent method; and obtaining gradient parameters based on the model to be trained and the updated model to be trained.
Optionally, the apparatus 500 may further include a client device selection module, configured to obtain a client device to be selected before distributing the model to be trained to the plurality of client devices and instructing the plurality of client devices to train the model to be trained based on respective image data; and acquiring a specified number of client devices from the client devices to be selected as a plurality of client devices.
A gradient parameter obtaining module 520, configured to obtain gradient parameters returned by the multiple client devices after training the model to be trained, so as to obtain multiple gradient parameters.
As an implementation manner, the gradient parameter obtaining module 520 may be configured to obtain gradient parameters returned after the plurality of client devices respectively train the model to be trained in turn, so as to obtain a plurality of gradient parameters.
A model obtaining module 530 for obtaining a target model based on the plurality of gradient parameters.
As one manner, the model obtaining module 530 may be configured to perform weighted average on the multiple gradient parameters and the number of images corresponding to each client device, so as to obtain an average gradient; and acquiring a target model based on the average gradient and the model to be trained.
Optionally, the apparatus 500 may further include a training frequency determining module, configured to, before obtaining the target model based on the multiple gradient parameters, if the training frequency reaches a specified frequency, take the model obtained based on the multiple gradient parameters as the target model; if the training times do not reach the specified times, the step of distributing the model to be trained to the plurality of client devices and instructing the plurality of client devices to train the model to be trained based on respective image data is repeatedly executed, wherein the model to be trained sent each time is generated based on the gradient parameters returned by the client devices after the last training is finished.
Optionally, the apparatus 500 may further comprise a model testing module for obtaining test image data after obtaining the target model based on the plurality of gradient parameters; inputting the test image data into the target model to obtain a second quality score; obtaining a second label score determined based on the test image data; and obtaining a model evaluation parameter of the target model based on the second quality score and the second label score so as to verify the image quality evaluation effect of the target model.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In several embodiments provided in the present application, the coupling of the modules to each other may be electrical, mechanical or other forms of coupling.
In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
A server provided by the present application will be described below with reference to fig. 8.
Referring to fig. 8, based on the above model training method and apparatus, the embodiment of the present application further provides a server 100 capable of executing the model training method. The server 100 includes one or more processors 104 (only one shown) and a memory 102 coupled to each other. The memory 102 stores therein a program that can execute the contents of the foregoing embodiments, and the processor 104 executes the program stored in the memory 102, where the memory 102 includes the apparatus 500 described in the foregoing embodiments.
The processor 104 may include one or more processing cores, among other things. The processor 104, using various interfaces and lines to connect various parts throughout the server 100, performs various functions of the server 100 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 102, and calling data stored in the memory 102. Alternatively, the processor 104 may be implemented in at least one hardware form of Digital Signal Processing (DSP), field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 104 may integrate one or more of a Central Processing Unit (CPU), a video image processor (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 104, but may be implemented by a communication chip.
The Memory 102 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 102 may be used to store instructions, programs, code sets, or instruction sets. The memory 102 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, a video image playing function, etc.), instructions for implementing the various method embodiments described above, and the like. The storage data area may also store data created by the server 100 in use (such as phone books, audio and video data, chat log data), and the like.
Referring to fig. 9, a block diagram of a computer-readable storage medium according to an embodiment of the present application is shown. The computer-readable medium 600 has stored therein a program code that can be called by a processor to execute the method described in the above-described method embodiments.
The computer-readable storage medium 600 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Alternatively, the computer-readable storage medium 600 includes a non-volatile computer-readable storage medium. The computer readable storage medium 600 has storage space for program code 610 for performing any of the method steps of the method described above. The program code can be read from and written to one or more computer program products. The program code 610 may be compressed, for example, in a suitable form.
According to the model training method, the model training device, the server and the storage medium, the model to be trained is distributed to the plurality of client devices, and the plurality of client devices are instructed to train the model to be trained based on respective image data; then obtaining gradient parameters returned after the client devices respectively train the model to be trained so as to obtain a plurality of gradient parameters; a target model is then obtained based on the plurality of gradient parameters. Therefore, the model to be trained can be trained by using the image data of the client side through the mode, the image data does not need to be collected to a server and then model training is started, on one hand, convenience of expanding a data set under a terminal cloud linkage scene is improved, a large data set is used for replacing a better deep learning model training effect, and further the image quality evaluation effect of a deep neural network is improved; on the other hand, the privacy of the user is effectively protected while the user data is used.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. A model training method is applied to a server, wherein the server is connected with a plurality of client devices, and the method comprises the following steps:
distributing a model to be trained to the plurality of client devices, and instructing the plurality of client devices to train the model to be trained based on respective image data;
obtaining gradient parameters returned after the client devices respectively train the model to be trained so as to obtain a plurality of gradient parameters;
a target model is obtained based on the plurality of gradient parameters.
2. The method of claim 1, wherein said obtaining a target model based on said plurality of gradient parameters comprises:
carrying out weighted average on the gradient parameters and the number of images corresponding to each client device to obtain an average gradient;
and acquiring a target model based on the average gradient and the model to be trained.
3. The method of claim 1, wherein distributing the model to be trained to the plurality of client devices comprises:
if the training is the first round, the initial model is used as a model to be trained and distributed to the plurality of client devices;
if the previous round of training is not the first round of training, acquiring the previous round of gradient parameters returned by the plurality of client devices after the previous round of training is finished so as to obtain a plurality of previous round of gradient parameters;
determining a previous round model based on the plurality of previous round gradient parameters;
distributing the previous round of models serving as models to be trained to the plurality of client devices;
the obtaining of the gradient parameters returned after the client devices respectively train the model to be trained to obtain a plurality of gradient parameters includes:
and acquiring gradient parameters returned after the plurality of client devices respectively train the model to be trained in turn so as to obtain a plurality of gradient parameters.
4. The method of claim 3, wherein the instructing the plurality of client devices to train the model to be trained based on respective image data comprises:
instruct the plurality of client devices to input respective image data into the model to be trained, resulting in a first quality score,
for the plurality of client devices to obtain a first tag score determined based on the respective image data; taking the absolute value of the difference between the first quality score and the first label score as a loss function, and updating the parameters of the model to be trained by adopting a gradient descent method; and obtaining gradient parameters based on the model to be trained and the updated model to be trained.
5. The method of claim 1, wherein prior to obtaining the target model based on the plurality of gradient parameters, further comprising:
if the training times reach the designated times, taking a model obtained based on the gradient parameters as a target model;
if the training times do not reach the designated times, the step of distributing the model to be trained to the plurality of client devices and instructing the plurality of client devices to train the model to be trained on the basis of respective image data is repeatedly executed, wherein the model to be trained sent each time is generated on the basis of gradient parameters returned by the client devices after the last training is finished.
6. The method of claim 1, wherein prior to distributing the model to be trained to the plurality of client devices and instructing the plurality of client devices to train the model to be trained based on respective image data, further comprising:
acquiring client equipment to be selected;
and acquiring a specified number of client devices from the client devices to be selected as a plurality of client devices.
7. The method of any of claims 1-6, wherein after obtaining the target model based on the plurality of gradient parameters, further comprising:
acquiring test image data;
inputting the test image data into the target model to obtain a second quality score;
obtaining a second label score determined based on the test image data;
and obtaining a model evaluation parameter of the target model based on the second quality score and the second label score so as to verify the image quality evaluation effect of the target model.
8. An apparatus for model training, operating on a server connected to a plurality of client devices, the apparatus comprising:
the model distribution and training module is used for distributing a model to be trained to the plurality of client devices and instructing the plurality of client devices to train the model to be trained based on respective image data;
a gradient parameter obtaining module, configured to obtain gradient parameters returned by the multiple client devices after training the model to be trained, so as to obtain multiple gradient parameters;
a model acquisition module for acquiring a target model based on the plurality of gradient parameters.
9. A server, comprising one or more processors and memory;
one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of any of claims 1-7.
10. A computer-readable storage medium, having program code stored therein, wherein the program code when executed by a processor performs the method of any of claims 1-7.
CN202211029747.0A 2022-08-25 2022-08-25 Model training method, device, server and storage medium Pending CN115527090A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211029747.0A CN115527090A (en) 2022-08-25 2022-08-25 Model training method, device, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211029747.0A CN115527090A (en) 2022-08-25 2022-08-25 Model training method, device, server and storage medium

Publications (1)

Publication Number Publication Date
CN115527090A true CN115527090A (en) 2022-12-27

Family

ID=84698392

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211029747.0A Pending CN115527090A (en) 2022-08-25 2022-08-25 Model training method, device, server and storage medium

Country Status (1)

Country Link
CN (1) CN115527090A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117371038A (en) * 2023-10-23 2024-01-09 北京智源人工智能研究院 Distributed medical image artificial intelligence model evaluation method and device
CN117857647A (en) * 2023-12-18 2024-04-09 慧之安信息技术股份有限公司 Federal learning communication method and system based on MQTT oriented to industrial Internet of things

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117371038A (en) * 2023-10-23 2024-01-09 北京智源人工智能研究院 Distributed medical image artificial intelligence model evaluation method and device
CN117857647A (en) * 2023-12-18 2024-04-09 慧之安信息技术股份有限公司 Federal learning communication method and system based on MQTT oriented to industrial Internet of things

Similar Documents

Publication Publication Date Title
CN115527090A (en) Model training method, device, server and storage medium
CN112001274A (en) Crowd density determination method, device, storage medium and processor
CN111950056B (en) BIM display method and related equipment for building informatization model
CN105894028B (en) User identification method and device
CN111711655A (en) Block chain-based electronic data evidence storing method, system, storage medium and terminal
CN111182332B (en) Video processing method, device, server and storage medium
CN113127723A (en) User portrait processing method, device, server and storage medium
CN113902473A (en) Training method and device of business prediction system
CN117278434A (en) Flow playback method and device and electronic equipment
CN115222862A (en) Virtual human clothing generation method, device, equipment, medium and program product
CN111865753A (en) Method and device for determining parameters of media information, storage medium and electronic device
CN111340574B (en) Risk user identification method and device and electronic equipment
CN111488529A (en) Information processing method, information processing apparatus, server, and storage medium
CN110852338A (en) User portrait construction method and device
CN113487041B (en) Transverse federal learning method, device and storage medium
CN114491093B (en) Multimedia resource recommendation and object representation network generation method and device
CN112035736B (en) Information pushing method, device and server
CN110232393B (en) Data processing method and device, storage medium and electronic device
CN107171949B (en) Information pushing method, device and system
CN111898033A (en) Content pushing method and device and electronic equipment
CN115496713A (en) Image quality evaluation method and device, electronic equipment and storage medium
CN113934871B (en) Training method and device of multimedia recommendation model, electronic equipment and storage medium
CN110585714B (en) UGC element setting method, device and equipment based on block chain
CN112767348B (en) Method and device for determining detection information
CN111782959B (en) User portrait updating method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination