CN113159329B

CN113159329B - Model training method, device, equipment and storage medium

Info

Publication number: CN113159329B
Application number: CN202110461573.4A
Authority: CN
Inventors: 侯宪龙
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-04-27
Filing date: 2021-04-27
Publication date: 2023-10-31
Anticipated expiration: 2041-04-27
Also published as: CN113159329A

Abstract

The embodiment of the application discloses a model training method, a model training device, model training equipment and a model training storage medium, and belongs to the technical field of machine learning. The method comprises the following steps: acquiring original data in the participating nodes and a pre-training model; processing the original data to obtain first augmentation data and second augmentation data, wherein the first augmentation data and the second augmentation data have characteristic differences; based on the first augmentation data and the second augmentation data, model parameters in the pre-training model are adjusted, and the adjusted model parameters are reported to the master node; and receiving global model parameters sent by the main node, and updating the global model parameters to the pre-training model to obtain a classification model. According to the embodiment of the application, the first augmentation data and the second augmentation data with characteristic differences are utilized to adjust model parameters of the pre-training model, and one augmentation data is utilized to generate the supervision signal to supervise the classification learning of the other augmentation data, so that the rapid learning of the classification task is realized, the image is not required to be manually marked, and the manpower resource is saved.

Description

Model training method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of machine learning, in particular to a model training method, device, equipment and storage medium.

Background

Nowadays, more and more images are stored in the memory of the terminal, and the classification model of the images is also widely applied to the terminal, so as to facilitate the search of the images by the user.

In the related art, after label-free data in each terminal are manually marked, a server trains to obtain a classification model of the picture in a federal learning mode.

Disclosure of Invention

The embodiment of the application provides a model training method, device, equipment and storage medium. The technical scheme is as follows:

according to an aspect of the present application, there is provided a model training method applied to a node participating in federal learning, the method comprising:

acquiring original data in the participating nodes and a pre-training model, wherein the pre-training model is obtained by pre-training a main node by adopting labeled data;

performing augmentation processing on the original data to obtain first augmentation data and second augmentation data, wherein characteristic differences exist between the first augmentation data and the second augmentation data;

based on the first augmentation data and the second augmentation data, adjusting model parameters in the pre-training model, and reporting the adjusted model parameters to the master node;

And receiving global model parameters sent by the master node, updating the global model parameters to the pre-training model to obtain a classification model, wherein the global model parameters are obtained by performing federal learning calculation by the master node based on the adjusted model parameters reported by at least two participating nodes.

According to another aspect of the present application, there is provided a model training method applied to a master node of federal learning, the method comprising:

receiving at least two groups of adjusted model parameters sent by at least two participating nodes, wherein the adjusted model parameters are obtained by adjusting model parameters of a pre-training model by the participating nodes based on unlabeled data, the unlabeled data comprises first augmentation data and second augmentation data obtained by carrying out augmentation processing based on the same original data, characteristic differences exist between the first augmentation data and the second augmentation data, and the pre-training model is obtained by pre-training the main nodes by adopting labeled data;

performing federal learning on the at least two groups of adjusted model parameters to obtain global model parameters;

and issuing the global model parameters to the at least two participating nodes so as to update the global model parameters into a pre-training model in the at least two participating nodes to obtain a classification model.

According to another aspect of the present application, there is provided a model training apparatus, the apparatus being located at a participating node of federal learning, the apparatus comprising:

the acquisition module is used for acquiring the original data in the participating nodes and a pre-training model, wherein the pre-training model is obtained by pre-training a main node by adopting labeled data;

the amplifying module is used for carrying out amplifying treatment on the original data to obtain first amplifying data and second amplifying data, and characteristic differences exist between the first amplifying data and the second amplifying data;

the parameter adjusting module is used for adjusting model parameters in the pre-training model based on the first augmentation data and the second augmentation data and reporting the adjusted model parameters to the main node;

and the updating module is used for receiving global model parameters sent by the master node, updating the global model parameters to the pre-training model to obtain a classification model, wherein the global model parameters are obtained by performing federal learning calculation by the master node based on the adjusted model parameters reported by at least two participating nodes.

According to another aspect of the present application, there is provided a model training apparatus, the apparatus being located at a master node of federal learning, the apparatus comprising:

The system comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving at least two groups of adjusted model parameters sent by at least two participating nodes, the adjusted model parameters are obtained by the participating nodes by adjusting model parameters of a pre-training model based on unlabeled data, the unlabeled data comprises first augmentation data and second augmentation data which are obtained by carrying out augmentation processing based on the same original data, characteristic differences exist between the first augmentation data and the second augmentation data, and the pre-training model is obtained by pre-training the main node by adopting labeled data;

the learning module is used for performing federal learning on the at least two groups of adjusted model parameters to obtain global model parameters;

and the sending module is used for sending the global model parameters to the at least two participating nodes so as to update the global model parameters to the pre-training model in the at least two participating nodes to obtain a classification model.

According to another aspect of the present application, there is provided an electronic device comprising a processor, and a memory coupled to the processor, and program instructions stored on the memory, which when executed by the processor implement a model training method as provided by the various aspects of the present application.

According to another aspect of the present application, there is provided a computer readable storage medium having stored therein program instructions which when executed by a processor implement a model training method as provided by the various aspects of the present application.

According to another aspect of the present application, there is provided a computer program product comprising computer instructions stored in a computer readable storage medium. A processor of a computer device reads the computer instructions from the computer readable storage medium, the processor executing the computer instructions, causing the computer device to perform the methods provided in various alternative implementations of the model training method described above.

The technical scheme provided by the embodiment of the application has the beneficial effects that:

according to the model training method, data augmentation is carried out by using unlabeled original data by the participating nodes, wherein each pair of training data comprises first augmentation data with smaller characteristic difference between the first augmentation data and the original data and second augmentation data with larger characteristic difference between the first augmentation data and the original data, model parameters of a pre-training model are adjusted by using the two augmentation data expanded by the original data, and supervision signals are generated by using the augmentation data with smaller characteristic difference between the first augmentation data and the original data so as to supervise classification learning of the other augmentation data, so that more classification features can be learned in each classification learning of the pre-training model, faster learning of classification tasks is realized, convergence speed is effectively accelerated, and training iteration times are reduced; secondly, the user side terminal can be used as a participating node, and the unlabeled original data is adopted to perform federal learning, so that manpower is liberated while the personalized requirements and personal data safety of the user are ensured, namely, the label of the original data in the user side terminal is not required to be marked manually.

Drawings

In order to more clearly describe the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments of the present application will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a block diagram illustrating a lateral federal learning system 100 provided in accordance with an exemplary embodiment of the present application;

FIG. 2 illustrates a flow chart of a model training method provided by an exemplary embodiment of the present application;

FIG. 3 illustrates a flow chart of a model training method provided by another exemplary embodiment of the present application;

FIG. 4 illustrates a flow chart of a model training method provided by another exemplary embodiment of the present application;

FIG. 5 illustrates a flowchart of a training method for a pre-training model provided by an exemplary embodiment of the present application;

FIG. 6 illustrates a flow chart of a model training method provided by another exemplary embodiment of the present application;

FIG. 7 illustrates a flow chart of a model training method provided by another exemplary embodiment of the present application;

FIG. 8 illustrates a flow chart of a model training method provided by another exemplary embodiment of the present application;

FIG. 9 illustrates a block diagram of a model training apparatus provided by an exemplary embodiment of the present application;

FIG. 10 shows a block diagram of a model training apparatus provided by another exemplary embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.

In the description of the present application, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present application, it should be noted that, unless explicitly specified and limited otherwise, the terms "connected," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium. The specific meaning of the above terms in the present application will be understood in specific cases by those of ordinary skill in the art. Furthermore, in the description of the present application, unless otherwise indicated, "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

In order that the scheme shown in the embodiment of the present application can be easily understood, several terms appearing in the embodiment of the present application are described below.

Artificial intelligence (Artificial Intelligence, AI), a branch of computer science, is a theory, method, technique, and application system for simulating, extending, and expanding human intelligence. It attempts to understand the nature of intelligence and produce a new intelligent machine that reacts in a similar way to human intelligence, the AI field includes machine learning, natural language processing, image recognition, speech recognition, vision techniques, robotics, etc.

Among them, machine Learning (ML) is a multi-domain interdisciplinary, and involves multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specialized in how computers simulate or implement learning behavior of humans to acquire new knowledge or skills, reorganizing existing knowledge structures to continuously improve their own performance.

Federal learning (Federated Learning), a machine learning framework, can effectively help multiple institutions perform data usage and machine learning modeling while meeting the requirements of user privacy protection, data security, and government regulations. The federal learning is used as a distributed machine learning paradigm, so that the problem of data island can be effectively solved, the participating nodes can be modeled in a combined way on the basis of not sharing data, the data island can be broken technically, and AI cooperation is realized.

Federal learning can be divided into three categories depending on the distribution of the participating data sources: lateral federal learning (Horizontal Federated Learning), longitudinal federal learning (Vertical Federated Learning), federal migration learning (Federated Transfer Learning).

And the horizontal federation learning is to segment the data sets according to the horizontal direction (namely the user dimension) under the condition that the user characteristics of the two data sets are more and the user overlap is less, and take out the part of data with the same user characteristics and the incomplete user characteristics for training.

Homomorphic encryption is a cryptographic technique based on the theory of computational complexity of mathematical problems. The homomorphically encrypted data is processed to obtain an output, and the output is decrypted, the result of which is the same as the output result obtained by processing the unencrypted original data by the same method.

Secure multi-party computing is a sub-field of cryptography that allows multiple data owners to perform collaborative computing without trust from each other, outputting the computed results, and ensuring that no party can get any other information than the resulting computed results.

Machine learning may be applied to classification tasks such as image classification, mail classification, tag classification, and the like. Aiming at an image classification scene, along with gradual improvement of a camera shooting function on a terminal, more and more images are stored in a memory of the terminal, and a model can be trained to execute an image classification task in a machine learning mode for facilitating user searching.

Because of considering the security problem of personal privacy data and the personalized requirement of users, a federal learning mode can be adopted, and an exemplary, relatively mature technical scheme is that a global model is trained by using a public data set, the global model is issued to a terminal, an image in a memory is classified by calling the global model when an application program is operated on the terminal, model parameters are adjusted according to classification results, the adjusted model parameters are reported to a server, federal learning is carried out by the server based on the adjusted model parameters reported by at least two terminals, the latest model parameters of the global model are determined, and the latest model parameters are issued to the terminal. However, the original data in the user's terminal are all unlabeled data. In order to solve this problem, the present application provides a model training method, and for details of implementation of this method, please refer to the following examples.

FIG. 1 illustrates a block diagram of a lateral federal learning system 100 provided in accordance with an exemplary embodiment of the present application. The horizontal federal learning system supports horizontal federal learning in which N participating nodes (also referred to as participants) cooperate together. The lateral federal learning system 100 includes: master node P ₀ With the participating node P ₁ To the participating node P _N-1 N is a positive integer greater than 1.

Master node P ₀ With a participating node (participating node P ₁ To the participating node P _N-1 ) A multi-drop tree topology deployment is employed. Any one of the participating nodes may be a terminal, or a server, or multiple servers, or a logical computing module in a cloud computing service. Any two participating nodes belong to different data sources, such as data sources of different companies, or data sources of different sub-companies of the same company, or data sources of different users, or different data sources of the same user.

For example, when the participating node is a terminal, the terminal may include a smart phone, a tablet computer, a laptop computer, a desktop computer, a computer all-in-one machine, a smart watch, a digital camera, an MP4 playing terminal, an MP5 playing terminal, a learning machine, a point-to-read machine, an electronic book, an electronic dictionary, a Virtual Reality (VR) playing terminal, an augmented Reality (Augmented Reality, AR) playing terminal, or the like.

Master node P ₀ Is provided with tag data { x } _i ,y _i Participating nodes of }, or, master node P ₀ Is possession of unlabeled data { x } _i A participating node or a master node is a participating node that does not possess training data. For example, master node P ₀ Is a provider of image processing software that has a public dataset of images. Wherein an image x in the dataset is disclosed _i As a sample, image classification y _i As a label. Participating node P _n Each having unlabeled data u _j N=1, 2, …, N-1, j is a positive integer.

The above-described horizontal federal learning system 100 supports N participating nodes in collaboration together to securely train a machine learning model. The machine learning model includes, but is not limited to: linear regression models (Linear Regression, LR), or logistic regression models (Logistic Regression, logR), or support vector machine models (Support Vector Machine, SVM).

This machine learning model is also known as the federal model. Each participating node has a federal model deployed locally. Network parameters in the various federal models may be different. Taking a machine learning model as a multiple linear regression model as an example y=w ₀ X ₀ +W ₁ X ₁ Network parameter W ₀ ＝W ₀₁ +W ₀₂ +W ₀₃ +…+W _0(N-1) Network parameter W ₁ ＝W ₁₁ +W ₁₂ +W ₁₃ +…+W _1(N-1) The federal model of the first participating node deployment has network parameters W of a multiple linear regression model ₀₁ And W is ₁₁ The federal model of the second participating node deployment has network parameters W of a multiple linear regression model ₀₂ And W is ₁₂ Federal mode for third participating node deployment Network parameters W with multiple linear regression models ₀₃ And W is ₁₃ And so on, all network parameters in the sub-models of all participating node deployments constitute all network parameters of the multiple linear regression model.

Master node P ₀ Owning participating node P _n Is a first public key PK of (2) _n . First public key PK of different participating nodes _n Different. Participating node P _n Owning master node P ₀ Is a first public key PK of (2) ₀ . First public key PK owned by each participating node ₀ Is identical. None of the participating nodes reveals its own private key to the other participating nodes. The public key is used for encrypting intermediate calculation results in the model training process. The encryption algorithm adopted by the application is a homomorphic encryption algorithm or a secure multi-party computing algorithm. The additive homomorphic encryption algorithm may be, for example, a Paillier homomorphic encryption algorithm.

Illustratively, a master node P ₀ For participating node P _n Generating a first random mask R _n . The first random masks corresponding to different participating nodes are different. Participating node P _n As master node P ₀ Generating a second random mask R _0n . The second random masks corresponding to different participating nodes are different. None of the participating nodes reveals the plaintext of any random mask to the other participating nodes. The random mask is used for protecting intermediate calculation results in the model training process and avoiding solving network parameters of the machine learning model in an inverse way through model output values of a plurality of groups of training samples.

In the model training process, the master node P ₀ And a participating node P _n The operation of combining the safety calculation by the two parties can be executed in parallel in the (N-1) party on the premise of not revealing any training sample, so that the model training efficiency is improved.

FIG. 2 illustrates a flow chart of a model training method provided by an exemplary embodiment of the present application. The model training method can be applied to the participating nodes of the federal learning system. In fig. 2, the model training method includes:

step 210: raw data in the participating nodes are acquired, and a pre-trained model is obtained.

The pre-training model is obtained by pre-training a main node by adopting labeled data.

Illustratively, the participating node receives a pre-trained model issued by the master node; and the participating nodes acquire the original data from their own memories.

For example, the original data in this embodiment may be any data with a classification requirement, for example, the original data may be an image, a tag, an article, or news.

Step 220: and carrying out amplification processing on the original data to obtain first amplification data and second amplification data, wherein characteristic differences exist between the first amplification data and the second amplification data.

Illustratively, the participating node performs augmentation processing on the T-th original data, and uses the obtained first augmentation data and second augmentation data as T-th group samples, where T is a positive integer.

Illustratively, the participating node performs augmentation processing on the original data by adopting a first type of processing mode to obtain first augmentation data; and carrying out augmentation processing on the original data by adopting a second type of processing mode to obtain second augmentation data.

Optionally, the first feature difference of the first augmented data is smaller than the second feature difference of the second augmented data, the first feature difference is the feature difference between the first augmented data and the original data, and the second feature difference is the feature difference between the second augmented data and the original data.

The method comprises the steps that an augmentation processing mode of original data is arranged in a participating node, the augmentation processing mode is arranged according to the requirement on characteristic difference between the augmentation data and the original data, and the augmentation processing mode comprises a first type processing mode and a second type processing mode; performing augmentation treatment on the original data by adopting a first type of treatment mode to obtain first augmentation data; and carrying out augmentation processing on the original data by adopting a second type of processing mode to obtain second augmentation data. The characteristic difference between the first augmentation data obtained by adopting the first type processing mode and the original data is a first characteristic difference, the characteristic difference between the second augmentation data obtained by adopting the second type processing mode and the original data is a second characteristic difference, and the first characteristic difference is smaller than the second characteristic difference.

For example, the first augmented data and the second augmented data may be distinguished by a difference threshold, e.g., a first characteristic difference of the first augmented data is less than or equal to a first difference threshold and a second characteristic difference of the second augmented data is greater than a second difference threshold, wherein the first difference threshold is less than or equal to the second difference threshold.

In order to ensure that the first feature difference is smaller than or equal to a first difference threshold, after the first augmentation data is generated, calculating the first feature difference between the first augmentation data and the original data, and determining the first augmentation data with the first feature difference smaller than or equal to the first difference threshold as training data of the model; after generating the second augmented data, calculating a second feature difference between the second augmented data and the original data, and determining the second augmented data with the second feature difference greater than a second difference threshold as training data of the model; the first difference threshold and the second difference threshold are preset in the participating node, or are configured by the master node for the participating node.

For example, the feature difference may be represented by a feature similarity between the augmented data and the original data, where the feature difference and the feature similarity have a negative correlation, and the feature difference is higher as the feature similarity is lower, and the feature difference is lower as the feature similarity is higher.

For example, the similarity threshold is used as a difference threshold, and the first difference threshold is set to be equal to the second difference threshold, and the value is set to be 0.85; taking original data as an original image as an example, a participation node rotates the original image to obtain a first augmented image, and the participation node cuts the original image to obtain a second augmented image, wherein the first feature similarity between the first augmented image and the original image is greater than 0.85, and the second feature similarity between the second augmented image and the original image is less than 0.85; taking original data as an original article as an example, classifying the original article based on the article abstract, adjusting sentence sequence of the article abstract by the participation node to obtain a first augmented abstract, picking sentence of the article abstract by the participation node to obtain a second augmented abstract, wherein the feature similarity between the first augmented abstract and the article abstract is larger than 0.85, and the feature similarity between the second augmented abstract and the article abstract is smaller than 0.85.

Optionally, for the augmentation data, the augmentation data may be divided into at least one category based on the range of feature differences; for example, the first augmentation data may be divided into one type of augmentation data or at least two types of augmentation data according to the characteristic difference range, for example, the first difference threshold is 0.2, the first augmentation data having the characteristic difference of 0.2 or less and 0.1 or more is determined as one type, and the first augmentation data having the characteristic difference of 0.1 or less and 0 or more is determined as another type; the second augmentation data may be classified into one type of augmentation data or at least two types of augmentation data according to a characteristic difference range, for example, the second difference threshold is 0.3, the second augmentation data having a characteristic difference of more than 0.3 and less than or equal to 0.4 is determined as the first type, the second augmentation data having a characteristic difference of more than 0.4 and less than or equal to 0.5 is determined as the second type, and the second augmentation data having a characteristic difference of more than 0.5 and less than or equal to 0.6 is determined as the third type.

Step 230: based on the first augmentation data and the second augmentation data, model parameters in the pre-training model are adjusted, and the adjusted model parameters are reported to the master node.

Illustratively, the participating node invokes the pre-training model to classify the first augmented data to obtain a first predicted value, and invokes the pre-training model to classify the second augmented data to obtain a second predicted value; and then, based on the first predicted value and the second predicted value, calling a loss function to adjust model parameters in the pre-training model, and reporting the adjusted model parameters to the master node.

The participating node invokes the pre-training model to perform feature extraction on the first augmentation data to obtain a first feature vector of the first augmentation data, and calculates a classification predicted value of the original data based on the first feature vector to obtain a first predicted value; and the participating node calls the pre-training model to perform feature extraction on the second augmentation data to obtain a second feature vector of the second augmentation data, and calculates the classification prediction type of the original data based on the second feature vector to obtain a second prediction value.

Illustratively, the participating node builds a loss function based on the first predicted value, the second predicted value, and model parameters of the pre-trained model, adjusts the model parameters to minimize the loss value in the loss function, and reports the model parameters corresponding to the minimum loss value to the master node.

Optionally, a value condition is set in the pre-training model; the participating node determining the first predicted value as a pseudo tag in response to the first predicted value meeting a value condition; and based on the pseudo tag and the second predicted value, calling a loss function to adjust model parameters in the pre-training model, and reporting the adjusted model parameters to the master node.

The above-mentioned value condition includes that the predicted value belongs to the value interval; and the participating node determines the first predicted value as a pseudo tag in response to the first predicted value belonging to the value interval. For example, the above-mentioned value interval is [0.85,1], the value interval includes 0.85 and 1, and the participating node determines the first predicted value as a pseudo tag in response to the first predicted value belonging to the value interval of [0.85,1 ].

Illustratively, the above-mentioned value condition includes a prediction threshold value; the participating node determines the first predicted value as a pseudo tag in response to the first predicted value being greater than or equal to the prediction threshold. For example, the prediction threshold is 0.9, and the participating node determines the first predicted value as a pseudo tag in response to the first predicted value being greater than or equal to 0.9.

The above-mentioned value interval or the prediction threshold value is obtained by a posterior method, that is, the above-mentioned value interval or the prediction threshold value is an empirical value.

After the pseudo tag is determined, a loss function is constructed based on the pseudo tag, the second predicted value and model parameters in the pre-training model, the model parameters in the loss function are adjusted to minimize the loss value in the loss function, and the model parameters corresponding to the minimum loss value are reported to the master node.

In an alternative embodiment, the master node configures a loss threshold for the participating nodes, or the loss threshold is set in the pre-training model; after the first predicted value is determined to be a pseudo tag, the participating node inputs the pseudo tag and the second predicted value into a loss function, and adjusts model parameters of the pre-training model to obtain a minimum loss value of the loss function; reporting the adjusted model parameters corresponding to the minimum loss value to the master node in response to the minimum loss value being less than or equal to the loss threshold; and in response to the minimum loss value being greater than the loss threshold, updating the adjusted model parameters to the pre-training model, continuing to adjust the model parameters in the pre-training model until the minimum loss value is less than or equal to the loss threshold, and reporting the adjusted model parameters corresponding to the minimum loss value to the master node.

For example, the loss threshold set in the pre-training model is 0.1; the participating node responds to the minimum loss value being smaller than or equal to 0.1, and reports the adjusted model parameters corresponding to the minimum loss value to the master node; and in response to the minimum loss value being greater than 0.1, updating the adjusted model parameters to the pre-training model, continuously adjusting the model parameters in the pre-training model until the minimum loss value is less than or equal to the loss threshold value, and reporting the adjusted model parameters corresponding to the minimum loss value to the master node.

Illustratively, in the process of model training for the T time, the participating node responds to the minimum loss value being smaller than or equal to the loss threshold value, and reports the adjusted model parameters obtained by model training for the T time to the master node; in response to the minimum loss value being greater than the loss threshold, re-executing step 230, adjusting model parameters in the pre-training model based on the first augmented data and the second augmented data of the t+1 th group, further, calling the pre-training model to classify the first augmented data in the t+1 th group of samples to obtain a first predicted value, calling the pre-training model to classify the second augmented data in the t+1 th group of samples to obtain a second predicted value, calling a loss function to adjust model parameters in the pre-training model based on the first predicted value and the second predicted value of the t+1 th group, and so on, reporting the adjusted model parameters to the master node when the minimum loss value is less than or equal to the loss threshold; t is a positive integer.

The method comprises the steps of setting a loss threshold value to control the classification loss of the pre-training model within an allowable range, determining that training of the pre-training model is completed when the classification loss of the pre-training model is reduced to be lower than the loss threshold value, and reporting model parameters of the pre-training model to a main node.

In another optional embodiment, the master node configures an iteration number threshold Q for the participating nodes, or the pre-training model is provided with the iteration number threshold Q, where Q is a positive integer; in the process of model training of the T time, the participating node responds to Q when T is equal to Q, and the adjusted model parameters are reported to the master node; and in response to the fact that T is smaller than Q, updating the adjusted model parameters to the pre-training model, continuing to adjust the model parameters in the pre-training model until the iteration is performed for Q times, and reporting the adjusted model parameters obtained after the Q-th iteration to the main node.

Illustratively, in the process of model training for the T time, the participating node responds to Q and reports the adjusted model parameters to the master node; in response to T being less than Q, step 230 is re-executed, the pre-training model is invoked to classify the first augmented data in the t+1st set of samples to obtain a first predicted value, the pre-training model is invoked to classify the second augmented data in the t+1st set of samples to obtain a second predicted value, the model parameters in the pre-training model are adjusted based on the first predicted value and the second predicted value of the t+1st set, and the loss function is invoked to adjust the model parameters in the pre-training model, and so on until the Q iterations are completed, and the adjusted model parameters obtained after the Q iterations are reported to the master node.

The method comprises the steps of setting an iteration number threshold to control the iteration number of the pre-training model, so that the excessive iteration number is avoided, the electric quantity of the participation node is consumed, the computing resource of the participation node is occupied for a long time, the training efficiency of the pre-training model is reduced, and the training efficiency and the training effect of the pre-training model can be ensured through the iteration number threshold.

Step 240: and receiving global model parameters sent by the main node, and updating the global model parameters to the pre-training model to obtain a classification model.

The participating nodes receive global model parameters sent by the main nodes, update the global model parameters to the pre-training model, and obtain a classification model for data classification. The global model parameters are obtained by performing federation learning calculation by the master node based on the adjusted model parameters reported by at least two participating nodes.

In summary, according to the model training method provided by the embodiment, the participating nodes perform data augmentation by adopting the untagged original data, wherein each pair of training data comprises a first augmentation data with smaller feature difference from the original data and a second augmentation data with larger feature difference from the original data, the model parameters of the pre-training model are adjusted by using the two augmentation data expanded by the original data, and the supervision signals are generated by using the augmentation data with smaller feature difference from the original data to supervise the classification learning of the other augmentation data, so that more classification features can be learned for each classification learning of the pre-training model, the faster learning of classification tasks is realized, the convergence speed is effectively accelerated, and the training iteration times are reduced.

Secondly, the user side terminal can be used as a participating node, and the unlabeled original data is adopted to perform federal learning so as to liberate manpower while guaranteeing the personalized requirements of the user and the personal data safety, namely, the original data in the user side terminal is not required to be labeled manually, the waste of manpower resources due to manual labeling of the original data is avoided, and the training cost of the pre-training model is reduced.

For a detailed description of the implementation of the model training method when the original data is the original image, please refer to fig. 3, which shows a flowchart of the model training method provided by an exemplary embodiment of the present application. The model training method can be applied to the participating nodes of the federal learning system. In fig. 3, the model training method includes:

step 310: an original image in the participating node is acquired, and a pre-trained model is obtained.

Illustratively, the participating nodes retrieve the raw images from memory along with the pre-trained model.

Step 320: performing augmentation treatment on the original image by adopting a first type of treatment mode to obtain a first augmentation image; the first type of processing means includes at least one of rotation and scaling.

Illustratively, the participating node performs augmentation processing on the original image in a rotation manner to obtain a first augmented image (i.e., first augmented data); or, the participating node performs augmentation treatment on the original image in a scaling mode to obtain a first augmentation image; or the participating nodes perform augmentation processing on the original image in a rotation and scaling combined mode to obtain a first augmented image.

Step 330: performing augmentation treatment on the original image by adopting a second type of treatment mode to obtain a second augmentation image; the second type of processing means includes at least one of clipping, exposure, and occlusion.

The second feature difference between the second augmented image (i.e., the second augmented data) obtained by the second type of processing method and the original image is larger than the first feature difference of the first augmented image obtained by the first type of processing method. It can also be said that the feature similarity between the first augmented image and the original image is greater than the feature similarity between the second augmented image and the original image.

Illustratively, the participating node performs augmentation processing on the original image in a clipping manner to obtain a second augmented image; or the participating node performs augmentation treatment on the original image in an exposure mode to obtain a second augmentation image; or, the participating node adopts a shielding mode to amplify the original image to obtain a second amplified image; or, the participating nodes perform augmentation treatment on the original image in a cutting and exposure combined mode to obtain a second augmentation image; or, the participating nodes perform augmentation treatment on the original image in a cutting and shielding combined mode to obtain a second augmentation image; or, the participating node performs augmentation treatment on the original image in a mode of combining exposure and shielding to obtain a second augmentation image; or the participating nodes perform augmentation processing on the original image in a mode of combining clipping, exposure and shielding to obtain a second augmentation image.

Step 340: based on the first augmented image and the second augmented image, model parameters in the pre-training model are adjusted, and the adjusted model parameters are reported to the master node.

Illustratively, the participating node invokes the pre-training model to classify the first augmented image to obtain a first predicted value, and invokes the pre-training model to classify the second augmented image to obtain a second predicted value; and based on the first predicted value and the second predicted value, calling a loss function to adjust model parameters in the pre-training model, and reporting the adjusted model parameters to the master node.

The participating node calls a pre-training model to perform feature extraction on the first augmented image to obtain a first feature image, and calculates a classification predicted value of the original image based on the first feature image to obtain a first predicted value; and calling the pre-training model to perform feature extraction on the second augmented image to obtain a second feature image, and calculating a classification predicted value of the original image based on the second feature image to obtain a second predicted value.

Optionally, the pre-training model is built based on a Computer Vision (CV) model; the participation node calls a CV model to conduct classification calculation on the first augmented image to obtain a first predicted value; and calling the CV model to perform classification calculation on the second augmented image to obtain a second predicted value.

Optionally, the CV model includes a geometrically active contour model (Geometric Active Contour Model); illustratively, the participating node invokes the geometric active contour model to perform classification calculation on the first augmented image to obtain a first predicted value; and calling the geometric active contour model to perform classification calculation on the second augmented image to obtain a second predicted value.

Illustratively, the participating node determines the first predicted value as a pseudo tag in response to the first predicted value being greater than or equal to the prediction threshold; based on the pseudo tag and the second predicted value, calling a loss function to adjust model parameters in the pre-training model, and reporting the adjusted model parameters to a main node;

and executing the step of calling the pre-training model to conduct classified calculation on the first augmented image to obtain a first predicted value again in response to the first predicted value being smaller than the prediction threshold, and then continuously calling a loss function to adjust model parameters in the pre-training model based on the first predicted value and the second predicted value obtained through the re-calculation.

In an optional embodiment, the participating node may further perform augmentation processing on the first augmented image by adopting a first type of processing manner, to obtain an updated first augmented image; and calling the pre-training model to perform classification calculation on the new first augmented image to obtain an updated first predicted value, and then continuously calling a loss function to adjust model parameters in the pre-training model based on the updated first predicted value and the updated second predicted value. The first augmentation image obtained for the first time and the updated first augmentation image obtained for the second time are obtained by adopting different or same first type of processing mode augmentation processing.

In another optional embodiment, the participating node may further perform augmentation processing on the original image by adopting a first type of processing manner, to obtain an updated first augmented image; and calling the pre-training model to perform classified calculation on the updated first augmented image to obtain an updated first predicted value, and then continuously calling a loss function to adjust model parameters in the pre-training model based on the updated first predicted value and the updated second predicted value. The first augmentation image obtained for the first time and the updated first augmentation image obtained for the second time are obtained by adopting different first type processing mode augmentation processing.

Optionally, after the participating node adopts the T-th group sample to adjust the model parameters, responding to the minimum loss value being smaller than or equal to the loss threshold value, and reporting the adjusted model parameters to the master node; and in response to the minimum loss value being greater than the loss threshold, updating the adjusted model parameters to the pre-training model, and adopting the T+1st group of samples to continue iterative training on the pre-training model.

Optionally, after the participating node adopts the T-th group sample to adjust the model parameters, responding to the iteration training frequency equal to the iteration frequency threshold value, and reporting the adjusted model parameters to the master node; and in response to the iteration training times being smaller than the iteration time threshold, updating the adjusted model parameters to the pre-training model, and continuing to iterate the pre-training model by adopting the T+1st group of samples.

Optionally, after the participating node adopts the T-th group sample to adjust the model parameters, responding to the iteration training frequency equal to the iteration frequency threshold value, and reporting the adjusted model parameters to the master node; responding to the iteration training times smaller than the iteration times threshold value and the minimum loss value smaller than or equal to the loss threshold value, and reporting the adjusted model parameters to the main node; and in response to the iteration training times being smaller than the iteration time threshold and the minimum loss value being larger than the loss threshold, updating the adjusted model parameters to the pre-training model, and continuing to iterate the training of the pre-training model by adopting the T+1st group of samples.

The classification of the augmentation data may also be divided based on the way the augmentation process is performed. Illustratively, taking classification of the augmented image as an example, determining the first augmented image obtained by the rotation processing as one type, and determining the first augmented image obtained by the scaling processing as another type; the second augmented image obtained by the clipping process is determined as a first class, the second augmented image obtained by the exposure process is determined as a second class, and the second augmented image obtained by the shielding process is determined as a third class.

Step 350: and receiving global model parameters sent by the main node, and updating the global model parameters to the pre-training model to obtain a classification model.

The global model parameters are obtained by performing federation learning calculation by the master node based on the adjusted model parameters reported by at least two participating nodes.

In summary, according to the model training method provided by the embodiment, the participating nodes adopt the label-free original image to perform data augmentation, wherein each pair of training data comprises a first augmented image with smaller feature difference from the original image and a second augmented image with larger feature difference from the original image, and the augmented image with smaller feature difference from the original image is used for generating a supervision signal to supervise the classification learning of the other augmented image, so that each classification learning of the pre-training model can learn more classification features, thereby realizing faster learning of the image classification task, effectively accelerating the model convergence speed, and reducing the training iteration times.

Secondly, the user side terminal can be used as a participating node, and the unlabeled original image is adopted to perform federal learning so as to liberate manpower while guaranteeing the personalized requirements and personal data safety of the user, namely, the original image in the user side terminal is not required to be labeled with a classification label manually, so that the waste of manpower resources caused by manual labeling of the original image is avoided, and the training cost of the pre-training model is reduced.

Referring to fig. 4, a flowchart of a model training method according to an exemplary embodiment of the present application is shown. The model training method can be applied to the main node of the federal learning system. In fig. 4, the model training method includes:

step 410: at least two sets of adjusted model parameters transmitted by at least two participating nodes are received.

The adjusted model parameters are obtained by adjusting model parameters of the pre-training model by the participating nodes based on non-label data, wherein the non-label data comprises first augmentation data and second augmentation data which are obtained by performing augmentation processing based on the same original data, and characteristic differences exist between the first augmentation data and the second augmentation data; the pre-training model is obtained by pre-training the main node by adopting the labeled data.

In an alternative embodiment, the master node issues a pre-training model to at least two nodes, and then receives participation feedback sent by M participation nodes, where M is a positive integer greater than 1; the master node receives the adjusted model parameters sent by each participating node and stores the adjusted model parameters into a memory; in response to receiving the M sets of adjusted model parameters, steps 420 through 430 begin to be performed.

The master node determines at least two candidate nodes, and sends a resource acquisition request to each candidate node in the at least two candidate nodes, wherein the resource acquisition request is used for acquiring resources which can be provided by the candidate nodes; receiving resource information fed back by each candidate node, wherein the resource information indicates resources which can be provided by the candidate node; determining at least two target nodes from the at least two candidate nodes based on the resource information; sending participation requests to at least two target nodes, wherein the participation requests are used for requesting the target nodes to participate in federal learning of the classification model; and receiving participation feedback sent by the M participation nodes, and sending a pre-training model to the M participation nodes. The engagement feedback may be determined by user action. The above-mentioned resource may be at least one of an electric quantity of the participating node and a computing resource, that is, the master node determines the participating node of the federal learning of the classification model according to the electric quantity of the participating node, the available computing resource, and even the participation intention of the user.

Step 420: and performing federal learning on at least two groups of adjusted model parameters to obtain global model parameters.

Illustratively, the master node calculates an average of at least two sets of adjusted model parameters to obtain global model parameters. For example, each set of adjusted model parameters includes D adjusted model parameters, an average value of the corresponding at least two adjusted model parameters is calculated, and D average model parameters, that is, global model parameters, are finally obtained, where D is a positive integer greater than 1. For example, the total of 3 groups of adjusted model parameters, the 1 st group of adjusted model parameters comprise D11, D12 and D13, the 2 nd group of adjusted model parameters comprise D21, D22 and D23, the 3 rd group of adjusted model parameters comprise D31, D32 and D33, and the calculated tie model parameters comprise (D11+D21+D31)/3, (D12+D22+D32)/3, (D13+D23+D33)/3.

Illustratively, the master node calculates weighted values of at least two sets of adjusted model parameters to obtain global model parameters. For example, the master node configures different weights for different participating nodes, calculates weighted values of at least two groups of adjusted model parameters based on the configured weights, and obtains global model parameters. For example, 2 groups of adjusted model parameters are added, the adjusted model parameters reported by the 1 st participating node include D1, the adjusted model parameters reported by the 2 nd participating node include D2, the weight corresponding to the 1 st participating node is R1, the weight corresponding to the 2 nd participating node is R2, and then the global model parameters are calculated to be d1×r1+d2×r2. For example, the above weights may be configured based on historical confidence in the participating nodes in the decision.

Step 430: and issuing the global model parameters to at least two participating nodes so as to update the global model parameters to the pre-training model in the at least two participating nodes to obtain a classification model.

The master node issues the global model parameters to the at least two participating nodes and the other non-participating nodes to update the global model parameters to the at least two participating nodes and the other non-participating nodes.

Illustratively, the master node has at least two designated times disposed therein; after training to obtain the classification model, the master node adopts the method provided by the embodiment of the application to readjust the model parameters of the classification model at the appointed time. The regular training method can continuously improve the accuracy of model classification, can be suitable for different application scenes, for example, the data types of images in spring festival are mostly dinner images, and the classification model readjusted by model parameters can obviously classify the dinner type images more accurately. For example, the setting of the designated time may be further divided according to the geographic location, for example, the cherokee rose flowers of the forty months of a certain place are full-open, and many cherokee rose images exist in the user terminal of the place, and at this time, the classification model with the model parameters readjusted can obviously classify the cherokee rose images more accurately.

For example, the master node may also perform model retraining according to application feedback of the classification model, e.g., the master node determines that the score of the classification model on the application platform is below a score threshold, triggering retraining of the classification model.

In summary, according to the model training method provided by the embodiment, the master node issues the pre-training model to the participating nodes, and the participating nodes adopt the label-free original data to perform data augmentation, wherein each pair of training data comprises a first augmentation data with smaller feature difference from the original data and a second augmentation data with larger feature difference from the original data, two augmentation data expanded by the original data are adopted to adjust model parameters of the pre-training model, and the augmentation data with smaller feature difference from the original data is used to generate a supervision signal to supervise classification learning of the other augmentation data, so that each classification learning of the pre-training model can learn more classification features, thereby realizing faster learning of classification tasks, effectively accelerating convergence speed, and reducing training iteration times.

Secondly, the user side terminal can be used as a participating node, and the unlabeled original data is adopted to perform federal learning so as to liberate manpower while guaranteeing the personalized requirements of the user and the personal data safety, namely, the original data in the user side terminal is not required to be labeled manually, so that the waste of manpower resources due to manual labeling of the original data is avoided, and the training cost of a pre-training model is reduced; and thirdly, the master node integrates the adjusted model parameters of at least two participating nodes to obtain global model parameters, so that the situation that the public data set is too few or is not representative is avoided, the classification accuracy of the classification model obtained by directly training the public data set is not high, and the robustness of the model can be enhanced.

For example, before issuing the pre-training model, the master node first pre-trains the initial machine learning model by using a small amount of labeled data to provide a pre-training model with a certain classification accuracy, so as to improve the training progress of the pre-training model by the participating nodes, as shown in fig. 5, which is a flowchart of a training method of the pre-training model provided by an exemplary embodiment of the present application. The model training method can be applied to the main node of the federal learning system. In fig. 5, the training method of the pre-training model includes:

Step 510: a machine learning model and tagged data is acquired, the tagged data being tagged with a sample tag value.

Illustratively, a master node retrieves a machine learning model and tagged data from memory, the tagged data tagged with a sample tag value.

Step 520: and calling a machine learning model to classify the labeled data to obtain a sample classification value.

And the master node invokes the machine learning model to perform feature extraction on the labeled data samples to obtain feature vectors, and calculates sample classification values of the labeled data based on the feature vectors.

Step 530: and calling a loss function to adjust model parameters in the machine learning model based on the sample classification value and the sample label value to obtain a pre-training model.

The master node builds a loss function based on the sample classification value, the sample tag value, and model parameters in the machine learning model, adjusts the model parameters to minimize the loss value in the loss function; after training of the at least two tagged data, a pre-trained model is obtained.

In summary, the training method of the pre-training model provided in this embodiment performs model training by using a small amount of labeled data by the master node before issuing the pre-training model, so as to improve the classification capability of the pre-training model, and further provide a set of model parameters with better classification capability when the model parameters are trained in the participating node, so as to improve the training efficiency of the classification model in the participating node.

Taking original data as an original image as an example, performing exemplary illustration on federal learning between a master node and a participating node, as shown in fig. 6, a server (i.e., the master node) trains an initial machine learning model based on a limited marked image (i.e., marked data), and a pre-trained model is obtained; the pre-training model is issued to K user terminals of a user 1, a user 2, a user 3, a user … … and a user K, clients are operated on the K user terminals, iterative training is carried out on the pre-training model based on a label-free image (namely an original image), after model convergence, aggregation calculation is carried out on the adjusted model parameters at a server based on homomorphic encryption or safe multi-party calculation to obtain average weight coefficients (namely average model parameters), and the server issues the average parameters to the clients to update the model, so that an iterative process is completed; k is a positive integer.

Referring to fig. 7, a detailed description of a complete iteration of the method is as follows:

step 610: the server side trains a machine learning model by adopting the tagged image to obtain a CV model.

The server is based on the tagged image { x } _i ,y _i }∈D _s Sample set training initial CV model f _s The CV model comprises at least one of a visual geometry group (Visual Geometry Group, VGG) model, a depth residual neural network (deep Residual Network, resNet) model, and an initial (acceptance) model; loss function L _s The definition is as follows:

wherein N is _s Representing the data quantity of the marked image at the server, H represents the cross entropy loss function, w _s Representing a server CV model f _s I is a positive integer.

Step 620: and the server side sends the CV model to the client side as a pre-training model.

Step 630: and the client receives the pre-training model, and then invokes the pre-training model to perform semi-supervised learning based on the unlabeled image to obtain the adjusted model parameters.

After receiving the pre-training model, the client performs semi-supervised learning based on the unlabeled image (i.e. the original image), and the learning scheme is shown in fig. 8 and is divided into two routes of weak reinforcement learning and strong reinforcement learning, wherein the weak reinforcement learning monitors the strong reinforcement learning by generating a pseudo tag through the pre-training model as shown below.

Weak reinforcement learning (i.e., learning of the first enlarged image): the user terminal uses the local unlabeled image x _j ∈D _c The sample set is weakly enhanced (e.g. rotated) by a (x) _j ) Loading the first augmented image after weak augmentation into a pre-training model to obtain a first predicted valueWherein w is _c Representing an end CV model f _c Weight coefficient (i.e., model parameters). If it isSetting->The corresponding category is a pseudo tag, wherein tau is a prediction threshold set in the pre-training model; otherwise, the conditions are met by adjusting the weak enhancement level or adjusting the model parameters in the pre-training model.

Strong reinforcement learning (i.e., learning of the second augmented image): the user terminal uses the local unlabeled image x _j ∈D _c Performing strong enhancement (e.g. clipping, masking, exposure, etc.) a (x _j ) Loading the second enhanced image after strong enhancement into a pre-training model to obtain a second predicted value f _c (A(x _j )，w _c ) The method comprises the steps of carrying out a first treatment on the surface of the Because the first augmented image and the second augmented image are derived from the same original image, the pseudo tag obtained by corresponding weak reinforcement learning is used for supervising the strong reinforcement learning, thereby updating the weight coefficient w in the local pre-training model _c Obtaining an updated classification model; exemplary training loss function L in user terminal _c The definition is as follows:

wherein N is _c Representing the data quantity of the unlabeled image at the user terminal side; and l represents an indication function for determining a label corresponding to the cross entropy loss function, and j is a positive integer.

Step 640: the client reports the learned adjusted model parameters to the server.

Step 650: the server performs federal learning based on the adjusted model parameters, and transmits the obtained global model parameters to the client.

Model parameters EN [ w ] learned by the user terminal side are calculated based on homomorphic encryption or secure multiparty _c ]Uploading to a server for security aggregation to obtain a global model parameter EN [ w ] _avg ]：

The service end sends EN [ w ] _avg ]Issuing to a client, and decrypting by the client to obtain w _avg And the model parameters are used for updating the local pre-training model, so that a round of iterative calculation is completed.

Step 660: and the client updates the global model parameters into the pre-training model to obtain a classification model.

In summary, in the model training method provided in this embodiment, firstly, a server uses a tagged image to pretrain an initial machine learning model, then the obtained pretraining model is issued to a user terminal, the user terminal uses an untagged image to perform data augmentation, each pair of training data includes a first augmented image with smaller feature difference from an original image and a second augmented image with larger feature difference from the original image, the two augmented images expanded from the original image perform model parameter adjustment on the pretraining model, a first predicted value of the first augmented image is used as a pseudo tag, a predicted loss of a second predicted value of the second augmented image relative to the first predicted value is calculated, and then the pretraining model is adjusted according to the predicted loss, namely, a supervision signal is generated by using the first augmented image to supervise the classification learning of the second augmented image, so that each classification learning of the pretraining model can learn more classification features, thereby realizing faster learning of image classification tasks and effectively accelerating convergence speed. After the user terminal is trained to obtain the adjusted model parameters, the server side calculates global model parameters in a federal learning mode, and the global model parameters are issued to the user terminal again, so that the effect of model classification is ensured, and meanwhile, the personal data safety and privacy of the user terminal are also ensured.

The following are examples of the apparatus of the present application that may be used to perform the method embodiments of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method of the present application.

Referring to fig. 9, a block diagram of a model training apparatus according to an exemplary embodiment of the present application is shown. The model training means may be implemented as all or part of the electronic device of the participating node by software, hardware or a combination of both. The device comprises:

the acquisition module 710 is configured to acquire original data in the participating nodes and a pre-training model, where the pre-training model is obtained by pre-training the master node with labeled data;

the augmentation module 720 is configured to perform augmentation processing on the original data to obtain first augmentation data and second augmentation data, where a feature difference exists between the first augmentation data and the second augmentation data;

the parameter adjusting module 730 is configured to adjust model parameters in the pre-training model based on the first augmentation data and the second augmentation data, and report the adjusted model parameters to the master node;

the updating module 740 is configured to receive global model parameters sent by the master node, update the global model parameters to a pre-training model, and obtain a classification model, where the global model parameters are obtained by performing federal learning calculation by the master node based on the adjusted model parameters reported by at least two participating nodes.

In an alternative embodiment, the first characteristic difference of the first augmented data is smaller than the second characteristic difference of the second augmented data, the first characteristic difference being the characteristic difference between the first augmented data and the original data, the second characteristic difference being the characteristic difference between the second augmented data and the original data; a parameter adjustment module 730 for:

invoking the pre-training model to classify the first augmented data to obtain a first predicted value, and invoking the pre-training model to classify the second augmented data to obtain a second predicted value;

determining the first predictor as a pseudo tag in response to the first predictor meeting a value condition;

and based on the pseudo tag and the second predicted value, calling a loss function to adjust model parameters in the pre-training model, and reporting the adjusted model parameters to the master node.

In an alternative embodiment, the parameter tuning module 730 is configured to:

inputting the pseudo tag and the second predicted value into a loss function, and adjusting model parameters of the pre-training model to obtain a minimum loss value of the loss function;

and responding to the minimum loss value being smaller than the loss threshold value, and reporting the adjusted model parameters corresponding to the minimum loss value to the master node.

In an alternative embodiment, the parameter tuning module 730 is configured to:

And in response to the minimum loss value being greater than the loss threshold, updating the adjusted model parameters to the pre-training model, and continuing to adjust the model parameters in the pre-training model.

In an alternative embodiment, the raw data comprises raw images; an augmentation module 720 for:

performing augmentation treatment on the original image by adopting a first type of treatment mode to obtain first augmentation data; the first type of processing means includes at least one of rotation and scaling;

performing augmentation treatment on the original image by adopting a second type of treatment mode to obtain second augmentation data; the second type of processing means includes at least one of clipping, exposure, and occlusion.

In an alternative embodiment, the pre-training model is built based on a computer vision model; a parameter adjustment module 730 for:

invoking a computer vision model to perform classification calculation on the first augmentation data to obtain a first predicted value;

and calling a computer vision model to perform classification calculation on the second augmentation data to obtain a second predicted value.

In summary, the model training device provided in this embodiment performs data augmentation by using label-free original data, where each pair of training data includes a first augmentation data with a smaller feature difference from the original data and a second augmentation data with a larger feature difference from the original data, the model parameters of the pre-training model are adjusted by using the two augmentation data expanded by the original data, and a supervision signal is generated by using the augmentation data with a smaller feature difference from the original data, so that classification learning of another augmentation data is supervised, so that more classification features can be learned for each classification learning of the pre-training model, thereby realizing faster learning of classification tasks, effectively accelerating convergence speed, and reducing training iteration times.

Referring to fig. 10, a block diagram of a model training apparatus according to an exemplary embodiment of the present application is shown. The model training means may be implemented as all or part of the electronic device that becomes the master node by software, hardware or a combination of both. The device comprises:

the receiving module 810 is configured to receive at least two sets of adjusted model parameters sent by at least two participating nodes, where the adjusted model parameters are obtained by adjusting model parameters of a pre-training model by the participating nodes based on label-free data, the label-free data includes first augmentation data and second augmentation data obtained by performing augmentation processing based on the same original data, feature differences exist between the first augmentation data and the second augmentation data, and the pre-training model is obtained by pre-training the main node with label data;

The learning module 820 is configured to perform federal learning on at least two sets of adjusted model parameters to obtain global model parameters;

and a sending module 830, configured to send the global model parameters to at least two participating nodes, so as to update the global model parameters to the pre-training model in the at least two participating nodes, and obtain a classification model.

In an alternative embodiment, learning module 820 is configured to:

and calculating the average value of at least two groups of adjusted model parameters to obtain global model parameters.

In an alternative embodiment, the apparatus further comprises: a training module 840;

the training module 840 is configured to obtain a machine learning model and labeled data, where the labeled data is labeled with a sample label value; calling a machine learning model to classify the labeled data to obtain a sample classification value; and calling a loss function to adjust model parameters in the machine learning model based on the sample classification value and the sample label value to obtain a pre-training model.

In summary, the model training device provided in this embodiment issues the pre-training model to the participating node, and the participating node adopts the label-free original data to perform data augmentation, where each pair of training data includes a first augmentation data with smaller feature differences from the original data and a second augmentation data with larger feature differences from the original data, the pre-training model is adjusted by two augmentation data expanded by the original data, and a supervision signal is generated by using the augmentation data with smaller feature differences from the original data to supervise the classification learning of the other augmentation data, so that each classification learning of the pre-training model can learn more classification features, thereby realizing faster learning of classification tasks, effectively accelerating convergence speed, and reducing training iteration times.

Secondly, the user side terminal can be used as a participating node, and the unlabeled original data is adopted to perform federal learning so as to liberate manpower while guaranteeing the personalized requirements of the user and the personal data safety, namely, the original data in the user side terminal is not required to be labeled manually, so that the waste of manpower resources due to manual labeling of the original data is avoided, and the training cost of a pre-training model is reduced; and the device integrates the adjusted model parameters of at least two participating nodes to obtain global model parameters, so that the problem that the classification accuracy of the classification model obtained by directly training the public data set is not high due to the fact that the public data set is too few or not representative is avoided.

Embodiments of the present application also provide a computer readable medium storing at least one instruction that is loaded and executed by the processor to implement the model training method described in the above embodiments.

It should be noted that: in the model training apparatus provided in the above embodiment, when executing the model training method, only the division of the above functional modules is used for illustration, in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the model training device and the model training method provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above embodiments are merely exemplary embodiments of the present application and are not intended to limit the present application, and any modifications, equivalent substitutions, improvements, etc. that fall within the spirit and principles of the present application should be included in the scope of the present application.

Claims

1. A model training method for use in a federal learning participating node, the method comprising:

the method comprises the steps of obtaining original data in the participating nodes and a pre-training model, wherein the pre-training model is obtained by pre-training a main node by adopting labeled data, and the original data comprises images or labels or articles or news;

2. The method of claim 1, wherein a first characteristic difference of the first augmented data is less than a second characteristic difference of the second augmented data, the first characteristic difference being a characteristic difference between the first augmented data and the original data, the second characteristic difference being a characteristic difference between the second augmented data and the original data;

the step of adjusting model parameters in the pre-training model based on the first augmentation data and the second augmentation data, and reporting the adjusted model parameters to the master node, includes:

Determining the first predicted value as a pseudo tag in response to the first predicted value meeting a value condition;

and calling a loss function to adjust model parameters in the pre-training model based on the pseudo tag and the second predicted value, and reporting the adjusted model parameters to the master node.

3. The method of claim 2, wherein the calling a loss function to adjust model parameters in the pre-trained model based on the pseudo tag and the second predicted value, and reporting the adjusted model parameters to the master node comprises:

inputting the pseudo tag and the second predicted value into the loss function, and adjusting model parameters of the pre-training model to obtain a minimum loss value of the loss function;

and responding to the minimum loss value being smaller than a loss threshold value, and reporting the adjusted model parameters corresponding to the minimum loss value to the master node.

4. A method according to claim 3, characterized in that the method further comprises:

5. The method of any one of claims 1 to 4, wherein the raw data comprises raw images;

the step of performing augmentation processing on the original data to obtain first augmentation data includes:

performing augmentation processing on the original image by adopting a first type of processing mode to obtain first augmentation data; the first type of processing mode comprises at least one of rotation and scaling;

the step of performing augmentation processing on the original data to obtain second augmentation data includes:

performing augmentation processing on the original image by adopting a second type of processing mode to obtain second augmentation data; the second type of processing means includes at least one of clipping, exposure and occlusion.

6. The method of any one of claims 2 to 4, wherein the pre-trained model is constructed based on a computer vision model;

the invoking the pre-training model to classify the first augmented data to obtain a first predicted value includes:

invoking the computer vision model to perform classification calculation on the first augmentation data to obtain the first predicted value;

the invoking the pre-training model to classify the second augmented data to obtain a second predicted value includes:

And calling the computer vision model to perform classification calculation on the second augmentation data to obtain the second predicted value.

7. A model training method for use in a master node of federal learning, the method comprising:

receiving at least two groups of adjusted model parameters sent by at least two participating nodes, wherein the adjusted model parameters are obtained by adjusting model parameters of a pre-training model by the participating nodes based on label-free data, the label-free data comprise first augmentation data and second augmentation data which are obtained by carrying out augmentation processing based on the same original data, characteristic differences exist between the first augmentation data and the second augmentation data, the pre-training model is obtained by pre-training by the main node by adopting label data, and the original data comprise images, labels, articles or news;

8. The method of claim 7, wherein federally learning the at least two sets of adjusted model parameters to obtain global model parameters comprises:

and calculating the average value of the at least two groups of adjusted model parameters to obtain the global model parameters.

9. The method of claim 7, wherein the training process of the pre-training model comprises:

acquiring a machine learning model and the tagged data, wherein the tagged data is marked with a sample tag value;

invoking the machine learning model to classify the tagged data to obtain a sample classification value;

and calling a loss function to adjust model parameters in the machine learning model based on the sample classification value and the sample label value to obtain the pre-training model.

10. A model training apparatus, the apparatus being located at a federally learned participating node, the apparatus comprising:

the acquisition module is used for acquiring original data in the participating nodes and a pre-training model, wherein the pre-training model is obtained by pre-training a main node by adopting labeled data, and the original data comprises images or labels or articles or news;

11. A model training apparatus, the apparatus being located at a master node of federal learning, the apparatus comprising:

the system comprises a receiving module, a pre-training module and a processing module, wherein the receiving module is used for receiving at least two groups of adjusted model parameters sent by at least two participating nodes, the adjusted model parameters are obtained by the participating nodes by adjusting model parameters of a pre-training model based on label-free data, the label-free data comprise first augmentation data and second augmentation data which are obtained by carrying out augmentation processing based on the same original data, characteristic differences exist between the first augmentation data and the second augmentation data, the pre-training model is obtained by the main node by adopting label data to be pre-trained, and the original data comprise images, labels, articles or news;

12. An electronic device comprising a processor, and a memory coupled to the processor, and program instructions stored on the memory, which when executed by the processor implement the model training method of any one of claims 1 to 6, or the model training method of any one of claims 7 to 9.

13. A computer readable storage medium having stored therein program instructions, which when executed by a processor, implement the model training method of any one of claims 1 to 6, or the model training method of any one of claims 7 to 9.