WO2023097602A1 - 协同训练数据属性的推断方法、装置、设备及存储介质 - Google Patents

协同训练数据属性的推断方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2023097602A1
WO2023097602A1 PCT/CN2021/135055 CN2021135055W WO2023097602A1 WO 2023097602 A1 WO2023097602 A1 WO 2023097602A1 CN 2021135055 W CN2021135055 W CN 2021135055W WO 2023097602 A1 WO2023097602 A1 WO 2023097602A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
gradient
training
data
attribute
Prior art date
Application number
PCT/CN2021/135055
Other languages
English (en)
French (fr)
Inventor
王艺
杨灏鑫
李斌
Original Assignee
东莞理工学院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东莞理工学院 filed Critical 东莞理工学院
Priority to PCT/CN2021/135055 priority Critical patent/WO2023097602A1/zh
Priority to CN202180004174.3A priority patent/CN114287009A/zh
Publication of WO2023097602A1 publication Critical patent/WO2023097602A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Definitions

  • the present application relates to the technical field of machine learning, and in particular to a method, device, computing device, and storage medium for inferring attributes of collaborative training data.
  • Centralized training collects the data required for training by a central server and then conducts centralized training; distributed training (also called collaborative training) does not need to collect data, but uses the local data of distributed training participants on their local devices ( (hereinafter referred to as participating devices) to train the model, and then send the training gradient or model parameter information to the central server for aggregation, so as to achieve the purpose of distributed training of the same model.
  • the data distribution of training participants is often unbalanced, which leads to a certain deviation in the locally trained model, which leads to a decline in the performance of the collaboratively trained model.
  • the application scenario of the model needs to be similar to the data distribution of the model in order to maximize the performance of the model.
  • Statistical properties of the training data can also deploy the model to more applicable scenarios.
  • the situation of iterative update does not conform to the general situation of collaborative training; other attribute inference methods based on gradient update cannot obtain the data attributes of a single training sample in the entire batch, and the inference is less effective.
  • the purpose of the embodiments of the present application is to provide a method, device, computing device and storage medium for inferring attributes of collaborative training data, so as to solve the problem of reconstructing the sample data of collaborative training in the prior art, so that each The technical problem of attribute inference on individual samples, and overcomes the current limitation that inference can only be performed on a single or extremely small batch of updated samples of participating devices.
  • the embodiment of the present application provides a method for inferring attributes of collaborative training data, which is applied to the central server of model distributed collaborative training.
  • the method includes: distributing the pre-trained shared model to the distributed collaborative training participating devices, so that the participating devices use sample data to train the shared model; obtain the first gradient uploaded by the participating devices, and the first gradient is the model loss calculated when the participating devices perform model training With respect to the gradient of the model parameters; based on the shared model, reconstructing the depth features of the sample data according to the first gradient; using the shared model to extract the depth features of the auxiliary data with attribute labels, and training the attribute inference model, wherein, The shared model is updated through several iterations of collaborative training; attribute inference is performed on the reconstructed deep features according to the trained attribute inference model.
  • the reconstructing the depth features of the sample data according to the first gradient based on the shared model includes: randomly initializing the first depth features to be optimized; inputting the first depth features into the Describe the shared model to obtain the second gradient;
  • the first depth feature is optimized by minimizing the difference between the first gradient and the second gradient.
  • the shared model is a convolutional neural network model
  • the shared model includes a feature extractor and a classifier f c
  • the feature extractor includes (n+1) convolutional blocks
  • the will Inputting the first depth feature into the shared model and obtaining a second gradient includes: inputting the first depth feature into the last convolution block f ++ of the plurality of convolution blocks of the feature extractor 1 , and then input the feature E(X) output by the convolution block f n+1 into the classifier f c ; respectively calculate the gradient of the parameters of the loss function corresponding to f n+1 The parameters of the loss function corresponding to f c
  • the second gradient includes the gradient and the gradient
  • the first depth features are data pairs said is the depth feature to be reconstructed, is a pseudo-label.
  • the gradient and the gradient The calculation formula is: in, is the cross-entropy loss function.
  • the optimization of the first depth feature includes: according to the following formula Make an update: in, for the optimized for In minimizing the objective function After the value, ⁇ is the learning rate.
  • the hyperparameter ⁇ and the learning rate ⁇ are set to the same value.
  • the first gradient is the first sample set randomly sampled by the participating device for model training, and the corresponding backpropagation model loss of the first sample set is calculated relative to the model parameters
  • the reconstructing the depth features of the sample data according to the first gradient based on the shared model includes: reconstructing the depth features of the first sample set according to the first gradient based on the shared model.
  • the sample data and the auxiliary data are pictures or voices.
  • the embodiment of the present application also provides a device for inferring attributes of collaborative training data, which is applied to the central server of model distributed collaborative training.
  • the device includes: a distribution module for distributing pre-trained shared models to distributed collaborative training Participating devices, so that the participating devices use the batch data of local samples to iteratively update the shared model for training; the obtaining module is used to obtain the first gradient uploaded by the participating devices, and the first gradient is the The gradient of the model loss relative to the model parameters calculated by the participating devices during model training; the reconstruction module is used to reconstruct the depth features of the sample data according to the first gradient based on the shared model; the training module is used to adopt the current The shared model extracts the deep features of the auxiliary data with attribute labels, and trains the attribute inference model, wherein, the shared model is obtained through several iterations of collaborative training; the inference module is used to infer the model according to the completed attributes of the training and according to The reconstructed deep features perform data attribute inference on a single training sample local to the participating device.
  • the embodiment of the present application also provides a computing device, including a processor, a memory, a communication interface, and a communication bus, and the processor, the memory, and the communication interface complete mutual communication through the communication bus; the The memory is used to store at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the method for inferring attributes of the cooperative training data as described above.
  • the embodiment of the present application also provides a computer-readable storage medium, at least one executable instruction is stored in the storage medium, and the executable instruction causes the processor to execute the method corresponding to the method for inferring attributes of collaborative training data as described above. operate.
  • the depth features of the sample data are reconstructed according to the gradient and the shared model, and the depth features of the auxiliary data with attribute labels are trained. Attribute inference model. Finally, attribute inference is performed on the reconstructed deep features according to the trained attribute inference model.
  • the redundant features contained in the reconstructed deep features can be used to perform additional attribute inference. It is possible to infer each The relevant attributes of the sample data are not affected by the batch size of the sample data updated each time the participating devices are trained, especially in the case of large batches of sample data, the performance is stable, and the attributes of a single training sample can be inferred.
  • Fig. 1 is a gender feature map of the sample data of the face classification model
  • FIG. 2 is a schematic diagram of an application scenario of an embodiment of the present application
  • FIG. 3 is a flowchart of a method for inferring attributes of collaborative training data provided by an embodiment of the present application
  • Fig. 4 is a schematic structural diagram of a shared model
  • Fig. 5 is a statistical diagram of the success rate of reconstructing depth features under different sample data batch sizes in the present application and related technology 1;
  • FIG. 6 is a structural diagram of a device for inferring attributes of collaborative training data provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • Centralized training collects the data required for training by a central server and then conducts centralized training; distributed training (also called collaborative training or collaborative learning) means that multiple participants use their own local data to jointly train the same machine learning model , in this process, the participants do not need to collect data or exchange their own local data, but use the participants' local data to train the model on their local devices, and then send the training gradient or model parameter information to Aggregation by the central server is equivalent to exchanging gradient information for model parameter update between participants, so as to achieve the purpose of distributed training of the same model.
  • Collaborative training ensures data privacy and high data security because participants do not need to upload local data.
  • the data distribution of training participants is often unbalanced.
  • the ratio of male to female in the data of different training participants may be different, resulting in a certain difference in the locally trained model. Bias, which leads to the degradation of the model performance of the co-training.
  • Counting the ratio of men to women in each training participant's local data can add constraints to the local model based on their data distribution, thereby improving model performance.
  • the application scenario of the model needs to be similar to the data distribution of the model in order to maximize the performance of the model.
  • Statistical properties of the training data can also deploy the model to more applicable scenarios. For example, in a co-trained face recognition model, if most of the participants' data are young people's data, it is not appropriate to deploy it to an application where most of the application scenarios are elderly people. After statistics on the attributes of the training data, the model can be deployed to a more suitable scene or deployed after fine-tuning the model.
  • the features extracted from the models trained by different learning tasks have certain generalization, that is to say, the features extracted from task one can be applied to the learning of task two.
  • the deep features In order to improve the performance of the model and deploy the model to more applicable scenarios, it is necessary to infer the distribution and related attributes of the training data during collaborative training. Since the deep features not only encode information related to the main task of co-training, but also contain other additional information, the deep features can be used to make relevant inferences about the data.
  • the attributes of the data are inferred by using the intermediate layer features obtained after the forward propagation of the model or the probability of the final output.
  • This method uses the data with attribute labels, the features obtained through the forward propagation of the model or the probability of the model output, and then uses this information to train the attribute inference classifier to infer the relevant attributes of the data.
  • the application scenario of this data attribute inference method is more machine learning as a service (Machine Learning As a Service). In this scenario, more data is used to query a trained model, and it does not involve the use of participant data to update model parameters or redeploy the model. In addition, generally speaking, such methods usually need to modify the training process of the model, so that the intermediate layer output or the final output code of the model contains information related to data attributes.
  • the gradient of the model during backpropagation is directly used to infer data attributes.
  • This method inputs the data with data labels into the model, then calculates the loss gradient corresponding to the data, and directly uses the gradient information to train the attribute inference classifier to infer the relevant attributes of the data.
  • most of the training methods used are mini-batch training, that is, in a training process, multiple data are input, and then the average gradient corresponding to the multiple data is calculated.
  • the gradients distributed in collaborative training are weighted averages of the gradients of multiple data. Therefore, such methods can only judge the average attribute of data in an entire batch, but cannot obtain the attribute of a single data point.
  • the gradient uploaded by the sub-model during collaborative training is used to reconstruct the original training data.
  • This method optimizes the randomly initialized training data by inputting randomly initialized training data into the model, calculates the loss gradient corresponding to the data, and then minimizes the gap between the loss gradient and the uploaded gradient, thereby reconstructing the original training data , and use the training data for attribute inference.
  • it will be greatly affected by the model structure and the batch size of the training data, which will affect the effect of reconstructing the data, and ultimately lead to inaccurate use of the reconstructed data for attribute inference.
  • the embodiment of the present application proposes a solution that uses the gradient information distributed during the collaborative training process to reconstruct the depth features of the training data, and uses the reconstructed depth features to infer additional information on the data, so as to infer the distribution of the training data And related attributes to adjust the parameter settings in the training process and better deploy the trained model to the actual scene.
  • the above-mentioned related technology 2 uses the gradient information after weighted average to infer data attributes, and can only infer the average attribute in the batch data, but cannot be accurate to the attribute of a specific data.
  • the depth feature corresponding to each data point is reconstructed by using the gradient, and then the attribute of the data is inferred by using the reconstructed depth feature, so that the attribute of a specific data point can be accurately inferred.
  • the above related technique 3 will be greatly affected by the model structure and the batch size of the training data, which will affect the effect of reconstructing the data, and ultimately lead to inaccurate use of the reconstructed data for attribute inference.
  • the model structure used by the embodiment of the present application is only some sub-blocks of the model, the model structure involved is simpler, and the task of reconstructing the depth features is simpler than the task of reconstructing the original data.
  • the provided by the embodiment of the present application The reconstruction method reconstructs deep features, which can avoid the influence of model structure, and has outstanding performance and stable performance under large batches of sample data.
  • Figure 1 is a gender feature map of the sample data of the face classification model.
  • the t-distributed stochastic neighbor embedding (t-sne) algorithm maps high-dimensional features to two dimensions, and then normalizes the coordinates to (0, 1).
  • the abscissa and The ordinate is the vertical after normalization, with no specific meaning.
  • the main task of training an ordinary face classification model is to distinguish people's identity information, while gender information is not provided during model training.
  • gender information is not provided during model training.
  • certain data attribute inference can be performed by using deep features, which verifies the possibility of using the features of the model for data attribute inference.
  • the application scenario of the embodiment of the present application is in the process of collaborative training in deep learning.
  • the goal of collaborative training is to use the local data of each collaborative training participant to jointly train a model, and the training data does not need to leave the participant's local area.
  • the deep learning model can be various neural network models, such as a convolutional neural network (Convolutional Neural Network, CNN) model. Deep learning models can be used for data processing, such as feature extraction and classification in image processing. Further, it can be used for face recognition, object recognition, etc. Among them, the object recognition can be animals, plants, objects and so on.
  • FIG. 2 is a schematic diagram of an application scenario of an embodiment of the present application.
  • the central server distributes the shared model that requires collaborative training to the participating devices (also called training devices) of the collaborative training participants, and the participating devices use the locally stored training data for model training. Participating devices send the training gradient or model parameter information to the central server for aggregation, and finally complete the model training.
  • the shared model is generally pre-trained on the public data set on the server side.
  • the public data set is generally considered to have different samples but similar data distribution with all participating devices of the collaborative training.
  • FIG. 3 is a flowchart of a method for inferring attributes of collaborative training data provided by an embodiment of the present application. This method is applied to the central server of model distributed collaborative training. As shown in the figure, the method includes the following steps:
  • S11 Distribute the pre-trained shared model to the participating devices of the distributed collaborative training, so that the participating devices use the batch data of local samples to train and iteratively update the shared model;
  • S15 Perform data attribute inference on a single local training sample of the participating device according to the attribute inference model after training and the reconstructed deep features.
  • the depth features of the sample data are reconstructed according to the gradient and the shared model, and the depth features of the auxiliary data with attribute labels are trained. Attribute inference model. Finally, attribute inference is performed on the reconstructed deep features according to the trained attribute inference model.
  • the redundant features contained in the reconstructed deep features can be used to perform additional attribute inference. It is possible to infer each
  • the relevant attributes of the sample data are not affected by the batch size of the sample data updated each time the participating devices are trained, especially in the case of large batches of sample data, the performance is stable, and data attributes can be inferred for a single local training sample of the participating devices .
  • the central server distributes the first shared model (that is, the initialization model) to all participating devices.
  • Each participating device randomly selects a batch of sample data from locally stored sample data for model training.
  • the participating devices will send the updated model parameters to the central server.
  • an optimized second shared model is obtained.
  • the central server continues to distribute the second shared model to all participating devices.
  • the process of model training continues by participating devices. In the subsequent training, participating devices will randomly select a batch of new local sample data for training each time. After multiple iterations of training, a trained model that has converged is finally obtained.
  • the central server initializes the model to be trained, and distributes the initialized shared model to participating devices of distributed collaborative training.
  • Each participating device locally stores sample data for training the model, and the sample data stored on each participating device is usually different and unbalanced.
  • the model to be trained can be a picture recognition model or a speech recognition model, and the data used for training is pictures or speech.
  • the central server distributes the updated shared model to participating devices.
  • Fig. 4 is a schematic structural diagram of a shared model.
  • the shared model is a convolutional neural network model for image recognition, and the shared model includes a feature extractor E and a classifier C.
  • the feature extractor E includes (n+1) convolutional blocks, denoted as f 1 , f 2 ,..., f n , f n+1 , and the feature extractor E is used to extract the feature E(X ).
  • the classifier C includes a convolution block f c , which can be a binary classification model, and can be trained according to the deep features extracted from the data of the central server, and then predict the attributes of the reconstructed deep features, that is, for the extracted features E (X) Identify according to the classifier built for the purpose of the model.
  • a convolution block f c can be a binary classification model, and can be trained according to the deep features extracted from the data of the central server, and then predict the attributes of the reconstructed deep features, that is, for the extracted features E (X) Identify according to the classifier built for the purpose of the model.
  • Input the sample data X (picture) to the feature extractor E and obtain the depth feature E(X) corresponding to the sample data X through the convolution operation of (n+1) convolution blocks.
  • the depth feature E(X) is input to the classifier C to obtain the final picture recognition result.
  • Participating devices receive the shared model sent by the central server, and randomly sample a batch of data from the locally stored data, perform training locally, and calculate the backpropagation loss gradient g corresponding to the randomly sampled data, and share g to the center
  • the server is used to train the model collaboratively.
  • the loss gradient g includes g c and g n+1 , where g c is the gradient of the loss function calculated by the participating device corresponding to the parameter of f c , and g n+1 is the loss function calculated by the participating device corresponding to The gradient of the parameters of f n+1 , both of which are true gradients. It provides useful information for subsequent attribute inference, especially for training and updating of large batches of sample data.
  • the first gradient is the first sample set randomly sampled by the participating device for model training, and the gradient of the backpropagation model loss corresponding to the first sample set relative to the model parameters is calculated.
  • the data in the first sample set may be data in small batches, which can improve calculation speed and efficiency.
  • the training data of participating devices is always kept locally and not shared with other participating devices or servers, so as to protect the privacy and security of training data.
  • a mini-batch of data contains multiple samples, and the number of samples depends on the size of the batch.
  • the content contained in each sample is related to the target model of co-training. For example, the purpose of co-training is to jointly train the face recognition model. Then the content contained in each training sample is a face picture and corresponds to a label. The content of the labels depends on the purpose of co-training the model.
  • S13 may further include:
  • S133 Minimize the gap between the first gradient and the second gradient, and optimize the first depth feature.
  • the shared model may be a convolutional neural network model, the shared model includes a feature extractor and a classifier f c , and the feature extractor includes (n+1) convolutional blocks.
  • the first deep feature is the data pair It represents a pair of data pairs that can be optimized, is the depth feature to be reconstructed, is a pseudo-label. Since the real label of the sample data (raw data) is not known, it is necessary to provide an optimized pseudo-label for calculating the cross-entropy loss. Initially obtained by random initialization, after optimization, the finally obtained reconstructed with the original resemblance, is the depth feature.
  • the information to be used in the embodiment of the present application is the information of the last convolution block f n+1 and the last classifier f c .
  • the data pair Input the submodel consisting of fn +1 and fc , and calculate the gradient of the parameters of the loss function corresponding to fn +1 and fc , respectively.
  • S132 includes:
  • the second gradient includes the gradient and gradient
  • the gradient and gradient The calculation formula is:
  • Minimizing the gap between the first gradient and the second gradient in S133 may further include:
  • the objective function To minimize the gap between the first gradient and the second gradient, the objective function is:
  • is a hyperparameter
  • g n+1 and g c are the first gradient uploaded by participating devices, and Both are distance functions to measure the difference between two gradients, measuring the two gradients g and The distance function d of the difference between is:
  • ⁇ 2 Var(g)
  • Var(g) is the variance of the gradient g.
  • the distance function d involves two calculations, the first of which is the cosine similarity
  • the second term is the Gaussian kernel function
  • Optimizing the first depth feature in S133 may further include:
  • is the learning rate
  • the hyperparameter ⁇ and the learning rate ⁇ can be set to the same value. For example, set the hyperparameter ⁇ to 0.1 and set the learning rate ⁇ to 0.1. It can be understood that the values of hyperparameters and learning rate can generally be adjusted according to experience, and can also be set to other values.
  • the number of times of optimization can be set according to experience, for example, it is set to 5000 times. Of course, it can also be set to a higher number, and the optimization result will be closer to the real value, but it will increase the time cost. If it is set to a smaller number of times, the optimization result may not be so close to the real value, but the time cost will be reduced.
  • the depth features of the reconstructed sample data are obtained
  • the reconstructed deep features can then be utilized Perform attribute inference on sample data X.
  • S11-S13 does not need to change the collaborative training process, and the collaborative training can be performed normally. And S11-S13 can reconstruct the depth features of each training sample data of each participating device, so that subsequent attribute inference can be performed for all sample data used for training.
  • the central server stores auxiliary data with attribute labels.
  • an attribute inference model also called an attribute classification model, whose function is to identify or classify the attributes of the data
  • An attribute inference model is then trained using the extracted deep features of the auxiliary data with attribute labels for inferring attributes of sample data in participating devices.
  • the central server inputs the deep features of the reconstructed sample data into the attribute inference model, so as to realize the inference of the attributes of the collaborative training data.
  • attributes of all sample data that participate in model training locally on participating devices can be inferred.
  • the in-depth feature reconstruction is carried out through the gradient uploaded by the participating devices of the collaborative training, and the attributes of the training data of the participating devices of the distributed collaborative training are inferred through the attribute inference model, so as to realize the attribute of the collaborative training data. infer.
  • the depth features corresponding to each data can be accurately reconstructed without being affected by the size of the small batch; compared with reconstruction The way of input samples, the amount of reconstructed data is small, and the efficiency is higher. For example, when the batch size of sample data reaches 8 in the way of reconstructing input samples, the reconstruction results can hardly be used for attribute inference.
  • the embodiment of the present application proposes a method of reconstructing depth features using gradients, which can reconstruct the depth features corresponding to each training sample and then use them to infer data-related attributes.
  • Each data-related attribute in mini-batch training can be inferred, and the inference accuracy rate is improved.
  • some existing methods can only infer whether a certain attribute exists in batch sample data, but cannot know which specific sample the attribute belongs to, or can only perform attribute inference on one quantity of sample data at a time.
  • FIG. 6 is a structural diagram of an apparatus for inferring attributes of collaborative training data provided by an embodiment of the present application.
  • the data attribute inference device is applied to the central server of model distributed collaborative training, and the device 500 includes a distribution module 501 , an acquisition module 502 , a reconstruction module 503 , a training module 504 and an inference module 505 .
  • the device 500 includes a distribution module 501 , an acquisition module 502 , a reconstruction module 503 , a training module 504 and an inference module 505 . in:
  • the distribution module 501 is used to distribute the pre-trained shared model to the participating devices of the distributed collaborative training, so that the participating devices use the batch data of local samples to perform training iteration updates on the shared model;
  • An acquisition module 502 configured to acquire the first gradient uploaded by the participating device, where the first gradient is the gradient of the model loss calculated by the participating device relative to the model parameters when performing model training;
  • a reconstruction module 503, configured to reconstruct the depth features of the sample data according to the first gradient based on the shared model
  • the training module 504 is used to extract the depth features of the auxiliary data with attribute labels by using the current shared model, and train the attribute inference model, wherein the shared model is updated through several iterations of collaborative training;
  • the inference module 505 is configured to perform data attribute inference on a single local training sample of a participating device according to the trained attribute inference model and the reconstructed deep features.
  • FIG. 7 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • the computing device 600 includes a processor 601 , a memory 602 , a communication interface 603 and a communication bus 604 , and the processor 601 , the memory 602 and the communication interface 603 communicate with each other through the communication bus 604 .
  • the memory 602 is a non-volatile computer-readable storage medium that can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules.
  • the memory 602 is used to store at least one executable instruction, and the executable instruction enables the processor 601 to perform operations corresponding to the above method for inferring attributes of collaborative training data.
  • the embodiment of the present application also provides a computer-readable storage medium, in which at least one executable instruction is stored, and the executable instruction causes the processor to execute the operation corresponding to the above method for inferring attributes of collaborative training data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

一种协同训练数据属性的推断方法、装置、计算设备及存储介质。方法包括:将服务器预训练的共享模型分发给分布式协同训练的参与设备,以使参与设备采用本地样本的批量数据对所述共享模型进行训练迭代更新(S11);获取所述参与设备上传的第一梯度,所述第一梯度为所述参与设备进行模型训练时计算的模型损失相对于模型参数的梯度(S12);基于更新后的共享模型,根据所述第一梯度重建所述样本数据的深度特征(S13);采用当前共享模型提取带有属性标签的辅助数据的深度特征,训练属性推断模型,其中,所述共享模型是经过协同训练若干次迭代更新得到的(S14);根据训练完成的属性推断模型以及根据重建的所述深度特征对参与设备本地的单个训练样本进行数据属性推断(S15)。

Description

协同训练数据属性的推断方法、装置、设备及存储介质 技术领域
本申请涉及机器学习技术领域,特别涉及一种协同训练数据属性的推断方法、装置、计算设备及存储介质。
背景技术
随着硬件设备的飞速发展以及大数据的广泛应用,人工智能领域受到人们的广泛关注。其中,深度学习作为一种重要的数据分析工具,广泛应用于生物特征识别、汽车自动驾驶、机器视觉等多个应用领域。在深度学习训练的过程中,包括中心式训练以及分布式训练两种方式。中心式训练由一个中心服务器收集训练所要求的数据然后进行集中训练;分布式训练(也可以称为协同训练)不需要收集数据,而是利用分布式训练参与者的本地数据在其本地设备(以下称为参与设备)上训练模型,然后将训练的梯度或者模型的参数信息发送给中心服务器进行聚合,以此来达到分布式训练同一个模型的目的。
在协同训练的过程中,训练参与者的数据分布往往不平衡,导致本地训练的模型有一定的偏差,从而导致协同训练的模型性能下降。此外,在深度学习中,模型的应用场景需要与模型的数据分布相似才能最大化模型的性能。统计训练数据的属性也可以将模型部署到更加适用的场景中。现有技术中,一般需要对协同训练的更新样本数据进行重建,才可以对参与设备本地的每个单独样本进行属性推断,且其技术方法仅适用于参与设备使用单个或极小批量样本数据进行迭代更新的情况,不符合协同训练的一般情况;其他基于梯度更新的属性推断技术方法则无法获得整个批量中单个训练样本的数据属性,推断有效性较差。
发明内容
本申请实施方式的目的在于提供一种协同训练数据属性的推断方法、装置、计算设备及存储介质,以解决现有技术中需要对协同训练的样本数据进行重建,才可以对训练数据中每个单独样本进行属性推断的技术问题,并且克服了目前只能在单个或极小批量的参与设备更新样本上进行推断的限制。
为解决上述技术问题,本申请实施例提供了一种协同训练数据属性的推断方法,应用于模型分布式协同训练的中心服务器,所述方法包括:将预训练的共享模型分发给分布式协同训练的参与设备,以使所述参与设备采用样本数据对所述共享模型进行训练;获取所述参与设备上传的第一梯度,所述第一梯度为所述参与设备进行模型训练时计算的模型损失相对于模型参数的梯度;基于所述共享模型,根据所述第一梯度重建所述样本数据的深度特征;采用共享模型提取带有属性标签的辅助数据的深度特征,训练属性推断模型,其中,所述共享模型是经过协同训练若干次迭代更新得到的;根据训练完成的属性推断模型对重建的所述深度特征进行属性推断。
在一些实施例中,所述基于所述共享模型,根据所述第一梯度重建所述样本数据的深度特征,包括:随机初始化待优化的第一深度特征;将所述第一深度特征输入所述共享模型,获取第二梯度;
最小化所述第一梯度和所述第二梯度之间的差距,对所述第一深度特征进行优化。
在一些实施例中,所述共享模型为卷积神经网络模型,所述共享模型包括特征提取器和分类器f c,所述特征提取器包括(n+1)个卷积块;所述将所述第一深度特征输入所述共享模型,获取第二梯度,包括:将所述第一深度特征输入所述特征提取器的所述多个卷积块中的最后一个卷积块f ++1,再将所述卷积块f n+1输出的特征E(X)输入所述分类器f c;分别计算损失函数对应于f n+1的参数的梯度
Figure PCTCN2021135055-appb-000001
与所述损失函数对应于f c的参数
Figure PCTCN2021135055-appb-000002
其中,所述第二梯度包括所述梯度
Figure PCTCN2021135055-appb-000003
和所述梯度
Figure PCTCN2021135055-appb-000004
在一些实施例中,所述第一深度特征为数据对
Figure PCTCN2021135055-appb-000005
所述
Figure PCTCN2021135055-appb-000006
是欲重建的深度特征,
Figure PCTCN2021135055-appb-000007
是伪标签。
在一些实施例中,所述梯度
Figure PCTCN2021135055-appb-000008
和所述梯度
Figure PCTCN2021135055-appb-000009
的计算公式为:
Figure PCTCN2021135055-appb-000010
Figure PCTCN2021135055-appb-000011
其中,
Figure PCTCN2021135055-appb-000012
是交叉熵损失函数。
在一些实施例中,所述最小化所述第一梯度和所述第二梯度之间的差距,包括:最小化目标函数
Figure PCTCN2021135055-appb-000013
以最小化所述第一梯度和所述第二梯度之间的差距,所述目标函数为:
Figure PCTCN2021135055-appb-000014
其中,λ为超参数, g n+1和g c为所述参与设备上传的第一梯度,
Figure PCTCN2021135055-appb-000015
Figure PCTCN2021135055-appb-000016
均为衡量两个梯度之间差别的距离函数,衡量两个梯度g与
Figure PCTCN2021135055-appb-000017
之间差别的距离函数d为:
Figure PCTCN2021135055-appb-000018
其中,σ 2=Var(g),Var(g)为梯度g的方差。
在一些实施例中,所述对所述第一深度特征进行优化,包括:根据如下公式对
Figure PCTCN2021135055-appb-000019
进行更新:
Figure PCTCN2021135055-appb-000020
其中,
Figure PCTCN2021135055-appb-000021
为优化后的
Figure PCTCN2021135055-appb-000022
Figure PCTCN2021135055-appb-000023
在最小化目标函数
Figure PCTCN2021135055-appb-000024
后的值,α为学习率。
在一些实施例中,将所述超参数λ和所述学习率α设置为相同的数值。
在一些实施例中,所述第一梯度为所述参与设备随机采样的第一样本集进行模型训练,并计算所述第一样本集对应的反向传播的模型损失相对于模型参数的梯度;所述基于所述共享模型,根据所述第一梯度重建样本数据的深度特征,包括:基于所述共享模型,根据所述第一梯度重建所述第一样本集的深度特征。
在一些实施例中,所述样本数据和所述辅助数据为图片或语音。
本申请实施例还提供了一种协同训练数据属性的推断装置,应用于模型分布式协同训练的中心服务器,所述装置包括:分发模块,用于将预训练的共享模型分发给分布式协同训练的参与设备,以使所述参与设备采用本地样本的批量数据对所述共享模型进行训练迭代更新;获取模块,用于获取所述参与设备上传的第一梯度,所述第一梯度为所述参与设备进行模型训练时计算的模型损失相对于模型参数的梯度;重建模块,用于基于所述共享模型,根据所述第一梯度重建所述样本数据的深度特征;训练模块,用于采用当前共享模型提取带有属性标签的辅助数据的深度特征,训练属性推断模型,其中,所述共享模型是经过协同训练若干次迭代更新得到的;推断模块,用于根据训练完成的属性推断模型以及根据重建的所述深度特征对参与设备本地的单个训练样本进行数据属性推断。
本申请实施例还提供了一种计算设备,包括处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;所述存储器用于存放至少一条可执行指令,所述可执行指令使所 述处理器执行如上所述的协同训练数据属性的推断方法对应的操作。
本申请实施例还提供了一种计算机可读存储介质,所述存储介质中存储有至少一条可执行指令,所述可执行指令使处理器执行如上所述的协同训练数据属性的推断方法对应的操作。
本申请实施例通过获取设备反馈的进行模型训练时计算的模型损失相对于模型参数的梯度,根据该梯度和共享模型重建样本数据的深度特征,并通过带有属性标签的辅助数据的深度特征训练属性推断模型,最后根据训练完成的属性推断模型对重建的深度特征进行属性推断,可以利用重建的深度特征中包含的冗余特征进行额外的属性推断,无需重构输入样本,就能够推断每一个样本数据的相关属性,且不受参与设备每次训练更新的样本数据批量大小(batch size)的影响,尤其在大批量样本数据下表现突出,性能稳定,且可以对单个训练样本作属性推断。
附图说明
一个或多个实施方式通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施方式的限定,附图中具有相同参考数字标号的元件表示为类似的元件,除非有特别申明,附图中的图不构成比例限制。
图1是人脸分类模型的样本数据的性别特征图;
图2是本申请实施例的应用场景示意图;
图3是本申请实施例提供的协同训练数据属性的推断方法的流程图;
图4是共享模型的结构示意图;
图5是本申请和相关技术1在不同样本数据批量大小下重建深度特征的成功率的统计图;
图6是本申请实施例提供的协同训练数据属性的推断装置的结构图;
图7是本申请实施例提供的一种计算设备的结构示意图。
具体实施方式
为使本申请实施方式的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施方式进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施方式中,为了使读者更好地理解本申请而提出了许多技术 细节。但是,即使没有这些技术细节和基于以下各实施方式的种种变化和修改,也可以实现本申请所要求保护的技术方案。
随着硬件设备的飞速发展以及大数据的广泛应用,人工智能领域受到人们的广泛关注。其中,深度学习作为一种重要的数据分析工具,广泛应用于生物特征识别、汽车自动驾驶、机器视觉等多个应用领域。
在深度学习训练的过程中,包括中心式训练以及分布式训练两种方式。中心式训练由一个中心服务器收集训练所要求的数据然后进行集中训练;分布式训练(也可以称为协同训练或协作学习)是指多个参与者利用自己的本地数据共同训练同一个机器学习模型,在此过程中,参与者不需要收集数据,也不需要交换自己的本地数据,而是利用参与者的本地数据在其本地设备上训练模型,然后将训练的梯度或者模型的参数信息发送给中心服务器进行聚合,相当于参与者之间交换用于模型参数更新的梯度信息,以此来达到分布式训练同一个模型的目的。协同训练由于参与者无需将本地数据上传,保证了数据的私密性,数据安全性较高。
在协同训练的过程中,训练参与者的数据分布往往不平衡,例如当协同训练一个人脸识别模型时,不同训练参与者的数据中的男女性别比例可能不同,导致本地训练的模型有一定的偏差,从而导致协同训练的模型性能下降。统计各个训练参与者本地数据中的男女比例可以根据其数据分布为本地模型添加约束,从而提高模型性能。
此外,在深度学习中,模型的应用场景需要与模型的数据分布相似才能最大化模型的性能。统计训练数据的属性也可以将模型部署到更加适用的场景中。例如,在协同训练的人脸识别模型中,如果参与者的数据大多数是年轻人的数据,那么将其部署到应用场景大多数是老年人的应用中是不太合适的。经过统计训练数据的属性,可以将模型部署到更加适合的场景中或者在微调模型后再进行相应的部署。
在深度学习中,不同的学习任务训练的模型提取出的特征具有一定的泛化性,也就是说,任务一所提取的特征可以应用于任务二的学习中。综上,为了提高模型性能,以及将模型部署到更加适用的场景中,在协同训练中需要推断出训练数据的分布以及相关属性。由于深度特征不仅编码含有协同训练主任务相关的信息,也含有其他额外信息,可以利用深度特征对数据进行相关的推 断。
相关技术1中,利用模型前向传播后得到的中间层特征或者最后输出的概率进行数据的属性推断。该方法通过带有属性标签的数据,经过模型前向传播得到的特征或者模型输出的概率,然后利用这些信息训练属性推断分类器,以此推断数据的相关属性。这种数据属性推断方式的应用场景更多是机器学习即服务(Machine Learning As a Service)。在这种场景下,更多的是利用数据对一个训练完成的模型进行查询,并不涉及到利用参与者的数据更新模型参数或者重新部署模型的问题。另外,通常来说,此类方法通常需要修改模型的训练过程,使得模型的中间层输出或者最终输出编码含有数据属性相关的信息。
相关技术2中,直接利用模型反向传播时的梯度进行数据属性的推断。该方法通过将带有数据标签的数据输入模型,然后计算该数据所对应的损失梯度,直接利用梯度信息训练属性推断分类器,以此推断数据的相关属性。在深度学习训练的过程中,大多采用的训练方式是小批量(mini-batch)训练,即一个训练流程中,输入多个数据,然后计算多个数据所对应的平均梯度。协同训练中分发的梯度是由多个数据的梯度加权平均而来。因此,此类方法仅仅能够判断一个整个批次中数据的平均属性,而无法获取单一某一个数据点的属性。
相关技术3中,利用协作训练时子模型上传的梯度重建原始训练数据。该方法通过将随机初始化的训练数据输入模型,计算该数据所对应的损失梯度,然后最小化该损失梯度与上传梯度之间的差距,以此来优化随机初始化的训练数据,从而重建原始训练数据,并将训练数据用于属性推断。在这种方法中,会受到模型结构以及训练数据批次大小(batch size)的非常大影响,而影响重建数据的效果,最终导致将重建的数据用于属性推断并不准确。
因此,本申请实施例提出了一种方案,利用协同训练过程中分发的梯度信息重建训练数据的深度特征,并且利用重建的深度特征对数据进行额外的信息推断,以此推测出训练数据的分布以及相关属性,来调整训练过程中的参数设置,以及将训练好的模型更好地部署到实际场景中。
上述相关技术1进行数据属性推断时通常需要修改模型的训练过程,使得模型的中间层输出或者最终输出编码含有数据属性相关的信息。而修改模型训练过程的方法在协同训练中往往是不可行的,因为所有的参与者需要有一个共 同的学习目标,如果单一一个参与者修改了训练过程,会影响整体模型的训练效果。通过本申请实施例提供的利用梯度重建深度特征,且根据重建的深度特征对每个训练数据进行属性推断的方法,不需要修改模型的训练过程,即可达到推断数据属性的目的。
上述相关技术2利用经过加权平均后的梯度信息进行数据属性推断,只能推断出批次数据中的平均属性,而不能精确到具体某一特定数据的属性。通过本申请实施例利用梯度重建出每一个数据点对应的深度特征,然后利用重建的深度特征对数据进行属性推断,可以精确推断特定数据点的属性。
上述相关技术3会受到模型结构以及训练数据批次大小(batch size)的非常大影响,而影响重建数据的效果,最终导致将重建的数据用于属性推断并不准确。通过本申请实施例所利用模型结构仅仅为模型的部分子块,涉及到的模型结构更为简单,且重建深度特征的任务比重建原始数据的任务也更为简单,通过本申请实施例提供的重建方法重构深度特征,可以避免模型结构的影响,且在大批量样本数据下表现突出,性能稳定。
图1是人脸分类模型的样本数据的性别特征图。t分布随机近邻嵌入(t-distributed stochastic neighbor embedding,t-sne)算法是将高维度的特征映射到二维,然后将坐标归一化到(0,1)之间,图1中横坐标和纵坐标为归一化后的竖直,无具体含义。训练普通的人脸分类模型的主要任务是分辨人的身份信息,而性别信息在模型训练过程中并没有提供。但是,如图中所示,可以看出,即使没有提供性别信息,模型提取的特征经过t-sne降维可视化后,男女样本提取的特征有着一定的差异,并且可以很容易地进行区分。因此,利用深度特征可以进行一定的数据属性推断,这验证了利用模型的特征进行数据属性推断的可能性。
本申请实施例的应用场景是深度学习中的协同训练过程中。其中,协同训练的目标是利用各个协同训练参与者的本地数据共同训练一个模型,并且训练数据不需要离开参与者本地。深度学习模型可以是各种神经网络模型,例如卷积神经网络(Convolutional Neural Network,CNN)模型。深度学习模型可以用于数据处理,例如图像处理中的特征提取和分类。进一步的,可以用于人脸识别、物体识别等。其中,物体识别可以是动物、植物、物品等。
图2是本申请实施例的应用场景示意图。如图中所示,中心服务器将需要 协同训练的共享模型分发给协同训练参与者的参与设备(也可称为训练设备),由参与设备采用本地存储的训练数据进行模型训练。参与设备将训练的梯度或者模型的参数信息发送给中心服务器进行聚合,最终完成模型的训练。为提高协同训练效率和避免较大的偏差,所述共享模型一般在服务器端的公共数据集上进行预训练。所述公共数据集一般认为与协同训练的所有参与设备拥有不一样的样本但相似的数据分布。
图3是本申请实施例提供的协同训练数据属性的推断方法的流程图。该方法应用于模型分布式协同训练的中心服务器。如图中所示,该方法包括如下步骤:
S11:将预训练的共享模型分发给分布式协同训练的参与设备,以使参与设备采用本地样本的批量数据对共享模型进行训练迭代更新;
S12:获取参与设备上传的第一梯度,第一梯度为参与设备进行模型迭代更新训练时计算的模型损失相对于模型参数的梯度;
S13:基于更新后的共享模型,根据第一梯度重建样本数据的深度特征;
S14:采用当前共享模型提取带有属性标签的辅助数据的深度特征,训练属性推断模型,其中,共享模型是经过协同训练若干次迭代更新得到的;
S15:根据训练完成的属性推断模型以及根据重建的深度特征对参与设备本地的单个训练样本进行数据属性推断。
本申请实施例通过获取设备反馈的进行模型训练时计算的模型损失相对于模型参数的梯度,根据该梯度和共享模型重建样本数据的深度特征,并通过带有属性标签的辅助数据的深度特征训练属性推断模型,最后根据训练完成的属性推断模型对重建的深度特征进行属性推断,可以利用重建的深度特征中包含的冗余特征进行额外的属性推断,无需重构输入样本,就能够推断每一个样本数据的相关属性,且不受参与设备每次训练更新的样本数据批量大小的影响,尤其在大批量样本数据下表现突出,性能稳定,且可以对参与设备的单个本地训练样本作数据属性推断。
首先,对本申请实施例中的协同训练过程进行简要介绍。中心服务器将第一共享模型(也即初始化模型)分发给所有的参与设备。每个参与设备分别从本地存储的样本数据中随机选择一批样本数据进行模型训练。此次训练完成 后,参与设备将训练更新的模型参数发送给中心服务器。中心服务器将获得的所有参与设备更新的模型参数进行参数平均后,得到优化的第二共享模型。中心服务器继续将第二共享模型分发给所有的参与设备。由参与设备继续进行模型训练的过程。在后续的训练中,参与设备每次都将随机选取一批新的本地样本数据进行训练。经过多次迭代训练后,最终得到收敛完成的训练好的模型。
S11中,首先,中心服务器对需要训练的模型进行初始化,并将初始化的共享模型分发给各分布式协同训练的参与设备。每个参与设备本地存储有用于训练该模型的样本数据,各参与设备上存储的样本数据通常不同,且不平衡。需要训练的模型可以为图片识别模型或语音识别模型,则用于训练的数据为图片或语音。经过每次迭代后,中心服务器再将更新的共享模型分发给参与设备。
图4是共享模型的结构示意图。如图中所示,共享模型为用于图片识别的卷积神经网络模型,共享模型包括特征提取器E和分类器C。特征提取器E包括(n+1)个卷积块,分别表示为f 1,f 2,…,f n,f n+1,特征提取器E用于提取输入样本数据X的特征E(X)。分类器C包括卷积块f c,其可以是二分类模型,可以根据中心服务器的数据提取的深度特征进行训练,然后对重建的深度特征的属性进行预测,也即用于将提取的特征E(X)按照模型目的构建的分类器进行识别。输入样本数据X(图片)至特征提取器E,经过(n+1)个卷积块的卷积操作,得到样本数据X对应的深度特征E(X)。深度特征E(X)输入至分类器C,得到最终的图片识别结果。
参与设备接收中心服务器发送的共享模型,且在本地存储的数据中随机采样一批数据,在本地进行训练,并且计算随机采样的数据对应的反向传播的损失梯度g,并且将g分享给中心服务器用于协同训练模型。请参考图4,可以根据损失函数
Figure PCTCN2021135055-appb-000025
计算损失梯度g,损失梯度g包括g c和g n+1,其中,g c为参与设备计算的损失函数对应于f c的参数的梯度,g n+1为参与设备计算的损失函数对应于f n+1的参数的梯度,二者均为真实的梯度。为后续的属性推断提供了有用的信息,特别对于大批量样本数据的训练和更新尤为有用。
在一些实施例中,第一梯度为参与设备随机采样的第一样本集进行模型训练,并计算第一样本集对应的反向传播的模型损失相对于模型参数的梯度。第一样本集中的数据可以是小批量的数据,这样可以提高计算速度和效率。当参 与设备随机采样的第一样本集进行模型训练时,在S13中,则基于更新后的共享模型,根据第一梯度重建第一样本集的深度特征。
参与设备的训练数据始终保持在本地,且不与其他参与设备或者服务器共享,以此达到保护训练数据的隐私安全的目的。一个小批次(mini-batch)的数据中包含多个样本,样本数量取决于批次的大小。每一个样本中包含的内容与协同训练的目标模型相关,例如,协同训练的目的是共同训练人脸识别模型,则每一个训练样本包含的内容就是一张人脸图片,并且对应于一个标签,标签的内容取决于协同训练模型的目的。
在一些实施例中,S13进一步可以包括:
S131:随机初始化待优化的第一深度特征;
S132:将第一深度特征输入共享模型,获取第二梯度;
S133:最小化第一梯度和第二梯度之间的差距,对第一深度特征进行优化。
请参考图4,共享模型可以为卷积神经网络模型,共享模型包括特征提取器和分类器f c,特征提取器包括(n+1)个卷积块。第一深度特征为数据对
Figure PCTCN2021135055-appb-000026
其代表的是一对可优化的数据对,
Figure PCTCN2021135055-appb-000027
是欲重建的深度特征,
Figure PCTCN2021135055-appb-000028
是伪标签。由于不知道样本数据(原始数据)的真实标签,因此此处需提供一个可优化的伪标签用于计算交叉熵损失。
Figure PCTCN2021135055-appb-000029
最开始是随机初始化得到的,经过优化之后,最终获得的重建的
Figure PCTCN2021135055-appb-000030
与原来的
Figure PCTCN2021135055-appb-000031
相似,
Figure PCTCN2021135055-appb-000032
即为深度特征。
本申请实施例需要利用的信息是最后一个卷积块f n+1以及最后的分类器f c的信息。利用f n+1和f c的前向传播信息以及这两层网络相对应的反向传播梯度信息g n+1和g c,将数据对
Figure PCTCN2021135055-appb-000033
输入由f n+1和f c组成的子模型,并且分别计算损失函数对应于f n+1和f c的参数的梯度。具体的,S132包括:
S1321:将第一深度特征输入特征提取器的多个卷积块中的最后一个卷积块f n+1,再将卷积块f n+1输出的特征E(X)输入分类器f c
S1322:分别计算损失函数对应于f n+1的参数的梯度
Figure PCTCN2021135055-appb-000034
与损失函数对应于f c的参数的梯度
Figure PCTCN2021135055-appb-000035
其中,第二梯度包括梯度
Figure PCTCN2021135055-appb-000036
和梯度
Figure PCTCN2021135055-appb-000037
在一些实施例中,梯度
Figure PCTCN2021135055-appb-000038
和梯度
Figure PCTCN2021135055-appb-000039
的计算公式为:
Figure PCTCN2021135055-appb-000040
Figure PCTCN2021135055-appb-000041
其中,
Figure PCTCN2021135055-appb-000042
是交叉熵损失函数。
Figure PCTCN2021135055-appb-000043
为f c的梯度算子(在空间各方向上的全微分),
Figure PCTCN2021135055-appb-000044
为f n+1的梯度算子。
由于需要将随机初始化的数据对
Figure PCTCN2021135055-appb-000045
所计算而来的梯度
Figure PCTCN2021135055-appb-000046
Figure PCTCN2021135055-appb-000047
与前文的真实的梯度g c和g n+1相匹配,因此,可以通过设计目标函数作为优化目标。S133中最小化第一梯度和第二梯度之间的差距,进一步可以包括:
最小化目标函数
Figure PCTCN2021135055-appb-000048
以最小化第一梯度和第二梯度之间的差距,目标函数为:
Figure PCTCN2021135055-appb-000049
其中,λ为超参数,g n+1和g c为参与设备上传的第一梯度,
Figure PCTCN2021135055-appb-000050
Figure PCTCN2021135055-appb-000051
均为衡量两个梯度之间差别的距离函数,衡量两个梯度g与
Figure PCTCN2021135055-appb-000052
之间差别的距离函数d为:
Figure PCTCN2021135055-appb-000053
其中,σ 2=Var(g),Var(g)为梯度g的方差。
该距离函数d涉及到两项计算,第一项是余弦相似度
Figure PCTCN2021135055-appb-000054
第二项是高斯核函数
Figure PCTCN2021135055-appb-000055
S133中对第一深度特征进行优化,进一步可以包括:
根据如下公式对
Figure PCTCN2021135055-appb-000056
进行更新:
Figure PCTCN2021135055-appb-000057
其中,
Figure PCTCN2021135055-appb-000058
为优化后的
Figure PCTCN2021135055-appb-000059
Figure PCTCN2021135055-appb-000060
在最小化目标函数
Figure PCTCN2021135055-appb-000061
后的值,α为学习率。
在一些实施例中,可以将超参数λ和学习率α设置为相同的数值。例如,将 超参数λ设为0.1且将学习率α也设为0.1。可以理解的是,超参数和学习率的值一般可根据经验调整,也可以设置为其他数值。
在经过一定次数的优化后,将最终最优的重建的
Figure PCTCN2021135055-appb-000062
表示为
Figure PCTCN2021135055-appb-000063
则经过重建获得的每一个样本数据的深度特征可以表示为:
Figure PCTCN2021135055-appb-000064
其中,优化的次数可以根据经验设置,例如设置为5000次。当然,也可以设置为更高的次数,优化结果将更加逼近真实值,但会导致时间成本的增加。若设置为较少的次数,则优化结果可能没那么逼近真实值,但是会降低时间成本。
经过上述步骤,得到了重建后的样本数据的深度特征
Figure PCTCN2021135055-appb-000065
接下来可以利用重建的深度特征
Figure PCTCN2021135055-appb-000066
对样本数据X进行属性推断。
可以理解的是,S11~S13的执行无需改变协同训练过程,协同训练正常进行即可。且S11~S13可以对每个参与设备的每次训练的样本数据的深度特征进行重建,从而对于所有用于训练的样本数据,都可以进行后续的属性推断。
S14中,中心服务器存储有带有属性标签的辅助数据,训练属性推断模型(也可以称为属性分类模型,其功能是对数据的属性进行识别或分类)需要先利用特征提取器提取带有属性标签的辅助数据的特征。然后利用提取的带有属性标签的辅助数据的深度特征训练属性推断模型,用于推断参与设备中的样本数据的属性。
S15中,中心服务器将重建的样本数据的深度特征输入属性推断模型,以此实现对协同训练数据属性的推断。
本步骤可以对参与设备本地的所有参与了模型训练的样本数据的属性进行推断。
综上,本申请实施例通过协同训练的参与设备上传的梯度进行深度特征重建,并且通过属性推断模型对分布式协同训练的参与设备的训练数据的属性进行推断,从而实现对于协同训练数据属性的推断。
统计了本申请实施例的方法在不同样本数据批量大小下重建深度特征的成功率。具体为,统计重建深度特征与原始真实特征的余弦相似度>0.95的比 例。与相关技术1的对比结果请参考图5所示。图5是本申请和相关技术1(图中标记为方法1)在不同样本数据批量大小下重建深度特征的成功率的统计图。可知,相比相关技术1,本申请实施例的方法在不同批量大小下对深度特征都有良好的重建效果。尤其在大批量大小(例如批量大小=512)下表现突出,性能稳定。
使用本申请实施例的方法以及相关技术1、相关技术2分别对第一数据集、第二数据集和第三数据集分别进行协同训练数据属性推断,对应的属性推断准确率如表1所示:
表1
Figure PCTCN2021135055-appb-000067
由此可见,本申请提高了属性推断的准确率。
综上,相比现有技术,本申请实施例具有如下有益效果:
(1)利用模型训练过程中的前向传播以及反向传播信息重建训练数据对应的深度特征,可以不受小批次的大小影响而精准地重建出每个数据对应的深度特征;相比重建输入样本的方式,重建的数据量小,效率更高,例如重建输入样本的方式在样本数据的批量大小达到8时,其重建结果几乎不能用于属性推断。
(2)利用了更少的模型结构重建深度特征,因此重建效果受到模型具体结构影响较小,使其可以应用于多个不同的卷积神经网络模型,提高了应用的广泛性。
(3)与其他基于反向传播信息的推断方法相比,本申请实施例提出利用 梯度重建深度特征的方法,可以重建出每一个训练样本对应的深度特征然后利用其进行数据相关属性的推断,可以推断小批次训练中的每一个数据相关的属性,并且提高了推断准确率。而现有的一些方式仅能推断批量样本数据中是否存在某个属性,无法获知该属性属于哪个具体的样本,或者一次仅能针对一个数量的样本数据进行属性推断。
图6是本申请实施例提供的协同训练数据属性的推断装置的结构图。如图中所示,该数据属性的推断装置应用于模型分布式协同训练的中心服务器,装置500包括分发模块501、获取模块502、重建模块503、训练模块504和推断模块505。其中:
分发模块501用于将预训练的共享模型分发给分布式协同训练的参与设备,以使所述参与设备采用本地样本的批量数据对所述共享模型进行训练迭代更新;
获取模块502,用于获取所述参与设备上传的第一梯度,所述第一梯度为所述参与设备进行模型训练时计算的模型损失相对于模型参数的梯度;
重建模块503,用于基于所述共享模型,根据所述第一梯度重建所述样本数据的深度特征;
训练模块504,用于采用当前共享模型提取带有属性标签的辅助数据的深度特征,训练属性推断模型,其中,所述共享模型是经过协同训练若干次迭代更新得到的;
推断模块505,用于根据训练完成的属性推断模型以及根据重建的所述深度特征对参与设备本地的单个训练样本进行数据属性推断。
本装置的具体实现方式和工作原理可参考前述的方法实施例,此处不再赘述。
图7是本申请实施例提供的一种计算设备的结构示意图。如图中所示,计算设备600包括处理器601、存储器602、通信接口603和通信总线604,处理器601、存储器602和通信接口603通过通信总线604完成相互间的通信。存储器602是非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块。本申请实施例中,存储器602用于存放至少一条可执行指令,可执行指令使处理器601执行如上的协同训练数据属性的 推断方法对应的操作。
本申请实施例还提供了一种计算机可读存储介质,存储介质中存储有至少一条可执行指令,可执行指令使处理器执行如上的协同训练数据属性的推断方法对应的操作。
最后应说明的是:以上实施方式仅用以说明本申请的技术方案,而非对其限制;在本申请的思路下,以上实施方式或者不同实施方式中的技术特征之间也可以进行组合,步骤可以以任意顺序实现,并存在如上所述的本申请的不同方面的许多其它变化,为了简明,它们没有在细节中提供;尽管参照前述实施方式对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施方式技术方案的范围。

Claims (13)

  1. 一种协同训练数据属性的推断方法,应用于模型分布式协同训练的中心服务器,其特征在于,所述方法包括:
    将预训练的共享模型分发给分布式协同训练的参与设备,以使所述参与设备采用本地样本的批量数据对所述共享模型进行训练迭代更新;
    获取所述参与设备上传的第一梯度,所述第一梯度为所述参与设备进行模型训练时计算的模型损失相对于模型参数的梯度;
    基于所述共享模型,根据所述第一梯度重建所述样本数据的深度特征;
    采用所述共享模型提取带有属性标签的辅助数据的深度特征,训练属性推断模型,其中,所述共享模型是经过协同训练若干次迭代更新得到的;
    根据训练完成的属性推断模型以及根据重建的所述深度特征对参与设备本地的单个训练样本进行数据属性推断。
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述共享模型,根据所述第一梯度重建所述样本数据的深度特征,包括:
    随机初始化待优化的第一深度特征;
    将所述第一深度特征输入所述共享模型,获取第二梯度;
    最小化所述第一梯度和所述第二梯度之间的差距,对所述第一深度特征进行优化。
  3. 根据权利要求2所述的方法,其特征在于,所述共享模型为卷积神经网络模型,所述共享模型包括特征提取器和分类器f c,所述特征提取器包括(n+1)个卷积块;
    所述将所述第一深度特征输入所述共享模型,获取第二梯度,包括:
    将所述第一深度特征输入所述特征提取器的所述多个卷积块中的最后一个卷积块f n+1,再将所述卷积块f n+1输出的特征E(X)输入所述分类器f c
    分别计算损失函数对应于f n+1的参数的梯度
    Figure PCTCN2021135055-appb-100001
    与所述损失函数对应于f c的参数的梯度
    Figure PCTCN2021135055-appb-100002
    其中,所述第二梯度包括所述梯度
    Figure PCTCN2021135055-appb-100003
    和所述梯度
    Figure PCTCN2021135055-appb-100004
  4. 根据权利要求3所述的方法,其特征在于,所述第一深度特征为数据对
    Figure PCTCN2021135055-appb-100005
    所述
    Figure PCTCN2021135055-appb-100006
    是欲重建的深度特征,
    Figure PCTCN2021135055-appb-100007
    是伪标签。
  5. 根据权利要求4所述的方法,其特征在于,所述梯度
    Figure PCTCN2021135055-appb-100008
    和所述梯度
    Figure PCTCN2021135055-appb-100009
    的计算公式为:
    Figure PCTCN2021135055-appb-100010
    Figure PCTCN2021135055-appb-100011
    其中,
    Figure PCTCN2021135055-appb-100012
    是交叉熵损失函数。
  6. 根据权利要求5所述的方法,其特征在于,所述最小化所述第一梯度和所述第二梯度之间的差距,包括:
    最小化目标函数
    Figure PCTCN2021135055-appb-100013
    以最小化所述第一梯度和所述第二梯度之间的差距,所述目标函数为:
    Figure PCTCN2021135055-appb-100014
    其中,λ为超参数,g n+1和g c为所述参与设备上传的第一梯度,
    Figure PCTCN2021135055-appb-100015
    Figure PCTCN2021135055-appb-100016
    均为衡量两个梯度之间差别的距离函数,衡量两个梯度g与
    Figure PCTCN2021135055-appb-100017
    之间差别的距离函数d为:
    Figure PCTCN2021135055-appb-100018
    其中,σ 2=Var(g),Var(g)为梯度g的方差。
  7. 根据权利要求6所述的方法,其特征在于,所述对所述第一深度特征进 行优化,包括:
    根据如下公式对
    Figure PCTCN2021135055-appb-100019
    进行更新:
    Figure PCTCN2021135055-appb-100020
    其中,
    Figure PCTCN2021135055-appb-100021
    为优化后的
    Figure PCTCN2021135055-appb-100022
    Figure PCTCN2021135055-appb-100023
    在最小化目标函数
    Figure PCTCN2021135055-appb-100024
    后的值,α为学习率。
  8. 根据权利要求7所述的方法,其特征在于,将所述超参数λ和所述学习率α设置为相同的数值。
  9. 根据权利要求1~9任一项所述的方法,其特征在于,所述第一梯度为所述参与设备随机采样的第一样本集进行模型训练,并计算所述第一样本集对应的反向传播的模型损失相对于模型参数的梯度;
    所述基于所述共享模型,根据所述第一梯度重建所述样本数据的深度特征,包括:
    基于所述共享模型,根据所述第一梯度重建所述第一样本集的深度特征。
  10. 根据权利要求1~9任一项所述的方法,其特征在于,所述样本数据和所述辅助数据为图片或语音。
  11. 一种协同训练数据属性的推断装置,应用于模型分布式协同训练的中心服务器,其特征在于,所述装置包括:
    分发模块,用于将预训练的共享模型分发给分布式协同训练的参与设备,以使所述参与设备采用本地样本的批量数据对所述共享模型进行训练迭代更新;
    获取模块,用于获取所述参与设备上传的第一梯度,所述第一梯度为所述参与设备进行模型训练时计算的模型损失相对于模型参数的梯度;
    重建模块,用于基于所述共享模型,根据所述第一梯度重建所述样本数据 的深度特征;
    训练模块,用于采用当前共享模型提取带有属性标签的辅助数据的深度特征,训练属性推断模型,其中,所述共享模型是经过协同训练若干次迭代更新得到的;
    推断模块,用于根据训练完成的属性推断模型以及根据重建的所述深度特征对参与设备本地的单个训练样本进行数据属性推断。
  12. 一种计算设备,其特征在于,包括处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;
    所述存储器用于存放至少一条可执行指令,所述可执行指令使所述处理器执行如权利要求1~10中任一项所述的协同训练数据属性的推断方法对应的操作。
  13. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有至少一条可执行指令,所述可执行指令使处理器执行如权利要求1~10中任一项所述的协同训练数据属性的推断方法对应的操作。
PCT/CN2021/135055 2021-12-02 2021-12-02 协同训练数据属性的推断方法、装置、设备及存储介质 WO2023097602A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2021/135055 WO2023097602A1 (zh) 2021-12-02 2021-12-02 协同训练数据属性的推断方法、装置、设备及存储介质
CN202180004174.3A CN114287009A (zh) 2021-12-02 2021-12-02 协同训练数据属性的推断方法、装置、设备及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/135055 WO2023097602A1 (zh) 2021-12-02 2021-12-02 协同训练数据属性的推断方法、装置、设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/613,118 Continuation US20240232665A1 (en) 2024-03-22 Attribute inference method for co-training data, computing device, and storage medium thereof

Publications (1)

Publication Number Publication Date
WO2023097602A1 true WO2023097602A1 (zh) 2023-06-08

Family

ID=80880015

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/135055 WO2023097602A1 (zh) 2021-12-02 2021-12-02 协同训练数据属性的推断方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN114287009A (zh)
WO (1) WO2023097602A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008696A (zh) * 2019-03-29 2019-07-12 武汉大学 一种面向深度联邦学习的用户数据重建攻击方法
US20210209515A1 (en) * 2020-09-25 2021-07-08 Beijing Baidu Netcom Science And Technology Co., Ltd. Joint training method and apparatus for models, device and storage medium
WO2021228404A1 (en) * 2020-05-15 2021-11-18 Huawei Technologies Co., Ltd. Generating high-dimensional, high utility synthetic data

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101489A (zh) * 2020-11-18 2020-12-18 天津开发区精诺瀚海数据科技有限公司 一种联邦学习与深度学习融合驱动的设备故障诊断方法
CN112600794A (zh) * 2020-11-23 2021-04-02 南京理工大学 一种联合深度学习中检测gan攻击的方法
CN112634341B (zh) * 2020-12-24 2021-09-07 湖北工业大学 多视觉任务协同的深度估计模型的构建方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008696A (zh) * 2019-03-29 2019-07-12 武汉大学 一种面向深度联邦学习的用户数据重建攻击方法
WO2021228404A1 (en) * 2020-05-15 2021-11-18 Huawei Technologies Co., Ltd. Generating high-dimensional, high utility synthetic data
US20210209515A1 (en) * 2020-09-25 2021-07-08 Beijing Baidu Netcom Science And Technology Co., Ltd. Joint training method and apparatus for models, device and storage medium

Also Published As

Publication number Publication date
CN114287009A (zh) 2022-04-05

Similar Documents

Publication Publication Date Title
US11893781B2 (en) Dual deep learning architecture for machine-learning systems
CN111523621B (zh) 图像识别方法、装置、计算机设备和存储介质
WO2021147557A1 (zh) 客户画像方法、装置、计算机可读存储介质及终端设备
CN109241903B (zh) 样本数据清洗方法、装置、计算机设备及存储介质
WO2022121289A1 (en) Methods and systems for mining minority-class data samples for training neural network
CN109446889B (zh) 基于孪生匹配网络的物体追踪方法及装置
CN111696196B (zh) 一种三维人脸模型重建方法及装置
US20220237917A1 (en) Video comparison method and apparatus, computer device, and storage medium
CN115587633A (zh) 一种基于参数分层的个性化联邦学习方法
WO2023103864A1 (zh) 抵抗联邦学习中歧视传播的节点模型的更新方法
CN116664930A (zh) 基于自监督对比学习的个性化联邦学习图像分类方法及系统
CN113743474A (zh) 基于协同半监督卷积神经网络的数字图片分类方法与系统
CN111079930B (zh) 数据集质量参数的确定方法、装置及电子设备
Pomponi et al. Bayesian neural networks with maximum mean discrepancy regularization
CN116187430A (zh) 一种联邦学习方法及相关装置
Mondal et al. Improved skin disease classification using generative adversarial network
CN114358250A (zh) 数据处理方法、装置、计算机设备、介质及程序产品
CN117523291A (zh) 基于联邦知识蒸馏和集成学习的图像分类方法
WO2023097602A1 (zh) 协同训练数据属性的推断方法、装置、设备及存储介质
Usmanova et al. Federated continual learning through distillation in pervasive computing
Rashmi et al. Textural feature based image classification using artificial neural network
CN117033997A (zh) 数据切分方法、装置、电子设备和介质
WO2019080844A1 (zh) 数据推理方法、装置及计算机设备
CN115563519A (zh) 面向非独立同分布数据的联邦对比聚类学习方法及系统
CN112464916B (zh) 人脸识别方法及其模型训练方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21966034

Country of ref document: EP

Kind code of ref document: A1