WO2021114931A1 - 防止隐私数据泄漏的编码模型训练方法及装置 - Google Patents

防止隐私数据泄漏的编码模型训练方法及装置 Download PDF

Info

Publication number
WO2021114931A1
WO2021114931A1 PCT/CN2020/124681 CN2020124681W WO2021114931A1 WO 2021114931 A1 WO2021114931 A1 WO 2021114931A1 CN 2020124681 W CN2020124681 W CN 2020124681W WO 2021114931 A1 WO2021114931 A1 WO 2021114931A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
loss
model
training sample
feature vector
Prior art date
Application number
PCT/CN2020/124681
Other languages
English (en)
French (fr)
Inventor
石磊磊
熊涛
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2021114931A1 publication Critical patent/WO2021114931A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Definitions

  • One or more embodiments of this specification relate to the application of machine learning to the technical field of data security, and in particular to a coding model training method and device for preventing privacy data leakage, and a target object identification method for preventing privacy data leakage.
  • a target object such as a user or a device, etc.
  • the user's face information can be collected to identify the user's identity (such as the user ID in the payment system), so as to find the corresponding payment account based on the identity and complete the payment of the corresponding order.
  • the identity of the device such as the device ID assigned by the data analysis system
  • the identity of the device can be identified by collecting sensor data generated by the terminal device during use, so as to establish the relationship between the user and the device. The mapping relationship between. Obviously, in these scenarios, higher requirements are put forward on the accuracy of identification.
  • One or more embodiments of this specification describe a coding model training method and device to prevent privacy data leakage, and a target object identification method and device to prevent privacy data leakage. At the same time of accuracy, it effectively reduces the risk of privacy data leakage.
  • a coding model training method for preventing privacy data leakage, the method comprising: obtaining a plurality of training sample groups, including any first sample group, the first sample group including the first sample Pair and a second sample pair, the first sample pair includes a first training sample and a second training sample, wherein the first training sample includes the first private data and the first object identifier that characterize the identity information of the first target object;
  • the second training sample has the first object identifier, and the two samples of the second sample pair have different object identifiers;
  • the privacy data corresponding to each training sample in the first sample group is input into the coding model respectively to obtain Corresponding multiple feature vectors, including the first feature vector corresponding to the first training sample; input the first feature vector to the classification model used to determine the identity of the target object to obtain the first classification result, based on the The first classification result and the first object identifier are used to determine the first classification loss;
  • the first feature vector is input to a decoding model used to invert private data to obtain the first
  • the target object includes a user
  • the identity information includes one or more of the following: a face image, a fingerprint image, and an iris image.
  • the target object includes a device
  • the identity information includes one or more of the following: International Mobile Equipment Identity (IMEI), a subscriber identification card SIM card number, and device sensor information.
  • the second sample pair includes the first training sample and the third training sample; wherein the feature vector corresponding to each training sample in the first sample group is input to distinguish different target objects.
  • Differentiating the model to obtain the first sample distance between the samples in the first sample pair and the second sample distance between the samples in the second sample pair includes: combining the first training sample and the first sample After the feature vectors corresponding to the two training samples and the third training sample are spliced in a preset order, input the discrimination model to obtain the first sample distance and the second sample distance.
  • the second sample pair includes a third training sample and a fourth training sample; wherein the feature vector corresponding to each training sample in the first sample group is input into a distinguishing model for distinguishing different target objects ,
  • Obtaining the first sample distance between the samples in the first sample pair and the second sample distance between the samples in the second sample pair includes: combining the first training sample, the second training sample, After the feature vectors corresponding to the third training sample and the fourth training sample are spliced in a preset order, input the discrimination model to obtain the first sample distance and the second sample distance.
  • the method further includes: adjusting the parameters in the classification model with the goal of minimizing the classification loss corresponding to the multiple training sample groups; and/or, The target is to minimize the decoding loss corresponding to the multiple training sample groups, and the parameters in the decoding model are adjusted; and/or the target is to minimize the distinction loss corresponding to the multiple training sample groups, and the distinction is adjusted. Parameters in the model.
  • the model parameters in the coding model are adjusted, It includes: performing a weighted summation of the classification loss, decoding loss, and discrimination loss based on preset weight parameters for the classification loss, decoding loss, and discrimination loss to obtain a comprehensive loss.
  • the comprehensive loss is the same as the classification loss.
  • the loss and the decoding loss are negatively correlated and positively correlated with the discrimination loss; based on the comprehensive loss, the model parameters in the coding model are adjusted.
  • a target object identification method for preventing privacy data leakage.
  • the execution subject of the method is a server, and the identification method includes: receiving a second feature vector from a terminal, and the second feature vector is determined by the The terminal inputs the collected second privacy data into an encoding model to determine; wherein the encoding model is obtained by pre-training based on the method described in the first aspect above; and the second feature vector corresponds to the pre-stored in the server Compare multiple feature vectors of multiple target objects to obtain a comparison result, which is used to determine whether the identity recognition of the target object corresponding to the second privacy data is successful; wherein the multiple feature vectors Multiple pieces of historical privacy data of multiple target objects are input into the coding model and are obtained.
  • a method for identifying a target object to prevent leakage of private data includes: collecting second private data; inputting the second private data into an encoding model, Obtain a second feature vector, the encoding model is obtained by pre-training based on the method described in the first aspect; sending the second feature vector to the server, so that the server compares the second feature vector with the server.
  • the pre-stored multiple feature vectors corresponding to multiple target objects are compared to obtain a comparison result, which is used to determine whether the identity recognition of the target object corresponding to the second privacy data is successful.
  • a coding model training device for preventing privacy data leakage, including: a sample obtaining unit configured to obtain a plurality of training sample groups, including any first sample group, the first sample group including A first sample pair and a second sample pair.
  • the first sample pair includes a first training sample and a second training sample.
  • the first training sample includes first private data and first private data that characterizes the identity information of the first target object.
  • the second training sample has the first object identifier, and the two samples of the second sample pair have different object identifiers;
  • the coding unit is configured to correspond to each training sample in the first sample group
  • the privacy data of is input into the coding model respectively to obtain a plurality of corresponding feature vectors, including the first feature vector corresponding to the first training sample;
  • the classification unit is configured to input the first feature vector to determine the target object
  • the classification model of the identity, the first classification result is obtained, and the first classification loss is determined based on the first classification result and the first object identifier;
  • the decoding unit is configured to input the first feature vector for inverting privacy Data decoding model to obtain first reversed data, and determine the first decoding loss based on the first reversed data and the first privacy data;
  • the distinguishing unit is configured to train each of the first sample groups
  • the feature vector input corresponding to the sample is used to distinguish the distinguishing model of different target objects, and the first sample distance between the samples in the first sample pair and the second sample distance between
  • a first discrimination loss is determined, where the first discrimination loss is positively correlated with the first sample distance and negatively correlated with the second sample distance; the coding model tuning unit is configured to maximize the multiplicity The classification loss and the decoding loss corresponding to each training sample group, and minimizing the discrimination loss corresponding to the multiple training samples as a goal, adjust the model parameters in the coding model.
  • a target object identification device for preventing privacy data leakage, the device is integrated in a server, and the identification device includes: a vector receiving unit configured to receive a second feature vector from a terminal, the second The feature vector is determined by the terminal inputting the collected second privacy data into the coding model; wherein the coding model is obtained by pre-training the device in the first four aspects; the vector comparison unit is configured to transfer the second feature The vector is compared with multiple feature vectors corresponding to multiple target objects pre-stored in the server to obtain a comparison result, which is used to determine whether the identity recognition of the target object corresponding to the second private data is successful; where The multiple feature vectors are obtained by inputting multiple pieces of historical privacy data of the multiple target objects into the coding model.
  • a target object recognition device for preventing privacy data leakage
  • the device is integrated in a terminal, and the recognition device includes: a data collection unit configured to collect second private data; and an encoding unit configured to The second privacy data is input into the coding model to obtain a second feature vector, and the coding model is pre-trained based on the device in the fourth aspect; the vector sending unit is configured to send the second feature vector to the server, so that The server compares the second feature vector with multiple feature vectors corresponding to multiple target objects pre-stored in the server, and obtains a comparison result, which is used to determine the data corresponding to the second private data Whether the identification of the target object is successful.
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect or the second aspect or the third aspect.
  • a computing device including a memory and a processor, the memory stores executable code, and when the processor executes the executable code, the first aspect or the second aspect or the first aspect is implemented. Three-dimensional approach.
  • adjusting the model parameters in the coding model can make the coding vector have a high The degree of discrimination (to ensure the accuracy and effectiveness of subsequent identification), and at the same time, it can make the encoding vector irreversible on the one hand, that is, it is difficult for criminals to reverse or restore the original private data through the encoding vector, and on the other hand, the encoding vector Confusion, that is, it is difficult for criminals to classify or determine the identity of the target object through the code vector.
  • the private data is encoded into a feature vector, and the feature vector is transmitted, stored, and compared to ensure that The accuracy and validity of the identification results.
  • sending the feature vector to the cloud for comparison instead of directly performing the comparison on the terminal can make the comparison range not limited by the storage resources of the terminal.
  • Fig. 1 shows an implementation block diagram of a coding model training method for preventing privacy data leakage according to an embodiment
  • Fig. 2 shows an implementation block diagram of a target object recognition method for preventing privacy data leakage according to an embodiment
  • FIG. 3 shows a flowchart of a coding model training method for preventing privacy data leakage according to an embodiment
  • Fig. 4 shows a schematic diagram of a network structure of a triplet network according to an embodiment
  • Fig. 5 shows an interaction diagram of a target object recognition method for preventing privacy data leakage according to an embodiment
  • Fig. 6 shows a structural diagram of a coding model training device for preventing privacy data leakage according to an embodiment
  • Fig. 7 shows a structural diagram of an identity recognition device for preventing privacy data leakage according to an embodiment
  • Fig. 8 shows a structural diagram of an identity recognition device for preventing privacy data leakage according to another embodiment.
  • this method of reducing the identification of private data is difficult to meet both the low identification of private data and the accuracy of target identification.
  • the collection and calculation of private data can be completed on the device side or the edge side, and the decision result can be returned, and the collected private data is not transmitted and stored.
  • the size of the comparable sample library on the end is limited and cannot be updated in real time, resulting in a very limited success rate and coverage rate of identification.
  • the inventor proposes to design a coding model training method to prevent privacy data leakage by introducing the idea of adversarial learning, and a target object identification method to prevent privacy data leakage based on the coding model. .
  • a coding model training method to prevent privacy data leakage by introducing the idea of adversarial learning
  • a target object identification method to prevent privacy data leakage based on the coding model.
  • FIG. 1 shows an implementation block diagram of a coding model training method for preventing privacy data leakage according to an embodiment.
  • a batch of training samples are drawn, where each training sample includes the privacy data (X) and object identification (Y) corresponding to the target object; then, the batch of training samples are separated Enter the coding model to obtain a corresponding batch of feature vectors (Vx); then, input these batches of feature vectors into the classification model used to determine the identity of the target object, the decoding model used to infer the private data, and the classification model used to infer the private data.
  • Vx feature vectors
  • the classification loss, decoding loss, and discrimination loss corresponding to the batch of training samples are determined respectively; then, the model parameters in the encoder are fixed first to minimize the classification loss, decoding loss, and The discrimination loss is the target, and the model parameters in the classification model, decoding model and discrimination model are adjusted accordingly.
  • another batch of training samples is drawn, and the above process is repeated to obtain the classification loss, decoding loss, and discrimination loss corresponding to the other batch of training samples; then, the classification after the above tuning is fixed
  • the model parameters in the model, the decoding model, and the discrimination model are adjusted to maximize the classification loss and decoding loss corresponding to the other batch of training samples, and to minimize the corresponding discrimination loss, and adjust the parameters in the coding model.
  • the final trained coding model can be obtained.
  • the feature vector obtained by the coding model has a good degree of discrimination for different target objects. At the same time, it is difficult for criminals to restore the available private data based on the leaked feature vector, nor can they determine the identity of the target object based on the leaked feature vector. , And effectively prevent the leakage of private data.
  • FIG. 2 shows an implementation block diagram of a target object recognition method for preventing privacy data leakage according to an embodiment.
  • the terminal collects private data (such as the user's face image), and then uses the coding model deployed in the terminal to encode the private data to obtain the corresponding feature vector; then, The terminal sends the feature vector to the cloud server; then, the server compares the received feature vector with multiple feature vectors corresponding to multiple target objects stored therein, and returns the comparison result to the terminal; and then , The terminal determines the final result of identification according to the comparison result.
  • the identification process all the feature vectors output by the coding model are transmitted, stored, and used, which can effectively prevent the leakage of private data.
  • FIG. 3 shows a flowchart of a coding model training method for preventing privacy data leakage according to an embodiment.
  • the execution subject of the method may be any device, device, platform, or device cluster with computing and processing capabilities.
  • the method includes the following steps: Step S310, acquiring a plurality of training sample groups, including any first sample group, the first sample group including a first sample pair and a second sample pair , The first sample pair includes a first training sample and a second training sample, wherein the first training sample includes first private data and a first object identifier that characterize the identity information of the first target object; the second training sample has The first object identifier and the two samples of the second sample pair have different object identifiers.
  • Step S320 Input the privacy data corresponding to each training sample in the first sample group into the coding model to obtain multiple corresponding feature vectors, including the first feature vector corresponding to the first training sample.
  • Step S330 Input the first feature vector into a classification model used to determine the identity of the target object to obtain a first classification result, and determine a first classification loss based on the first classification result and the first object identifier.
  • Step S340 Input the first feature vector into a decoding model for inverting private data to obtain first inverted data, and determine a first decoding loss based on the first inverted data and the first private data.
  • Step S350 input the feature vector corresponding to each training sample in the first sample group into a distinguishing model for distinguishing different target objects, to obtain the first sample distance between samples in the first sample pair, and The second sample distance between samples in the second sample pair, and a first discrimination loss is determined, and the first discrimination loss is positively correlated with the first sample distance and negatively correlated with the second sample distance.
  • Step S360 with the goal of maximizing the classification loss and decoding loss corresponding to the multiple training sample groups, and minimizing the discrimination loss corresponding to the multiple training samples, adjust the model parameters in the coding model.
  • step S310 multiple training sample groups are acquired.
  • the target objects involved in the multiple training sample groups may include users.
  • the identity information of the target objects may include the user's biometric information, such as face images and fingerprints. Image and iris image and so on.
  • the identity information of the target object may also include the user's mobile phone number, ID number, and so on.
  • the target objects involved in the multiple training sample groups may include animals, such as horses, cats, dogs, pigs, etc.
  • the identity information of the target objects may include biometric information of the animals.
  • the animal's biometric information may include the animal's facial profile picture, the animal's full-body image, the animal's paw print, and so on.
  • the target objects involved in the multiple training sample groups may include devices, and correspondingly, the identity information of the target objects may include identification information of devices in the device and device sensor information.
  • the identification information of the device may include IMEI (International Mobile Equipment Identity, International Mobile Equipment Identity) and SIM (Subscriber Identity Modula, Subscriber Identity Modula) card number.
  • the device sensor information may include basic circuit data of the device sensor (such as sensor current, voltage, etc.) and usage status data collected by the device sensor (such as device acceleration, camera noise, etc.).
  • the object identifier of the above-mentioned target object may be a unique identifier assigned to each target object by the system (for example, the execution subject of the training method or the business demander).
  • the object identification may consist of one or more of numbers, letters, or symbols.
  • the object identifiers of two different target objects may be 0011 and 1100, respectively.
  • each training sample group in the above-mentioned multiple training sample groups may include three training samples, or four training samples, or other number of training samples.
  • each training sample group It suffices that a sample pair with the same object ID and a sample pair with different object IDs exist at the same time.
  • the first sample pair and the second sample pair It includes a first training sample and a second training sample with the same object identifier, and the second sample pair includes the first training sample and the third training sample with different object identifiers.
  • the first sample pair and the second sample pair include a first training sample and a second training sample with the same object identifier, and the second sample pair includes a third training sample with a different object identifier.
  • Sample and the fourth training sample for each training sample group in the above-mentioned multiple training sample groups, it may include three training samples, or four training samples, or other number of training samples. The key is that each training sample group It suffices that a sample pair with the same object ID and a sample pair with different object IDs exist at the
  • a batch of training samples may be acquired first, and then the batch of training samples can be divided into the aforementioned multiple training sample groups.
  • a sample from the batch of training samples can be arbitrarily selected as an anchor sample, and then a sample with the same object identifier as the sample can be selected from other samples as a positive sample. ), and a sample with a different object identifier from the certain sample is selected as a negative sample (Negative), so that the certain sample and its corresponding positive sample and negative sample can jointly form a training sample group.
  • a certain sample and its corresponding positive sample can be used as the aforementioned first sample pair with the same object identifier, and the certain sample and its corresponding negative sample can be used as the aforementioned second sample pair with different object identifiers. . Therefore, by performing the above process of selecting anchor points and corresponding positive and negative samples multiple times, the above multiple training sample groups can be obtained based on the batch of training samples.
  • two samples with the same object identifier can be arbitrarily selected from the batch of training samples as a sample pair, and two samples with different objects can be selected from other training samples as the other sample.
  • the one sample pair and the other sample pair can form a training sample group. Therefore, by performing the process of selecting two sample pairs multiple times, the above-mentioned multiple training sample groups can be obtained based on the batch of training samples.
  • multiple training sample groups can be obtained, and for any first sample group included therein, step S320 is performed, and the privacy data corresponding to each training sample in the first sample group is input into the coding model to obtain the corresponding Of multiple feature vectors. It should be understood that by performing step S320 on each training sample group in the multiple training sample groups, a full feature vector corresponding to all training samples in the multiple training sample groups can be obtained.
  • the aforementioned coding model may be implemented by using a neural network.
  • the neural network may include CNN (Convolutional Neural Networks, Convolutional Neural Network) or DNN ((Deep Neural Networks, Deep Neural Network)).
  • step S330, step S340, and step S350 can be performed respectively.
  • step S330 the first feature vector is input to the classification model used to determine the identity of the target object to obtain a first classification result, based on the first classification result and the first object identification, Determine the first category loss.
  • the classification model can be implemented by algorithms such as neural network, gradient decision tree, Bayesian classification, and support vector machine.
  • the classification model may be a multi-classification model.
  • the classification model may be multiple binary classification models.
  • a cross-entropy loss function, a hinge loss function, an exponential loss function, etc. may be used to determine the first classification loss.
  • the first classification loss corresponding to the first training sample can be determined, which means that the classification loss corresponding to each sample in the first sample group and then to the multiple training sample groups can be determined.
  • the classification loss corresponding to each sample is added or the expected value is taken to obtain the classification loss corresponding to multiple training sample groups.
  • the cross-entropy loss function in the following formula (1) can be specifically used to determine the classification loss corresponding to multiple training sample groups.
  • Y represents the corresponding label value, which is determined based on the object identifier of the corresponding training sample.
  • step S330 the classification loss corresponding to multiple training sample groups can be determined.
  • step S340 the first feature vector is input to a decoding model for inverting private data to obtain first inverted data, based on the first inverted data and the first private data, Determine the first decoding loss.
  • the decoding model can be implemented using algorithms such as neural network, gradient decision tree, Bayesian classification, and support vector machine.
  • loss functions such as MSE (Mean Square Error) and MAE (Mean Absolute Error) may be used to determine the first decoding loss.
  • the first decoding loss corresponding to the first training sample can be determined, which means that the decoding loss corresponding to each sample in the first sample group and then to the multiple training sample groups can be determined.
  • the decoding loss corresponding to multiple training sample groups can be obtained.
  • the MAE loss function in the following formula (2) can be specifically used to determine the decoding loss corresponding to multiple training sample groups.
  • step S340 the decoding loss corresponding to multiple training sample groups can be determined.
  • step S350 the multiple feature vectors corresponding to the first sample group determined in step S320 are input into the distinguishing model for distinguishing different target objects, and the first sample centering sample is obtained.
  • the first sample distance between the first sample distance and the second sample distance between the samples in the second sample pair, and the first discrimination loss is determined.
  • the first discrimination loss is positively correlated with the first sample distance and is The second sample distance is negatively correlated.
  • the above-mentioned discrimination model may be implemented by using a triplet network.
  • the second sample pair includes the first training sample and the third training sample.
  • this step may include: splicing the feature vectors corresponding to each of the first training sample, the second training sample, and the third training sample in a preset order, and then inputting the distinguishing model to obtain the first sample distance And the second sample distance.
  • the preset order may be any arrangement order set for the three types of samples: anchor point samples, negative samples, and positive samples.
  • the preset order may be: negative samples, anchor samples, and positive samples.
  • FIG. 4 shows a schematic diagram of the network structure of a triplet network according to an embodiment.
  • the triplet network includes three identical feedforward networks (the three networks share parameters). It is represented by 3 Nets, and X, X + and X - represent the aforementioned anchor point samples, positive samples and negative samples respectively. Sample distance 1 represents the distance between the anchor point sample and the negative sample, and sample distance 2 represents the anchor point The distance between the sample and the positive sample. Further, the above-mentioned first discrimination loss can be determined by using the loss function corresponding to the triplet network.
  • the above-mentioned distinguishing model may be implemented using a four-tuple network.
  • the second sample pair includes a third training sample and a fourth training sample.
  • this step may include: splicing the feature vectors corresponding to the first training sample, the second training sample, the third training sample, and the fourth training sample in a preset order, and then inputting the distinguishing model to obtain the The first sample distance and the second sample distance.
  • the preset order may be that the two samples in the sample pair with the same object identifier (the order of the two samples may not be limited) are the first, and the two samples in the sample pair with different object identifiers Samples (the order of these two samples may not be limited) comes later.
  • the above-mentioned first discrimination loss can be determined by using the loss function corresponding to the quadruple network.
  • the first discrimination loss corresponding to the first sample group can be determined, which means that the discrimination loss corresponding to each sample group in the multiple training sample groups can be determined.
  • the discrimination loss corresponding to multiple training sample groups can be obtained.
  • the triple loss function in the following formula (3) can be used to determine the discrimination loss corresponding to multiple training sample groups.
  • step S350 the discrimination loss corresponding to multiple training sample groups can be determined.
  • step S330 the classification loss, decoding loss, and discrimination loss corresponding to multiple training sample groups can be determined respectively.
  • step S360 with the goal of maximizing the classification loss and decoding loss corresponding to the multiple training sample groups, and minimizing the discrimination loss corresponding to the multiple training samples, the model parameters in the coding model are adjusted .
  • the comprehensive loss may be determined based on the classification loss, decoding loss, and discrimination loss corresponding to multiple training sample groups; then based on the comprehensive loss, the model parameters in the encoding model are adjusted, where the comprehensive loss is equal to The classification loss is negatively correlated with the decoding loss, and positively correlated with the discrimination loss.
  • the following formula (4) can be used to determine the comprehensive loss:
  • L Recognition , L classification and L Reconstruction respectively represent the discrimination loss, classification loss and decoding loss corresponding to multiple training sample groups.
  • weight parameters can also be assigned to the classification loss, decoding loss, and discrimination loss to determine the comprehensive loss, as shown in the following formula (5):
  • ⁇ 1 , ⁇ 2 and ⁇ 3 are weight parameters and are hyperparameters.
  • the values of ⁇ 1 , ⁇ 2 and ⁇ 3 may be 0.5, 0.25, and 0.25, respectively.
  • the classification model and the coding model can be regarded as simulating the attacker model, so as to minimize the discrimination loss corresponding to the multiple training sample groups and maximize the classification loss corresponding to the multiple training sample groups
  • adjusting the model parameters in the coding model can make the coding vector have a high degree of discrimination (to ensure the accuracy and effectiveness of subsequent identification), and at the same time, it can effectively resist the attack of the attacker, making the coding vector
  • it is irreversible that is, it is difficult for criminals to infer or restore the original private data through the encoding vector.
  • the encoding vector is confused, that is, it is difficult for the criminals to classify or determine the identity of the target object through the encoding vector.
  • the training method may further include: adjusting the parameters in the classification model with the goal of minimizing the classification loss corresponding to the multiple training sample groups; And/or, with the goal of minimizing the decoding loss corresponding to the multiple training sample groups, adjust the parameters in the decoding model; and/or, with the goal of minimizing the discrimination loss corresponding to the multiple training sample groups , Adjust the parameters in the distinguishing model. In this way, by introducing an adversarial learning method, the performance of the coding model can be further improved.
  • the above training method requires multiple rounds of iterative training.
  • it can include multiple iterative training and coding of the three models of classification model, decoding model and discrimination model.
  • a training of the model More specifically, in the first round of training, the encoding model can be fixed first, and several batches of training samples can be drawn in sequence to optimize the parameters in the classification model, the decoding model, and the discrimination model. Then, based on the parameters optimized in this round The classification model, the decoding model and the discrimination model, and then a batch of training samples are used to optimize the parameters in the coding model. In this way, after multiple rounds of iterative training, a final convergent coding model can be obtained for subsequent identification of the target object.
  • the training method of the coding model is introduced.
  • the identification method for target objects implemented based on the trained coding model will be introduced.
  • FIG. 5 shows an interaction diagram of a target object identification method for preventing privacy data leakage according to an embodiment, wherein the interaction terminal includes a terminal and a server.
  • the terminal may include a smart phone, a tablet computer, a wearable device, a scanning device, and so on.
  • the server may be a cloud server, and the server may call data records stored in the cloud database.
  • the method includes step S510 to step S550.
  • Step S510 The terminal collects second privacy data.
  • the target object of the identity recognition is the user, and accordingly, the second private data can be collected in response to the collection instruction issued by the user.
  • the face data and mobile phone number may be collected in response to a face-scanning payment instruction issued by the user.
  • the target object of identity recognition is a device. Accordingly, the identity information of the terminal, such as IMEA, SIM card number, and sensor information, can be collected from the terminal periodically based on user authorization.
  • the second privacy data can be collected above.
  • the terminal inputs the second privacy data into the coding model obtained in the above training method to obtain a second feature vector.
  • the terminal sends the second feature vector to the server.
  • an encoding model obtained based on the above training method is deployed in the terminal. Based on this, the terminal can use the encoding model to encode the collected second privacy data to obtain the corresponding second feature vector. In this way, by transmitting, storing and using the second feature vector, the leakage of private data can be effectively prevented.
  • the second private data collected therein can be deleted to prevent the leakage of the private data.
  • step S540 the server compares the second feature vector with multiple feature vectors corresponding to multiple target objects pre-stored in the server to obtain a comparison result, which is used to determine whether the second feature vector is Whether the identification of the target object corresponding to the private data is successful.
  • the multiple feature vectors are obtained by inputting multiple pieces of historical privacy data of the multiple target objects into the coding model.
  • comparing the second feature vector with multiple feature vectors to obtain a comparison result may include: firstly calculating the second feature vector and each feature vector of the multiple feature vectors. And determine the maximum value among them; then, in a specific embodiment, when the maximum value is greater than a preset threshold, it is determined that the identification of the target object corresponding to the second privacy data is successful , As the comparison result; in another specific embodiment, when the maximum value is not greater than a preset threshold, it is determined that the identity recognition fails, as the comparison result.
  • the preset threshold can be set according to actual experience and different business needs, such as 0.99 in the payment scenario, and 0.90 in the unlocking scenario, and the mapping relationship between users and devices is established. Set to 0.80 in the scene.
  • a certain feature vector of the multiple feature vectors corresponding to the maximum value and multiple pre-stored feature vectors can be used.
  • the mapping relationship with multiple user information (including payment accounts, etc.) the payment account corresponding to the certain feature vector is obtained, and the deduction operation for the current order is completed.
  • the identification method may further include step S550, sending the comparison result to the terminal.
  • the comparison result including the success or failure of the above identification may be sent to the terminal.
  • the above-mentioned maximum value may also be sent to the terminal.
  • the above-mentioned server may send the maximum value as the comparison result to the terminal after determining the above-mentioned maximum value without judging the maximum value. Instead, after receiving the maximum value, the terminal determines whether the maximum value is greater than a preset threshold, and then determines whether the identity recognition is successful.
  • the private data is encoded into a feature vector, and the feature vector is transmitted, stored and compared to ensure identification The accuracy and validity of the results.
  • sending the feature vector to the cloud for comparison instead of directly comparing it on the terminal can make the comparison range not limited by the storage resources of the terminal.
  • the embodiment of this specification also discloses a training device and recognition device, which are specifically as follows:
  • the training device 600 may include: a sample obtaining unit 610 configured to obtain a plurality of training sample groups, including any first sample group, the first sample group including a first sample pair And a second sample pair, the first sample pair includes a first training sample and a second training sample, wherein the first training sample includes first private data and a first object identifier that characterize the identity information of the first target object;
  • the second training sample has the first object identifier, and the two samples of the second sample pair have different object identifiers.
  • the coding unit 620 is configured to input the privacy data corresponding to each training sample in the first sample group into the coding model to obtain a plurality of corresponding feature vectors, including the first feature vector corresponding to the first training sample .
  • the classification unit 630 is configured to input the first feature vector into a classification model used to determine the identity of the target object to obtain a first classification result, and determine a first classification loss based on the first classification result and the first object identifier .
  • the decoding unit 640 is configured to input the first feature vector into a decoding model for inverting private data to obtain first inverted data, and determine the first inverted data based on the first inverted data and the first private data. Decoding loss.
  • the distinguishing unit 650 is configured to input a feature vector corresponding to each training sample in the first sample group into a distinguishing model for distinguishing different target objects, to obtain a first sample between samples in the first sample pair Distance, and the second sample distance between the samples in the second sample pair, and determine the first discrimination loss, the first discrimination loss is positively correlated with the first sample distance, and with the second sample distance Negative correlation.
  • the coding model tuning unit 660 is configured to maximize the classification loss and decoding loss corresponding to the multiple training sample groups, and to minimize the discrimination loss corresponding to the multiple training samples, and adjust the coding model Model parameters.
  • the target object includes a user
  • the identity information includes one or more of the following: a face image, a fingerprint image, and an iris image.
  • the target object includes a device
  • the identity information includes one or more of the following: International Mobile Equipment Identity (IMEI), a subscriber identification card SIM card number, and device sensor information.
  • the second sample pair includes the first training sample and the third training sample; wherein the distinguishing unit 650 is specifically configured to: combine the first training sample, the second training sample, and the After the feature vectors corresponding to each of the third training samples are spliced in a preset order, input the discrimination model to obtain the first sample distance and the second sample distance.
  • the second sample pair includes a third training sample and a fourth training sample; wherein the distinguishing unit 650 is specifically configured to: combine the first training sample, the second training sample, the third training sample, and the first training sample. After the feature vectors corresponding to each of the four training samples are spliced in a preset order, input the discrimination model to obtain the first sample distance and the second sample distance.
  • the device 600 further includes: a classification model adjustment unit 670 configured to adjust the parameters in the classification model with the goal of minimizing the classification loss corresponding to the multiple training sample groups; and/or ,
  • the decoding model tuning unit 680 is configured to adjust the parameters in the decoding model with the goal of minimizing the decoding loss corresponding to the multiple training sample groups; and/or, the distinguishing model tuning unit 690 is configured to Minimizing the discrimination loss corresponding to the multiple training sample groups is a goal, and adjusting the parameters in the discrimination model.
  • the coding model tuning unit 660 is specifically configured to perform the classification loss, decoding loss, and discrimination loss based on preset weight parameters for the classification loss, decoding loss, and discrimination loss. Weighted summation is used to obtain a comprehensive loss, which is negatively correlated with the classification loss and decoding loss, and positively correlated with the discrimination loss; based on the comprehensive loss, the model parameters in the coding model are adjusted.
  • Fig. 7 shows a structural diagram of an identity recognition device for preventing privacy data leakage according to an embodiment, and the device is integrated in a server.
  • the identification device 700 includes: a vector receiving unit 710 configured to receive a second feature vector from the terminal, the second feature vector being determined by the terminal inputting the collected second privacy data into the coding model ; Wherein the coding model is pre-trained based on the device shown in FIG. 6.
  • the vector comparison unit 720 is configured to compare the second feature vector with multiple feature vectors corresponding to multiple target objects pre-stored in the server to obtain a comparison result, which is used to determine whether the second feature vector is 2. Whether the identity recognition of the target object corresponding to the private data is successful; wherein the multiple feature vectors are obtained by inputting multiple pieces of historical privacy data of the multiple target objects into the coding model.
  • the identification device 700 further includes: a result sending unit 730 configured to send the comparison result to the terminal.
  • the vector comparison unit 720 is specifically configured to: calculate the similarity between the second feature vector and each feature vector of the plurality of feature vectors, and determine the maximum value thereof; In a case where the maximum value is greater than a preset threshold, it is determined that the identity recognition of the target object corresponding to the second privacy data is successful, as the comparison result.
  • the vector comparison unit 720 is specifically configured to calculate the similarity between the second feature vector and each feature vector of the multiple feature vectors, and determine the maximum value thereof, as The comparison result; wherein the result sending unit 730 is specifically configured to send the maximum value to the terminal, so that the terminal can determine whether the second private data is targeted for the second private data according to the maximum value and a preset threshold. Corresponding to whether the identification of the target object is successful.
  • Fig. 8 shows a structural diagram of an identity recognition device for preventing privacy data leakage according to another embodiment, and the device is integrated in a terminal.
  • the identification device 800 includes: a data collection unit 810, configured to collect second privacy data; an encoding unit 820, configured to input the second privacy data into an encoding model to obtain a second feature vector, so The coding model is pre-trained based on the device shown in FIG. 6; the vector sending unit 830 is configured to send the second feature vector to a server, so that the server will compare the second feature vector with the server.
  • the pre-stored multiple feature vectors corresponding to multiple target objects are compared to obtain a comparison result, which is used to determine whether the identity recognition of the target object corresponding to the second privacy data is successful.
  • the comparison result includes the maximum value of the similarity between the second feature vector and each feature vector of the plurality of feature vectors
  • the device 800 further includes: a result receiving unit 840 configured to To receive the maximum value from the server; the determining unit 850 is configured to determine that the identification of the target object corresponding to the second privacy data is successful when the maximum value is greater than a preset threshold.
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 3 or FIG. 5.
  • a computing device including a memory and a processor, the memory stores executable code, and when the processor executes the executable code, a combination of FIG. 3 or FIG. 5 is implemented. The method described.
  • the functions described in the present invention can be implemented by hardware, software, firmware, or any combination thereof.
  • these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Bioethics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Image Analysis (AREA)
  • Collating Specific Patterns (AREA)

Abstract

一种编码模型训练方法,该方法包括:首先,获取多个训练样本,其中每个训练样本包括表征对应目标对象身份信息的隐私数据和对象标识;然后,将多个训练样本分别输入编码模型中,得到多个特征向量;接着,将多个特征向量分别输入用于确定目标对象身份的分类模型、用于反推隐私数据的解码模型和用于区分不同目标对象的区分模型,以对应确定分类损失、解码损失和区分损失;再然后,以最大化分类损失和解码损失,以及最小化区分损失为目标,对编码模型调参。此外,一种目标对象身份识别方法,利用训练好的编码模型对采集的隐私数据进行编码,并对得到的特征向量进行传输、存储和比对使用。如此,可以有效防止隐私数据的泄漏。

Description

防止隐私数据泄漏的编码模型训练方法及装置 技术领域
本说明书一个或多个实施例涉及将机器学习应用于数据安全的技术领域,尤其涉及一种防止隐私数据泄漏的编码模型训练方法及装置、一种防止隐私数据泄漏的目标对象身份识别方法。
背景技术
在许多场景下,都涉及通过采集目标对象(如用户或设备等)的隐私数据,来对该目标对象进行身份识别或身份核验。例如,在人脸支付场景下,可以通过采集用户的人脸信息,识别该用户的身份(如支付系统中的用户ID),以根据该身份查找到对应支付账户,完成相应订单的支付。又例如,在用户行为分析场景下,可以通过采集终端设备在使用过程中产生的传感器数据等,识别该设备的身份(如数据分析系统为该设备分配的设备ID),以建立用户与设备之间的映射关系。显然,这些场景下,对身份识别的准确性都提出较高的要求。
然而,在以上身份识别过程中涉及的针对隐私数据的采集、传输、存储和使用环节,均存在隐私数据泄漏的风险。因此,迫切需要一种合理、可靠的方案,可以在保证针对目标对象进行身份识别的准确性的同时,有效降低隐私数据泄漏的风险。
发明内容
本说明书一个或多个实施例描述了一种防止隐私数据泄漏的编码模型训练方法及装置,以及一种防止隐私数据泄漏的目标对象身份识别方法及装置,可以在保证针对目标对象进行身份识别的准确性的同时,有效降低隐私数据泄漏的风险。
根据第一方面,提供一种防止隐私数据泄漏的编码模型训练方法,该方法包括:获取多个训练样本组,包括任意的第一样本组,所述第一样本组包括第一样本对和第二样本对,所述第一样本对包括第一训练样本和第二训练样本,其中第一训练样本包括表征第一目标对象身份信息的第一隐私数据和第一对象标识;所述第二训练样本具有所述第一对象标识,所述第二样本对的两个样本具有不同对象标识;将所述第一样本组中各训练样本对应的隐私数据分别输入编码模型,得到对应的多个特征向量,其中包括对应于 所述第一训练样本的第一特征向量;将所述第一特征向量输入用于确定目标对象身份的分类模型,得到第一分类结果,基于所述第一分类结果和所述第一对象标识,确定第一分类损失;将所述第一特征向量输入用于反推隐私数据的解码模型,得到第一反推数据,基于所述第一反推数据和所述第一隐私数据,确定第一解码损失;将所述第一样本组中各训练样本对应的特征向量输入用于区分不同目标对象的区分模型,得到所述第一样本对中样本之间的第一样本距离,以及第二样本对中样本之间的第二样本距离,并且,确定第一区分损失,所述第一区分损失与所述第一样本距离正相关,且与所述第二样本距离负相关;以最大化所述多个训练样本组对应的分类损失和解码损失,以及最小化所述多个训练样本对应的区分损失为目标,调整所述编码模型中的模型参数。
在一个实施例中,所述目标对象包括用户,所述身份信息包括以下中的一种或多种:人脸图像、指纹图像、虹膜图像。
在一个实施例中,所述目标对象包括设备,所述身份信息包括以下中的一种或多种:国际移动设备识别码IMEI、用户识别卡SIM的卡号、设备传感器信息。
在一个实施例中,所述第二样本对包括所述第一训练样本和第三训练样本;其中将所述第一样本组中各训练样本对应的特征向量输入用于区分不同目标对象的区分模型,得到所述第一样本对中样本之间的第一样本距离,以及第二样本对中样本之间的第二样本距离,包括:将所述第一训练样本、所述第二训练样本和所述第三训练样本各自对应的特征向量按照预设顺序拼接后,输入所述区分模型,得到所述第一样本距离以及所述第二样本距离。
在一个实施例中,所述第二样本对包括第三训练样本和第四训练样本;其中将所述第一样本组中各训练样本对应的特征向量输入用于区分不同目标对象的区分模型,得到所述第一样本对中样本之间的第一样本距离,以及第二样本对中样本之间的第二样本距离,包括:将所述第一训练样本、第二训练样本、第三训练样本和第四训练样本各自对应的特征向量按照预设顺序拼接后,输入所述区分模型,得到所述第一样本距离以及所述第二样本距离。
在一个实施例中,在确定第一区分损失之后,所述方法还包括:以最小化所述多个训练样本组对应的分类损失为目标,调整所述分类模型中参数;和/或,以最小化所述多个训练样本组对应的解码损失为目标,调整所述解码模型中的参数;和/或,以最小化所述多个训练样本组对应的区分损失为目标,调整所述区分模型中的参数。
在一个实施例中,其中以最大化所述多个训练样本对应的分类损失和解码损失,以及最小化所述多个训练样本对应的区分损失为目标,调整所述编码模型中的模型参数,包括:基于预先设定的针对所述分类损失、解码损失和区分损失的权重参数,对所述分类损失、解码损失和区分损失进行加权求和,得到综合损失,所述综合损失与所述分类损失和解码损失负相关,且与所述区分损失正相关;基于所述综合损失,调整所述编码模型中的模型参数。
根据第二方面,提供一种防止隐私数据泄漏的目标对象身份识别方法,所述方法的执行主体为服务器,所述识别方法包括:从终端接收第二特征向量,所述第二特征向量由所述终端将采集的第二隐私数据输入编码模型而确定;其中所述编码模型基于上述第一方面所述的方法而预先训练得到;将所述第二特征向量与所述服务器中预先存储的对应于多个目标对象的多个特征向量进行比对,得到比对结果,用于判别针对所述第二隐私数据所对应目标对象的身份识别是否成功;其中所述多个特征向量通过将所述多个目标对象的多条历史隐私数据输入所述编码模型而得到。
根据第三方面,提供一种防止隐私数据泄漏的目标对象识别方法,所述方法的执行主体为终端,所述识别方法包括:采集第二隐私数据;将所述第二隐私数据输入编码模型,得到第二特征向量,所述编码模型基于第一方面所述的方法而预先训练得到;将所述第二特征向量发送至服务器,以使所述服务器将所述第二特征向量与所述服务器中预先存储的对应于多个目标对象的多个特征向量进行比对,得到比对结果,用于判别针对所述第二隐私数据所对应目标对象的身份识别是否成功。
根据第四方面,提供一种防止隐私数据泄漏的编码模型训练装置,包括:样本获取单元,配置为获取多个训练样本组,包括任意的第一样本组,所述第一样本组包括第一样本对和第二样本对,所述第一样本对包括第一训练样本和第二训练样本,其中第一训练样本包括表征第一目标对象身份信息的第一隐私数据和第一对象标识;所述第二训练样本具有所述第一对象标识,所述第二样本对的两个样本具有不同对象标识;编码单元,配置为将所述第一样本组中各训练样本对应的隐私数据分别输入编码模型,得到对应的多个特征向量,其中包括对应于所述第一训练样本的第一特征向量;分类单元,配置为将所述第一特征向量输入用于确定目标对象身份的分类模型,得到第一分类结果,基于所述第一分类结果和所述第一对象标识,确定第一分类损失;解码单元,配置为将所述第一特征向量输入用于反推隐私数据的解码模型,得到第一反推数据,基于所述第一反推数据和所述第一隐私数据,确定第一解码损失;区分单元,配置为将所述第一样本组 中各训练样本对应的特征向量输入用于区分不同目标对象的区分模型,得到所述第一样本对中样本之间的第一样本距离,以及第二样本对中样本之间的第二样本距离,并且,确定第一区分损失,所述第一区分损失与所述第一样本距离正相关,且与所述第二样本距离负相关;编码模型调参单元,配置为以最大化所述多个训练样本组对应的分类损失和解码损失,以及最小化所述多个训练样本对应的区分损失为目标,调整所述编码模型中的模型参数。
根据第五方面,提供一种防止隐私数据泄漏的目标对象身份识别装置,所述装置集成于服务器,所述识别装置包括:向量接收单元,配置为从终端接收第二特征向量,所述第二特征向量由所述终端将采集的第二隐私数据输入编码模型而确定;其中所述编码模型上述第一四方面中的装置而预先训练得到;向量比对单元,配置为将所述第二特征向量与所述服务器中预先存储的对应于多个目标对象的多个特征向量进行比对,得到比对结果,用于判别针对所述第二隐私数据所对应目标对象的身份识别是否成功;其中所述多个特征向量通过将所述多个目标对象的多条历史隐私数据输入所述编码模型而得到。
根据第六方面,提供一种防止隐私数据泄漏的目标对象识别装置,所述装置集成于终端,所述识别装置包括:数据采集单元,配置为采集第二隐私数据;编码单元,配置为将所述第二隐私数据输入编码模型,得到第二特征向量,所述编码模型基于第四方面中的装置而预先训练得到;向量发送单元,配置为将所述第二特征向量发送至服务器,以使所述服务器将所述第二特征向量与所述服务器中预先存储的对应于多个目标对象的多个特征向量进行比对,得到比对结果,用于判别针对所述第二隐私数据所对应目标对象的身份识别是否成功。
根据第七方面,提供了一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行第一方面或第二方面或第三方面的方法。
根据第八方面,提供了一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现第一方面或第二方面或第三方面的方法。
综上,在本说明书实施例披露的上述训练方法及装置中,以最小化区分损失,以及最大化分类损失和解码损失为目标,调整所述编码模型中的模型参数,可以使得编码向量具有高区分度(以保证后续身份识别的准确度和有效性),同时,可以使得编码向量 一方面不可逆,也就是不法分子难以通过编码向量反推或还原出原始隐私数据,另一方面对编码向量进行混淆,也就是不法分子难以通过编码向量实现分类或者说实现对目标对象身份的确定。
此外,在本说明书实施例披露的上述身份识别方法及装置中,通过采用由上述训练方法得到的编码模型,将隐私数据编码为特征向量,并对特征向量进行传输、存储和比对,可以保证身份识别结果的准确度和有效性。同时,即使特征向量发生泄漏,因其具有不可逆性和混淆性,不法分子难以基于特征向量获得可用信息,从而实现有效防止隐私数据的泄漏。并且,将特征向量发送至云端比对,而不是直接在终端进行比对,可以使得比对范围不受终端存储资源的限制。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1示出根据一个实施例的一种防止隐私数据泄漏的编码模型训练方法的实施框图;
图2示出根据一个实施例的一种防止隐私数据泄漏的目标对象识别方法的实施框图;
图3示出根据一个实施例的一种防止隐私数据泄漏的编码模型训练方法流程图;
图4示出根据一个实施例的三元组网络的网络结构示意图;
图5示出根据一个实施例的一种防止隐私数据泄漏的目标对象识别方法交互图;
图6示出根据一个实施例的一种防止隐私数据泄漏的编码模型训练装置结构图;
图7示出根据一个实施例的一种防止隐私数据泄漏的身份识别装置结构图;
图8示出根据另一个实施例的一种防止隐私数据泄漏的身份识别装置结构图。
具体实施方式
下面结合附图,对本说明书提供的方案进行描述。
如前所述,在隐私数据的采集、传输、存储和使用环节,均存在隐私数据泄漏的风 险。目前,在一种方案中,可以在采集到目标对象的隐私数据后,对其进行加密,然后传输和存储加密后的数据,以使得传输和存储环节泄漏的数据对不法分子而言不可用。但是,在使用过程中,需要对加密数据进行解密,以对隐私数据还原,故仍存在泄漏风险,并且,在密钥泄漏或密钥被破解的情况下,也会导致隐私数据的泄漏。在另一种方案中,可以在采集到的隐私数据中加入噪音(如水印),以降低隐私数据的辨识度,之后对加入噪音的隐私数据进行传输、存储和使用。然而,此种降低隐私数据辨识度的方法,很难同时满足隐私数据的辨识度低和目标对象身份识别的准确性这两点要求。在又一种方案中,可以在设备端或边缘端完成隐私数据的采集和计算并返回决策结果,不对采集的隐私数据进行传输和存储。但是,由于端上的存储资源和网络资源的限制,端上可比对的样本库大小受限且不能实时更新,导致身份识别的成功率和覆盖率十分有限。
基于以上观察和分析,发明人提出通过引入对抗学习的思想,设计一种防止隐私数据泄漏的编码模型训练方法,以及,基于所述编码模型实现的一种防止隐私数据泄漏的目标对象身份识别方法。采用所述训练方法和识别方法,可以在保证针对目标对象进行身份识别的准确性的同时,有效降低隐私数据泄漏的风险。
具体地,图1示出根据一个实施例的一种防止隐私数据泄漏的编码模型训练方法的实施框图。在一个实施例中,如图1所示,首先,抽取一批训练样本,其中每个训练样本包括对应目标对象的隐私数据(X)和对象标识(Y);接着,将这批训练样本分别输入编码模型中,得到对应的一批特征向量(Vx);然后,将这批特征向量分别输入用于确定目标对象的身份的分类模型、用于反推隐私数据的解码模型,和用于区分不同目标对象的区分模型中,以分别确定该批训练样本对应的分类损失、解码损失和区分损失;再接着,先固定编码器中的模型参数,分别以最小化所述分类损失、解码损失和区分损失为目标,对应调整分类模型、解码模型和区分模型中的模型参数。进一步地,在一个具体的实施例中,再抽取另一批训练样本,重复上述过程得到对应于该另一批训练样本的分类损失、解码损失和区分损失;然后,固定上述调参后的分类模型、解码模型和区分模型中的模型参数,以最大化该另一批训练样本对应的分类损失和解码损失,以及最小化对应的区分损失为目标,调整编码模型中的参数。如此循环迭代,可以得到最终训练好的编码模型。并且,由该编码模型得到的特征向量,针对不同目标对象具有良好的区分度,同时,不法分子很难根据泄漏的特征向量还原出可用的隐私数据,也无法根据泄漏的特征向量确定目标对象身份,进而有效防止隐私数据的泄漏。
进一步地,利用最终训练好的编码模型,可以实现在识别目标对象身份的过程中, 有效防止隐私数据的泄漏。图2示出根据一个实施例的一种防止隐私数据泄漏的目标对象识别方法的实施框图。在一个实施例中,如图2所示,首先,终端采集隐私数据(如用户的人脸图像),再利用终端中部署的编码模型,对隐私数据进行编码,得到对应的特征向量;然后,终端将特征向量发送至云端服务器;再接着,服务器将接收到的特征向量与其中已存储的对应于多个目标对象的多个特征向量进行比对,并将比对结果返回给终端;再然后,终端根据比对结果确定身份识别的最终结果。如此,在身份识别过程中,传输、存储和使用的均为编码模型输出的特征向量,可以有效防止隐私数据的泄漏。
下面,结合具体的实施例,描述上述保护方法的实施步骤。
具体地,图3示出根据一个实施例的一种防止隐私数据泄漏的编码模型训练方法流程图,所述方法的执行主体可以为任何具有计算、处理能力的装置、设备、平台、设备集群。如图3所示,所述方法包括以下步骤:步骤S310,获取多个训练样本组,包括任意的第一样本组,所述第一样本组包括第一样本对和第二样本对,所述第一样本对包括第一训练样本和第二训练样本,其中第一训练样本包括表征第一目标对象身份信息的第一隐私数据和第一对象标识;所述第二训练样本具有所述第一对象标识,所述第二样本对的两个样本具有不同对象标识。步骤S320,将所述第一样本组中各训练样本对应的隐私数据分别输入编码模型,得到对应的多个特征向量,其中包括对应于所述第一训练样本的第一特征向量。步骤S330,将所述第一特征向量输入用于确定目标对象身份的分类模型,得到第一分类结果,基于所述第一分类结果和所述第一对象标识,确定第一分类损失。步骤S340,将所述第一特征向量输入用于反推隐私数据的解码模型,得到第一反推数据,基于所述第一反推数据和所述第一隐私数据,确定第一解码损失。步骤S350,将所述第一样本组中各训练样本对应的特征向量输入用于区分不同目标对象的区分模型,得到所述第一样本对中样本之间的第一样本距离,以及第二样本对中样本之间的第二样本距离,并且,确定第一区分损失,所述第一区分损失与所述第一样本距离正相关,且与所述第二样本距离负相关。步骤S360,以最大化所述多个训练样本组对应的分类损失和解码损失,以及最小化所述多个训练样本对应的区分损失为目标,调整所述编码模型中的模型参数。
首先需要说明的是,上述第一样本组、第一样本对、第一目标对象、第一对象标识和第一特征向量等中的“第一”,以及第二样本对、第二训练样本等中的“第二”仅用于区分同类事物,不具有其他限定作用。
以上步骤具体如下。
首先,在步骤S310,获取多个训练样本组。
在一个实施例中,多个训练样本组所涉及的目标对象可以包括用户,相应地,在一个具体的实施例中,目标对象的身份信息可以包括用户的生物特征信息,如人脸图像、指纹图像和虹膜图像等等。在另一个的具体的实施例中,目标对象的身份信息还可以包括用户的手机号、身份证号等。
在另一个实施例中,多个训练样本组所涉及的目标对象可以包括动物,如马、猫、狗、猪等,相应地,目标对象的身份信息可以包括动物的生物特征信息。在一个具体的实施例中,其中动物的生物特征信息可以包括动物的面部头像、动物的全身图像、动物的爪印等等。在又一个实施例中,多个训练样本组所涉及的目标对象可以包括设备,相应地,目标对象的身份信息可以包括设备中器件的标识信息和设备传感器信息。在一个具体的实施例中,其中器件的标识信息可以包括IMEI(International Mobile Equipment Identity,国际移动设备识别码)和SIM(Subscriber Identity Modula,用户识别卡)的卡号。在一个具体的实施例中,其中设备传感器信息可以包括设备传感器的基础电路数据(如传感器电流、电压等)和设备传感器所采集的使用状态数据(如设备加速度、摄像头杂音等)。
在一个实施例中,上述目标对象的对象标识可以为系统(如所述训练方法的执行主体或业务需求方)为每个目标对象分配的唯一标识。在一个具体的实施例中,其中对象标识可以由数字、字母或符号中的一种或几种组成。例如,两个不同目标对象的对象标识可以分别为0011和1100。
在一个实施例中,对于上述多个训练样本组中的每个训练样本组,可以包括三个训练样本、或四个训练样本、或其他数量的训练样本,关键在于,每个训练样本组中同时存在具有相同对象标识的样本对和不同对象标识的样本对即可。进一步地,对于多个训练样本组中任意的第一样本组,其中包括第一样本对和第二样本对,在一个具体的实施例中,其中第一样本对和第二样本对包括具有相同对象标识的第一训练样本和第二训练样本,第二样本对中包括具有不同对象标识的第一训练样本和第三训练样本。在另一个具体的实施例中,其中第一样本对和第二样本对包括具有相同对象标识的第一训练样本和第二训练样本,第二样本对中包括具有不同对象标识的第三训练样本和第四训练样本。
另一方面,在一个实施例中,可以先获取一批训练样本,再将这批训练样本划分为 上述多个训练样本组。在一个具体的实施例中,可以从这批训练样本中任意选取某个样本作为锚点(Anchor)样本,再从其他样本中选取与该某个样本具有相同对象标识的样本作为正样本(Positive),并且选取与该某个样本具有不同对象标识的样本作为负样本(Negative),如此该某个样本与其对应的正样本和负样本可以共同组成一个训练样本组。需要理解,其中某个样本和其对应的正样本可作为上述具有相同对象标识的第一样本对,且该某个样本和其对应的负样本可作为上述具有不同对象标识的第二样本对。由此,多次执行上述选取锚点和对应正负样本的过程,可以基于该批训练样本,获得上述多个训练样本组。
在另一个具体的实施例中,可以从这批训练样本中任意选取具有相同对象标识的两个样本作为一个样本对,并且从其他训练样本中选取具有不同对象的两个样本作为另一个样本,如此该一个样本对和该另一个样本对就可以组成一个训练样本组。由此,多次执行选取两个样本对的过程,可以基于该批训练样本,获得上述多个训练样本组。
以上,可以获取多个训练样本组,并且,对于其中包括的任意的第一样本组,执行步骤S320,将该第一样本组中各训练样本对应的隐私数据分别输入编码模型,得到对应的多个特征向量。需要理解,对多个训练样本组中每个训练样本组均执行步骤S320,可以得到对应于多个训练样本组中全量训练样本的全量特征向量。
在一个实施例中,上述编码模型可以采用神经网络实现。在一个具体的实施例中,其中神经网络可以包括CNN(Convolutional Neural Networks,卷积神经网络)或DNN((Deep Neural Networks,深度神经网络)。
需要理解,对于第一样本组中包括的任意的第一训练样本,将该第一训练样本中的隐私数据输入编码模型中,可以得到对应的第一特征向量。基于此,可以分别执行步骤S330、步骤S340和步骤S350。
具体地,一方面,在步骤S330中,将所述第一特征向量输入用于确定目标对象身份的分类模型,得到第一分类结果,基于所述第一分类结果和所述第一对象标识,确定第一分类损失。
在一个实施例中,其中分类模型可以采用神经网络、梯度决策树、贝叶斯分类、支持向量机等算法实现。在一个具体的实施例中,其中分类模型可以为多分类模型。在另一个实施例中,其中分类模型可以为多个二分类模型。在一个实施例中,可以采用交叉熵损失函数、铰链损失函数、指数损失函数等,确定第一分类损失。
如此,可以确定第一训练样本对应的第一分类损失,也就意味着可以确定出第一样本组中,再至多个训练样本组中每个样本对应的分类损失。相应地,对该每个样本对应的分类损失进行加和或者取期望值等运算,可以得到多个训练样本组对应的分类损失。在一个例子中,具体可以采用以下公式(1)中的交叉熵损失函数,确定多个训练样本组对应的分类损失。
Figure PCTCN2020124681-appb-000001
其中
Figure PCTCN2020124681-appb-000002
表示分类模型输出的预测值;Y表示对应的标签值,基于对应训练样本的对象标识而确定,具体可参见相关现有技术,在此不赘述。
以上通过步骤S330,可以确定出多个训练样本组对应的分类损失。
另一方面,在步骤S340中,将所述第一特征向量输入用于反推隐私数据的解码模型,得到第一反推数据,基于所述第一反推数据和所述第一隐私数据,确定第一解码损失。
在一个实施例中,其中解码模型可以采用神经网络、梯度决策树、贝叶斯分类、支持向量机等算法实现。在一个实施例中,可以采用MSE(Mean Square Error,均方误差)、MAE(Mean Absolute Error,平均绝对误差)等损失函数,确定第一解码损失。
如此,可以确定第一训练样本对应的第一解码损失,也就意味着可以确定出第一样本组中,再至多个训练样本组中每个样本对应的解码损失。相应地,对该每个样本对应的解码损失进行加和或者取期望值等运算,可以得到多个训练样本组对应的解码损失。在一个例子中,具体可以采用以下公式(2)中的MAE损失函数,确定多个训练样本组对应的解码损失。
Figure PCTCN2020124681-appb-000003
其中
Figure PCTCN2020124681-appb-000004
表示解码模型输出的反推数据,X表示对应的原始的隐私数据。
以上通过步骤S340,可以确定出多个训练样本组对应的解码损失。
又一方面,在步骤S350中,将步骤S320中确定出的第一样本组对应的多个特征向量输入用于区分不同目标对象的区分模型中,得到所述第一样本对中样本之间的第一样本距离,以及第二样本对中样本之间的第二样本距离,并且,确定第一区分损失,所述第一区分损失与所述第一样本距离正相关,且与所述第二样本距离负相关。
在一个实施例中,上述区分模型可以采用三元组网络(Triplet Network)实现。具体 地,其中第二样本对包括所述第一训练样本和第三训练样本。相应地,本步骤可以包括:将上述第一训练样本、第二训练样本和第三训练样本各自对应的特征向量按照预设顺序拼接后,输入所述区分模型,得到所述第一样本距离以及所述第二样本距离。在一个具体的实施例中,其中预设顺序可以为针对锚点样本、负样本和正样本这三种样本设定的任意排列顺序。在一个例子中,预设顺序可以为先后为:负样本、锚点样本和正样本。在一个具体的实施例中,图4示出根据一个实施例的三元组网络的网络结构示意图,该三元组网络包括3个相同的前馈网络(这3个网络共享参数),图中用3个Net进行表示,并且,X、X +和X -分别表示前述锚点样本、正样本和负样本,样本距离1表示锚点样本和负样本之间的距离,样本距离2表示锚点样本和正样本之间的距离。进一步地,上述第一区分损失可以利用三元组网络对应的损失函数确定。
在另一个实施例中,上述区分模型可以采用四元组网络实现。具体地,其中第二样本对包括第三训练样本和第四训练样本。相应地,本步骤可以包括:将所述第一训练样本、第二训练样本、第三训练样本和第四训练样本各自对应的特征向量按照预设顺序拼接后,输入所述区分模型,得到所述第一样本距离以及所述第二样本距离。在一个具体的实施例中,其中预设顺序可以为具有相同对象标识的样本对中的两个样本(这两个样本的顺序可以不限定)在前,具有不同对象标识的样本对中的两个样本(这两个样本的顺序可以不限定)在后。进一步地,上述第一区分损失可以利用四元组网络对应的损失函数确定。
如此,可以确定第一样本组对应的第一区分损失,也就意味着可以确定出多个训练样本组中每个样本组对应的区分损失。相应地,对该每个样本组对应的区分损失进行加和或者取期望值等运算,可以得到多个训练样本组对应的区分损失。在一个例子中,具体可以采用以下公式(3)中的三元组损失函数,确定多个训练样本组对应的区分损失。
L Recognition=∑(||Net(X)-Net(X +)|| 2-||Net(X)-Net(X -)|| 2+α   (3)
其中X、X +和X -分别表示锚点样本、正样本和负样本对应的特征向量,||Net(X A)-Net(X p)|| 2表示由区分模型输出的锚点样本和正样本之间的距离,||Net(X A)-Net(X N)|| 2表示由区分模型输出的锚点样本和负样本之间的距离,α为超参,例如可以设定为1。
以上通过步骤S350,可以确定出多个训练样本组对应的区分损失。
由上,在步骤S330、步骤S340和步骤S350,可以分别确定多个训练样本组对应的 分类损失、解码损失和区分损失。基于此,在步骤S360,以最大化所述多个训练样本组对应的分类损失和解码损失,以及最小化所述多个训练样本对应的区分损失为目标,调整所述编码模型中的模型参数。
在一个实施例中,可以先基于多个训练样本组对应的分类损失、解码损失和区分损失,确定综合损失;再基于该综合损失,调整所述编码模型中的模型参数,其中该综合损失与分类损失和解码损失负相关,且与区分损失正相关。在一个具体的实施例中,可以采用以下公式(4)确定综合损失:
L=L Recognition-L classification-L Reconstruction     (4)
其中L Recognition、L classification和L Reconstruction分别表示多个训练样本组对应的区分损失、分类损失和解码损失。
在另一个具体的实施例中,还可以为分类损失、解码损失和区分损失分配不同的权重参数,进而确定综合损失,具体如下式(5)所示:
L=α 1L Recognition2L classification3L Reconstruction       (5)
其中α 1、α 2和α 3为权重参数,且为超参。在一个例子中,α 1、α 2和α 3的取值可以分别为0.5、0.25和0.25。
需要理解,可以将分类模型和编码模型看作是在模拟攻击者模型,如此,以最小化所述多个训练样本组对应的区分损失,以及最大化所述多个训练样本组对应的分类损失和解码损失为目标,调整所述编码模型中的模型参数,可以使得编码向量具有高区分度(以保证后续身份识别的准确度和有效性),同时,有效抵御攻击者的攻击,使得编码向量一方面不可逆,也就是不法分子难以通过编码向量反推或还原出原始隐私数据,另一方面对编码向量进行混淆,也就是不法分子难以通过编码向量实现分类或者说实现对目标对象身份的确定。
此外需要说明的是,在一个实施例中,在步骤S360之后,所述训练方法还可以包括:以最小化所述多个训练样本组对应的分类损失为目标,调整所述分类模型中参数;和/或,以最小化所述多个训练样本组对应的解码损失为目标,调整所述解码模型中的参数;和/或,以最小化所述多个训练样本组对应的区分损失为目标,调整所述区分模型中的参数。如此,通过引入对抗学习的方式,可以进一步提高编码模型的性能。
下面,结合一个具体的例子,对上述训练方法进行进一步说明。在一个例子中,得 到最终训练好的编码模型需要进行多轮迭代训练,在其中一轮训练中,又可以包括对分类模型、解码模型和区分模型这三个模型的多次迭代训练和对编码模型的一次训练。更具体地,在第一轮训练中,可以先固定编码模型,依次抽取几个批次训练样本,以优化分类模型、解码模型和区分模型中的参数,然后,基于此轮中参数优化后的分类模型、解码模型和区分模型,再次后去一批训练样本,以优化编码模型中的参数。如此,经过多轮迭代训练,可以得到最终收敛的编码模型,用于后续针对目标对象的身份识别。
以上,对编码模型的训练方法进行介绍。接下来,再对基于该训练好的编码模型而实施的针对目标对象的身份识别方法进行介绍。
具体地,图5示出根据一个实施例的一种防止隐私数据泄漏的目标对象识别方法交互图,其中的交互端包括终端和服务器。需要说明的是,在一个实施例中,其中终端可以包括智能手机、平板电脑、可穿戴设备、扫描设备等等。在一个实施例中,其中服务器可以为云端服务器,并且该服务器可以调用云端数据库中存储的数据记录。
如图5所示,所述方法包括步骤S510至步骤S550。
步骤S510,终端采集第二隐私数据。
在一个实施例中,身份识别的目标对象为用户,相应地,可以响应于用户发出的采集指令,采集第二隐私数据。在一个具体的实施例中,可以响应于用户发出的刷脸支付指令,采集人脸数据和手机号。在另一个实施例中,身份识别的目标对象为设备,相应地,可以基于用户授权,定期从终端采集终端的身份信息,如IMEA、SIM卡号和传感器信息等。
以上可以采集第二隐私数据。接着,在步骤S520,终端将所述第二隐私数据输入上述训练方法中得到的编码模型,得到第二特征向量。并且,步骤S530,终端将第二特征向量发送至服务器。
需要说明的是,终端中部署有基于上述训练方法得到的编码模型,基于此,终端可以利用该编码模型对采集到的第二隐私数据进行编码,得到对应的第二特征向量。如此,通过对第二特征向量进行传输、存储和使用,可以有效防止隐私数据的泄漏。可选地,在终端生成第二特征向量后,可以对其中采集的第二隐私数据进行删除,以防止隐私数据的泄漏。
然后,在步骤S540,服务器将所述第二特征向量与所述服务器中预先存储的对应于 多个目标对象的多个特征向量进行比对,得到比对结果,用于判别针对所述第二隐私数据所对应目标对象的身份识别是否成功。
在一个实施例中,其中所述多个特征向量通过将所述多个目标对象的多条历史隐私数据输入所述编码模型而得到。
在一个实施例中,其中将第二特征向量与多个特征向量进行比对,得到比对结果,可以包括:先分别计算所述第二特征向量与所述多个特征向量中各个特征向量之间的相似度,并确定其中的最大值;然后,在一个具体的实施例中,在该最大值大于预设阈值的情况下,判定针对所述第二隐私数据所对应目标对象的身份识别成功,作为所述比对结果;在另一个具体的实施例中,在该最大值不大于预设阈值的情况下,判定身份识别失败,作为所述比对结果。在一个例子中,其中预设阈值可以根据实际经验和不同的业务需求进行设定,如在支付场景下设定为0.99,而在解除门禁场景下设定为0.90,在建立用户与设备映射关系的场景下设定为0.80。
需要说明的是,在一个实施例中,在支付场景下,在上述判定身份识别成功后,可以根据该最大值所对应的多个特征向量中的某特征向量,以及预先存储的多个特征向量与多个用户信息(包括支付账户等)之间的映射关系,获取该某特征向量对应的支付账户,并完成针对当前订单的扣款操作。
此外,在一个实施例中,在步骤S540之后,所述识别方法还可以包括步骤S550,将比对结果发送至终端。在一个具体的实施例中,可以将包括上述身份识别成功或身份识别失败的比对结果发送至终端。在另一个具体的实施例中,还可以将上述最大值发送至终端,此时,上述服务器可以在确定上述最大值后,将最大值作为比对结果发送至终端,而无需对最大值进行判断操作,而是由终端在接收到最大值后,判断该最大值是否大于预设阈值,进而确定身份识别是否成功。
以上,采用本说明书实施例披露的目标对象身份识别方法,通过采用由上述训练方法得到的编码模型,将隐私数据编码为特征向量,并对特征向量进行传输、存储和比对,可以保证身份识别结果的准确度和有效性。同时,即使特征向量发生泄漏,因其具有不可逆性和混淆性,不法分子难以基于特征向量获得可用信息,从而实现有效防止隐私数据的泄漏。此外,将特征向量发送至云端比对,而不是直接在终端进行比对,可以使得比对范围不受终端存储资源的限制。
与上述训练方法和识别方法相对应的,本说明书实施例还披露一种训练装置和识别 装置,具体如下:
图6示出根据一个实施例的一种防止隐私数据泄漏的编码模型训练装置结构图。如图6所示,所述训练装置600可以包括:样本获取单元610,配置为获取多个训练样本组,包括任意的第一样本组,所述第一样本组包括第一样本对和第二样本对,所述第一样本对包括第一训练样本和第二训练样本,其中第一训练样本包括表征第一目标对象身份信息的第一隐私数据和第一对象标识;所述第二训练样本具有所述第一对象标识,所述第二样本对的两个样本具有不同对象标识。编码单元620,配置为将所述第一样本组中各训练样本对应的隐私数据分别输入编码模型,得到对应的多个特征向量,其中包括对应于所述第一训练样本的第一特征向量。分类单元630,配置为将所述第一特征向量输入用于确定目标对象身份的分类模型,得到第一分类结果,基于所述第一分类结果和所述第一对象标识,确定第一分类损失。解码单元640,配置为将所述第一特征向量输入用于反推隐私数据的解码模型,得到第一反推数据,基于所述第一反推数据和所述第一隐私数据,确定第一解码损失。区分单元650,配置为将所述第一样本组中各训练样本对应的特征向量输入用于区分不同目标对象的区分模型,得到所述第一样本对中样本之间的第一样本距离,以及第二样本对中样本之间的第二样本距离,并且,确定第一区分损失,所述第一区分损失与所述第一样本距离正相关,且与所述第二样本距离负相关。编码模型调参单元660,配置为以最大化所述多个训练样本组对应的分类损失和解码损失,以及最小化所述多个训练样本对应的区分损失为目标,调整所述编码模型中的模型参数。
在一个实施例中,所述目标对象包括用户,所述身份信息包括以下中的一种或多种:人脸图像、指纹图像、虹膜图像。
在一个实施例中,所述目标对象包括设备,所述身份信息包括以下中的一种或多种:国际移动设备识别码IMEI、用户识别卡SIM的卡号、设备传感器信息。
在一个实施例中,所述第二样本对包括所述第一训练样本和第三训练样本;其中区分单元650具体配置为:将所述第一训练样本、所述第二训练样本和所述第三训练样本各自对应的特征向量按照预设顺序拼接后,输入所述区分模型,得到所述第一样本距离以及所述第二样本距离。
在一个实施例中,所述第二样本对包括第三训练样本和第四训练样本;其中区分单元650具体配置为:将所述第一训练样本、第二训练样本、第三训练样本和第四训练样 本各自对应的特征向量按照预设顺序拼接后,输入所述区分模型,得到所述第一样本距离以及所述第二样本距离。
在一个实施例中,所述装置600还包括:分类模型调参单元670,配置为以最小化所述多个训练样本组对应的分类损失为目标,调整所述分类模型中参数;和/或,解码模型调参单元680,配置为以最小化所述多个训练样本组对应的解码损失为目标,调整所述解码模型中的参数;和/或,区分模型调参单元690,配置为以最小化所述多个训练样本组对应的区分损失为目标,调整所述区分模型中的参数。
在一个实施例中,所述编码模型调参单元660具体配置为:基于预先设定的针对所述分类损失、解码损失和区分损失的权重参数,对所述分类损失、解码损失和区分损失进行加权求和,得到综合损失,所述综合损失与所述分类损失和解码损失负相关,且与所述区分损失正相关;基于所述综合损失,调整所述编码模型中的模型参数。
图7示出根据一个实施例的一种防止隐私数据泄漏的身份识别装置结构图,所述装置集成于服务器。如图7所示,所述识别装置700包括:向量接收单元710,配置为从终端接收第二特征向量,所述第二特征向量由所述终端将采集的第二隐私数据输入编码模型而确定;其中所述编码模型基于图6示出的装置而预先训练得到。向量比对单元720,配置为将所述第二特征向量与所述服务器中预先存储的对应于多个目标对象的多个特征向量进行比对,得到比对结果,用于判别针对所述第二隐私数据所对应目标对象的身份识别是否成功;其中所述多个特征向量通过将所述多个目标对象的多条历史隐私数据输入所述编码模型而得到。
在一个实施例中,所述识别装置700还包括:结果发送单元730,配置为将所述比对结果发送至所述终端。
在一个实施例中,所述向量比对单元720具体配置为:分别计算所述第二特征向量与所述多个特征向量中各个特征向量之间的相似度,并确定其中的最大值;在所述最大值大于预设阈值的情况下,判定针对所述第二隐私数据所对应目标对象的身份识别成功,作为所述比对结果。
在一个实施例中,所述向量比对单元720具体配置为:分别计算所述第二特征向量与所述多个特征向量中各个特征向量之间的相似度,并确定其中的最大值,作为所述比对结果;其中结果发送单元730具体配置为:将所述最大值发送至所述终端,以使所述终端根据所述最大值和预设阈值,判别针对所述第二隐私数据所对应目标对象的身份识 别是否成功。
图8示出根据另一个实施例的一种防止隐私数据泄漏的身份识别装置结构图,所述装置集成于终端。如图8所示,所述识别装置800包括:数据采集单元810,配置为采集第二隐私数据;编码单元820,配置为将所述第二隐私数据输入编码模型,得到第二特征向量,所述编码模型基于图6示出的装置而预先训练得到;向量发送单元830,配置为将所述第二特征向量发送至服务器,以使所述服务器将所述第二特征向量与所述服务器中预先存储的对应于多个目标对象的多个特征向量进行比对,得到比对结果,用于判别针对所述第二隐私数据所对应目标对象的身份识别是否成功。
在一个实施例中,所述比对结果包括所述第二特征向量与所述多个特征向量中各个特征向量之间相似度的最大值,所述装置800还包括:结果接收单元840,配置为从所述服务器接收所述最大值;判定单元850,配置为在所述最大值大于预设阈值的情况下,判定针对所述第二隐私数据所对应目标对象的身份识别成功。
根据又一方面的实施例,还提供一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行结合图3或图5所描述的方法。
根据再一方面的实施例,还提供一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现结合图3或图5所述的方法。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本发明所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。
以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本发明的保护范围之内。

Claims (28)

  1. 一种防止隐私数据泄漏的编码模型训练方法,包括:
    获取多个训练样本组,包括任意的第一样本组,所述第一样本组包括第一样本对和第二样本对,所述第一样本对包括第一训练样本和第二训练样本,其中第一训练样本包括表征第一目标对象身份信息的第一隐私数据和第一对象标识;所述第二训练样本具有所述第一对象标识,所述第二样本对的两个样本具有不同对象标识;
    将所述第一样本组中各训练样本对应的隐私数据分别输入编码模型,得到对应的多个特征向量,其中包括对应于所述第一训练样本的第一特征向量;
    将所述第一特征向量输入用于确定目标对象身份的分类模型,得到第一分类结果,基于所述第一分类结果和所述第一对象标识,确定第一分类损失;
    将所述第一特征向量输入用于反推隐私数据的解码模型,得到第一反推数据,基于所述第一反推数据和所述第一隐私数据,确定第一解码损失;
    将所述第一样本组中各训练样本对应的特征向量输入用于区分不同目标对象的区分模型,得到所述第一样本对中样本之间的第一样本距离,以及第二样本对中样本之间的第二样本距离,并且,确定第一区分损失,所述第一区分损失与所述第一样本距离正相关,且与所述第二样本距离负相关;
    以最大化所述多个训练样本组对应的分类损失和解码损失,以及最小化所述多个训练样本对应的区分损失为目标,调整所述编码模型中的模型参数。
  2. 根据权利要求1所述的方法,其中,所述目标对象包括用户,所述身份信息包括以下中的一种或多种:人脸图像、指纹图像、虹膜图像。
  3. 根据权利要求1所述的方法,其中,所述目标对象包括设备,所述身份信息包括以下中的一种或多种:国际移动设备识别码IMEI、用户识别卡SIM的卡号、设备传感器信息。
  4. 根据权利要求1所述的方法,其中,所述第二样本对包括所述第一训练样本和第三训练样本;其中将所述第一样本组中各训练样本对应的特征向量输入用于区分不同目标对象的区分模型,得到所述第一样本对中样本之间的第一样本距离,以及第二样本对中样本之间的第二样本距离,包括:
    将所述第一训练样本、所述第二训练样本和所述第三训练样本各自对应的特征向量按照预设顺序拼接后,输入所述区分模型,得到所述第一样本距离以及所述第二样本距离。
  5. 根据权利要求1所述的方法,其中,所述第二样本对包括第三训练样本和第四训练样本;其中将所述第一样本组中各训练样本对应的特征向量输入用于区分不同目标对象的区分模型,得到所述第一样本对中样本之间的第一样本距离,以及第二样本对中样本之间的第二样本距离,包括:
    将所述第一训练样本、第二训练样本、第三训练样本和第四训练样本各自对应的特征向量按照预设顺序拼接后,输入所述区分模型,得到所述第一样本距离以及所述第二样本距离。
  6. 根据权利要求1所述的方法,其中,在确定第一区分损失之后,所述方法还包括:
    以最小化所述多个训练样本组对应的分类损失为目标,调整所述分类模型中参数;和/或,
    以最小化所述多个训练样本组对应的解码损失为目标,调整所述解码模型中的参数;和/或,
    以最小化所述多个训练样本组对应的区分损失为目标,调整所述区分模型中的参数。
  7. 根据权利要求1所述的方法,其中,以最大化所述多个训练样本对应的分类损失和解码损失,以及最小化所述多个训练样本对应的区分损失为目标,调整所述编码模型中的模型参数,包括:
    基于预先设定的针对所述分类损失、解码损失和区分损失的权重参数,对所述分类损失、解码损失和区分损失进行加权求和,得到综合损失,所述综合损失与所述分类损失和解码损失负相关,且与所述区分损失正相关;
    基于所述综合损失,调整所述编码模型中的模型参数。
  8. 一种防止隐私数据泄漏的目标对象身份识别方法,所述方法的执行主体为服务器,所述识别方法包括:
    从终端接收第二特征向量,所述第二特征向量由所述终端将采集的第二隐私数据输入编码模型而确定;其中所述编码模型基于权利要求1所述的方法而预先训练得到;
    将所述第二特征向量与所述服务器中预先存储的对应于多个目标对象的多个特征向量进行比对,得到比对结果,用于判别针对所述第二隐私数据所对应目标对象的身份识别是否成功;其中所述多个特征向量通过将所述多个目标对象的多条历史隐私数据输入所述编码模型而得到。
  9. 根据权利要求8所述的识别方法,其中,在将所述第二特征向量与所述服务器 中预先存储的对应于多个目标对象的多个特征向量进行比对,得到比对结果之后,所述识别方法还包括:
    将所述比对结果发送至所述终端。
  10. 根据权利要求8或9所述的识别方法,其中,将所述第二特征向量与所述服务器中预先存储的对应于多个目标对象的多个特征向量进行比对,得到比对结果,包括:
    分别计算所述第二特征向量与所述多个特征向量中各个特征向量之间的相似度,并确定其中的最大值;
    在所述最大值大于预设阈值的情况下,判定针对所述第二隐私数据所对应目标对象的身份识别成功,作为所述比对结果。
  11. 根据权利要求9所述的方法,其中,将所述第二特征向量与所述服务器中预先存储的多个目标对象的多个特征向量进行比对,得到比对结果,包括:
    分别计算所述第二特征向量与所述多个特征向量中各个特征向量之间的相似度,并确定其中的最大值,作为所述比对结果;
    其中将所述比对结果发送至所述终端,包括:
    将所述最大值发送至所述终端,以使所述终端根据所述最大值和预设阈值,判别针对所述第二隐私数据所对应目标对象的身份识别是否成功。
  12. 一种防止隐私数据泄漏的目标对象识别方法,所述方法的执行主体为终端,所述识别方法包括:
    采集第二隐私数据;
    将所述第二隐私数据输入编码模型,得到第二特征向量,所述编码模型基于权利要求1所述的方法而预先训练得到;
    将所述第二特征向量发送至服务器,以使所述服务器将所述第二特征向量与所述服务器中预先存储的对应于多个目标对象的多个特征向量进行比对,得到比对结果,用于判别针对所述第二隐私数据所对应目标对象的身份识别是否成功。
  13. 根据权利要求12所述的方法,其中,所述比对结果包括所述第二特征向量与所述多个特征向量中各个特征向量之间相似度的最大值,在将所述第二特征向量发送至服务器之后,所述方法还包括:
    从所述服务器接收所述最大值;
    在所述最大值大于预设阈值的情况下,判定针对所述第二隐私数据所对应目标对象的身份识别成功。
  14. 一种防止隐私数据泄漏的编码模型训练装置,包括:
    样本获取单元,配置为获取多个训练样本组,包括任意的第一样本组,所述第一样本组包括第一样本对和第二样本对,所述第一样本对包括第一训练样本和第二训练样本,其中第一训练样本包括表征第一目标对象身份信息的第一隐私数据和第一对象标识;所述第二训练样本具有所述第一对象标识,所述第二样本对的两个样本具有不同对象标识;
    编码单元,配置为将所述第一样本组中各训练样本对应的隐私数据分别输入编码模型,得到对应的多个特征向量,其中包括对应于所述第一训练样本的第一特征向量;
    分类单元,配置为将所述第一特征向量输入用于确定目标对象身份的分类模型,得到第一分类结果,基于所述第一分类结果和所述第一对象标识,确定第一分类损失;
    解码单元,配置为将所述第一特征向量输入用于反推隐私数据的解码模型,得到第一反推数据,基于所述第一反推数据和所述第一隐私数据,确定第一解码损失;
    区分单元,配置为将所述第一样本组中各训练样本对应的特征向量输入用于区分不同目标对象的区分模型,得到所述第一样本对中样本之间的第一样本距离,以及第二样本对中样本之间的第二样本距离,并且,确定第一区分损失,所述第一区分损失与所述第一样本距离正相关,且与所述第二样本距离负相关;
    编码模型调参单元,配置为以最大化所述多个训练样本组对应的分类损失和解码损失,以及最小化所述多个训练样本对应的区分损失为目标,调整所述编码模型中的模型参数。
  15. 根据权利要求14所述的装置,其中,所述目标对象包括用户,所述身份信息包括以下中的一种或多种:人脸图像、指纹图像、虹膜图像。
  16. 根据权利要求14所述的装置,其中,所述目标对象包括设备,所述身份信息包括以下中的一种或多种:国际移动设备识别码IMEI、用户识别卡SIM的卡号、设备传感器信息。
  17. 根据权利要求14所述的装置,其中,所述第二样本对包括所述第一训练样本和第三训练样本;其中区分模型具体配置为:
    将所述第一训练样本、所述第二训练样本和所述第三训练样本各自对应的特征向量按照预设顺序拼接后,输入所述区分模型,得到所述第一样本距离以及所述第二样本距离。
  18. 根据权利要求14所述的装置,其中,所述第二样本对包括第三训练样本和第四训练样本;其中区分模型具体配置为:
    将所述第一训练样本、第二训练样本、第三训练样本和第四训练样本各自对应的特征向量按照预设顺序拼接后,输入所述区分模型,得到所述第一样本距离以及所述第二样本距离。
  19. 根据权利要求14所述的装置,其中,所述装置还包括:
    分类模型调参单元,配置为以最小化所述多个训练样本组对应的分类损失为目标,调整所述分类模型中参数;和/或,
    解码模型调参单元,配置为以最小化所述多个训练样本组对应的解码损失为目标,调整所述解码模型中的参数;和/或,
    区分模型调参单元,配置为以最小化所述多个训练样本组对应的区分损失为目标,调整所述区分模型中的参数。
  20. 根据权利要求14所述的装置,其中,所述编码模型调参单元具体配置为:
    基于预先设定的针对所述分类损失、解码损失和区分损失的权重参数,对所述分类损失、解码损失和区分损失进行加权求和,得到综合损失,所述综合损失与所述分类损失和解码损失负相关,且与所述区分损失正相关;
    基于所述综合损失,调整所述编码模型中的模型参数。
  21. 一种防止隐私数据泄漏的目标对象身份识别装置,所述装置集成于服务器,所述识别装置包括:
    向量接收单元,配置为从终端接收第二特征向量,所述第二特征向量由所述终端将采集的第二隐私数据输入编码模型而确定;其中所述编码模型基于权利要求14所述的装置而预先训练得到;
    向量比对单元,配置为将所述第二特征向量与所述服务器中预先存储的对应于多个目标对象的多个特征向量进行比对,得到比对结果,用于判别针对所述第二隐私数据所对应目标对象的身份识别是否成功;其中所述多个特征向量通过将所述多个目标对象的多条历史隐私数据输入所述编码模型而得到。
  22. 根据权利要求21所述的识别装置,其中,所述识别装置还包括:
    结果发送单元,配置为将所述比对结果发送至所述终端。
  23. 根据权利要求21或22所述的识别装置,其中,所述向量比对单元具体配置为:
    分别计算所述第二特征向量与所述多个特征向量中各个特征向量之间的相似度,并确定其中的最大值;
    在所述最大值大于预设阈值的情况下,判定针对所述第二隐私数据所对应目标对象 的身份识别成功,作为所述比对结果。
  24. 根据权利要求22所述的识别装置,其中,所述向量比对单元具体配置为:
    分别计算所述第二特征向量与所述多个特征向量中各个特征向量之间的相似度,并确定其中的最大值,作为所述比对结果;
    其中结果发送单元具体配置为:
    将所述最大值发送至所述终端,以使所述终端根据所述最大值和预设阈值,判别针对所述第二隐私数据所对应目标对象的身份识别是否成功。
  25. 一种防止隐私数据泄漏的目标对象识别装置,所述装置集成于终端,所述识别装置包括:
    数据采集单元,配置为采集第二隐私数据;
    编码单元,配置为将所述第二隐私数据输入编码模型,得到第二特征向量,所述编码模型基于权利要求14所述的装置而预先训练得到;
    向量发送单元,配置为将所述第二特征向量发送至服务器,以使所述服务器将所述第二特征向量与所述服务器中预先存储的对应于多个目标对象的多个特征向量进行比对,得到比对结果,用于判别针对所述第二隐私数据所对应目标对象的身份识别是否成功。
  26. 根据权利要求25所述的装置,其中,所述比对结果包括所述第二特征向量与所述多个特征向量中各个特征向量之间相似度的最大值,所述装置还包括:
    结果接收单元,配置为从所述服务器接收所述最大值;
    判定单元,配置为在所述最大值大于预设阈值的情况下,判定针对所述第二隐私数据所对应目标对象的身份识别成功。
  27. 一种计算机可读存储介质,其上存储有计算机程序,其中,当所述计算机程序在计算机中执行时,令计算机执行权利要求1-13中任一项的所述的方法。
  28. 一种计算设备,包括存储器和处理器,其中,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1-13中任一项所述的方法。
PCT/CN2020/124681 2019-12-09 2020-10-29 防止隐私数据泄漏的编码模型训练方法及装置 WO2021114931A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911252327.7A CN111046422B (zh) 2019-12-09 2019-12-09 防止隐私数据泄漏的编码模型训练方法及装置
CN201911252327.7 2019-12-09

Publications (1)

Publication Number Publication Date
WO2021114931A1 true WO2021114931A1 (zh) 2021-06-17

Family

ID=70235290

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/124681 WO2021114931A1 (zh) 2019-12-09 2020-10-29 防止隐私数据泄漏的编码模型训练方法及装置

Country Status (3)

Country Link
CN (2) CN111046422B (zh)
TW (1) TWI756687B (zh)
WO (1) WO2021114931A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113904834A (zh) * 2021-09-30 2022-01-07 北京华清信安科技有限公司 基于机器学习的xss攻击检测方法
CN114818973A (zh) * 2021-07-15 2022-07-29 支付宝(杭州)信息技术有限公司 一种基于隐私保护的图模型训练方法、装置及设备
CN115238827A (zh) * 2022-09-16 2022-10-25 支付宝(杭州)信息技术有限公司 保护隐私的样本检测系统训练方法及装置
CN115906032A (zh) * 2023-02-20 2023-04-04 之江实验室 一种识别模型的修正方法、装置和存储介质
CN116049840A (zh) * 2022-07-25 2023-05-02 荣耀终端有限公司 一种数据保护方法、装置、相关设备及系统
CN117078789A (zh) * 2023-09-22 2023-11-17 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及介质
CN117273941A (zh) * 2023-11-16 2023-12-22 环球数科集团有限公司 一种用于跨域支付反洗钱风控模型训练系统

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046422B (zh) * 2019-12-09 2021-03-12 支付宝(杭州)信息技术有限公司 防止隐私数据泄漏的编码模型训练方法及装置
CN113642731A (zh) * 2020-05-06 2021-11-12 支付宝(杭州)信息技术有限公司 基于差分隐私的数据生成系统的训练方法及装置
CN111651792B (zh) * 2020-07-17 2023-04-18 支付宝(杭州)信息技术有限公司 多方协同学习中的风险检测、模型增强方法及装置
CN111783126B (zh) * 2020-07-21 2022-04-29 支付宝(杭州)信息技术有限公司 一种隐私数据识别方法、装置、设备和可读介质
CN111782550B (zh) * 2020-07-31 2022-04-12 支付宝(杭州)信息技术有限公司 基于用户隐私保护训练指标预测模型的方法及装置
CN112417414A (zh) * 2020-12-04 2021-02-26 支付宝(杭州)信息技术有限公司 一种基于属性脱敏的隐私保护方法、装置以及设备
CN112508101A (zh) * 2020-12-07 2021-03-16 杭州海康威视数字技术股份有限公司 一种神经网络模型的调整系统、方法及设备
CN113657350A (zh) * 2021-05-12 2021-11-16 支付宝(杭州)信息技术有限公司 人脸图像处理方法及装置
CN113342810A (zh) * 2021-05-31 2021-09-03 中国工商银行股份有限公司 记录链接数据匹配方法及装置
CN114241264B (zh) * 2021-12-17 2022-10-28 深圳尚米网络技术有限公司 用户判别模型训练方法、用户判别方法及相关装置
CN116361859B (zh) * 2023-06-02 2023-08-25 之江实验室 基于深度隐私编码器的跨机构患者记录链接方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120213419A1 (en) * 2011-02-22 2012-08-23 Postech Academy-Industry Foundation Pattern recognition method and apparatus using local binary pattern codes, and recording medium thereof
CN105426857A (zh) * 2015-11-25 2016-03-23 小米科技有限责任公司 人脸识别模型训练方法和装置
CN109583217A (zh) * 2018-11-21 2019-04-05 深圳市易讯天空网络技术有限公司 一种互联网电商平台用户隐私数据加密及解密方法
CN109902767A (zh) * 2019-04-11 2019-06-18 网易(杭州)网络有限公司 模型训练方法、图像处理方法及装置、设备和介质
CN111046422A (zh) * 2019-12-09 2020-04-21 支付宝(杭州)信息技术有限公司 防止隐私数据泄漏的编码模型训练方法及装置

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8638820B2 (en) * 2011-02-22 2014-01-28 Cisco Technology, Inc. In-voicemail-session call transfers
CN105450411B (zh) * 2014-08-14 2019-01-08 阿里巴巴集团控股有限公司 利用卡片特征进行身份验证的方法、装置及系统
US10460153B2 (en) * 2016-11-15 2019-10-29 Futurewei Technologies, Inc. Automatic identity detection
US10552738B2 (en) * 2016-12-15 2020-02-04 Google Llc Adaptive channel coding using machine-learned models
CN107944238A (zh) * 2017-11-15 2018-04-20 中移在线服务有限公司 身份认证方法、服务器和系统
CN110598779B (zh) * 2017-11-30 2022-04-08 腾讯科技(深圳)有限公司 摘要描述生成方法、装置、计算机设备和存储介质
US11669746B2 (en) * 2018-04-11 2023-06-06 Samsung Electronics Co., Ltd. System and method for active machine learning
CN108737623A (zh) * 2018-05-31 2018-11-02 南京航空航天大学 基于智能手机携带位置及携带模式的用户身份识别方法
CN108875818B (zh) * 2018-06-06 2020-08-18 西安交通大学 基于变分自编码机与对抗网络结合的零样本图像分类方法
CN109283217A (zh) * 2018-10-12 2019-01-29 广州特种承压设备检测研究院 一种石墨烯材料热导率的测量方法和装置
CN109711546B (zh) * 2018-12-21 2021-04-06 深圳市商汤科技有限公司 神经网络训练方法及装置、电子设备和存储介质
CN109670303B (zh) * 2018-12-26 2021-05-25 网智天元科技集团股份有限公司 基于条件变分自编码的密码攻击评估方法
CN110009013B (zh) * 2019-03-21 2021-04-27 腾讯科技(深圳)有限公司 编码器训练及表征信息提取方法和装置
CN110245132B (zh) * 2019-06-12 2023-10-31 腾讯科技(深圳)有限公司 数据异常检测方法、装置、计算机可读存储介质和计算机设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120213419A1 (en) * 2011-02-22 2012-08-23 Postech Academy-Industry Foundation Pattern recognition method and apparatus using local binary pattern codes, and recording medium thereof
CN105426857A (zh) * 2015-11-25 2016-03-23 小米科技有限责任公司 人脸识别模型训练方法和装置
CN109583217A (zh) * 2018-11-21 2019-04-05 深圳市易讯天空网络技术有限公司 一种互联网电商平台用户隐私数据加密及解密方法
CN109902767A (zh) * 2019-04-11 2019-06-18 网易(杭州)网络有限公司 模型训练方法、图像处理方法及装置、设备和介质
CN111046422A (zh) * 2019-12-09 2020-04-21 支付宝(杭州)信息技术有限公司 防止隐私数据泄漏的编码模型训练方法及装置

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114818973A (zh) * 2021-07-15 2022-07-29 支付宝(杭州)信息技术有限公司 一种基于隐私保护的图模型训练方法、装置及设备
CN113904834A (zh) * 2021-09-30 2022-01-07 北京华清信安科技有限公司 基于机器学习的xss攻击检测方法
CN113904834B (zh) * 2021-09-30 2022-09-09 北京华清信安科技有限公司 基于机器学习的xss攻击检测方法
CN116049840A (zh) * 2022-07-25 2023-05-02 荣耀终端有限公司 一种数据保护方法、装置、相关设备及系统
CN116049840B (zh) * 2022-07-25 2023-10-20 荣耀终端有限公司 一种数据保护方法、装置、相关设备及系统
CN115238827A (zh) * 2022-09-16 2022-10-25 支付宝(杭州)信息技术有限公司 保护隐私的样本检测系统训练方法及装置
CN115238827B (zh) * 2022-09-16 2022-11-25 支付宝(杭州)信息技术有限公司 保护隐私的样本检测系统训练方法及装置
CN115906032A (zh) * 2023-02-20 2023-04-04 之江实验室 一种识别模型的修正方法、装置和存储介质
CN117078789A (zh) * 2023-09-22 2023-11-17 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及介质
CN117078789B (zh) * 2023-09-22 2024-01-02 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及介质
CN117273941A (zh) * 2023-11-16 2023-12-22 环球数科集团有限公司 一种用于跨域支付反洗钱风控模型训练系统
CN117273941B (zh) * 2023-11-16 2024-01-30 环球数科集团有限公司 一种用于跨域支付反洗钱风控模型训练系统

Also Published As

Publication number Publication date
TWI756687B (zh) 2022-03-01
TW202123052A (zh) 2021-06-16
CN111046422A (zh) 2020-04-21
CN113159288B (zh) 2022-06-28
CN111046422B (zh) 2021-03-12
CN113159288A (zh) 2021-07-23

Similar Documents

Publication Publication Date Title
WO2021114931A1 (zh) 防止隐私数据泄漏的编码模型训练方法及装置
CN110929870B (zh) 图神经网络模型训练方法、装置及系统
CN112417439B (zh) 账号检测方法、装置、服务器及存储介质
TWI752418B (zh) 伺服器、客戶端、用戶核身方法及系統
JP6973876B2 (ja) 顔認識方法、顔認識装置及び顔認識方法を実行するコンピュータプログラム
US20190347388A1 (en) User image verification
WO2020248780A1 (zh) 活体检测方法、装置、电子设备及可读存储介质
CN112580826B (zh) 业务模型训练方法、装置及系统
CN114596639A (zh) 一种生物特征识别方法、装置、电子设备及存储介质
US20240119714A1 (en) Image recognition model training method and apparatus
CN111091102B (zh) 一种视频分析装置、服务器、系统及保护身份隐私的方法
CN117272017A (zh) 异质图数据节点嵌入特征提取模型训练方法、嵌入特征提取方法、节点分类方法及装置
CN115174237B (zh) 一种物联网系统恶意流量的检测方法、装置和电子设备
CN113011893B (zh) 数据处理方法、装置、计算机设备及存储介质
CN115578765A (zh) 目标识别方法、装置、系统及计算机可读存储介质
CN110956098B (zh) 图像处理方法及相关设备
CN112288088B (zh) 业务模型训练方法、装置及系统
CN115146788A (zh) 分布式机器学习模型的训练方法、装置、电设备存储介质
CN114360002A (zh) 基于联邦学习的人脸识别模型训练方法及装置
CN112291188B (zh) 注册验证方法及系统、注册验证服务器、云服务器
CN112597379B (zh) 数据识别方法、装置和存储介质及电子装置
CN111901324B (zh) 一种基于序列熵流量识别的方法、装置和存储介质
CN114004974A (zh) 对弱光环境下拍摄的图像的优化方法及装置
CN116434287A (zh) 一种人脸图像检测方法、装置、电子设备及存储介质
CN111368866A (zh) 图片分类方法、装置及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20899646

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20899646

Country of ref document: EP

Kind code of ref document: A1