CN113420322A - Model training and desensitizing method and device, electronic equipment and storage medium - Google Patents

Model training and desensitizing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113420322A
CN113420322A CN202110564134.6A CN202110564134A CN113420322A CN 113420322 A CN113420322 A CN 113420322A CN 202110564134 A CN202110564134 A CN 202110564134A CN 113420322 A CN113420322 A CN 113420322A
Authority
CN
China
Prior art keywords
data
network
desensitization
training
output data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110564134.6A
Other languages
Chinese (zh)
Other versions
CN113420322B (en
Inventor
涂小兵
黄腾
张伟丰
李颖敏
成亮
吴彩娣
尉鲁飞
薛盛可
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Innovation Co
Original Assignee
Alibaba Singapore Holdings Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Singapore Holdings Pte Ltd filed Critical Alibaba Singapore Holdings Pte Ltd
Priority to CN202110564134.6A priority Critical patent/CN113420322B/en
Publication of CN113420322A publication Critical patent/CN113420322A/en
Application granted granted Critical
Publication of CN113420322B publication Critical patent/CN113420322B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Bioethics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Storage Device Security (AREA)

Abstract

The embodiment of the application provides a model training and desensitizing method and device, electronic equipment and a storage medium. The model training method comprises the following steps: acquiring first desensitization training data; performing initial training on a target neural network by using the first desensitization training data; carrying out safety processing on output data of a first part of networks in the initially trained target neural network to obtain second desensitization training data; performing fine tuning training on a second part of the initially trained target neural network by using the second desensitization training data; a data desensitization model is determined from the initially trained first partial network and the second partial network trained via the fine tuning. The method provided by the embodiment of the application avoids the possibility of restoring the data desensitization model by the original target neural network formed by the first part network and the second part network, thereby ensuring the data security.

Description

Model training and desensitizing method and device, electronic equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of data security, in particular to a model training and desensitization method and device, electronic equipment and a storage medium.
Background
Data security and privacy: the artificial intelligence and the data supplement each other and promote each other. Data security is the key to artificial intelligence security. Artificial intelligence these capabilities, once improperly or maliciously exploited, not only threaten personal privacy and enterprise asset security, but even affect social stability and national security.
Data desensitization is used as a data security technology, data deformation can be carried out on certain sensitive information through desensitization rules, and reliable protection of sensitive private data is achieved. Under the condition of relating to client security data or some business sensitive data, the real data is modified and provided for test use without violating system rules, and personal information such as identity information, mobile phone numbers, card numbers, client numbers and the like needs data desensitization.
However, the security of current data desensitization schemes has yet to be improved.
Disclosure of Invention
Embodiments of the present application provide a model training method, a model desensitization method, an apparatus, an electronic device, and a storage medium to at least partially solve the above problems.
According to a first aspect of embodiments of the present application, there is provided a model training method, including: acquiring first desensitization training data; performing initial training on a target neural network by using the first desensitization training data, wherein the initial training is used for adjusting parameters of the target neural network; carrying out safety processing on output data of a first part of networks in the initially trained target neural network to obtain second desensitization training data; performing fine tuning training on a second partial network in the initially trained target neural network by using the second desensitization training data, wherein the input layer of the first partial network is connected with the input layer of the second partial network, and the fine tuning training is used for further adjusting the parameters of the second partial network through adjustment; a data desensitization model is determined from the initially trained first partial network and the second partial network trained via the fine tuning.
According to a second aspect of embodiments herein, there is provided a data desensitization method, comprising: acquiring data to be desensitized; inputting the data to be desensitized into a first part network in the data desensitization model to obtain output data of the first part network, wherein the data desensitization model is obtained by training through the method in the first aspect; and carrying out safety processing on the output data, and inputting the output data into a second part network in the data desensitization model to obtain desensitization data.
According to a third aspect of embodiments of the present application, there is provided a model training apparatus, including: the acquisition module acquires first desensitization training data; the initial training module is used for carrying out initial training on a target neural network by utilizing the first desensitization training data, and the initial training is used for adjusting parameters of the target neural network; the safety processing module is used for carrying out safety processing on output data of a first part of networks in the initially trained target neural network to obtain second desensitization training data; a fine tuning training module, which utilizes the second desensitization training data to perform fine tuning training on a second part network in the initially trained target neural network, wherein the input layer of the first part network is connected with the input layer of the second part network, and the fine tuning training is used for further adjusting the parameters of the second part network through adjustment; a module determination module determines a data desensitization model based on the initially trained first partial network and the second partial network trained via the fine tuning.
According to a fourth aspect of embodiments of the present application, there is provided a data desensitization apparatus, including: the acquisition module acquires data to be desensitized; the first desensitization module is used for inputting the data to be desensitized into a first part network in the data desensitization model to obtain output data of the first part network, and the data desensitization model is obtained by training through the method of the first aspect; and the second desensitization module is used for carrying out safety processing on the output data and inputting the output data into a second part network in the data desensitization model to obtain desensitization data.
According to a fifth aspect of embodiments of the present application, there is provided an electronic apparatus, including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the corresponding operation of the method according to the first aspect.
According to a sixth aspect of embodiments herein, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the method according to the first aspect.
According to the method provided by the embodiment of the application, the second desensitization training data is obtained by performing security processing on the output data of the first part network in the initially trained target neural network, and the second part network is subjected to fine tuning by using the second desensitization training data, so that the first part network and the second part network are obtained by adopting different training data for training, the possibility that the original target neural network is formed by the first part network and the second part network to restore the data desensitization model is avoided, and the data security is ensured.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present application, and other drawings can be obtained by those skilled in the art according to the drawings.
FIG. 1A is a flowchart illustrating steps of a model training method according to a first embodiment of the present application;
FIG. 1B is a schematic block diagram of a module training method in the embodiment shown in FIG. 1A;
FIG. 2A is a flow chart of steps of a data desensitization method according to a second embodiment of the present application;
FIG. 2B is a schematic block diagram of a data desensitization method in the embodiment shown in FIG. 2A;
FIG. 3 is a block diagram of a model training apparatus according to a third embodiment of the present application;
FIG. 4 is a block diagram of a data desensitization apparatus according to a fourth embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application shall fall within the scope of the protection of the embodiments in the present application.
The following further describes specific implementations of embodiments of the present application with reference to the drawings of the embodiments of the present application.
The data security and privacy, the artificial intelligence and the data supplement each other and promote each other to develop. Meanwhile, data security is also the key of artificial intelligence security. Artificial intelligence these capabilities, once improperly or maliciously exploited, not only threaten personal privacy and enterprise asset security, but even affect social stability and national security. Therefore, in the field of artificial intelligence application, protection of individual privacy feature data is particularly important in scenes such as face recognition.
In general, data desensitization processes can be classified into recoverable desensitization and non-recoverable desensitization according to desensitization rules. Recoverable desensitization is that after data is transformed by a desensitization rule, the original data before desensitization can be restored through certain processing. For unrecoverable desensitization, the data cannot be restored to the original data after being subjected to unrecoverable desensitization.
In addition, because an artificial intelligence algorithm can be adopted to carry out model protection on sensitive data, and desensitization rules are reflected in parameters of the model, the safety of the model is also very critical. Generally, a great deal of manpower and development cost are required to obtain a high-quality model, once an artificial intelligence model deployed externally is reversed or leaked, a very serious indirect data attack is also caused, that is, a very serious security problem is caused, and therefore how to prevent the model from being leaked and stolen becomes an important security problem.
Fig. 1A is a flowchart of steps of a model training method according to a first embodiment of the present application, and the solution of the present embodiment may be applied to any suitable electronic device with data processing capability, including but not limited to: server, mobile terminal (such as mobile phone, PAD, etc.), PC, etc. The model training method comprises the following steps:
110: first desensitization training data is acquired.
It should be understood that the desensitization training data herein may be tagged historical sensitive data, including but not limited to personal privacy information such as passwords, names, identity information, and service context data that is critical to the service process for a particular context. It should also be understood that embodiments of the present invention are applicable to both recoverable and non-recoverable desensitization.
120: and performing initial training on the target neural network by using the first desensitization training data, wherein the initial training is used for adjusting parameters of the target neural network.
It should be appreciated that a first partial network of the initially trained target neural network is for deployment in a Trusted Execution Environment (TEE).
It should also be understood that the target neural network may be any type of neural network, such as a feed-forward neural network, a convolutional neural network, a recurrent neural network. For example, the target neural network may be a multi-layer neural network.
It is also understood that adjusting the parameters of the target neural network may adjust all parameters (e.g., weights) of the target neural network based on the target function such that a convergence condition of the target function is satisfied.
130: and carrying out safety processing on output data of the first part of networks in the initially trained target neural network to obtain second desensitization training data.
It should be understood that the target neural network may be divided into a first partial network and a second partial network based on the particular network layer of the target neural network. The combination of the first partial network and the second partial network may be the target neural network or may be part of the target neural network. The target neural network may be divided into a first partial network and a second partial network prior to initial training. It is also possible to select a specific network layer (e.g., a specific number of network layers, or a ratio of the number of network layers) based on the training result of the initial training after the initial training (initial fine-tuning of all parameters of the target neural network) is completed.
It should also be appreciated that the secure processing of the output data may be in any manner, such as encryption processing, privacy processing, noise processing, and the like.
140: and performing fine tuning training on a second part of network in the initially trained target neural network by using second desensitization training data, wherein the input layer of the first part of network is connected with the input layer of the second part of network, and the fine tuning training is used for further adjusting the parameters of the second part of network through adjustment.
150: a data desensitization model is determined from the initially trained first partial network and the second partial network trained via the fine tuning.
It should be appreciated that the second partial network via the fine tuning training is for deployment in a Rich Execution Environment (REE).
It will also be appreciated that the fine tuning training is used to further adjust the parameters via the adjusted second partial network. In other words, the parameters belonging to the second partial network, which were previously adjusted in the initial training, can be directly used for continuing the training, so that the parameters of the second partial network are further adjusted.
It should also be understood that the model training method and the data desensitization method according to the embodiments of the present invention may be applied to application scenarios with requirements on security, including but not limited to: the intelligent face-lock system comprises a face door lock, an intelligent community, an entrance guard system such as an entrance guard all-in-one machine and noninductive attendance, an intelligent vending machine, an intelligent class board, a face snapshot, an artificial intelligent device such as an artificial intelligent box, an intelligent cabin, other internet of things device application scenes and the like.
The model training method and data desensitization method of the present embodiment are also applicable to: (1) and mobile payment: fingerprint verification, PIN code input, etc.; (2) confidential data: secure storage of private keys, certificates, and the like; (3) the content comprises the following steps: DRM (digital rights protection) and the like.
According to the method provided by the embodiment of the application, the second desensitization training data is obtained by performing security processing on the output data of the first part network in the initially trained target neural network, and the second part network is subjected to fine tuning by using the second desensitization training data, so that the first part network and the second part network are obtained by adopting different training data for training, the possibility that the original target neural network is formed by the first part network and the second part network to restore the data desensitization model is avoided, and the data security is ensured.
In another implementation of the invention, a first partial network that is initially trained is for deployment in a trusted execution environment and a second partial network that is trained via trimming is for deployment in an accelerated execution environment.
Since the first partial network in the initially trained target neural network is deployed in the trusted execution environment, an effective data desensitization process can be performed using a data desensitization model that includes the first partial network.
Specifically, the TEE environment and the REE environment may correspond. Generally, an operating system such as Linux runs on an REE environment, but some security requirements are high, such as comparison of fingerprints, payment and other scenes require operation with a private key signature, and the like, and the operation in the TEE environment is safer.
Furthermore, the TEE environment may also have its own execution space, i.e., in the TEE environment, an operating system is also required. The operating system of the TEE environment has a higher security level than the Rich operating system (normal operating system).
Furthermore, the software and hardware resources accessed by the TEE can be separate from the Rich operating system. The TEE provides a secure execution environment for authorized security software (TA), while also protecting the confidentiality, integrity and access rights of TA's resources and data. Typically, in order to guarantee the root of trust for the TEE itself, the TEE is to be authenticated and isolated from the Rich operating system during secure boot. In a TEE, each TA may be independent of each other and may not be accessible to each other without authorization.
In other words, there is also a corresponding application (TA) on the operating system of the TEE environment, and besides the operating environment of the TEE is independent from the normal operating system, the TA in the TEE also needs to be authorized and operate independently from each other.
In another implementation of the present invention, the performing security processing on the output data of the first partial network in the initially trained target neural network to obtain second desensitization training data includes: carrying out differential privacy processing on output data of the initially trained first part of network to obtain scrambled representation of the output data; and carrying out noise processing on the scrambled representation to obtain second desensitization training data.
Based on the above configuration, with the scrambling representation and the noise processing, the reliability of the security processing is improved.
In another implementation of the present invention, performing differential privacy processing on output data of an initially trained first partial network to obtain a scrambled representation of the output data includes: inputting output data of the initially trained first partial network and reference training data into a target generation countermeasure network for training; if the criterion of the discriminator of the target countermeasure generation network is met, the reference training data is determined as a scrambled representation of the output data.
It should be understood that the second desensitization training data may also be obtained by processing the first desensitization training data separately.
Based on the above configuration, the discriminator for generating the countermeasure network can perform efficient comparison discrimination on the reference training data and the output data, and therefore the accuracy of scramble representation is improved, and the security is further improved.
FIG. 1B is a schematic block diagram of a module training method in the embodiment shown in FIG. 1A. As shown, the target neural network is initially trained using the training data, resulting in an initially trained target neural network. The first partial network of the target neural network through initial training is determined as the TEE network as shown. The second partial network in the target neural network via initial training is determined as the desensitization network as shown. And carrying out fine tuning training on the desensitization network to obtain the desensitization network as shown in the figure.
It is to be understood that the output layer of the first partial network is connected to the input layer of the second partial network. The first partial network and the second partial network may form all or part of the target neural network. For example, the first partial network may be a portion of a neural network layer in the target neural network.
It should also be understood that training data for fine-tuning training of the second partial network may be obtained by inputting training data into the first partial network and at the output layer of the first partial network (i.e., as shown, the output data of the TEE network).
It will also be appreciated that the output data for the initially trained first partial network, as well as the reference training data, may be input into the target generation countermeasure network for training. For example, as shown, a first output of the TEE network may be connected to an input of a generator that generates the countermeasure network, the generator for generating a scrambled representation of the output data. The second output of the TEE network and the output of the generator are connected to a discriminator generating a countermeasure network, and the discriminator determines the reference training data as a scrambled representation of the output data when the discriminator meets a discrimination condition of the discriminator of the target countermeasure generating network, and specifically, the discriminator inputs the scrambled representation of the output data to the desensitization network as training data of the desensitization network when the discrimination condition is satisfied.
In addition, parameters in the desensitization network can be adjusted by adopting a target loss function, and when the target loss function meets a preset convergence condition, the training of the desensitization network is finished.
It can be seen that the second partial network, which is a desensitization network, uses the same training data as the first partial network during the initial training of the first stage, and uses different training data than the first partial network during the fine training of the second stage. Thus, even in the case where the first department network and the second partial network are leaked, for example, the target neural network cannot be obtained by combining the first partial network and the second partial network, for example, due to the difference in the training data at the different training stages.
Fig. 2A is a flowchart of steps of a data desensitization method according to the second embodiment of the present application, and the solution of the present embodiment may be applied to any suitable electronic device with data processing capability, including but not limited to: server, mobile terminal (such as mobile phone, PAD, etc.) and PC, etc., the data desensitization method includes:
210: data to be desensitized is acquired.
220: and inputting the data to be desensitized into the first part network in the data desensitization model to obtain output data of the first part network, wherein the data desensitization model is obtained by training through a model training method.
230: and carrying out safety processing on the output data, and inputting the output data into a second part network in the data desensitization model to obtain desensitization data.
According to the method provided by the embodiment of the application, because the second desensitization training data is obtained by performing security processing on the output data of the first part network in the initially trained target neural network, and the second desensitization training data is used for fine tuning of the second part network, the first part network and the second part network are obtained by training with different training data, so that the possibility that the original target neural network is formed by the first part network and the second part network to restore the data desensitization model is avoided, and the data security is ensured when the data desensitization model is applied.
In addition, when the data desensitization model is applied, data to be desensitized may be acquired from the REE environment into the TEE environment for input into the TEE network.
Specifically, the TEE environment and the REE environment may correspond. Generally, an operating system such as Linux runs on an REE environment, but some security requirements are high, such as comparison of fingerprints, payment and other scenes require operation with a private key signature, and the like, and the operation in the TEE environment is safer.
Furthermore, the TEE environment may also have its own execution space, i.e., in the TEE environment, an operating system is also required. The operating system of the TEE environment has a higher security level than the Rich operating system (normal operating system).
Furthermore, the software and hardware resources accessed by the TEE can be separate from the Rich operating system. The TEE provides a secure execution environment for authorized security software (TA), while also protecting the confidentiality, integrity and access rights of TA's resources and data. Typically, in order to guarantee the root of trust for the TEE itself, the TEE is to be authenticated and isolated from the Rich operating system during secure boot. In a TEE, each TA may be independent of each other and may not be accessible to each other without authorization.
In other words, there is also a corresponding application (TA) on the operating system of the TEE environment, and besides the operating environment of the TEE is independent from the normal operating system, the TA in the TEE also needs to be authorized and operate independently from each other.
In addition, the output data of the TEE network and the reference data can be input into the generation countermeasure network for training; if the criterion of the discriminator for generating the countermeasure network is met, the reference data is determined as a scrambled representation of the output data. In particular, the generation of the countermeasure network may be performed in another TEE environment (e.g., which may be referred to as a second TEE environment) independent of the TEE environment (e.g., which may be referred to as a first TEE environment). For example, the generation countermeasure network may fetch output data of the TEE network (first partial network) from the first TEE environment into the second TEE environment, resulting in reference data. This reference data may then be retrieved from the second TEE environment into the REE environment for input into the desensitization network (second partial network).
In another implementation of the invention, acquiring data to be desensitized comprises: and acquiring the data to be desensitized into the trusted execution environment, wherein the first part of the network is deployed in the trusted execution environment.
Based on the above configuration, the reliability of data desensitization can be improved with the first partial network deployed in the trusted execution environment.
In another implementation of the present invention, the performing security processing on output data includes: obtaining output data from the trusted execution environment into the accelerated execution environment; and in the accelerated execution environment, the output data is processed safely.
Based on the configuration, since the output data is not the training data itself, the security level is lower compared to the training data, the output data is processed in the accelerated execution environment, and the data processing efficiency is improved.
FIG. 2B is a schematic block diagram of a data desensitization method in the embodiment shown in FIG. 2A. As shown, in applying the above model, a TEE network may be deployed in a TEE environment and a desensitization network may be deployed in a REE environment. The TEE environment and the REE environment can adopt secure communication for data transmission and can also adopt a general communication mode for data transmission.
Sensitive data serving as data to be desensitized can be acquired into a TEE environment and input into the TEE network trained in the above mode. Further, in one example, output data of the TEE network may be directly obtained to the REE environment for security processing (e.g., privacy processing/noise processing), resulting in a desensitization feature. In another example, output data of the TEE network may be securely processed in the TEE environment to obtain a desensitization feature, which may then be retrieved to the REE environment.
In the REE environment, the desensitization feature obtained above may be input into the desensitization network obtained in the above embodiment, so as to obtain desensitization data.
Fig. 3 is a block diagram of a model training apparatus according to a third embodiment of the present application, and the solution of this embodiment may be applied to any suitable electronic device with data processing capability, including but not limited to: server, mobile terminal (such as cell-phone, PAD etc.) and PC etc. this model training device includes:
an acquisition module 310 acquires first desensitization training data.
The initial training module 320 performs initial training on the target neural network by using the first desensitization training data, where the initial training is used to adjust parameters of the target neural network.
And the safety processing module 330 is used for carrying out safety processing on the output data of the first part of networks in the initially trained target neural network to obtain second desensitization training data.
And the fine tuning training module 340 is used for performing fine tuning training on a second part of network in the initially trained target neural network by using the second desensitization training data, wherein the input layer of the first part of network is connected with the input layer of the second part of network, and the fine tuning training is used for further adjusting the parameters of the adjusted second part of network.
The module determination module 350 determines a data desensitization model based on the initially trained first partial network and the second partial network trained via the fine tuning.
According to the method provided by the embodiment of the application, the second desensitization training data is obtained by performing security processing on the output data of the first part network in the initially trained target neural network, and the second part network is subjected to fine tuning by using the second desensitization training data, so that the first part network and the second part network are obtained by adopting different training data for training, the possibility that the original target neural network is formed by the first part network and the second part network to restore the data desensitization model is avoided, and the data security is ensured.
In another implementation of the invention, a first partial network that is initially trained is for deployment in a trusted execution environment and a second partial network that is trained via trimming is for deployment in an accelerated execution environment.
In another implementation manner of the present invention, the secure processing module is specifically configured to: carrying out differential privacy processing on output data of the initially trained first part of network to obtain scrambled representation of the output data; and carrying out noise processing on the scrambled representation to obtain second desensitization training data.
In another implementation manner of the present invention, the secure processing module is specifically configured to: inputting output data of the initially trained first partial network and reference training data into a target generation countermeasure network for training; if the criterion of the discriminator of the target countermeasure generation network is met, the reference training data is determined as a scrambled representation of the output data.
The apparatus of this embodiment is used to implement the corresponding method in the foregoing method embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein again. In addition, the functional implementation of each module in the apparatus of this embodiment can refer to the description of the corresponding part in the foregoing method embodiment, and is not described herein again.
Fig. 4 is a block diagram of a data desensitization apparatus according to a fourth embodiment of the present application, and the solution of this embodiment may be applied to any suitable electronic device with data processing capability, including but not limited to: server, mobile terminal (such as cell-phone, PAD etc.) and PC etc. this data desensitization device includes:
an acquisition module 410 acquires data to be desensitized.
A first desensitization module 420 for inputting data to be desensitized into the first partial network in the data desensitization model to obtain output data of the first partial network, the data desensitization model being trained by the method of any one of claims 1 to 3;
and the second desensitization module 430 is used for performing secure processing on the output data and inputting the output data into a second part of the network in the data desensitization model to obtain desensitization data.
According to the method provided by the embodiment of the application, because the second desensitization training data is obtained by performing security processing on the output data of the first part network in the initially trained target neural network, and the second desensitization training data is used for fine tuning of the second part network, the first part network and the second part network are obtained by training with different training data, so that the possibility that the original target neural network is formed by the first part network and the second part network to restore the data desensitization model is avoided, and the data security is ensured when the data desensitization model is applied.
In another implementation manner of the present invention, the obtaining module is specifically configured to: and acquiring the data to be desensitized into the trusted execution environment, wherein the first part of the network is deployed in the trusted execution environment.
In another implementation of the invention, the second desensitization module is specifically configured to: obtaining output data from the trusted execution environment into the accelerated execution environment; and in the accelerated execution environment, the output data is processed safely.
The apparatus of this embodiment is used to implement the corresponding method in the foregoing method embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein again. In addition, the functional implementation of each module in the apparatus of this embodiment can refer to the description of the corresponding part in the foregoing method embodiment, and is not described herein again.
EXAMPLE five
Referring to fig. 5, a schematic structural diagram of an electronic device according to a fifth embodiment of the present application is shown, and the specific embodiment of the present application does not limit a specific implementation of the electronic device.
As shown in fig. 5, the electronic device may include: a processor (processor)502, a Communications Interface 504, a memory 506, and a communication bus 508.
Wherein:
the processor 502, communication interface 504, and memory 506 communicate with one another via a communication bus 508.
A communication interface 504 for communicating with other electronic devices or servers.
The processor 502 is configured to execute the program 510, and may specifically perform the relevant steps in the above method embodiments.
In particular, program 510 may include program code that includes computer operating instructions.
The processor 502 may be a central processing unit CPU, or an application specific integrated circuit asic, or one or more integrated circuits configured to implement embodiments of the present application. The intelligent device comprises one or more processors which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And a memory 506 for storing a program 510. The memory 506 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The program 510 may specifically be used to cause the processor 502 to perform the following operations: acquiring first desensitization training data; performing initial training on a target neural network by using the first desensitization training data, wherein the initial training is used for adjusting parameters of the target neural network; carrying out safety processing on output data of a first part of networks in the initially trained target neural network to obtain second desensitization training data; performing fine tuning training on a second partial network in the initially trained target neural network by using the second desensitization training data, wherein the input layer of the first partial network is connected with the input layer of the second partial network, and the fine tuning training is used for further adjusting the parameters of the second partial network through adjustment; determining a data desensitization model according to the initially trained first partial network and the finely trained second partial network;
or acquiring data to be desensitized; inputting the data to be desensitized into a first part network in the data desensitization model to obtain output data of the first part network, wherein the data desensitization model is obtained by training through a model training method; and carrying out safety processing on the output data, and inputting the output data into a second part network in the data desensitization model to obtain desensitization data.
In addition, for specific implementation of each step in the program 510, reference may be made to corresponding steps and corresponding descriptions in units in the foregoing method embodiments, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.
It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present application may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present application.
The above-described methods according to embodiments of the present application may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the methods described herein may be stored in such software processes on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that a computer, processor, microprocessor controller, or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by a computer, processor, or hardware, implements the methods described herein. Further, when a general-purpose computer accesses code for implementing the methods illustrated herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the methods illustrated herein.
Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.
The above embodiments are only used for illustrating the embodiments of the present application, and not for limiting the embodiments of the present application, and those skilled in the relevant art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present application, so that all equivalent technical solutions also belong to the scope of the embodiments of the present application, and the scope of patent protection of the embodiments of the present application should be defined by the claims.

Claims (10)

1. A model training method, comprising:
acquiring first desensitization training data;
performing initial training on a target neural network by using the first desensitization training data, wherein the initial training is used for adjusting parameters of the target neural network;
carrying out safety processing on output data of a first part of networks in the initially trained target neural network to obtain second desensitization training data;
performing fine tuning training on a second partial network in the initially trained target neural network by using the second desensitization training data, wherein the input layer of the first partial network is connected with the input layer of the second partial network, and the fine tuning training is used for further adjusting the parameters of the second partial network through adjustment;
a data desensitization model is determined from the initially trained first partial network and the second partial network trained via the fine tuning.
2. The method of claim 1, wherein the securely processing the output data of the first partial network of the initially trained target neural network to obtain second desensitization training data comprises:
carrying out differential privacy processing on output data of the initially trained first part of network to obtain scrambled representation of the output data;
and carrying out noise processing on the scrambling representation to obtain the second desensitization training data.
3. The method of claim 2, wherein the differentially privacy processing the output data of the initially trained first partial network to obtain the scrambled representation of the output data comprises:
inputting output data of the initially trained first partial network and reference training data into a target generation countermeasure network for training;
determining reference training data as a scrambled representation of the output data if a criterion of a discriminator of the target countermeasure generating network is met.
4. The method of claim 1, wherein the initially trained first partial network is for deployment in a trusted execution environment and the second partial network trained via trimming is for deployment in an accelerated execution environment.
5. A method of data desensitization, comprising:
acquiring data to be desensitized;
inputting the data to be desensitized into a first partial network in the data desensitization model, resulting in output data of the first partial network, the data desensitization model being trained by the method of any of claims 1-4;
and carrying out safety processing on the output data, and inputting the output data into a second part network in the data desensitization model to obtain desensitization data.
6. The method of claim 5, wherein the acquiring data to be desensitized comprises:
and acquiring the data to be desensitized into a trusted execution environment, wherein the first part of network is deployed in the trusted execution environment.
7. The method of claim 5, wherein the securely processing the output data comprises:
obtaining the output data from the trusted execution environment into an accelerated execution environment;
and performing safety processing on the output data in the accelerated execution environment.
8. A model training apparatus comprising:
the acquisition module acquires first desensitization training data;
the initial training module is used for carrying out initial training on a target neural network by utilizing the first desensitization training data, and the initial training is used for adjusting parameters of the target neural network;
the safety processing module is used for carrying out safety processing on output data of a first part of networks in the initially trained target neural network to obtain second desensitization training data;
a fine tuning training module, which utilizes the second desensitization training data to perform fine tuning training on a second part network in the initially trained target neural network, wherein the input layer of the first part network is connected with the input layer of the second part network, and the fine tuning training is used for further adjusting the parameters of the second part network through adjustment;
a module determination module determines a data desensitization model based on the initially trained first partial network and the second partial network trained via the fine tuning.
9. A data desensitization apparatus, comprising:
the acquisition module acquires data to be desensitized;
a first desensitization module, inputting the data to be desensitized into a first partial network in the data desensitization model, resulting in output data of the first partial network, the data desensitization model being trained by the method of any one of claims 1-4;
and the second desensitization module is used for carrying out safety processing on the output data and inputting the output data into a second part network in the data desensitization model to obtain desensitization data.
10. A computer storage medium having stored thereon a computer program which, when executed by a processor, carries out the method of any one of claims 1 to 7.
CN202110564134.6A 2021-05-24 2021-05-24 Model training and desensitizing method and device, electronic equipment and storage medium Active CN113420322B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110564134.6A CN113420322B (en) 2021-05-24 2021-05-24 Model training and desensitizing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110564134.6A CN113420322B (en) 2021-05-24 2021-05-24 Model training and desensitizing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113420322A true CN113420322A (en) 2021-09-21
CN113420322B CN113420322B (en) 2023-09-01

Family

ID=77712754

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110564134.6A Active CN113420322B (en) 2021-05-24 2021-05-24 Model training and desensitizing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113420322B (en)

Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284684A (en) * 2018-08-21 2019-01-29 Oppo广东移动通信有限公司 A kind of information processing method, device and computer storage medium
EP3471005A1 (en) * 2017-10-13 2019-04-17 Nokia Technologies Oy Artificial neural network
CN110175623A (en) * 2019-04-10 2019-08-27 阿里巴巴集团控股有限公司 Desensitization process method and device based on image recognition
CN110378138A (en) * 2019-07-22 2019-10-25 上海鹰瞳医疗科技有限公司 Data encryption, decryption method and neural network training method and equipment
KR102034827B1 (en) * 2019-05-14 2019-11-18 주식회사 뷰노 Method for improving reproducibility of trained deep neural network model and apparatus using the same
US20190392305A1 (en) * 2018-06-25 2019-12-26 International Business Machines Corporation Privacy Enhancing Deep Learning Cloud Service Using a Trusted Execution Environment
CN110807207A (en) * 2019-10-30 2020-02-18 腾讯科技(深圳)有限公司 Data processing method and device, electronic equipment and storage medium
CN111159770A (en) * 2019-12-31 2020-05-15 医渡云(北京)技术有限公司 Text data desensitization method, device, medium and electronic equipment
CN111177792A (en) * 2020-04-10 2020-05-19 支付宝(杭州)信息技术有限公司 Method and device for determining target business model based on privacy protection
CN111191275A (en) * 2019-11-28 2020-05-22 深圳云安宝科技有限公司 Sensitive data identification method, system and device
CN111260053A (en) * 2020-01-13 2020-06-09 支付宝(杭州)信息技术有限公司 Method and apparatus for neural network model training using trusted execution environments
CN111291416A (en) * 2020-05-09 2020-06-16 支付宝(杭州)信息技术有限公司 Method and device for preprocessing data of business model based on privacy protection
CN111832591A (en) * 2019-04-23 2020-10-27 创新先进技术有限公司 Machine learning model training method and device
CN111898145A (en) * 2020-07-22 2020-11-06 苏州浪潮智能科技有限公司 Neural network model training method, device, equipment and medium
CN112100628A (en) * 2020-11-16 2020-12-18 支付宝(杭州)信息技术有限公司 Method and device for protecting safety of neural network model
CN112115509A (en) * 2020-09-11 2020-12-22 青岛海信电子产业控股股份有限公司 Data generation method and device
CN112132270A (en) * 2020-11-24 2020-12-25 支付宝(杭州)信息技术有限公司 Neural network model training method, device and system based on privacy protection
CN112395645A (en) * 2020-11-30 2021-02-23 中国民航信息网络股份有限公司 Data desensitization processing method and device
CN112417414A (en) * 2020-12-04 2021-02-26 支付宝(杭州)信息技术有限公司 Privacy protection method, device and equipment based on attribute desensitization
US20210073393A1 (en) * 2019-09-09 2021-03-11 Kazuhm, Inc. Encryption for machine learning model inputs
CN112528318A (en) * 2020-11-27 2021-03-19 国家电网有限公司大数据中心 Image desensitization method and device and electronic equipment
CN112784990A (en) * 2021-01-22 2021-05-11 支付宝(杭州)信息技术有限公司 Training method of member inference model
US20210150269A1 (en) * 2019-11-18 2021-05-20 International Business Machines Corporation Anonymizing data for preserving privacy during use for federated machine learning

Patent Citations (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3471005A1 (en) * 2017-10-13 2019-04-17 Nokia Technologies Oy Artificial neural network
US20190392305A1 (en) * 2018-06-25 2019-12-26 International Business Machines Corporation Privacy Enhancing Deep Learning Cloud Service Using a Trusted Execution Environment
CN112106076A (en) * 2018-06-25 2020-12-18 国际商业机器公司 Privacy-enhanced deep learning cloud service using trusted execution environments
CN109284684A (en) * 2018-08-21 2019-01-29 Oppo广东移动通信有限公司 A kind of information processing method, device and computer storage medium
CN110175623A (en) * 2019-04-10 2019-08-27 阿里巴巴集团控股有限公司 Desensitization process method and device based on image recognition
CN111832591A (en) * 2019-04-23 2020-10-27 创新先进技术有限公司 Machine learning model training method and device
KR102034827B1 (en) * 2019-05-14 2019-11-18 주식회사 뷰노 Method for improving reproducibility of trained deep neural network model and apparatus using the same
CN110378138A (en) * 2019-07-22 2019-10-25 上海鹰瞳医疗科技有限公司 Data encryption, decryption method and neural network training method and equipment
US20210073393A1 (en) * 2019-09-09 2021-03-11 Kazuhm, Inc. Encryption for machine learning model inputs
CN110807207A (en) * 2019-10-30 2020-02-18 腾讯科技(深圳)有限公司 Data processing method and device, electronic equipment and storage medium
US20210150269A1 (en) * 2019-11-18 2021-05-20 International Business Machines Corporation Anonymizing data for preserving privacy during use for federated machine learning
CN111191275A (en) * 2019-11-28 2020-05-22 深圳云安宝科技有限公司 Sensitive data identification method, system and device
CN111159770A (en) * 2019-12-31 2020-05-15 医渡云(北京)技术有限公司 Text data desensitization method, device, medium and electronic equipment
CN111260053A (en) * 2020-01-13 2020-06-09 支付宝(杭州)信息技术有限公司 Method and apparatus for neural network model training using trusted execution environments
CN111177792A (en) * 2020-04-10 2020-05-19 支付宝(杭州)信息技术有限公司 Method and device for determining target business model based on privacy protection
CN111291416A (en) * 2020-05-09 2020-06-16 支付宝(杭州)信息技术有限公司 Method and device for preprocessing data of business model based on privacy protection
CN111898145A (en) * 2020-07-22 2020-11-06 苏州浪潮智能科技有限公司 Neural network model training method, device, equipment and medium
CN112115509A (en) * 2020-09-11 2020-12-22 青岛海信电子产业控股股份有限公司 Data generation method and device
CN112100628A (en) * 2020-11-16 2020-12-18 支付宝(杭州)信息技术有限公司 Method and device for protecting safety of neural network model
CN112132270A (en) * 2020-11-24 2020-12-25 支付宝(杭州)信息技术有限公司 Neural network model training method, device and system based on privacy protection
CN112528318A (en) * 2020-11-27 2021-03-19 国家电网有限公司大数据中心 Image desensitization method and device and electronic equipment
CN112395645A (en) * 2020-11-30 2021-02-23 中国民航信息网络股份有限公司 Data desensitization processing method and device
CN112417414A (en) * 2020-12-04 2021-02-26 支付宝(杭州)信息技术有限公司 Privacy protection method, device and equipment based on attribute desensitization
CN112784990A (en) * 2021-01-22 2021-05-11 支付宝(杭州)信息技术有限公司 Training method of member inference model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张煜;吕锡香;邹宇聪;李一戈;: "基于生成对抗网络的文本序列数据集脱敏", 网络与信息安全学报, no. 04 *
毛典辉;李子沁;蔡强;薛子育;: "基于DCGAN反馈的深度差分隐私保护方法", 北京工业大学学报, no. 06 *
袁秋壮;魏松杰;罗娜;: "基于深度学习神经网络的SAR星上目标识别系统研究", 上海航天, no. 05 *

Also Published As

Publication number Publication date
CN113420322B (en) 2023-09-01

Similar Documents

Publication Publication Date Title
US9881150B2 (en) Method and device for verifying the integrity of platform software of an electronic device
US20190182241A1 (en) Anonymizing biometric data for use in a security system
KR101476948B1 (en) System and method for tamper-resistant booting
US20020112183A1 (en) Apparatus and method for authenticating access to a network resource
EP3933624B1 (en) Blockchain-based identity verification method and related hardware
US10373135B2 (en) System and method for performing secure online banking transactions
JP2008502070A (en) Biometric template similarity based on feature location
US9485255B1 (en) Authentication using remote device locking
CN111666591A (en) Online underwriting data security processing method, system, equipment and storage medium
US20220052841A1 (en) Matching system, client and server
CN117972787A (en) Large model knowledge base access control method and system based on JWT
US20210306147A1 (en) Authentication using transformation verification
CN113420322B (en) Model training and desensitizing method and device, electronic equipment and storage medium
CN100442305C (en) Biometric template similarity based on feature locations
Lee et al. A study on a secure USB mechanism that prevents the exposure of authentication information for smart human care services
CN114070571A (en) Method, device, terminal and storage medium for establishing connection
CN109344593B (en) Biological information verification method, verification server and entry and verification client
CN110689351A (en) Financial service verification system and financial service verification method
CN114567451B (en) Identity verification method, identity verification device, computer equipment and storage medium
US20240356752A1 (en) Encoded animated images and methods of generating, displaying, and reading encoded animated images, in particular for authorizing operations on online services
WO2022172096A1 (en) Method and system for processing reference faces
KR100353730B1 (en) Apparatus and method for protection of stolen fingerprint data in fingerprint authentication system
Muhammad Comprehensive Study on Mobile Security Risks and Biometrics Techniques
Karthika et al. Authorization of Aadhar data using Diffie Helman key with enhanced security concerns
Zieja et al. PORTABLE BIOMETRIC MODULE SOFTWARE FOR MILITARY AVIATION SUPPORT SYSTEM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40069599

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240312

Address after: 51 Belarusian Pasha Road, Singapore, Lai Zan Da Building 1 # 03-06, Postal Code 189554

Patentee after: Alibaba Innovation Co.

Country or region after: Singapore

Address before: Room 01, 45th Floor, AXA Building, 8 Shanton Road, Singapore

Patentee before: Alibaba Singapore Holdings Ltd.

Country or region before: Singapore

TR01 Transfer of patent right