Detailed Description
The embodiment of the specification provides a data encryption method, a machine learning model training method, a device and electronic equipment.
In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any inventive step based on the embodiments of the present disclosure, shall fall within the scope of protection of the present application.
Fig. 1 is a schematic diagram of an overall architecture related to the solution of the present specification in a practical application scenario. In the overall architecture, three parts are mainly involved: the data to be encrypted, the equipment where the self-encoder is located and the encrypted data corresponding to the data to be encrypted. The data to be encrypted is input into the equipment where the self-encoder is located for processing, and the encrypted data corresponding to the data to be encrypted can be obtained.
Fig. 2 is a schematic flowchart of a data encryption method provided in an embodiment of the present disclosure. Possible execution subjects of the process include, but are not limited to, the following devices that can be servers or terminals: personal computers, medium-sized computers, computer clusters, mobile phones, tablet computers, intelligent wearable devices, car machines and the like.
The flow in fig. 2 may include the following steps:
s202: and inputting the data to be encrypted into the self-encoder for processing.
In the embodiment of the present specification, the self-encoder is implemented based on a neural network, which is a neural network model, and by training the self-encoder, a nonlinear machine learning algorithm with input equal to output (in practical application, a certain error is allowed) can be implemented.
Intermediate states between the input and the output can contain useful information of the input, according to which principle data encryption can be realized on the basis of the self-encoder as well as the intermediate states.
In the embodiment of the present specification, the data to be encrypted may be the original data itself described in the background art; the data to be encrypted may also be data obtained by performing corresponding preprocessing on the original data in order to adapt to the scheme of the present specification, where the preprocessing may be, for example, formatting processing, data cleaning processing, feature extraction processing, and the like.
S204: and acquiring the neural network hidden layer data generated by the self-encoder in the processing process.
In this embodiment, the self-encoder may include at least one hidden layer of a neural network, and in the processing, the input data to be encrypted may sequentially pass through the input layer, the hidden layers, and the output layer of the self-encoder, each layer may include a plurality of nodes, and the nodes may calculate data from an upper node, for example, assign weights to each data, and perform linear or nonlinear operations, and the like. And the data obtained by the node calculation of the hidden layer is the hidden layer data of the neural network, and reflects the intermediate state.
It should be noted that, in step S204, it is not necessary to acquire all the neural network hidden layer data generated from the encoder during the processing. For example, if the self-encoder includes a plurality of hidden layers, the neural network hidden layer data generated by only one of the hidden layers may be obtained.
S206: and obtaining encrypted data corresponding to the data to be encrypted according to the neural network hidden layer data.
In the embodiment of the present specification, the neural network hidden layer data may be directly used as encrypted data corresponding to the data to be encrypted, or the neural network hidden layer data may be further processed to obtain encrypted data corresponding to the data to be encrypted. For the former way, step S206 may not include an action that needs to be actually performed, but only represents that the neural network hidden data obtained in step S204 is the encrypted data corresponding to the data to be encrypted. For the latter method, for example, weighted calculation may be performed on hidden data of a neural network in a plurality of hidden layers, and the calculated data may be used as encrypted data corresponding to data to be encrypted.
When there are a plurality of hidden layers, the hidden layer data of any one of the hidden layers may be used as the obtained encrypted data. In a preferred embodiment, the hidden layer data of the hidden layer closest to the output layer may be used as the obtained encrypted data.
The encrypted data correspondingly obtained by the method of fig. 2 can contain useful information of the data to be encrypted, because: during the layer-by-layer processing of the data to be encrypted in self-encoding, although the data is transformed, the main characteristics (belonging to useful information) are still kept. Therefore, the encrypted data obtained in this way has practical significance for external presentation, and has good practicability. The outward presentation here can be understood as outward output to the user of the data. The user of the data may train the model with the encrypted data. The trained model can be used in some practical scenarios, such as risk assessment scenarios, credit prediction scenarios, and the like.
Based on the method of fig. 2, the present specification also provides some specific embodiments of the method, and further provides the following descriptions.
For convenience of understanding, the embodiments of the present specification provide a schematic structural diagram of an auto encoder, as shown in fig. 3.
The self-encoder in fig. 3 includes an input layer, an output layer, and a hidden layer. x is the number of1、x2、x3、x4、x5、x6"+ 1" represents the input data for each dimension of the self-encoder, x1'、x'2、x'3、x'4、x'5、x'6Representing the output data for each dimension of the self-encoder. It can be seen that in the self-encoder, the dimension of the hidden layer is lower than that of the input layer, and the dimension of the neural network hidden layer data generated by the hidden layer in the process of processing the input data is correspondingly lower than that of the input data. For the convenience of calculation, input data, neural network hidden data and output data can be generally expressed in a vector form.
In the embodiments of the present specification, the original data is not necessarily represented in a vector form, and is often represented in a data table, a key value peer form. In this case, the original data may be vectorized and then input to the self-encoder for processing.
For example, for step S202, before inputting the encryption to be encrypted from the encoder for processing, the following steps may also be performed: acquiring original data; formatting the original data to obtain a vector representing the original data; the data to be encrypted includes the vector representing the original data.
Assuming that the original data is part of the user's information, there are 6 dimensions: age, hometown, property, annual income, whether there is a car (house), whether there is a loan. The information of the dimensions of each user can be extracted as original data, formatting processing is performed to obtain a 6-dimensional vector corresponding to each user, for example, the 6-dimensional vector of a certain user can be represented as (20-30 years old, 100 ten thousands, 20 ten thousands, car-in and loan-out); for convenience of calculation, each dimension of information in the vector of this example may be further mapped into a number according to a predetermined mapping rule, and then input to the self-encoder.
In the embodiments of the present specification, the self-encoder used for data encryption may be an already trained self-encoder. The process of autoencoder training includes: adjusting the number of hidden layers of the self-encoder and the number of nodes of each hidden layer; inputting sample data into a self-encoder, and training the self-encoder by taking the input data of the self-encoder and corresponding output data as targets.
The target, i.e. the function where the input equals the output, can be expressed as hW,b(x) X, where x represents the input, hW,b(x) The output is represented by ≈ rather than ═ because in practical applications a certain error is allowed, generally the input is substantially equal to the output, which is advantageous for shortening the training time.
Further, the input layer and the output layer of the self-encoder may have the same structure, for example, the input layer and the output layer include the same number of nodes, and the symmetrical structure is favorable for speeding up the training convergence. After training is completed, the single-node input of each node of the input layer is basically equal to the single-node output of the corresponding node of the output layer. Of course, if the input layer and the output layer have different structures, it is still possible to successfully learn the above functionNumber hW,b(x)。
In this embodiment, when the self-encoder includes only one hidden layer, for step S204, at least a part of the hidden layer data generated by the hidden layer in the process may be acquired.
When the self-encoder comprises a plurality of hidden layers, one or more hidden layers can be determined in advance to serve as target hidden layers; or, according to the specific situation of the neural network hidden layer data generated by the hidden layers in the processing process, selecting one or more hidden layers as target hidden layers; further, for step S204, at least part of the neural network hidden layer data generated by the target hidden layer in the processing procedure may be acquired.
For the first mode in the previous paragraph. For example, the hidden layer with the lowest dimension may be selected as the target hidden layer; for another example, according to the requirement of the encryption degree, if the encryption degree is desired to be higher, the more central hidden layer may be selected as the target hidden layer. For the second approach in the previous paragraph. For example, if some dimensions in the neural network hidden layer data corresponding to a certain hidden layer directly expose the corresponding part of the original information, the hidden layer can be excluded and the target hidden layer can be determined in the remaining hidden layers.
In fig. 3, since there is only one hidden layer, the hidden layer is the target hidden layer, and the dimension of the target hidden layer is set to be lower than that of the input layer, which has the advantages that: the features of the input data are gathered, so that the useful information is extracted, and the dimensionality and the data volume of the encrypted data are reduced.
It should be noted that the dimension of the target hidden layer is lower than that of the input layer is not a requirement that must be satisfied in the solution of this specification, and it is also possible that the dimension of the target hidden layer is equal to or greater than that of the input layer. For example, fig. 4 is a schematic structural diagram of another self-encoder provided in an embodiment of this specification, where hidden layers of the self-encoder in fig. 4 include three layers, and a dimension of each hidden layer is higher than a dimension of an input layer, and one or more of the three layers may be used as a target hidden layer.
According to the above description, the embodiment of the present specification further provides a schematic diagram of an implementation scheme of the data encryption method in a practical application scenario, as shown in fig. 5a and fig. 5 b.
FIG. 5a is a schematic diagram of a process for training a self-encoder, which has been described above; fig. 5b is a schematic flow chart of data encryption based on the trained auto-encoder.
In fig. 5b, the data to be encrypted is data containing multiple dimensions of "age", "hometown" and "asset", and may be specifically represented as a multidimensional vector, such as a 6-dimensional vector in the above example. In training the self-encoder, a vector having the same structure as the data to be encrypted may be used as sample data.
The process in fig. 5b mainly comprises the following steps:
inputting the data to be encrypted in a vector form into a trained self-encoder for processing; and acquiring hidden layer data generated in a self-encoder in the processing process as encrypted data corresponding to the data to be encrypted. The hidden layer data can preferably be represented as a vector having dimensions lower than those of the data to be encrypted, the resulting encrypted data being able to contain useful information of the data to be encrypted.
Assuming that the above-mentioned 6-dimensional vector is inputted into the self-encoder in fig. 3, a 4-dimensional vector can be obtained as corresponding encrypted data, for example, (0.21,0.43,0.23,0.98), etc., each piece of data in the 4-dimensional vector can reflect the main characteristics of 1-dimensional data or multi-dimensional data in the 6-dimensional vector to some extent, which is useful for external presentation purposes.
In the embodiment of the present specification, the encrypted data may be used for training the machine learning model instead of the corresponding data to be encrypted, in addition to being used for external presentation. Based on such a concept, an embodiment of the present specification further provides a machine learning model training method, as shown in fig. 6, and fig. 6 is a schematic flow chart of the machine learning model training method.
The flow in fig. 6 may include the following steps:
s602: and acquiring encrypted data, wherein the encrypted data is obtained by inputting the corresponding data to be encrypted into a self-encoder for processing and according to the neural network hidden layer data generated by the self-encoder in the processing process.
S604: training a machine learning model using the encrypted data.
Because the encrypted data comprises the main information of the data to be encrypted corresponding to the encrypted data, the effect close to the effect of training the machine learning model by using the encrypted data can be achieved, and the data to be encrypted does not need to be exposed in the training process, so that the privacy of the data to be encrypted is facilitated.
The data encryption method and the machine learning model training method provided by the embodiment of the present specification are described above, and based on the same idea, the embodiment of the present specification further provides corresponding apparatuses, as shown in fig. 7 and fig. 8.
Fig. 7 is a schematic structural diagram of a data encryption apparatus corresponding to fig. 2 provided in an embodiment of the present specification, where a dashed box represents an optional module, and the apparatus may be located on an execution body of the flow in fig. 2, and includes:
the processing module 701 inputs data to be encrypted into the self-encoder for processing;
an obtaining module 702, configured to obtain the neural network hidden layer data generated by the self-encoder in the processing process;
the obtaining module 703 is configured to obtain, according to the neural network hidden layer data, encrypted data corresponding to the data to be encrypted.
Optionally, the encrypted data is used to train a machine learning model.
Optionally, the apparatus further comprises:
a formatting module 704, configured to obtain original data before the processing module 701 inputs the data to be encrypted into the self-encoder for processing, and format the original data to obtain a vector representing the original data; the data to be encrypted includes the vector representing the original data.
Optionally, the self-encoder of the data input to be encrypted is a trained self-encoder.
Optionally, the obtaining module 702 obtains the neural network hidden layer data generated by the self-encoder in the processing process, specifically including:
the obtaining module 702 determines a target hidden layer among hidden layers included in the self-encoder, and obtains the neural network hidden layer data generated by the target hidden layer in the processing process.
Optionally, the dimension of the target hidden layer is lower than the dimension of the input layer of the self-encoder.
Optionally, the obtaining module 702 obtains the neural network hidden layer data generated by the self-encoder in the processing process, specifically including:
the obtaining module 702 obtains the neural network hidden layer data generated by the hidden layer closest to the output layer in the self-encoder during the processing.
Optionally, the obtaining module 703 obtains, according to the neural network hidden layer data, encrypted data corresponding to the data to be encrypted, which specifically includes:
optionally, the neural network hidden layer data is a vector.
Fig. 8 is a schematic structural diagram of a machine learning model training apparatus corresponding to fig. 6 provided in an embodiment of the present specification, where the apparatus may be located on an execution body of the flowchart in fig. 6, and includes:
an obtaining module 801, configured to obtain encrypted data, where the encrypted data is processed by inputting data to be encrypted corresponding to the encrypted data into a self-encoder, and is obtained according to neural network hidden layer data generated by the self-encoder in the processing process;
a training module 802 for training a machine learning model using the encrypted data.
Based on the same idea, embodiments of the present specification further provide an electronic device corresponding to fig. 2, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
inputting data to be encrypted into a self-encoder for processing;
acquiring neural network hidden layer data generated by the self-encoder in the processing process;
and obtaining encrypted data corresponding to the data to be encrypted according to the neural network hidden layer data.
Based on the same idea, embodiments of the present specification further provide an electronic device corresponding to fig. 6, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
the acquisition module is used for acquiring encrypted data, inputting the corresponding data to be encrypted into a self-encoder for processing, and acquiring the encrypted data according to the neural network hidden layer data generated by the self-encoder in the processing process;
and the training module is used for training a machine learning model by using the encrypted data.
Based on the same idea, the embodiments of the present specification further provide a non-volatile computer storage medium corresponding to fig. 2, storing computer-executable instructions configured to:
inputting data to be encrypted into a self-encoder for processing;
acquiring neural network hidden layer data generated by the self-encoder in the processing process;
and obtaining encrypted data corresponding to the data to be encrypted according to the neural network hidden layer data.
Based on the same idea, the embodiments of the present specification further provide a non-volatile computer storage medium corresponding to fig. 6, in which computer-executable instructions are stored, and the computer-executable instructions are configured to:
acquiring encrypted data, wherein the encrypted data is obtained by inputting corresponding data to be encrypted into a self-encoder for processing and according to neural network hidden layer data generated by the self-encoder in the processing process;
training a machine learning model using the encrypted data.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiments of the apparatus, the electronic device, and the nonvolatile computer storage medium, since they are substantially similar to the embodiments of the method, the description is simple, and the relevant points can be referred to the partial description of the embodiments of the method.
The apparatus, the electronic device, the nonvolatile computer storage medium and the method provided in the embodiments of the present description correspond to each other, and therefore, the apparatus, the electronic device, and the nonvolatile computer storage medium also have similar advantageous technical effects to the corresponding method.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), vhal (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the various elements may be implemented in the same one or more software and/or hardware implementations of the present description.
As will be appreciated by one skilled in the art, the present specification embodiments may be provided as a method, system, or computer program product. Accordingly, embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The description has been presented with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only an example of the present specification, and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.