CN116633571A - Privacy information protection method and device based on homomorphic encryption and unsupervised feature selection - Google Patents

Privacy information protection method and device based on homomorphic encryption and unsupervised feature selection Download PDF

Info

Publication number
CN116633571A
CN116633571A CN202210151055.7A CN202210151055A CN116633571A CN 116633571 A CN116633571 A CN 116633571A CN 202210151055 A CN202210151055 A CN 202210151055A CN 116633571 A CN116633571 A CN 116633571A
Authority
CN
China
Prior art keywords
feature selection
matrix
homomorphic encryption
feature
privacy information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210151055.7A
Other languages
Chinese (zh)
Inventor
龙春
魏金侠
万巍
赵静
杨帆
肖建平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Network Information Center of CAS
Original Assignee
Computer Network Information Center of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Network Information Center of CAS filed Critical Computer Network Information Center of CAS
Priority to CN202210151055.7A priority Critical patent/CN116633571A/en
Publication of CN116633571A publication Critical patent/CN116633571A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/008Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a privacy information protection method and device based on homomorphic encryption and unsupervised feature selection. When a user carries out outsourcing treatment on private information, encrypting the characteristic selection operation on the private information by adopting a preset homomorphic encryption algorithm, and uploading the encrypted characteristic selection operation to a server for carrying out preset data processing operation; and/or when the user performs feature selection processing on the privacy information, protecting the privacy information while performing feature selection processing on the privacy information based on the preset homomorphic encryption algorithm and by adopting an unsupervised feature selection model. The application solves the technical problems that the dimension of the feature set used for modeling is higher and the information leakage is easy to cause in the processing process. The application solves the problem of feature selection under the condition of scarce labels and also realizes the privacy protection of feature sensitive information.

Description

Privacy information protection method and device based on homomorphic encryption and unsupervised feature selection
Technical Field
The application relates to the field of privacy information protection, in particular to a privacy information protection method and device based on homomorphic encryption and unsupervised feature selection.
Background
It is often desirable in a network environment to build an intrusion detection model for traffic data collected in real time.
The inventor finds that sensitive information is easy to leak when the flow data is subjected to outsourcing treatment or feature selection. Meanwhile, the data set of the flow data has higher characteristic dimension, so that the calculation efficiency and time of the classifier are also affected.
Aiming at the problems that the dimension of a feature set used for modeling in the related technology is high and information leakage is easily caused in the processing process, no effective solution is proposed at present.
Disclosure of Invention
The application mainly aims to provide a privacy information protection method and device based on homomorphic encryption and unsupervised feature selection, which are used for solving the problems that a feature set for modeling is high in dimension and information leakage is easy to cause in a processing process.
To achieve the above object, according to one aspect of the present application, there is provided a privacy information protection method based on homomorphic encryption and unsupervised feature selection.
The privacy information protection method based on homomorphic encryption and unsupervised feature selection according to the present application comprises: when the user carries out outsourcing treatment on the privacy information, the privacy information is encrypted by adopting a preset homomorphic encryption algorithm to carry out characteristic selection operation, and then the encrypted privacy information is uploaded to a server to carry out preset data processing operation; and/or when the user performs feature selection processing on the privacy information, protecting the privacy information while performing feature selection processing on the privacy information based on the preset homomorphic encryption algorithm and by adopting an unsupervised feature selection model.
Further, the unsupervised feature selection model further comprises: target feature data is selected based on the self-representation of homomorphic encryption features.
Further, when the user performs feature selection processing on the private information, protecting the private information while performing the feature selection processing based on the preset homomorphic encryption algorithm and by adopting an unsupervised feature selection model, including:
based on the unsupervised feature selection model, obtaining a feature matrix X; determining importance of different features through the feature matrix X and the projection matrix A obtained based on the preset homomorphic encryption algorithm, wherein the a i Represents the ith row, a, of matrix A i =(a i1 ,a i2 ,…,a iM ),i∈[1,D]。
Further, the projection matrix a includes:
based on the importance of different features reflected by the column vector of the matrix A, when the matrix A has k columns which are not 0, the corresponding feature matrix X has k features selected, and the rest are not selected, and the reconstruction loss term isIf a is i For a 0 vector, the corresponding ith dimension feature x i The contribution degree of (2) is 0.
Further, after the feature selection processing, the method further includes: at the same time, the minimum error between the reconstruction matrix AX and the matrix X is satisfied, and the projection matrix A has k columns which are not 0, the constraint optimization problem of the reconstruction loss term can be expressed as follows:
introducing a square matrix r with a dimension of m, wherein elements on a diagonal line of r are 0 or 1, the number of 1 on the diagonal line is k, all row/column corner labels and other elements of a non-zero column in a matrix A are 0, and simultaneously introducing a regularization term to balance the fitting degree of algorithm complexity and parameter optimization, so that the constraint optimization problem is transformed into the following form:
when the ith row element of X is selected, r ii =1, otherwise r ii =0,λ A The fitting degree is used for balancing algorithm complexity and parameter optimization.
Further, when the user performs outsourcing processing on the private information, after encrypting the feature selection operation by adopting a preset homomorphic encryption algorithm on the private information, uploading the encrypted feature selection operation to a server for performing a preset data processing operation, and further comprising: mapping the actual data of the privacy information to a plaintext space M through a preset preprocessing algorithm; the floating point data is required to be converted into binary data in a plaintext space, so that D ' is a matrix after the transformation of the data feature matrix D, namely, the elements of the feature matrix D ' after the transformation all belong to the plaintext space M, and each element of the matrix D ' can be expressed as
d j =(d j1 ,d j2 ,L,d jN ),d ij ∈{0,1}。
Further, after the user performs outsourcing processing on the privacy information and encrypts the feature selection operation by adopting a preset homomorphic encryption algorithm on the privacy information, the feature selection operation is uploaded to a server for performing preset data processing operation, and the method further comprises the following steps: when the characteristic selection operation is encrypted by adopting a preset homomorphic encryption algorithm, the same plaintext is encrypted for a plurality of times to obtain a plurality of different ciphertext results.
To achieve the above object, according to another aspect of the present application, there is provided a privacy information protecting apparatus based on homomorphic encryption and unsupervised feature selection.
The privacy information protection apparatus based on homomorphic encryption and unsupervised feature selection according to the present application comprises:
the homomorphic encryption module is used for encrypting the feature selection operation by adopting a preset homomorphic encryption algorithm on the privacy information when the user carries out outsourcing treatment on the privacy information, and then uploading the encrypted feature selection operation to the server for carrying out preset data processing operation;
and the unsupervised feature selection module is used for protecting the privacy information based on the preset homomorphic encryption algorithm and by adopting an unsupervised feature selection model when the user performs feature selection processing on the privacy information.
In order to achieve the above object, according to yet another aspect of the present application, there is provided a computer readable storage medium having a computer program stored therein, wherein the computer program is arranged to execute the method when run.
To achieve the above object, according to a further aspect of the present application, there is provided an electronic device comprising a memory, in which a computer program is stored, and a processor arranged to run the computer program to perform the method.
In the embodiment of the application, when the privacy information is outsourced by a user, the privacy information is encrypted by a preset homomorphic encryption algorithm and then uploaded to a server for carrying out a preset data processing operation, or when the privacy information is subjected to the feature selection processing by the user, the privacy information is protected while the feature selection processing is carried out by the user on the basis of the preset homomorphic encryption algorithm and an unsupervised feature selection model. Through presetting homomorphic encryption algorithm and adopting an unsupervised feature selection model, the purposes of homomorphic encryption of sensitive information and unsupervised feature selection of feature data are achieved, so that the technical effect of protecting user privacy data is achieved, and the technical problems that feature set dimension for modeling is high and information leakage is easy to cause in the processing process are solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application, are incorporated in and constitute a part of this specification. The drawings and their description are illustrative of the application and are not to be construed as unduly limiting the application. In the drawings:
fig. 1 is a schematic hardware structure diagram of a privacy information protection method based on homomorphic encryption and unsupervised feature selection according to an embodiment of the present application;
FIG. 2 is a flow diagram of a privacy information protection method based on homomorphic encryption and unsupervised feature selection in accordance with an embodiment of the present application;
fig. 3 is a schematic structural diagram of a privacy information protecting apparatus based on homomorphic encryption and unsupervised feature selection according to an embodiment of the present application.
Detailed Description
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the application herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In the present application, the terms "upper", "lower", "left", "right", "front", "rear", "top", "bottom", "inner", "outer", "middle", "vertical", "horizontal", "lateral", "longitudinal" and the like indicate an azimuth or a positional relationship based on that shown in the drawings. These terms are only used to better describe the present application and its embodiments and are not intended to limit the scope of the indicated devices, elements or components to the particular orientations or to configure and operate in the particular orientations.
Also, some of the terms described above may be used to indicate other meanings in addition to orientation or positional relationships, for example, the term "upper" may also be used to indicate some sort of attachment or connection in some cases. The specific meaning of these terms in the present application will be understood by those of ordinary skill in the art according to the specific circumstances.
Furthermore, the terms "mounted," "configured," "provided," "connected," "coupled," and "sleeved" are to be construed broadly. For example, it may be a fixed connection, a removable connection, or a unitary construction; may be a mechanical connection, or an electrical connection; may be directly connected, or indirectly connected through intervening media, or may be in internal communication between two devices, elements, or components. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art according to the specific circumstances.
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
As shown in fig. 1, the hardware structure of the privacy information protection method based on homomorphic encryption and unsupervised feature selection in the embodiment of the present application includes: user 100, local computer 200, cloud server 300. Traffic data is generated when a user 100 accesses the cloud server 300 through the local computer 200. The rapid development of cloud services provides users with more diversified data processing modes. Because of the wide variety of users, the data sets they provide will also cover multiple types of information, such as email messages, background user database information, personal health status, etc., related to personal privacy or business confidentiality. When the user performs the outsourcing process or the feature selection process on the information (both of which are synchronized with the cloud server 300), there is a risk of leakage of the sensitive information contained therein.
As shown in fig. 2, the method includes steps S201 to S202 as follows:
step S201, when a user carries out outsourcing processing on the privacy information, the privacy information is encrypted by adopting a preset homomorphic encryption algorithm to carry out characteristic selection operation, and then the encrypted privacy information is uploaded to a server to carry out preset data processing operation;
step S202, and/or when the user performs feature selection processing on the private information, protecting the private information while performing the feature selection processing based on the preset homomorphic encryption algorithm and by adopting an unsupervised feature selection model.
The whole process of feature selection is carried out under the encryption condition of the feature matrix elements, so that the cloud server cannot acquire any feature information when the unsupervised feature selection is realized, and although the non-zero element of k can know which dimension features are selected as key contents, the specific contents of the features are not acquired all the time, and finally the unsupervised feature selection privacy protection is realized.
From the above description, it can be seen that the following technical effects are achieved:
when the user performs outsourcing processing on the private information, the characteristic selection operation is encrypted by a preset homomorphic encryption algorithm on the private information, and then the encrypted private information is uploaded to a server to perform a mode of preset data processing operation, or when the user performs characteristic selection processing on the private information, the private information is protected while the characteristic selection processing is performed on the basis of the preset homomorphic encryption algorithm and an unsupervised characteristic selection model. Through presetting homomorphic encryption algorithm and adopting an unsupervised feature selection model, the purposes of homomorphic encryption of sensitive information and unsupervised feature selection of feature data are achieved, so that the technical effect of protecting user privacy data is achieved, and the technical problems that feature set dimension for modeling is high and information leakage is easy to cause in the processing process are solved.
In the step S201, when the user needs to perform the outsourcing processing on the private information, the feature selection operation is encrypted by the preset homomorphic encryption algorithm and then uploaded to the server for performing the preset data processing operation. That is, encryption is performed by a preset homomorphic encryption algorithm before uploading to the server.
As an alternative embodiment, the homomorphic encryption algorithm includes a key generation process, an encryption process, and a decryption process.
As an alternative embodiment, noise introduced by homomorphic encryption algorithms needs to be removed.
In the step S202, when the user performs the feature selection processing on the private information in another scenario, the private information is protected while the feature selection processing is performed based on the preset homomorphic encryption algorithm and by using an unsupervised feature selection model.
As an alternative implementation manner, it is considered that the data collected in the actual network environment does not have enough category labels, so that the correlation between the features and the labels cannot be used for judging the importance of the features, and further, no way is provided for realizing accurate selection of the features. Meanwhile, the privacy of the feature information is considered, so that the method for selecting the unsupervised feature based on homomorphic encryption in the embodiment of the application is adopted to select important features based on the self-representation of the encrypted features.
As a preference in this embodiment, the unsupervised feature selection model further includes: target feature data is selected based on the self-representation of homomorphic encryption features. I.e. the target feature data selected by the unsupervised feature selection model is selected based on the importance level of the encryption feature itself.
As a preferred aspect of the present embodiment, when the user performs feature selection processing on the private information, protecting the private information while performing the feature selection processing based on the preset homomorphic encryption algorithm and using an unsupervised feature selection model includes: based on the unsupervised feature selection model, obtaining a feature matrix X; determining importance of different features through the feature matrix X and the projection matrix A obtained based on the preset homomorphic encryption algorithm, wherein the a i Represents the ith row, a, of matrix A i =(a i1 ,a i2 ,…,a iM ),i∈[1,D]。
In particular implementation, the column vector of the matrix a may reflect the importance of different features, and when the matrix a has only k columns other than 0, the feature matrix X corresponding to the column vector is selected only by k features, and the rest of the columns are not selected.
As a preferred embodiment, the projection matrix a includes: based on the importance of different features reflected by the column vector of the matrix A, when the matrix A has k columns which are not 0, the corresponding feature matrix X has k features selected, and the rest are not selected, and the reconstruction loss term isIf a is i For a 0 vector, the corresponding ith dimension feature x i The contribution degree of (2) is 0.
As a preferable aspect of the present embodiment, after the feature selection processing, the method further includes:
at the same time, the minimum error between the reconstruction matrix AX and the matrix X is satisfied, and the projection matrix A has k columns which are not 0, the constraint optimization problem of the reconstruction loss term can be expressed as follows:
introducing a square matrix r with a dimension of m, wherein elements on a diagonal line of r are 0 or 1, the number of 1 on the diagonal line is k, all row/column corner labels and other elements of a non-zero column in a matrix A are 0, and simultaneously introducing a regularization term to balance the fitting degree of algorithm complexity and parameter optimization, so that the constraint optimization problem is transformed into the following form:
when the ith row element of X is selected, r ii =1, otherwise r ii =0,λ A The fitting degree is used for balancing algorithm complexity and parameter optimization.
In particular, when the ith row element of X is selected, r ii =1, otherwise r ii =0。λ A The method is used for balancing the fitting degree of algorithm complexity and parameter optimization, and avoids overfitting and overhigh algorithm complexity. And converting the optimization problem, converting the optimization problem into a Lagrange function by using an interleaving direction multiplier method, and solving by using an iterative optimization solving variable method to obtain a k value, a matrix r and a matrix A. According to k non-zero columns of the matrix A, selecting elements in the matrix X, selecting k rows of elements corresponding to the k non-zero columns of the matrix A, wherein the whole characteristic selection process has no relation with column labels corresponding to the characteristics, and unsupervised characteristic selection is realized.
Preferably, in order to verify the reliability of the result, the user performs unsupervised feature selection locally by using the feature matrix which is not homomorphic encrypted, then compares whether k 'non-zero columns of the obtained projection matrix a' are equal to k, and whether the positions of k 'non-zero element columns in a' are completely consistent with the positions of non-zero elements in k in a. If the characteristics are consistent, the selected k-dimensional characteristics are further input into a plurality of detection models to check the validity of the characteristic selection.
As a preferred option in this embodiment, when the user performs outsourcing processing on the private information, after encrypting the feature selection operation by using a preset homomorphic encryption algorithm on the private information, the feature selection operation is uploaded to the server to perform a preset data processing operation, and further includes:
mapping the actual data of the privacy information to a plaintext space M through a preset preprocessing algorithm;
the floating point data is required to be converted into binary data in a plaintext space, so that D ' is a matrix after the transformation of the data feature matrix D, namely, the elements of the feature matrix D ' after the transformation all belong to the plaintext space M, and each element of the matrix D ' can be expressed as
d j =(d j1 ,d j2 ,L,d jN ),d ij ∈{0,1}。
In the specific implementation, let D be the data feature matrix, D j Is the first of the matrix Dj rows each representing one sample, the number of rows representing the number of samples, d j =(d j1 ,d j2 ,L,d jN ),j∈[1,L]Each column represents a 1-dimensional feature, column number i ε [1, N]Representing dimensions of features, each dimension feature being represented by d i To represent. The data in most practical scenarios cannot be directly used as the plaintext of the encryption algorithm, and a certain preprocessing algorithm is required to map the actual data to the plaintext space M. The currently common feature-converted formats include floating-point type data and binary type data, and therefore, it is first necessary to convert floating-point type data into data in the plaintext space (binary). Let D ' be the matrix after the transformation of the data feature matrix D, i.e. the elements of the feature matrix D ' after the transformation all belong to the plaintext space M, each element of the matrix D ' can be represented as the following form D j =(d j1 ,d j2 ,L,d jN ),d ij ∈{0,1}。
As an preference in this embodiment, the step of the user performing outsourcing processing on the private information, encrypting the feature selection operation by using a preset homomorphic encryption algorithm on the private information, and uploading the encrypted feature selection operation to a server to perform a preset data processing operation, further includes: when the characteristic selection operation is encrypted by adopting a preset homomorphic encryption algorithm, the same plaintext is encrypted for a plurality of times to obtain a plurality of different ciphertext results.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
According to an embodiment of the present application, there is also provided a privacy information protecting apparatus for implementing the above method based on homomorphic encryption and unsupervised feature selection, as shown in fig. 3, the apparatus comprising:
the homomorphic encryption module 301 is configured to encrypt the feature selection operation by using a preset homomorphic encryption algorithm on the private information when the user performs outsourcing processing on the private information, and then upload the encrypted feature selection operation to the server for performing a preset data processing operation;
and the unsupervised feature selection module 302 is configured to protect the private information while performing feature selection processing on the private information based on the preset homomorphic encryption algorithm and by adopting an unsupervised feature selection model when the user performs feature selection processing on the private information.
In the homomorphic encryption module 301 of the embodiment of the present application, when a user needs to outsource private information, the feature selection operation is encrypted by the preset homomorphic encryption algorithm, and then uploaded to a server for performing a preset data processing operation. That is, encryption is performed by a preset homomorphic encryption algorithm before uploading to the server.
As an alternative embodiment, the homomorphic encryption algorithm includes a key generation process, an encryption process, and a decryption process.
As an alternative embodiment, noise introduced by homomorphic encryption algorithms needs to be removed.
In another scenario, that is, when the user performs feature selection processing on the private information, the unsupervised feature selection module 302 of the embodiment of the present application protects the private information while performing feature selection processing on the basis of the preset homomorphic encryption algorithm and using an unsupervised feature selection model.
As an alternative implementation manner, it is considered that the data collected in the actual network environment does not have enough category labels, so that the correlation between the features and the labels cannot be used for judging the importance of the features, and further, no way is provided for realizing accurate selection of the features. Meanwhile, the privacy of the feature information is considered, so that the method for selecting the unsupervised feature based on homomorphic encryption in the embodiment of the application is adopted to select important features based on the self-representation of the encrypted features.
For homomorphic encryption module 301, the following procedure is performed:
and (3) key generation: let λ be the security parameter of the encryption system, input λ into the key generation system, output three parameters ρ, η and γ, whereinLet p be an odd integer of η bits, then there is p++2Z+1) I (2 η-1 ,2 η ) Let p be the private key sk, calculate t 0 =q 0 P is the public key, where q 0 ←(2Z+1)I(1,2 γ p)。
Encryption: input plaintext m epsilon {0,1} and public key t 0 From [1,2 respectively γ p) and (-2) ρ ,2 ρ ) Random numbers q' and r are selected, and then ciphertext is calculated and output according to the following formula
C=E(m,pk)=(m+2·r+q′·p)modt 0
Decryption: the ciphertext C and the private key sk are input, and then the plaintext m is output according to the following formula
m=D(C,sk)=(Cmodp)mod2
Because random noise is introduced in the homomorphic encryption process, a plurality of different ciphertext results can be obtained by encrypting the same plaintext for a plurality of times.
The unsupervised feature selection module 302 is configured to perform the following procedure:
let D be the data feature matrix, D j For the j-th row of the matrix D, each row represents one sample, the row represents the number of samples, D j =(d j1 ,d j2 ,L,d jN ),j∈[1,L]Each column represents a 1-dimensional feature, column number i ε [1, N]Representing dimensions of features, each dimension feature being represented by d i To represent. The data in most practical scenarios cannot be directly used as the plaintext of the encryption algorithm, and a certain preprocessing algorithm is required to map the actual data to the plaintext space M. The currently common feature-converted formats include floating-point type data and binary type data, and therefore, it is first necessary to convert floating-point type data into data in the plaintext space (binary). Let D ' be the matrix after the transformation of the data feature matrix D, i.e. the elements of the feature matrix D ' after the transformation all belong to the plaintext space M, each element of the matrix D ' can be represented as the following form D j =(d j1 ,d j2 ,L,d jN ),d ij ∈{0,1}。
Let X be the feature matrix after homomorphic encryption of D' (using the aboveThe homomorphic encryption algorithm encrypts the elements in D', namely each element in X is homomorphic ciphertext form of the corresponding element in the feature matrix, A is projection matrix, and a is a i Represents the ith row, a, of matrix A i =(a i1 ,a i2 ,…,a iM ),i∈[1,D]. The column vector of the matrix A can reflect the importance of different features, when the matrix A has k columns which are not 0, the corresponding feature matrix X has k features selected, and the rest is not selected, the reconstruction loss term can be representedIf a is i For a 0 vector, the corresponding ith dimension feature x i The contribution degree of (2) is 0. In order to simultaneously meet the minimum error between the reconstruction matrix AX and the feature matrix X, and the projection matrix a has only k columns other than 0, the constraint optimization problem of the reconstruction loss term can be expressed as follows:
consider the above A 2,0 The constraint problem cannot directly solve the total optimum solution, and a square matrix r with the dimension of m is introduced, wherein elements on the diagonal of r are 0 or 1, the number of 1 on the diagonal is k, and the row angle marks/column angle marks and other elements in non-zero columns in the matrix A are all 0. And meanwhile, regularization terms are introduced to balance the algorithm complexity and the fitting degree of parameter optimization, so that the optimization problem can be transformed into the following form:
when the ith row element of X is selected, r ii =1, otherwise r ii =0。λ A The method is used for balancing the fitting degree of algorithm complexity and parameter optimization, and avoids overfitting and overhigh algorithm complexity. Converting the optimization problem, converting the optimization problem into a Lagrangian function by using an interleaving direction multiplier method, and using iterationAnd solving by optimizing a method for solving variables to obtain a k value, a matrix r and a matrix A. According to k non-zero columns of the matrix A, selecting elements in the matrix X, selecting k rows of elements corresponding to the k non-zero columns of the matrix A, wherein the whole characteristic selection process has no relation with column labels corresponding to the characteristics, and unsupervised characteristic selection is realized.
In order to verify the reliability of the result, the user performs unsupervised feature selection locally by using the feature matrix which is not homomorphic-encrypted, then compares whether k 'non-zero columns of the obtained projection matrix A' are equal to k, and whether the positions of k 'non-zero element columns in A' are completely consistent with the positions of non-zero elements in k in A. If the characteristics are consistent, the selected k-dimensional characteristics are further input into a plurality of detection models to check the validity of the characteristic selection.
It will be apparent to those skilled in the art that the modules or steps of the application described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, or they may alternatively be implemented in program code executable by computing devices, such that they may be stored in a memory device for execution by the computing devices, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module. Thus, the present application is not limited to any specific combination of hardware and software.
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. A privacy information protection method based on homomorphic encryption and unsupervised feature selection, comprising:
when the user carries out outsourcing treatment on the privacy information, the privacy information is encrypted by adopting a preset homomorphic encryption algorithm to carry out characteristic selection operation, and then the encrypted privacy information is uploaded to a server to carry out preset data processing operation;
and/or when the user performs feature selection processing on the privacy information, protecting the privacy information while performing feature selection processing on the privacy information based on the preset homomorphic encryption algorithm and by adopting an unsupervised feature selection model.
2. The method of claim 1, wherein the unsupervised feature selection model further comprises: target feature data is selected based on the self-representation of homomorphic encryption features.
3. The method according to claim 1, wherein when the user performs feature selection processing on the private information, protecting the private information while performing the feature selection processing based on the preset homomorphic encryption algorithm and using an unsupervised feature selection model, comprises:
based on the unsupervised feature selection model, obtaining a feature matrix X;
determining importance of different features through the feature matrix X and the projection matrix A obtained based on the preset homomorphic encryption algorithm, wherein the a i Represents the ith row, a, of matrix A i =(a i1 ,a i2 ,…,a iM ),i∈[1,D]。
4. A method according to claim 3, wherein the projection matrix a comprises:
based on the importance of different features reflected by the column vector of the matrix A, when the matrix A has k columns which are not 0, the corresponding feature matrix X has k features selected, and the rest are not selected, and the reconstruction loss term isIf a is i For a 0 vector, the corresponding ith dimension feature x i The contribution degree of (2) is 0.
5. A method according to claim 3, further comprising, after said feature selection process:
at the same time, the minimum error between the reconstruction matrix AX and the matrix X is satisfied, and the projection matrix A has k columns which are not 0, the constraint optimization problem of the reconstruction loss term can be expressed as follows:
introducing a square matrix r with a dimension of m, wherein elements on a diagonal line of r are 0 or 1, the number of 1 on the diagonal line is k, all row/column corner labels and other elements of a non-zero column in a matrix A are 0, and simultaneously introducing a regularization term to balance the fitting degree of algorithm complexity and parameter optimization, so that the constraint optimization problem is transformed into the following form:
when the ith row element of X is selected, r ii =1, otherwise r ii =0,λ A The fitting degree is used for balancing algorithm complexity and parameter optimization.
6. The method of claim 1, wherein when the user performs outsourcing processing on the private information, after encrypting the feature selection operation by using a preset homomorphic encryption algorithm on the private information, uploading the encrypted feature selection operation to a server for performing a preset data processing operation, further comprising:
mapping the actual data of the privacy information to a plaintext space M through a preset preprocessing algorithm;
the floating point data is required to be converted into binary data in a plaintext space, so that D ' is a matrix after the transformation of the data feature matrix D, namely, the elements of the feature matrix D ' after the transformation all belong to the plaintext space M, and each element of the matrix D ' can be expressed as
d j =(d j1 ,d j2 ,L,d jN ),d ij ∈{0,1}。
7. The method of claim 1, wherein the step of the user outsourcing the private information to encrypt the feature selection operation by using a preset homomorphic encryption algorithm, and uploading the encrypted feature selection operation to a server to perform a preset data processing operation, further comprises:
when the characteristic selection operation is encrypted by adopting a preset homomorphic encryption algorithm, the same plaintext is encrypted for a plurality of times to obtain a plurality of different ciphertext results.
8. A privacy information protection device based on homomorphic encryption and unsupervised feature selection, comprising:
the homomorphic encryption module is used for encrypting the feature selection operation by adopting a preset homomorphic encryption algorithm on the privacy information when the user carries out outsourcing treatment on the privacy information, and then uploading the encrypted feature selection operation to the server for carrying out preset data processing operation;
and the unsupervised feature selection module is used for protecting the privacy information based on the preset homomorphic encryption algorithm and by adopting an unsupervised feature selection model when the user performs feature selection processing on the privacy information.
9. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program, wherein the computer program is arranged to execute the method of any of the claims 1 to 7 when run.
10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the method of any of the claims 1 to 7.
CN202210151055.7A 2022-02-14 2022-02-14 Privacy information protection method and device based on homomorphic encryption and unsupervised feature selection Pending CN116633571A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210151055.7A CN116633571A (en) 2022-02-14 2022-02-14 Privacy information protection method and device based on homomorphic encryption and unsupervised feature selection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210151055.7A CN116633571A (en) 2022-02-14 2022-02-14 Privacy information protection method and device based on homomorphic encryption and unsupervised feature selection

Publications (1)

Publication Number Publication Date
CN116633571A true CN116633571A (en) 2023-08-22

Family

ID=87612244

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210151055.7A Pending CN116633571A (en) 2022-02-14 2022-02-14 Privacy information protection method and device based on homomorphic encryption and unsupervised feature selection

Country Status (1)

Country Link
CN (1) CN116633571A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117201020A (en) * 2023-11-08 2023-12-08 陕西元镁体信息科技有限公司 Network information security encryption method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117201020A (en) * 2023-11-08 2023-12-08 陕西元镁体信息科技有限公司 Network information security encryption method and system
CN117201020B (en) * 2023-11-08 2024-01-26 陕西元镁体信息科技有限公司 Network information security encryption method and system

Similar Documents

Publication Publication Date Title
Zhang et al. A survey on federated learning
US11196541B2 (en) Secure machine learning analytics using homomorphic encryption
US20220230071A1 (en) Method and device for constructing decision tree
Kumarage et al. Secure data analytics for cloud-integrated internet of things applications
Clare et al. Combining distribution‐based neural networks to predict weather forecast probabilities
Liu et al. Keep your data locally: Federated-learning-based data privacy preservation in edge computing
US11558403B2 (en) Quantum computing machine learning for security threats
Weerapanpisit et al. A decentralized location-based reputation management system in the IoT using blockchain
CN113822315A (en) Attribute graph processing method and device, electronic equipment and readable storage medium
WO2021259366A1 (en) Federated doubly stochastic kernel learning on vertical partitioned data
CN113361962A (en) Method and device for identifying enterprise risk based on block chain network
CN116579775B (en) Commodity transaction data management system and method
JP2023509589A (en) Privacy Preserving Machine Learning via Gradient Boosting
CN116633571A (en) Privacy information protection method and device based on homomorphic encryption and unsupervised feature selection
Akter et al. Edge intelligence-based privacy protection framework for iot-based smart healthcare systems
Liu et al. Privacy and integrity protection for IoT multimodal data using machine learning and blockchain
Pang et al. ADI: Adversarial Dominating Inputs in Vertical Federated Learning Systems
US12015691B2 (en) Security as a service for machine learning
CN109728958A (en) A kind of network node trusts prediction technique, device, equipment and medium
SM et al. Improving security with federated learning
Xu et al. FedG2L: a privacy-preserving federated learning scheme base on “G2L” against poisoning attack
CN115204888A (en) Target account identification method and device, storage medium and electronic equipment
CN114330758B (en) Data processing method, device and storage medium based on federal learning
Brandão et al. Efficient privacy preserving distributed k-means for non-iid data
Saravanan et al. Big Data analytics for privacy through ND-homomorphic encryption

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination