CN116776351A - Preserving format encryption method and system for personal information to resist statistical analysis attack - Google Patents

Preserving format encryption method and system for personal information to resist statistical analysis attack Download PDF

Info

Publication number
CN116776351A
CN116776351A CN202310742067.1A CN202310742067A CN116776351A CN 116776351 A CN116776351 A CN 116776351A CN 202310742067 A CN202310742067 A CN 202310742067A CN 116776351 A CN116776351 A CN 116776351A
Authority
CN
China
Prior art keywords
personal information
desensitization
statistical analysis
association
encryption
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310742067.1A
Other languages
Chinese (zh)
Inventor
龚丽
张瑞庆
何龙
班新博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Second Research Institute of CAAC
Original Assignee
Second Research Institute of CAAC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Second Research Institute of CAAC filed Critical Second Research Institute of CAAC
Priority to CN202310742067.1A priority Critical patent/CN116776351A/en
Publication of CN116776351A publication Critical patent/CN116776351A/en
Pending legal-status Critical Current

Links

Landscapes

  • Storage Device Security (AREA)

Abstract

The invention discloses a reserved format encryption method and a reserved format encryption system for personal information to resist statistical analysis attack, and relates to the technical field of information security. Comprising the following steps: s1, building a table: acquiring personal information and establishing a personal information association table; s2, calibrating: marking a personal information relevance core field in a relevance table; s3, desensitizing: the association core field is reserved for format encryption desensitization; s4, recovering: and reserving format encryption and recovering for the association core field after desensitization. The invention combines the fields which can be deduced mutually in the personal information and regards all the personal information fields as a whole to carry out the encryption and desensitization of the reserved format, thereby effectively realizing the consistency and the legality of the format and logic before and after the data encryption, further reducing the possibility of being attacked by statistical analysis and violent attack, and simultaneously improving the desensitization efficiency of the personal information data set.

Description

Preserving format encryption method and system for personal information to resist statistical analysis attack
Technical Field
The invention relates to the technical field of information security, in particular to a reserved format encryption method and a reserved format encryption system for personal information to resist statistical analysis attack.
Background
Along with the development and transition of the data age, various industries put forward development targets of digital transformation and deep fusion with new technologies such as big data and the like, and the flow speed and range of the data in the whole ecological system are accelerated. However, the faster and wider the data flow, the more serious is the security risk, especially the security risk faced by running data and personal information related to sensitive information. The "2020 China Internet network Security report" indicates that the monitoring discovers that the sensitive personal information event 107, such as an identity card number, a mobile phone number, a home address, an academic, work and the like, involves nearly 10 ten thousand pieces of non-desensitized personal information. In recent years, personal information leakage events are endless, and cause great loss and adverse effects to citizens and enterprises.
Currently, privacy protection research for personal information has advanced to some extent. For sensitive data such as name, gender, detailed address and the like, K-anonymity and L-diversity technology is often adopted to mask and replace sensitive fields; for birth date and event date, sensitive fields are often desensitized by generalization technology; for sensitive information such as identification numbers, telephone numbers, mailboxes, etc., a reserved format encryption (Format Preserving Encryption, FPE) is often employed for desensitization. The reserved format encryption algorithm has become one of the best strategies for desensitizing personal sensitive information because of the reserved format uniformity of the desensitized data before and after the data.
However, existing schemes are ubiquitous in statistical recovery vulnerabilities, mainly in two ways: on the one hand, after an attacker obtains a large amount of desensitized personal information, the coping relationship of partial ciphertext and plaintext can be statistically deduced; on the other hand, most of the existing desensitization schemes only consider a single field, and do not consider the relevance between different fields, so that an attacker can deduce partial plaintext by analyzing the relevance between different fields of the desensitized data, and the risk of revealing the privacy information is increased. Gong et al carried out desensitization data risk assessment on 80 tens of thousands of people in our country, and the result shows that 19.58% of desensitized personal information can be re-identified in combination with background knowledge under the strategy of a limited data set. How to improve the resistance of desensitized data to statistical analysis attacks is one of the pain and difficulty problems of data desensitization.
Therefore, a method and a system for encrypting the personal information in a reserved format for resisting the statistical analysis attack are provided to solve the difficulty existing in the prior art, which are the problems to be solved by the person skilled in the art.
Disclosure of Invention
In view of the above, the invention provides a reserved format encryption method and a reserved format encryption system for resisting statistical analysis attack of personal information, which consider the relevance among different fields and improve the capability of resisting the statistical analysis attack of desensitized data.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a method for encrypting a reserved format of personal information against statistical analysis attack comprises the following steps:
s1, building a table: acquiring personal information and establishing a personal information association table;
s2, calibrating: marking a personal information relevance core field in a relevance table;
s3, desensitizing: the association core field is reserved for format encryption desensitization;
s4, recovering: and reserving format encryption and recovering for the association core field after desensitization.
The method, optionally, the step of S1 tabulating specifically includes: and a personal information association table established according to the 18 personal information direct identifiers and the 15 personal information quasi identifiers comprises 4 groups of personal information association relations.
The method, optionally, the step of S2 calibration specifically includes: and marking the field with the largest occurrence number in the personal information association table as a personal information association core field.
The above method, optionally, the step of S3 desensitizing specifically includes the steps of:
s31, desensitizing a personal information relevance core field;
s32, desensitizing other fields in the personal information association table.
The above method, optionally, the step of S4 recovery specifically includes:
s41, recovering the personal information relevance core field;
s42, recovering other fields in the personal information association table.
A preserving format encryption system for resisting personal information statistical analysis attack, which applies any one of the preserving format encryption methods for resisting personal information statistical analysis attack, comprises: the system comprises a meter building module, a calibration module, a desensitization module and a recovery module;
the table building module is connected with the input end of the calibration module and used for acquiring personal information and building a personal information association table;
the calibration module is connected with the input end of the desensitization module and is used for calibrating the personal information relevance core field in the relevance table;
the desensitization module is connected with the input end of the recovery module and is used for encrypting and desensitizing the association core field reservation format;
and the recovery module is used for reserving format encryption and recovering the desensitized association core field.
Compared with the prior art, the invention provides the reserved format encryption method and the reserved format encryption system for resisting the statistical analysis attack of the personal information, which have the following beneficial effects: calibrating the personal information relevance core field by establishing a personal information relevance table, and carrying out subsection division and SM 4-based reserved format encryption desensitization on the personal information relevance core field; and then desensitizing other fields in the personal information association table according to the desensitization result of the core field. The fields which can be deduced mutually in the personal information are combed, all the personal information fields are regarded as a whole to carry out reserved format encryption desensitization, the consistency and legality of formats and logics before and after data encryption are effectively realized, the possibility of being attacked by statistical analysis and violent attack is further reduced, and meanwhile, the desensitization efficiency of the personal information data set is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for encrypting a reserved format for personal information against statistical analysis attack;
FIG. 2 is a diagram showing the intent of associating personal information provided by an embodiment of the present invention;
fig. 3 is a block diagram of a reserved format encryption system for resisting a statistical analysis attack of personal information according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the invention discloses a reserved format encryption method for resisting statistical analysis attack of personal information, which comprises the following steps:
s1, building a table: acquiring personal information and establishing a personal information association table;
s2, calibrating: marking a personal information relevance core field in a relevance table;
s3, desensitizing: the association core field is reserved for format encryption desensitization;
s4, recovering: and reserving format encryption and recovering for the association core field after desensitization.
Further, the step of S1 table construction specifically includes: and a personal information association table established according to the 18 personal information direct identifiers and the 15 personal information quasi identifiers comprises 4 groups of personal information association relations.
Specifically, according to the latest 18 personal information direct identifiers and 15 personal information quasi identifiers published in GB/T42460-2023 personal information de-identification effect evaluation guide, the personal information association table is arranged.
Further, the step of S2 calibration specifically includes: and marking the field with the largest occurrence number in the personal information association table as a personal information association core field.
Further, the step of S3 desensitizing specifically includes the steps of:
s31, desensitizing a personal information relevance core field;
s32, desensitizing other fields in the personal information association table.
Specifically, sub-segment division and SM 4-based reserved format encryption desensitization are carried out on the personal information relevance core field; and then desensitizing other fields in the personal information association table according to the desensitization result of the core field.
Further, the step of S4 recovery specifically includes:
s41, recovering the personal information relevance core field;
s42, recovering other fields in the personal information association table.
In one embodiment, the following is specific:
referring to fig. 2, the personal information association representing intention provided by the invention is a personal information association table formulated according to 18 personal information direct identifiers and 15 personal information quasi identifiers, wherein the personal information association table comprises 4 groups of personal information association relations.
A first group: the 1 st to 6 th digits of the identification card number represent administrative division codes, and personal information such as telephone numbers, detailed addresses, postal codes and the like can directly or indirectly reflect administrative division information;
second group: the 7 th to 14 th positions of the identification card number represent the birth date, the age and other personal information associated with the identification card number;
third group: the 17 th bit of the identification card number represents gender and is related to personal information such as gender;
fourth group: the 18 th bit of the ID card number is a check code and is generated according to the first 17 bit body codes.
And (II) marking the field with the largest occurrence number in the personal information association table as a personal information association core field. As shown in FIG. 2, the number of occurrences of the identification card number is the greatest in the 4 sets of personal information association relationships, so the identification card number is marked as the personal information association core field.
(III) desensitizing personal information relevance core field
The core field P is divided into m sub-segments according to its characteristics, p=p 1 ||P 2 ||...||P m The reserved format encryption desensitization is then performed according to algorithm 1. Trunk (SM 4) in algorithm 1 K (R i-1 ),|L i-1 I) represents a truncated function, SM4 K (R i-1 ) Representing the R pair by using the key K and the national encryption algorithm SM4 i-1 Encryption is carried out, the result is converted into hexadecimal system and the highest L is intercepted i-1 The I bit. The identification card number shown in fig. 2 is composed of a 17-bit body code and a 1-bit check code, and the identification card number is sequentially from left to right: the 6-bit address code, the 4-bit birth year code, the 2-bit birth month code, the 2-bit birth date code, the 3-bit sequence code and the 1-bit check code are generated according to the first 17-bit body code, so that the last-bit check code does not participate in the subsection division, namely the subsections of the identification card number division are as follows: p (P) 1 =510107,P 2 =1990,P 3 =09,P 4 =09,P 5 =714,Y i Representing P i Is |Y i The i indicates the number in the value field. Establishing P 1 A comparison table of 2850 administrative division codes, i.e., |Y 1 I is 2850, Y 1 The value range of (2) is shown in accessory 1; y is Y 2 ={1900,...,2022},|Y 2 |=122;Y 3 ={1...,12},|Y 3 |=12;Y 4 ={1...,30},|Y 4 |=30;Y 5 ={100,...,999},|Y 5 |=900。
Desensitizing personal information relevance core field identification card number according to algorithm 1, setting SM4 key K=34569149873687, L 0 =P 1 ||P 2 ||P 3 =510107||1990||09,R 0 =P 4 ||P 5 =09||714, number of cycles r=4:
(1)i=1
L 1 =R 0 =09||714
(2)i=2
L 2 =R 1 =410581||1954||02
(3)i=3
L 3 =R 2 =03||215
(4)i=4
L 4 =R 3 =431230||2014||11
R 4 =L 3 =03||215
final ciphertext c= 43123020141103215.
Finally, the personal information relevance core field identity card number ciphertext C= 43123020141103215 is obtained, and the last check code of the identity card number is generated according to the following rule:
(1) 17 bit body code weighted summation:
wherein:
i represents the position serial number of the identity card number from left to right;
A i representing the number value of the identity card at the ith position;
W i the weight at the i-th position is shown in table 1.
TABLE 1 weighting of identification card numbers
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
W i 7 9 10 5 8 4 2 1 6 3 7 9 10 5 8 4 2
(2) And (3) calculating a last bit check code:
Y=S mod 11
the value range of Y is 0 to 10, and the check code corresponding to Y is shown in Table 2.
Table 2 check code for ID card number
Y 0 1 2 3 4 5 6 7 8 9 10
Check code 1 0 X 9 8 7 6 5 4 3 2
The identity card number ciphertext C= 43123020141103215 after the personal information relevance core field is desensitized, and a check code is calculated: the weighted summation S of the first 17 bit body codes is calculated to be 182, then the weighted summation S is calculated to be 6 according to a formula 182 (mod 11), and the corresponding check code is 6, namely the identity card number after encryption desensitization of the complete reserved format is 431230201411032156.
Other fields in the personal information association table are desensitized. And taking the desensitization result of the personal information correlation core field identity card number as a reference, and desensitizing other fields in the personal information correlation table.
1 telephone number desensitization. The telephone number is subjected to reserved format encryption desensitization according to algorithm 1.
2 detailed address desensitization. The administrative area represented by the first six digits of the desensitized identification card number is the detailed address desensitization result. The first six digits of the desensitized identification card number are 431230 as shown in figure 2, namely, the detailed address desensitization result is Dong nationality county of Hunan Huai nationality channel of Hunan province.
3, the postal code is desensitized. The post code of the administrative area represented by the first six digits of the desensitized identification card number is the post code desensitization result. The first six digits of the desensitized identification card number shown in fig. 2 are 431230, i.e., the corresponding zip code desensitization result is 418500.
4 desensitization on birth date. The desensitized ID card numbers from 7 th to 14 th are the desensitization result of the birth date. The desensitized ID card numbers from 7 th to 14 th are 20141103 as shown in FIG. 2, namely the desensitization result of the birth date is 11 months 03 in 2014.
Desensitization at 5 years. Subtracting the 7 th to 14 th positions of the desensitized ID card number from the current time to obtain an age desensitization result. The number of the desensitized ID card from 7 th to 14 th is 20141103 as shown in FIG. 2, namely the age desensitization result is 9 years.
Sex desensitization of 6. If the 17 th position of the desensitized ID card number is even, the sex desensitization result is female, and if the 17 th position of the desensitized ID card number is odd, the sex desensitization result is male. The number 17 of the desensitized ID card is an odd number, and the sex desensitization result is "men" as shown in figure 2.
(IV) association field reserved format encryption recovery:
personal information relevance core field recovery
And (3) carrying out reserved format encryption recovery on the personal information relevance core field identity card number according to the algorithm 2. Ciphertext c= 43123020141103215 was divided into 5 parts: p (P) 1 =431230,P 2 =2014,P 3 =11,P 4 =03,P 5 =215. Input L 4 =431230||2014||11,R 4 =03||215, the keys K and r are consistent with the encryption setting.
(1)i=3
L 3 =R 4 =03||215
(2)i=2
L 2 =R 3 =410581||1954||02
(3)i=1
L 1 =R 2 =09||714
(4)
L 0 =R 1 =510107||1990||09
R 0 =L 1 =09||714
Finally recovering plaintext p=l 0 R 0 =51010719900909714
Other fields in the personal information association table are restored. And recovering other fields in the personal information association table by taking the personal information association core field identity card number recovery result as a reference.
And recovering the telephone number. And (5) carrying out reserved format encryption recovery on the telephone number according to the algorithm 2.
Other fields resume. The recovery process is similar to desensitizing other fields in the personal information association table and will not be described in detail here.
Algorithm 1: encryption algorithm description of SM 4-based reserved format encryption method:
algorithm 2: decryption algorithm description of SM 4-based reserved format encryption method:
corresponding to the method shown in fig. 1, the embodiment of the invention also provides a reserved format encryption system for resisting the statistical analysis attack of the personal information, which is used for realizing the method shown in fig. 1, and the structure schematic diagram is shown in fig. 3, and comprises the following steps: the system comprises a meter building module, a calibration module, a desensitization module and a recovery module;
the table building module is connected with the input end of the calibration module and used for acquiring personal information and building a personal information association table;
the calibration module is connected with the input end of the desensitization module and is used for calibrating the personal information relevance core field in the relevance table;
the desensitization module is connected with the input end of the recovery module and is used for encrypting and desensitizing the association core field reservation format;
and the recovery module is used for reserving format encryption and recovering the desensitized association core field.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (6)

1. The reserved format encryption method for resisting the statistical analysis attack of the personal information is characterized by comprising the following steps of:
s1, building a table: acquiring personal information and establishing a personal information association table;
s2, calibrating: marking a personal information relevance core field in a relevance table;
s3, desensitizing: the association core field is reserved for format encryption desensitization;
s4, recovering: and reserving format encryption and recovering for the association core field after desensitization.
2. The method for preserving format encryption of personal information against statistical analysis attacks according to claim 1, wherein,
the step S1 of building the table specifically comprises the following steps: and a personal information association table established according to the 18 personal information direct identifiers and the 15 personal information quasi identifiers comprises 4 groups of personal information association relations.
3. The method for preserving format encryption of personal information against statistical analysis attacks according to claim 1, wherein,
the S2 calibration step specifically comprises the following steps: and marking the field with the largest occurrence number in the personal information association table as a personal information association core field.
4. The method for preserving format encryption of personal information against statistical analysis attacks according to claim 1, wherein,
the step S3 of desensitizing specifically comprises the following steps:
s31, desensitizing a personal information relevance core field;
s32, desensitizing other fields in the personal information association table.
5. The method for preserving format encryption of personal information against statistical analysis attacks according to claim 1, wherein,
the S4 recovery step specifically comprises the following steps:
s41, recovering the personal information relevance core field;
s42, recovering other fields in the personal information association table.
6. A reserved format encryption system for resisting a statistical analysis attack of personal information, characterized in that a reserved format encryption method for resisting a statistical analysis attack of personal information according to any one of claims 1 to 5 is applied, comprising: the system comprises a meter building module, a calibration module, a desensitization module and a recovery module;
the table building module is connected with the input end of the calibration module and used for acquiring personal information and building a personal information association table;
the calibration module is connected with the input end of the desensitization module and is used for calibrating the personal information relevance core field in the relevance table;
the desensitization module is connected with the input end of the recovery module and is used for encrypting and desensitizing the association core field reservation format;
and the recovery module is used for reserving format encryption and recovering the desensitized association core field.
CN202310742067.1A 2023-06-21 2023-06-21 Preserving format encryption method and system for personal information to resist statistical analysis attack Pending CN116776351A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310742067.1A CN116776351A (en) 2023-06-21 2023-06-21 Preserving format encryption method and system for personal information to resist statistical analysis attack

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310742067.1A CN116776351A (en) 2023-06-21 2023-06-21 Preserving format encryption method and system for personal information to resist statistical analysis attack

Publications (1)

Publication Number Publication Date
CN116776351A true CN116776351A (en) 2023-09-19

Family

ID=87990860

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310742067.1A Pending CN116776351A (en) 2023-06-21 2023-06-21 Preserving format encryption method and system for personal information to resist statistical analysis attack

Country Status (1)

Country Link
CN (1) CN116776351A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480549A (en) * 2017-06-28 2017-12-15 银江股份有限公司 A kind of shared sensitive information desensitization method of data-oriented and system
CN112231747A (en) * 2020-09-25 2021-01-15 中国建设银行股份有限公司 Data desensitization method, data desensitization apparatus, and computer readable medium
CN114626092A (en) * 2022-03-10 2022-06-14 上海上讯信息技术股份有限公司 Desensitization method, system, device and computer storage medium for multi-field data with incidence relation
CN115422584A (en) * 2022-08-31 2022-12-02 中国工商银行股份有限公司 Data deformation method and device
CN116032464A (en) * 2022-11-01 2023-04-28 江西科技师范大学 Property data encryption system based on quantum communication

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480549A (en) * 2017-06-28 2017-12-15 银江股份有限公司 A kind of shared sensitive information desensitization method of data-oriented and system
CN112231747A (en) * 2020-09-25 2021-01-15 中国建设银行股份有限公司 Data desensitization method, data desensitization apparatus, and computer readable medium
CN114626092A (en) * 2022-03-10 2022-06-14 上海上讯信息技术股份有限公司 Desensitization method, system, device and computer storage medium for multi-field data with incidence relation
CN115422584A (en) * 2022-08-31 2022-12-02 中国工商银行股份有限公司 Data deformation method and device
CN116032464A (en) * 2022-11-01 2023-04-28 江西科技师范大学 Property data encryption system based on quantum communication

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张玉磊;骆广萍;张永洁;张雪微;刘祥震;王彩芬;: "基于格式保留的敏感信息加密方案", 计算机工程与科学, no. 02 *

Similar Documents

Publication Publication Date Title
Dubrova et al. Breaking a fifth-order masked implementation of crystals-kyber by copy-paste
Liu et al. Color image encryption algorithm based on DNA coding and double chaos system
Yuan et al. Gini-impurity index analysis
Xiao et al. A chaos-based hash function with both modification detection and localization capabilities
CN111125736A (en) Pathogenic gene detection method based on privacy protection intersection calculation protocol
CN106940777A (en) A kind of identity information method for secret protection measured based on sensitive information
Xiao et al. Parallel keyed hash function construction based on chaotic neural network
CN104601596B (en) Data-privacy guard method in a kind of Classification Data Mining system
Yu et al. Privacy-preserving data aggregation computing in cyber-physical social systems
Huang A more secure parallel keyed hash function based on chaotic neural network
CN107733655B (en) APUF security authentication method based on polynomial reconstruction
CN110263586A (en) A kind of hardware security appraisal procedure of chaos cipher system
CN115242371B (en) Differential privacy-protected set intersection and base number calculation method, device and system thereof
Duan et al. Differential power analysis attack and efficient countermeasures on PRESENT
Zhou et al. A lightweight cryptographic protocol with certificateless signature for the Internet of Things
CN117118617A (en) Distributed threshold encryption and decryption method based on mode component homomorphism
Zhang et al. An efficient differential fault attack against SIMON key schedule
CN104618098B (en) Cryptography building method and system that a kind of set member's relation judges
CN114374775A (en) Image encryption method based on Julia set and DNA coding
CN116776351A (en) Preserving format encryption method and system for personal information to resist statistical analysis attack
Yang et al. Improved privacy-preserving Bayesian network parameter learning on vertically partitioned data
Jamil et al. A new cryptographic hash function based on cellular automata rules 30, 134 and omega-flip network
Zhang et al. A differential fault attack on security vehicle system applied SIMON block cipher
Lustro et al. Performance analysis of enhanced SPECK algorithm
CN110222092A (en) A kind of multi-party statistical query method based on difference secret protection technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination