CN105825141A - Database Chinese name desensitization method based on complementary mapping - Google Patents

Database Chinese name desensitization method based on complementary mapping Download PDF

Info

Publication number
CN105825141A
CN105825141A CN201610072405.5A CN201610072405A CN105825141A CN 105825141 A CN105825141 A CN 105825141A CN 201610072405 A CN201610072405 A CN 201610072405A CN 105825141 A CN105825141 A CN 105825141A
Authority
CN
China
Prior art keywords
chinese
database
name
code
desensitization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610072405.5A
Other languages
Chinese (zh)
Inventor
罗建峰
袁玉波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jianqing Information Technology Co Ltd
Original Assignee
Shanghai Jianqing Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jianqing Information Technology Co Ltd filed Critical Shanghai Jianqing Information Technology Co Ltd
Priority to CN201610072405.5A priority Critical patent/CN105825141A/en
Publication of CN105825141A publication Critical patent/CN105825141A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a new database Chinese name desensitization method based on complementary mapping. A general database Chinese name processing method is to directly remove the name item or replace Chinese names with messy codes, which leads to serious information loss of a database. The database Chinese name desensitization method based on complementary mapping has the characteristic that data uniqueness and identifiability are effectively guaranteed, and the database is free from information loss during processing. The database Chinese name desensitization method based on complementary mapping comprises the following steps: breaking the Chinese names in the database into single Chinese characters, then coding the Chinese characters to obtain coded data, adopting the two-step elementary transformation method to scramble the coding order, finally adopting complementary mapping to obtain desensitized codes, and combining to obtain all desensitization results of the Chinese names. Lots of database experiments testify that the database Chinese name desensitization method based on complementary mapping is very effective and the technical requirements for nondestructive database desensitization can be met.

Description

A kind of data base's Chinese name desensitization method mapped based on complementation
Technical field
Present invention is mainly used for database privacy protection, be embodied in a kind of data base's Chinese name conversion method relating to the concepts such as encoding of chinese characters, data disorder and complementary mapping.
Background technology
Name desensitization method is the major issue of method for secret protection research.In the epoch of this information explosion, secret protection has become the technology barriers of big market demand, and how protecting the privacy information in data base is the technical barrier urgently captured.Privacy refers to the personal information being reluctant to be known by other people.The individual affair unrelated with public interest including inherent thought, external life style, health, family relation and the background of individual, living environment and space etc. and situation.On April 1st, 2013,Country" information security technology, the public and commercial service information system personal information protection guide " of the Ministry of Industry and Information Technology's establishment is formally implemented.Personal information is clearly divided into individual's general information and personal sensitive information by guide;Requirement simultaneously, processes personal information and should have specific, clear and definite and rational purpose, it should obtain the agreement of personal information main body in the case of personal information main body is known the inside story.Process for individual's general information can be set up on the basis of silent approvement is agreed to, as long as personal information main body does not explicitly indicate that opposition, just can collect and utilize;But for personal sensitive information, then need to set up on the basis of expressing consent, before collecting and utilizing, it is necessary to first obtain the mandate that personal information main body is clear and definite.In these personal sensitive informations, name be one important and enjoy the information of user or public attention.From the perspective of China's 5,000-year and down history, name is one of culture important way of holding of arteries and veins, it it is the society & culture's mark with blood vessels succession as foundation, it is people's requisite symbols in social relations, is that individual is necessary for information representation, the instrument that exchanges and propagate in society & culture exchanges.At big data fields, the personal sensitive information related to, often more than million, the most tens million of even several hundred million, will be obtained the agreement of these individualities, adds up and use, be can not thing.Therefore name has desensitized the important technical problem into database privacy protection.
Name encoding of chinese characters is the important technology of name desensitization.The current method of Chinese character coding is the most, such as region-position code, ISN, outer code and ASCII character etc..This patent selects 1981Country" Chinese Character Set Code for Informati baseset " (the abbreviation Chinese Character standard exchange code) that the Bureau of Standards announces.This set Chinese Character standard exchange code is divided into two-stage, 3755 words of one-level, two grades of 3008 words, totally 6763 Chinese characters.This Chinese Character standard exchange code is the internal code of computer, can be that the design of various input-output equipment provides unified standard, makes the exchange of the information between various system have corporate identity, so that the shared of information resources is ensured.For the name information in big data desensitizes, the efficiency of desensitization is the key factor that must take into, and therefore should not use excessively complicated coding techniques.Different from the coding techniques of those complexity, the major advantage of Chinese Character standard exchange code is that and uses the most efficiently.
Data disorder is the requisite step of name information desensitization.Data disorder is a common technology of information desensitization, its objective is that data are replaced as reader is difficult to the data of its original regularity of distribution, keeps the size of data, scale not to change simultaneously.
Complement code mapping is the safeguards technique of name information desensitization.Complement code thought refers to they sums always constant based on complementary conservation principle, the complementation of two amounts.Corresponding four region-position codes of each Chinese character in this patent, therefore we specify true form and complement code sum to be permanent several 9999.
Summary of the invention
It is an object of the invention to propose a kind of data base's Chinese name desensitization method mapped based on complementation, exist for purpose, to ensure that data validity is as principle reducing the information of Chinese name in data base.Meanwhile, application claims desensitization method is reversible, and i.e. wanting can be from desensitization storehouse reduction raw data base.The method of invention, whole desensitization is automatically performed by computer completely, and user has only to input raw data base, so that it may allows computer automatically desensitize data base's Chinese name, finally gives the data base after desensitization.
Technical scheme is as follows:
Step 1, the Chinese character decomposition to Chinese name, split name, N={x according to byte1, x2, x3..., xk};
Step 2, encodes Chinese character, and this patent usesCountryStandard Chinese character code, ui=c (xi), i=1,2 ..., k.
Step 3, uses elementary transformation matrix scramble v in two steps to each Chinese character correspondence codei=l (ui), i=1,2 ..., k;
Step 4, obtains complement code by complementary for the encoding of chinese characters after scramble mapping, and complementation is mapped as:
Ei=F (vi)=9999-vi, i=1,2 ..., k, such as: F (8021)=9999-8021=1978;
Step 5, desensitize name data E=E by complement code combination producing1E2…Ek
Accompanying drawing explanation
Reader is in referenceAccompanying drawingAfter having read the detailed description of the invention of the present invention, it will more clearly understand various aspects of the invention.DescriptionAccompanying drawingBe the desensitization result of 1000 data, before three be classified as initial data, after three be classified as the data after desensitization.
Fig. 1ExtremelyFigure 19It it is all the application example of method inventionFigureWe have selected 1000 data object as secret protection from data base; first is classified as the Chinese name in data base, is sensitive information attribute, in order to protect privacy; name is concealed with " certain " or " so-and-so "; only leave surname, after 2 to 4 leus time be " sex " " age " " date of birth " respectively, the name code after the 5th row correspondence desensitization; 6 to 8 leus next are " sex " " age " " date of birth " respectively, fromFig. 1ExtremelyFigure 19It can be seen that be difficult to identify that personal information after Tuo Min, reach the purpose of data desensitization.
Detailed description of the invention
Step 1, first extracts name field from the data-base recording of input, the name of corresponding field is carried out Chinese character decomposition, obtains individual Chinese character, such as " Gong Sunjuyun "={ " public ", " grandson ", " gathering ", " cloud " }.
Step 2, provide the exclusive identification code of each Chinese character, example such as 2511=c (" public "), 4379=c (" grandson "), 3059=c (" gathering "), 5238=c (" cloud "), during being embodied as, if encountering the rare Chinese character not having in current code table storehouse, automatically raising the price, i.e. in existing code storehouse, maximum code is the correspondence code of this rare Chinese character plus 1.
Step 3, the Chinese character correspondence code scramble that will have encoded, during enforcement, scrambled fashion is as follows: such as Chinese character " grandson " correspondence is encoded to 4379, existing by 4379 scrambles, regards 4 dimensional vectors as by 4379, completing scramble with quadravalence elementary matrix, this patent uses: 1) to a point exchange: ( 7 9 4 3 ) = ( 4 3 7 9 ) 0 0 1 0 0 0 0 1 1 0 0 0 0 1 0 0 ,
2) one or two exchange: ( 9 7 4 3 ) = ( 7 9 4 3 ) 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 1 ,
Result is: 9743=l (4379) is scramble code.
Step 4, only need directly deducting scramble code with 9999 can generate the complement code of individual Chinese character when application.
Such as: 0256=F (9743)=9999-9743.
Step 5, when being embodied as, group code does not change order, uses and directly combines.
Such as: by implementing step above: " public "--> 8874, " grandson "--> 0256, " gathering "--> 0469, " cloud "--> 1647, then " Gong Sunjuyun " corresponding desensitization data are: 8874025604691647.

Claims (3)

1. the data base's Chinese name desensitization method mapped based on complementation, it is characterised in that:
When desensitizing data base's Chinese name, it is desirable to protect following method step, its step is specific as follows:
Step 1, the Chinese character decomposition to Chinese name, split name, N={x according to byte1, x2, x3..., xk);
Step 2, encodes Chinese character, and this patent uses Chinese characters of the national standard code, ui=c (xi), i=1,2 ..., k. example such as:
Gallbladder: 2108;Bullet: 2115;Egg: 2116;
Step 3, uses elementary transformation matrix scramble v in two steps to each Chinese character correspondence codei=l (ui), i=1,2 ..., k;
Step 4, obtains complement code by complementary for the encoding of chinese characters after scramble mapping, and complementation is mapped as:
Ei=F (vi)=9999-vi, i=1,2 ..., k, such as: F (8021)=9999-8021=1978;
Step 5, desensitize name data E=E by complement code combination producing1E2...Ek
2. according to the data base's Chinese name desensitization method mapped based on complementation, the alternative approach of Patent right requirement protection scramble code, specific as follows:
Claim and to the method for scramble code in step 3 be:
vi=ui* P*Q, i=1,2 ..., k
Employing elementary matrix P is for the first time:
Second time employing elementary matrix Q is:
Example is such as:
2108 scrambles are 8021;2115 scrambles are 5121;2116 scrambles are 6121.
3. according to the data base's Chinese name desensitization method mapped based on complementation, the generation method of Patent right requirement protection mutual-complementing code, specific as follows:
Ei=F (vi)=9999-vi, i=1,2 ..., k;
I.e. require EiAnd viComplementation, Ei+vi=9999.
CN201610072405.5A 2016-02-02 2016-02-02 Database Chinese name desensitization method based on complementary mapping Pending CN105825141A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610072405.5A CN105825141A (en) 2016-02-02 2016-02-02 Database Chinese name desensitization method based on complementary mapping

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610072405.5A CN105825141A (en) 2016-02-02 2016-02-02 Database Chinese name desensitization method based on complementary mapping

Publications (1)

Publication Number Publication Date
CN105825141A true CN105825141A (en) 2016-08-03

Family

ID=56987017

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610072405.5A Pending CN105825141A (en) 2016-02-02 2016-02-02 Database Chinese name desensitization method based on complementary mapping

Country Status (1)

Country Link
CN (1) CN105825141A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051357A (en) * 2021-03-08 2021-06-29 中国地质大学(武汉) Vector map optimization local desensitization method based on game theory

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1353354A (en) * 2001-12-19 2002-06-12 钟林 Method for encrypting digitalized Chinese-character information
CN101000604A (en) * 2007-01-16 2007-07-18 北京方正国际软件系统有限公司 Literal encipher method and system based on logical character
CN101551711A (en) * 2009-05-21 2009-10-07 华南理工大学 Chinese character coding input method based on structure and primitive
CN103049096A (en) * 2012-12-13 2013-04-17 刘陶 Method for achieving random coding of words, terms and sentences by displacing word code list of three kinds of Chinese character messages

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1353354A (en) * 2001-12-19 2002-06-12 钟林 Method for encrypting digitalized Chinese-character information
CN101000604A (en) * 2007-01-16 2007-07-18 北京方正国际软件系统有限公司 Literal encipher method and system based on logical character
CN101551711A (en) * 2009-05-21 2009-10-07 华南理工大学 Chinese character coding input method based on structure and primitive
CN103049096A (en) * 2012-12-13 2013-04-17 刘陶 Method for achieving random coding of words, terms and sentences by displacing word code list of three kinds of Chinese character messages

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨义先等: "《应用密码学(第2版)》", 30 June 2013 *
熊婷等: "《大学计算机应用基础教程》", 31 December 2015 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051357A (en) * 2021-03-08 2021-06-29 中国地质大学(武汉) Vector map optimization local desensitization method based on game theory

Similar Documents

Publication Publication Date Title
CN103279499B (en) Privacy of user guard method in personalized information retrieval
CN106161006B (en) Digital encryption algorithm
CN107301350B (en) Data processing method and system
KR101704702B1 (en) Tagging based personal data de-identification system and de-identification method of personal data
CN103279697B (en) Based on details in fingerprint Information hiding and the restoration methods of orthogonal matrix and modular arithmetic
McAteer et al. Integration of biometrics and steganography: a comprehensive review
CN1336051A (en) Method and system for the application of a safety marking
CN103838753B (en) A kind of storage of redemption code, verification method and device
Abduljaleel et al. A lightweight hybrid scheme for hiding text messages in colour images using LSB, Lah transform and Chaotic techniques
Kamal et al. Facilitating and securing offline e‐medicine service through image steganography
CN106375083A (en) Encryption-decryption method based on Base64 and device thereof
CN106778520A (en) A kind of fuzzy safety box encryption method of finger vena
CN105825141A (en) Database Chinese name desensitization method based on complementary mapping
CN112084531B (en) Data sensitivity grading method, device, equipment and storage medium
Ade-Ibijola Synthesis of social media profiles using a probabilistic context-free grammar
CN104346547A (en) Intelligent identity identification system
CN116055067B (en) Weak password detection method, device, electronic equipment and medium
CN115860768B (en) Source tracing method and device based on blockchain and electronic equipment thereof
CN108134799B (en) Novel coding and decoding method and device thereof
CN113378226A (en) Biological data processing method, device, equipment and computer readable storage medium
Andreasyan et al. Security Issues of Scientific based Big Data Circulation Analysis.
CN210836203U (en) Tracing anti-counterfeiting application system based on block chain, RFID and NFC
Ge et al. High-Capacity Reversible Data Hiding in Encrypted Images Based on 2D-HS Chaotic System and Full Bit-Plane Searching
Gleni et al. DNA Smart Card for Financial Transactions
CN104580234A (en) Protection method of behavior characteristics in social network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
CB02 Change of applicant information

Address after: 201203 Pudong New Area Zhang Heng Road, Shanghai, Lane 1000, Lane 22

Applicant after: Shanghai Jianqing Information Technology Co., Ltd.

Address before: 200232 Xuhui District Shilong Road, Shanghai, room 581, No. 217

Applicant before: Shanghai Jianqing Information Technology Co., Ltd.

CB02 Change of applicant information
CB03 Change of inventor or designer information

Inventor after: Luo Jianfeng

Inventor before: Luo Jianfeng

Inventor before: Yuan Yubo

CB03 Change of inventor or designer information
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160803

RJ01 Rejection of invention patent application after publication