CN112541196B - Dynamic data desensitization method and system - Google Patents

Dynamic data desensitization method and system Download PDF

Info

Publication number
CN112541196B
CN112541196B CN202011535750.0A CN202011535750A CN112541196B CN 112541196 B CN112541196 B CN 112541196B CN 202011535750 A CN202011535750 A CN 202011535750A CN 112541196 B CN112541196 B CN 112541196B
Authority
CN
China
Prior art keywords
data
desensitization
user
access
accessed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011535750.0A
Other languages
Chinese (zh)
Other versions
CN112541196A (en
Inventor
柴森春
王昭洋
唐嘉
崔灵果
李慧芳
姚分喜
张百海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202011535750.0A priority Critical patent/CN112541196B/en
Publication of CN112541196A publication Critical patent/CN112541196A/en
Application granted granted Critical
Publication of CN112541196B publication Critical patent/CN112541196B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication

Abstract

The invention relates to a dynamic data desensitization method and a system. The dynamic data desensitization method provided by the invention is used for desensitizing different types of data in real time according to the data access requirement and the identity of an accessor. The dynamic desensitization system provided by the invention is different from the previous desensitization system in that different desensitization models are set for different data types, and meanwhile visitor information is specifically divided, so that the level judgment can be carried out according to the keys and the identities of different visitors, different desensitization degrees are set according to different access levels, and further the access authority control of different visitors on sensitive data is realized, and the purpose of improving the security of the sensitive data is achieved.

Description

Dynamic data desensitization method and system
Technical Field
The invention relates to the technical field of data processing, in particular to a dynamic data desensitization method and a dynamic data desensitization system.
Background
With the vigorous development of informatization construction, most of paper data are stored in a digitalized manner, so that for an enterprise, a large amount of sensitive information and data can be generated along with the continuous accumulation of internal databases. The MES system emphasizes the problem that the information of various industries can be interconnected and communicated. Therefore, sensitive information inside the enterprise is inevitably involved in this process. And these data, throughout the daily operation of each enterprise, if the sensitive data has the problems of losing, improper using, unauthorized being touched or modified by people, etc., it will cause huge loss to the enterprise.
Enterprises have sensitive data including business secrets, intellectual property, key business information, business partner information or user information and the like, and once leakage and damage occur, the enterprises can not only bring great economic loss, but also cause great negative effects on the image of the enterprises, and the enterprises are all harmed but not beneficial.
Data desensitization is a technique for processing sensitive information in data by replacing the sensitive information in the data or deforming the sensitive information in the data, and is characterized in that the processed data looks real, but does not expose any sensitive information, and has no use value for people who want to abuse the data.
Data desensitization can be divided into two categories in total, one is static desensitization and one is dynamic desensitization. The static data desensitization is a traditional data desensitization mode, the system needs to export data from an original database at one time, desensitization operation is carried out on the data, and desensitized data are obtained, can be exported as a database file, and can also be stored in a mirror image library for test development or external release. In dynamic data desensitization, the system does not store desensitized data, but performs desensitization operation on the data in real time according to data access requirements and the identity of an accessor.
Although the static data desensitization has good desensitization effect and can be used for test development, the static data desensitization has a plurality of defects. With the arrival of the big data era and the rapid increase of data volume, people use stream computing to process increasing data, and the traditional static data desensitization can not well meet the application test development requirements. And in a production environment, as the data volume increases, the difficulty of maintaining the mirror desensitization database is increased.
The dynamic data desensitization is mainly proposed to solve the problems that static data desensitization cannot adapt to the growth change of data well and the update is slow, desensitization rules and desensitization strategies can be set for different data types, and different desensitization granularities can be set according to different visitor identities to realize access authority control on sensitive data. However, the dynamic desensitization granularity in the prior art is the same, which is not beneficial to distinguishing visitors for access.
Therefore, providing a novel dynamic desensitization method or system to improve the security of sensitive information of an enterprise is a technical problem to be solved in the field.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides an MES-oriented dynamic data desensitization method and system for improving the security of sensitive information of enterprises.
In order to achieve the purpose, the invention provides the following scheme:
a method of dynamic data desensitization, comprising:
acquiring identity information of an access user and a data type of accessed data; the identity information comprises an access key; the data types include: non-numerical data, time data, numerical data and text data;
determining the access level of the access user according to the identity information; the access levels include: a client user, a low-level internal user, and a high-level internal user;
calling a desensitization model according to the access level and the data type of the accessed data; the desensitization model includes: a client user desensitization model, a low-level internal user desensitization model, and an advanced internal user desensitization model;
and performing desensitization operation on the data accessed by the access user by adopting the desensitization model.
Preferably, the desensitizing operation performed on the data accessed by the access user by using the desensitizing model specifically includes:
when the access level of the access user is a client user, desensitizing operation is carried out on the data accessed by the access user by adopting a client user desensitizing model;
when the access level of the access user is a low-level internal user, performing desensitization operation on data accessed by the access user by adopting a low-level internal user desensitization model;
and when the access level of the access user is a high-level internal user, performing desensitization operation on the data accessed by the access user by adopting a high-level internal user desensitization model.
Preferably, when the access level of the access user is a client user, performing desensitization operation on the data accessed by the access user by using a client user desensitization model specifically includes:
when the data type of the data accessed by the client user is non-numerical data, numerical data or text data, desensitizing the accessed data by adopting a suppression processing method;
and when the data type of the data accessed by the client user is time-class data, performing desensitization operation on the accessed data by adopting a generalization operation method.
Preferably, when the data type of the data accessed by the client user is non-numerical data, or text data, the desensitization operation is performed on the accessed data by using a suppression processing method, which specifically includes:
when the data type of the data accessed by the client user is non-numerical data, numerical data or text data, replacing the data at a specific position in the non-numerical data, the numerical data or the text data by a special symbol; the special symbols include: "", "%" or "#".
Preferably, when the data type of the data accessed by the client user is time-class data, a generalization operation method is adopted to perform desensitization operation on the accessed data, which specifically includes:
when the data type of the data accessed by the client user is time class data, discarding the data at a specific position in the time class data.
Preferably, when the access level of the access user is a low-level internal user, performing desensitization operation on the data accessed by the access user by using a low-level internal user desensitization model specifically includes:
when the data type of the data accessed by the low-level internal user is non-numerical data, desensitizing the non-numerical data by adopting a normalization method;
when the data type of the data accessed by the low-level internal user is time-class data, desensitizing the time-class data by adopting a noise adding method;
when the data type of the data accessed by the low-level internal user is numerical class data, the formula X '= (X.S')% 10 is adopted N Or X' = [ X.f (S)]%10 N For the numerical value class numberCarrying out desensitization operation; wherein, X represents the original data, S 'represents the numerical value of the key,% represents the remainder operation, X' represents the desensitized data, f (S) represents the key mapping function, and N represents the length of the original data;
when the data type of the data accessed by the low-level internal user is text data, desensitizing the text data by adopting a text replacement mode.
Preferably, when the access level of the access user is a high-level internal user, performing desensitization operation on data accessed by the access user by using a high-level internal user desensitization model specifically includes:
when the data type of the data accessed by the high-level internal user is non-numerical data, desensitizing the non-numerical data by adopting a normalization method;
when the data type of the data accessed by the high-level internal user is time-class data, performing desensitization operation on the time-class data by adopting an encryption algorithm;
when the data type of the data accessed by the high-level internal user is numerical class data, desensitizing the numerical class data by adopting a formula X ' = X.S ' or X ' = X.f (S); wherein X represents original data, S 'represents the numerical value of the key, X' represents the data after desensitization, f (S) represents a key mapping function;
and when the data type of the data accessed by the high-level internal user is text data, performing desensitization operation on the text data in a text replacement mode.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
according to the dynamic data desensitization method provided by the invention, desensitization operation is carried out on different types of data in real time according to the data access requirement and the identity of an accessor. The dynamic desensitization system provided by the invention is different from the previous desensitization system in that different desensitization models are set for different data types, and meanwhile visitor information is specifically divided, so that the level judgment can be carried out according to the keys and the identities of different visitors, different desensitization degrees are set according to different access levels, and further the access authority control of different visitors on sensitive data is realized, and the purpose of improving the security of the sensitive data is achieved.
Corresponding to the dynamic data desensitization method, the invention also provides two dynamic data desensitization systems.
A dynamic data desensitization system, comprising:
the acquisition module is used for acquiring the identity information of the access user and the data type of the accessed data; the identity information comprises an access key; the data types include: non-numerical data, time data, numerical data and text data;
the access level determining module is used for determining the access level of the access user according to the identity information; the access levels include: client users, low-level internal users, and high-level internal users;
the desensitization model calling module is used for calling a desensitization model according to the access level and the data type of the accessed data; the desensitization model includes: a client user desensitization model, a low-level internal user desensitization model, and a high-level internal user desensitization model;
and the desensitization operation module is used for performing desensitization operation on the data accessed by the access user by adopting the desensitization model.
Another dynamic data desensitization system, comprising:
the data source interface module is used for acquiring data to be desensitized inside an enterprise;
the sensitive information classification module is connected with the data source interface module and is used for carrying out data classification on the data to be desensitized to obtain the classified data to be desensitized; the classified data to be desensitized comprises: non-numerical data, time data, numerical data and text data;
the information anomaly analysis module is connected with the sensitive information classification module and is used for eliminating the anomalous data in the classified data to be desensitized;
the user interface module is used for acquiring the identity information of the access user; the identity information comprises an access key;
the key matching module is connected with the user interface module and used for determining the access level of the access user according to the identity information; the access levels include: a client user, a low-level internal user, and a high-level internal user;
and the sensitive data desensitization module is respectively connected with the key matching module and the information anomaly analysis module and is used for calling different desensitization models according to the access levels so as to perform desensitization operation on the classified data to be desensitized after abnormal data are removed.
The technical effects and purposes of the dynamic data desensitization system provided by the invention are the same as those of the dynamic data desensitization method provided by the invention, and therefore, the details are not repeated.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of a method of dynamic data desensitization provided by the present invention;
FIG. 2 is a schematic structural diagram of a first dynamic data desensitization system provided by the present invention;
FIG. 3 is a schematic diagram of a second dynamic data desensitization system provided by the present invention;
FIG. 4 is a block diagram of a process for performing data desensitization by the dynamic data desensitization system in an embodiment of the present invention;
fig. 5 is a schematic diagram of a 36-ary conversion process in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a dynamic data desensitization method and a dynamic data desensitization system to improve the security of sensitive information of an enterprise.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Fig. 1 is a flowchart of a dynamic data desensitization method provided by the present invention, and as shown in fig. 1, a dynamic data desensitization method includes:
step 100: identity information of the accessing user and the data type of the accessed data are obtained. The identity information includes an access key. The data types include: non-numeric class data, temporal class data, numeric class data, and textual class data.
Step 110: and determining the access level of the access user according to the identity information. The access levels include: a client user, a low-level internal user, and a high-level internal user.
Step 120: the desensitization model is invoked according to the access level and the data type of the accessed data. The desensitization model included: a client user desensitization model, a low-level internal user desensitization model, and an advanced internal user desensitization model.
Step 130: and carrying out desensitization operation on the data accessed by the access user by adopting a desensitization model. The method specifically comprises the following steps:
step 1301: when the access level of the access user is a client user, performing desensitization operation on data accessed by the access user by using a client user desensitization model, which specifically comprises the following steps:
when the data type of the data accessed by the client user is non-numerical data, numerical data or text data, desensitizing the accessed data by adopting a suppression processing method, specifically:
when the data type of the data accessed by the client user is non-numerical data, numerical data or text data, replacing the data at a specific position in the non-numerical data, the numerical data or the text data with a special symbol. The special symbols include: "", "%" or "#". The data is protected by replacing the real values or parts of the real values with special symbols, e.g. replacing some special values (or text) with "+", so that the real data cannot be seen, wherein the data length remains unchanged, parts of the data can be kept and only desensitization operations are performed at specific locations.
When the data type of the data accessed by the client user is time data, desensitizing operation is performed on the accessed data by adopting a generalization operation method, which specifically comprises the following steps:
one of the generalization operations is an operation that converts the original precise value in the time to be desensitized into a range, fuzzy value. Another generalization is to rank the data top-down in dependency relationship such that only values of certain layers of attributes are retained, with the remaining attribute values being discarded. In summary, the core of the generalization operation is to reduce the precision of data by rounding off some values so that the real data fluctuates within a certain range, thereby protecting sensitive information. For sensitive data of time type with obvious dependency relationship, the top layer is year data, and the bottom layer is month, date and time data in sequence, so that the information of the type needs to only retain the year data and the month data, and specific time is directly omitted and comprises specific date and specific time.
Step 1302: when the access level of the access user is a low-level internal user, performing desensitization operation on data accessed by the access user by adopting a low-level internal user desensitization model, which specifically comprises the following steps:
when the data type of the data accessed by the low-level internal user is non-numerical data, desensitizing the non-numerical data by adopting a normalization method, which specifically comprises the following steps: scaling the data to fall within a small specified interval [ X ] min ,X max ]And the unit limit of the data is removed, and the data is converted into a dimensionless pure numerical value, so that the subsequent desensitization treatment of the data is facilitated. In the method, the main purpose is to ensure that the desensitized data can be matched with the original dataIs kept consistent, wherein the length of the data is N, the conversion formula is
Figure BDA0002853385450000081
It should be noted that, if the digital signature of how many bits needs to be stored, the data needs to be converted into the designated area, see table 1-1.
TABLE 1-1
Original data length 1 2 k n
Transition intervals [0,36) [36,36 2 ) [36 k-1 ,36 k ) [36 n-1 ,36 n )
Converting the data into 36-ary numbers, namely removing the 36-ary numbers to be converted from the dimensioned numerical values to obtain a quotient and a remainder, then dividing the quotient by 36 to obtain another quotient and remainder, continuing until the quotient is 0, and reversely arranging all obtained remainders to obtain the 36-ary numbers, wherein the specific calculation process is as shown in figure 5. Meanwhile, each key corresponds to one confusion table, one confusion table can correspond to a plurality of keys, each confusion table has a remainder and a value corresponding to one of 0-9 and A-Z, and the converted 36-system number is A 1 A 2 …A n Any one of the tables is shown in tables 1-2.
Tables 1 to 2
Remainder 0 1 2 3 4 5 6 7 8 9
Corresponding character P T H X 5 C D 4 G U
Remainder 10 11 12 13 14 15 16 17 18 19
Corresponding character I 3 K B 2 R V O W 8
Remainder 20 21 22 23 24 25 26 27 28 29
Corresponding character 7 J A 9 M 1 S F 0 Y
Remainder 30 31 32 33 34 35
Corresponding character N Z L Q 6 E
And then the obtained 36-system number is used as an ASCII code, and the data is subjected to remainder operation, so that the data is not reversible, and the characteristics of the original data can be ensured. The ASCII code conversion may be expressed as:
X i ′=ASCII(A i )mod10。
where ASCII (·) indicates that the ASCII code is used for the bit.
When the data type of the data accessed by the low-level internal user is time-class data, desensitizing the time-class data by adopting a noise adding method.
Desensitization of time-class data requires irreversible processing of the data, so the method of adding noise is chosen. On the other hand, since the value system in the time class data is different, for example, the month is 12 system, the date is 30 or 31 system, and the hour, minute and second is 60 system, the time data needs to be processed. The invention considers the conversion into a time stamp format, which is defined as the total number of seconds from the time greenwich time 1970, 01, 00 h, 00 min, 00 s to the present, and is widely applied to the processing of data. The specific treatment process comprises the following steps:
the time class data is converted into a time stamp format, denoted F.
Generating a perturbation value of the sensitive data through the key, so that the time data fluctuates, namely a desensitization formula: f ' = F ± (S ' ×)% (M × 3600 × 24), where F ' denotes the time stamp after desensitization conversion, S denotes the key, and the remainder denotes that the time shifts down over M days, i.e., the date fluctuates around M days.
And then the data in the time stamp format is restored to the original time data format.
When the data type of the data accessed by the low-level internal user is numerical class data, the formula X ' = (X · S ')% 10N or X ' = [ X · f (S)]%10 N Desensitizing the numerical data. Wherein, X represents the original data, S 'represents the numerical value of the key,% represents the remainder operation, X' represents the desensitized data, f (S) represents the key mapping function, and N represents the length of the original data.
Desensitizing numerical data, and performing irreversible processing on the numerical data, wherein the processing method is to combine a secret key and consider the value range [ X ] of the numerical data min ,X max ]There are two categories of situations that need to be distinguished:
wherein for numerical data without negative values, the desensitization formula is:
Figure BDA0002853385450000091
X'=(X·S′)%10 N or X' = [ X.f (S)]%10 N
For data in the numerical class containing negative values, the desensitization formula is:
Figure BDA0002853385450000092
X'=(X·S′)%10 N or X' = [ X.f (S)]%10 N
Where X represents the original data, S 'represents the numerical size of the key,% represents the remainder operation, and X' represents the desensitized data.
It should be noted that the key may contain letters in the design, so when performing numerical operation, it is default to take the ASCII code for each digit character in the key, and perform a remainder operation on 10, that is, the expression:
S i ′=ASCII(S i )mod10。
when the data type of the data accessed by the low-level internal user is text-type data, desensitizing the text-type data by adopting a text replacement mode.
The text replacement mode performed when desensitizing the text data is specifically as follows:
and acquiring the first letter of each piece of data in the text data, importing the first letter into a dictionary, sequencing the Chinese characters under each first letter, and marking the Chinese characters with numbers. The adopted dictionary is an existing electronic dictionary or a dictionary artificially constructed according to actual requirements.
A functional relationship operation is performed on the key to generate a set of random numbers, and then each first letter is able to find other chinese characters corresponding to the random number.
And rearranging the obtained Chinese characters according to the initial letters of the previous text data, thereby realizing the desensitization treatment of the text.
Step 1303: and when the access level of the access user is the advanced internal user, performing desensitization operation on the data accessed by the access user by adopting an advanced internal user desensitization model.
And when the data type of the data accessed by the high-level internal user is non-numerical data, performing desensitization operation on the non-numerical data by adopting a normalization method.
The desensitization operation process is the same as the desensitization operation method performed when the data type of the data accessed by the low-level internal user is non-numerical data, and details are not repeated here.
When the data type of the data accessed by the high-level internal user is time-class data, performing desensitization operation on the time-class data by adopting an encryption algorithm, specifically:
the time data is converted into a time stamp format.
By multiplying the time stamp by the value or functional mapping of the key, there is
F ' = F.S ' or F ' = F.f (S)
Through the normalization process, the data is remapped into the effective time stamp, so that the encryption is realized.
When the data type of the data accessed by the high-level internal user is numerical class data, desensitization operation is performed on the numerical class data by adopting a formula X ' = X · S ' or X ' = X · f (S). Wherein X represents the original data, S 'represents the numerical size of the key, X' represents the desensitized data, and f (S) represents the key mapping function.
The process requires encryption of the values, with the desensitization formula being
X ' = X · S ' or X ' = X · f (S).
X represents the original data, S 'represents the numerical size of the key, X' represents the desensitized data, and f (S) represents a mapping function designed about the key. Similarly, the key may be designed to contain letters, so that when performing a numerical operation, each digit of the key is defined as ASCII code by default, and a remainder is performed on 10 or a mapping function is designed for the key.
And finally, carrying out normalization processing, and remapping the data into an effective value domain so as to realize encryption.
When the data type of the data accessed by the high-level internal user is text data, desensitizing the text data by adopting a text replacement mode, specifically:
a comparison table of the initial of the text data and the name of the text data is established, each letter has a Chinese character corresponding to the letter, different keys have different comparison tables, and one comparison table can correspond to a plurality of keys.
Desensitization of the original data can be achieved by correlating the initials to a look-up table and performing a look-up.
For the above non-numeric class data, it is essentially non-numeric class fixed-length sensitive data. In this type of data, the data always maintains a specific length, is associated with certain information, and does not represent a specific numerical value, and even if the first digits in the data are 0, the 0 cannot be omitted, such as an identification number, a telephone number, an order number, and the like.
The time-class data generally includes the order time of the customer, the birth date of the customer, and the like, and the time-related data has specific meaning and represents XX minutes and XX seconds in XX month and XX day.
The numerical data may include composition ratios of some products, which may relate to confidential contents of enterprises, and therefore, the data needs to be modified to prevent loss of the numerical data while information is shared, which causes huge economic loss to the enterprises.
Thus, the present invention provides different desensitization procedures based on different data types and different access levels.
In conclusion, the dynamic data desensitization method provided by the invention carries out desensitization operation on different types of data in real time according to the data access requirement and the identity of the visitor. The dynamic desensitization system provided by the invention is different from the previous desensitization system in that different desensitization models are set for different data types, and meanwhile visitor information is specifically divided, so that the level judgment can be carried out according to the keys and the identities of different visitors, different desensitization degrees are set according to different access levels, and further the access authority control of different visitors on sensitive data is realized, and the purpose of improving the security of the sensitive data is achieved.
In addition, corresponding to the provided dynamic data desensitization method, the invention also provides two dynamic data desensitization systems:
one of the dynamic data desensitization systems, as shown in fig. 2, includes: an acquisition module 200, an access level determination module 210, a desensitization model invocation module 220, and a desensitization operation module 230.
The obtaining module 200 is used for obtaining the identity information of the accessing user and the data type of the accessed data. The identity information includes an access key. The data types include: non-numeric class data, temporal class data, numeric class data, and textual class data.
The access level determination module 210 is configured to determine an access level of the accessing user according to the identity information. The access levels include: a client user, a low-level internal user, and a high-level internal user.
Desensitization model calling module 220 is used to call desensitization models based on access levels and data types of the accessed data. Desensitization models include: a client user desensitization model, a low-level internal user desensitization model, and an advanced internal user desensitization model.
Desensitization operation module 230 is used to perform desensitization operations on data accessed by an accessing user using a desensitization model.
The specific implementation process of the dynamic data desensitization system is the same as the specific implementation process of the dynamic data desensitization method provided by the invention, and the detailed description of the dynamic data desensitization method is omitted here.
Another dynamic data desensitization system, as shown in fig. 3, includes: a data source interface module 300, a sensitive information classification module 310, an information anomaly analysis module 320, a user interface module 330, a key matching module 340, and a sensitive data desensitization module 350.
The data source interface module 300 is used for acquiring data to be desensitized inside an enterprise. The method is particularly used for acquiring data and texts, importing the information needing desensitization, and providing a detail checking function of a data source.
The sensitive information classification module 310 is connected to the data source interface module 300, and is configured to perform data classification on data to be desensitized, so as to obtain the classified data to be desensitized. The classified data to be desensitized includes: non-numeric class data, temporal class data, numeric class data, and textual class data.
The information anomaly analysis module 320 is connected to the sensitive information classification module 310 and is configured to remove anomalous data in the classified data to be desensitized. Specifically, after information to be desensitized is classified, each type of information has possible requirements, and data missing or data with insufficient length can be directly imported into an abnormal data set.
The user interface module 330 is used to obtain identity information of the accessing user. The identity information includes an access key.
The key matching module 340 is connected to the user interface module 330, and is configured to determine an access level of the accessing user according to the identity information. The access levels include: a client user, a low-level internal user, and a high-level internal user. Different keys correspond to different identity information and determine different access degrees, so that the keys need to be identified and the access information belonging to the identity is judged.
The sensitive data desensitization module 350 is respectively connected to the key matching module 340 and the information anomaly analysis module 320, and is configured to invoke different desensitization models according to access levels, so as to perform desensitization operation on the classified data to be desensitized from which the abnormal data is rejected. According to different accesses of different identities to different data, the idea of desensitizing each identity to sensitive information is provided for each type of information.
As shown in fig. 4, when a data source is imported into a database through the data interface module 300, the data source is classified into non-numerical fixed length data information, time data information, numerical fixed length data information, and text fixed length data information according to the sensitive information classification module 310. And after the accurate classification, eliminating abnormal data of different sensitive information to finish the processing of the data part. When a client accesses certain type of data, the access authority of a user is judged through the key matching module 340, the data desensitization module 350 is called, and different desensitization models are called according to the user authority of accessing the sensitive information.
In this system, there is a division of data access personnel permissions, roughly into three categories: customers (visitors), low level insiders, high level insiders. The desensitization degree of data which can be accessed by personnel in different levels is different, identities are distinguished through different keys bound by users, and three types of users are explained below.
For the client (tourist), the access right is minimum, and subsequent data processing is not needed, so the desensitization level of the user belongs to the lowest level, and the data and text retention degree is lowest, which means that the desensitization of the user mostly adopts direct elimination or special symbols to shield sensitive characters.
For low-level insiders, the access authority is high, data needs to be subsequently processed, including business of order downloading, order processing and the like, the data needs to be supported, the authenticity of the data is not concerned, and only fake data with the same format is needed. However, this type of user can access dummy data, which is not reversible, i.e. the original customer information cannot be deduced in the reverse direction by knowing the system. This means that the user desensitization data is processed irreversibly, which ensures that the original features of the data can be preserved, the authenticity of the lost data is ensured, and the data can never be decrypted reversely.
For high-level insiders, the higher access rights mean that the data can be further mined, so that the data is only encrypted. Meaning that this type of user can obtain the true value of the private data, but the number of this type of user is small. This means that this type of desensitization process is reversible, but the specific content of the transcryption mechanism is not available to the person himself, but can be decrypted only by means of the key.
On the other hand, because the desensitization modes are different due to different sensitive information, different processing methods are designed for sensitive information types possibly existing in the MES system, and the desensitization methods mainly comprise modes of non-numerical fixed-length sensitive data desensitization, time sensitive data desensitization, numerical sensitive data desensitization and text sensitive data desensitization. The classification of the different types of sensitive information is judged according to the characteristics of the sensitive information, and the characteristics are summarized as the following tables 1-3:
tables 1 to 3
Figure BDA0002853385450000141
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principle and the embodiment of the present invention are explained by applying specific examples, and the above description of the embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the foregoing, the description is not to be taken in a limiting sense.

Claims (7)

1. A method of dynamic data desensitization, comprising:
acquiring identity information of an access user and a data type of accessed data; the identity information comprises an access key; the data types include: non-numerical data, time data, numerical data and text data;
determining the access level of the access user according to the identity information; the access levels include: client users, low-level internal users, and high-level internal users; different keys correspond to different identity information and determine different access levels, so that the keys need to be identified and the access information belonging to which identity is judged;
calling a desensitization model according to the access level and the data type of the accessed data; the desensitization model includes: a client user desensitization model, a low-level internal user desensitization model, and an advanced internal user desensitization model;
desensitizing the data accessed by the access user by adopting the desensitizing model;
the desensitization operation of the access user on the data accessed by the access user by adopting the desensitization model specifically comprises the following steps:
when the access level of the access user is a client user, desensitizing operation is carried out on the data accessed by the access user by adopting a client user desensitizing model;
when the access level of the access user is a low-level internal user, performing desensitization operation on data accessed by the access user by adopting a low-level internal user desensitization model;
when the access level of the access user is a high-level internal user, performing desensitization operation on data accessed by the access user by adopting a high-level internal user desensitization model;
when the access level of the access user is a low-level internal user, performing desensitization operation on the data accessed by the access user by using a low-level internal user desensitization model, specifically comprising:
when the data type of the data accessed by the low-level internal user is non-numerical data, the data is further converted into a 36-system after normalization processing, and desensitization operation is carried out on the non-numerical data;
when the data type of the data accessed by the low-level internal user is time-class data, desensitizing the time-class data by adopting a noise adding method;
when the data type of the data accessed by the low-level internal user is numerical data, adopting a formula
Figure DEST_PATH_IMAGE001
Or
Figure DEST_PATH_IMAGE002
Performing desensitization operation on the numerical data; wherein, X represents the original data,
Figure DEST_PATH_IMAGE003
denotes the value size of the key,% denotes the remainder operation,
Figure DEST_PATH_IMAGE004
the data after the desensitization is represented by,
Figure DEST_PATH_IMAGE005
representing a key mapping function, N representing the length of the original data;
when the data type of the data accessed by the low-level internal user is text type data, desensitizing the text type data by adopting a text replacement mode.
2. The dynamic data desensitization method according to claim 1, wherein when the access level of the accessing user is a client user, performing desensitization operation on the data accessed by the accessing user by using a client user desensitization model, specifically includes:
when the data type of the data accessed by the client user is non-numerical data, numerical data or text data, desensitizing the accessed data by adopting a suppression processing method;
and when the data type of the data accessed by the client user is time-class data, desensitizing the accessed data by adopting a generalization operation method.
3. The dynamic data desensitization method according to claim 2, wherein when the data type of the data accessed by the client user is non-numerical data, numerical data or text data, the desensitization operation is performed on the accessed data by using a suppression processing method, specifically including:
when the data type of the data accessed by the client user is non-numerical data, numerical data or text data, replacing a real value or a part of a real value in the non-numerical data, the numerical data or the text data by a special symbol; the special symbols include: "+", "%" or "#".
4. The dynamic data desensitization method according to claim 2, wherein when the data type of the data accessed by the client user is time-class data, a generalization operation method is used to perform desensitization operation on the accessed data, which specifically includes:
one of the generalization operations is to convert the original precise value in the time data to be desensitized into a range and fuzzy value; another generalization operation is to arrange the data from top to bottom according to the dependency relationship, so that only the values of some layers of attributes are reserved, and the values of the rest attributes are discarded; in summary, the core of the generalization operation method is to reduce the precision of data by discarding some values in the time-class data, so that the real data fluctuates within a preset range, thereby protecting sensitive information.
5. The dynamic data desensitization method according to claim 1, wherein when the access level of the access user is a high-level internal user, performing desensitization operation on the data accessed by the access user by using a high-level internal user desensitization model specifically includes:
when the data type of the data accessed by the high-level internal user is non-numerical data, desensitizing the non-numerical data by adopting a normalization method;
when the data type of the data accessed by the high-level internal user is time-class data, performing desensitization operation on the time-class data by adopting an encryption algorithm;
when the data type of the data accessed by the high-level internal user is numerical value data, adopting a formula
Figure DEST_PATH_IMAGE006
Or
Figure DEST_PATH_IMAGE007
Performing desensitization operation on the numerical data; wherein, X represents the original data, and X represents the original data,
Figure 207180DEST_PATH_IMAGE003
which represents the size of the value of the key,
Figure 327583DEST_PATH_IMAGE004
the data after the desensitization is shown,
Figure 310582DEST_PATH_IMAGE005
representing a key mapping function;
and when the data type of the data accessed by the high-level internal user is text data, performing desensitization operation on the text data in a text replacement mode.
6. A dynamic data desensitization system, comprising:
the acquisition module is used for acquiring the identity information of the access user and the data type of the accessed data; the identity information comprises an access key; the data types include: non-numerical class data, time class data, numerical class data and text class data;
the access level determining module is used for determining the access level of the access user according to the identity information; the access levels include: a client user, a low-level internal user, and a high-level internal user; different keys correspond to different identity information and determine different access levels, so that the keys need to be identified and the access information belonging to which identity is judged;
a desensitization model calling module used for calling a desensitization model according to the access level and the data type of the accessed data; the desensitization model includes: a client user desensitization model, a low-level internal user desensitization model, and an advanced internal user desensitization model;
the desensitization operation module is used for performing desensitization operation on the data accessed by the access user by adopting the desensitization model;
the desensitization operation module comprises:
the first desensitization operation unit is used for performing desensitization operation on the data accessed by the access user by adopting a client user desensitization model when the access level of the access user is a client user;
the second desensitization operation unit is used for performing desensitization operation on the data accessed by the access user by adopting a low-level internal user desensitization model when the access level of the access user is a low-level internal user;
the third desensitization operation unit is used for performing desensitization operation on the data accessed by the access user by adopting an advanced internal user desensitization model when the access level of the access user is an advanced internal user;
the second desensitization operating unit includes:
a first desensitization operation subunit, configured to, when the data type of the data accessed by the low-level internal user is non-numerical data, perform normalization processing, further convert the data into 36-ary system, and perform desensitization operation on the non-numerical data;
the second desensitization operation subunit is used for performing desensitization operation on the time-class data by adopting a method of adding noise when the data type of the data accessed by the low-level internal user is the time-class data;
a third desensitization operation subunit for when the voltage is lowWhen the data type of the data accessed by the intra-level user is numerical value data, a formula is adopted
Figure DEST_PATH_IMAGE009
Or
Figure DEST_PATH_IMAGE011
Carrying out desensitization operation on the numerical data; wherein, X represents the original data,
Figure DEST_PATH_IMAGE013
denotes the value size of the key,% denotes the remainder operation,
Figure DEST_PATH_IMAGE015
the data after the desensitization is shown,
Figure DEST_PATH_IMAGE017
representing a key mapping function, N representing the length of the original data;
and the fourth desensitization operation subunit is used for performing desensitization operation on the text-type data in a text replacement mode when the data type of the data accessed by the low-level internal user is text-type data.
7. A dynamic data desensitization system, comprising:
the data source interface module is used for acquiring data to be desensitized inside an enterprise;
the sensitive information classification module is connected with the data source interface module and is used for carrying out data classification on the data to be desensitized to obtain the classified data to be desensitized; the classified data to be desensitized comprises: non-numerical data, time data, numerical data and text data;
the information anomaly analysis module is connected with the sensitive information classification module and is used for eliminating the anomalous data in the classified data to be desensitized;
the user interface module is used for acquiring the identity information of the access user; the identity information comprises an access key;
the key matching module is connected with the user interface module and used for determining the access level of the access user according to the identity information; the access levels include: client users, low-level internal users, and high-level internal users; different keys correspond to different identity information and determine different access levels, so that the keys need to be identified and the access information belonging to which identity is judged;
the sensitive data desensitization module is respectively connected with the key matching module and the information anomaly analysis module and is used for calling different desensitization models according to the access levels so as to perform desensitization operation on the classified data to be desensitized after abnormal data are removed;
the sensitive data desensitization module comprises:
the first desensitization operation unit is used for performing desensitization operation on the data accessed by the access user by adopting a client user desensitization model when the access level of the access user is a client user;
the second desensitization operation unit is used for performing desensitization operation on the data accessed by the access user by adopting a low-level internal user desensitization model when the access level of the access user is a low-level internal user;
a third desensitization operation unit, configured to perform desensitization operation on the data accessed by the access user by using an advanced internal user desensitization model when the access level of the access user is an advanced internal user;
the second desensitization operating unit includes:
the first desensitization operation subunit is used for performing desensitization operation on the non-numerical data after normalization processing when the data type of the data accessed by the low-level internal user is the non-numerical data and further converting the data into a 36-bit system;
the second desensitization operation subunit is used for performing desensitization operation on the time-class data by adopting a method of adding noise when the data type of the data accessed by the low-level internal user is the time-class data;
third desensitization operating subunitWhen the data type of the data accessed by the low-level internal user is numerical data, adopting a formula
Figure DEST_PATH_IMAGE009A
Or
Figure DEST_PATH_IMAGE011A
Performing desensitization operation on the numerical data; wherein, X represents the original data, and X represents the original data,
Figure DEST_PATH_IMAGE013A
denotes the value size of the key,% denotes the remainder operation,
Figure DEST_PATH_IMAGE015A
the data after the desensitization is represented by,
Figure DEST_PATH_IMAGE017A
representing a key mapping function, N representing the length of the original data;
and the fourth desensitization operation subunit is used for performing desensitization operation on the text data in a text replacement mode when the data type of the data accessed by the low-level internal user is the text data.
CN202011535750.0A 2020-12-23 2020-12-23 Dynamic data desensitization method and system Active CN112541196B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011535750.0A CN112541196B (en) 2020-12-23 2020-12-23 Dynamic data desensitization method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011535750.0A CN112541196B (en) 2020-12-23 2020-12-23 Dynamic data desensitization method and system

Publications (2)

Publication Number Publication Date
CN112541196A CN112541196A (en) 2021-03-23
CN112541196B true CN112541196B (en) 2022-10-21

Family

ID=75017608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011535750.0A Active CN112541196B (en) 2020-12-23 2020-12-23 Dynamic data desensitization method and system

Country Status (1)

Country Link
CN (1) CN112541196B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113836580A (en) * 2021-09-26 2021-12-24 中国电信股份有限公司 Data desensitization method, system, equipment and storage medium
CN113988226B (en) * 2021-12-29 2022-04-19 深圳红途科技有限公司 Data desensitization validity verification method and device, computer equipment and storage medium
CN115470509A (en) * 2022-11-14 2022-12-13 优铸科技(北京)有限公司 Display method, device and medium for workshop billboard
CN115879156A (en) * 2022-12-27 2023-03-31 北京明朝万达科技股份有限公司 Dynamic desensitization method, device, electronic equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7797341B2 (en) * 2007-04-30 2010-09-14 Hewlett-Packard Development Company, L.P. Desensitizing database information
CN108009443A (en) * 2017-11-30 2018-05-08 广州天鹏计算机科技有限公司 The access method and system of data
CN108418676A (en) * 2018-01-26 2018-08-17 山东超越数控电子股份有限公司 A kind of data desensitization method based on permission
CN110532797A (en) * 2019-07-24 2019-12-03 方盈金泰科技(北京)有限公司 The desensitization method and system of big data
CN112115482A (en) * 2020-09-16 2020-12-22 安徽长泰信息安全服务有限公司 Big data-based data security monitoring system for protecting data
CN112115512A (en) * 2020-09-22 2020-12-22 安徽长泰信息安全服务有限公司 Dynamic desensitization system and method based on database plug-in

Also Published As

Publication number Publication date
CN112541196A (en) 2021-03-23

Similar Documents

Publication Publication Date Title
CN112541196B (en) Dynamic data desensitization method and system
US10747903B2 (en) Identification of pseudonymized data within data sources
US8949625B2 (en) Systems for structured encryption using embedded information in data strings
CA2906475C (en) Method and apparatus for substitution scheme for anonymizing personally identifiable information
US8649552B2 (en) Data obfuscation of text data using entity detection and replacement
US10467420B2 (en) Systems for embedding information in data strings
JP2017091515A (en) Computer-implemented system and method for automatically identifying attributes for anonymization
CN111539021A (en) Data privacy type identification method, device and equipment
CN107301350B (en) Data processing method and system
WO2023065632A1 (en) Data desensitization method, data desensitization apparatus, device, and storage medium
CN113158233B (en) Data preprocessing method and device and computer storage medium
CN111639179B (en) Batch customer information privacy control method and device for bank front-end query system
CN107908732B (en) Mutually isolated multi-source big data fusion analysis method and system
FI20195426A1 (en) Compatible anonymization of data sets of different source
US20230195932A1 (en) Sensitive data attribute tokenization system
CN114861205A (en) Data classification-based privacy protection system with high safety performance
EP3582133B1 (en) Method for de-identifying data
CN109729076B (en) Data desensitization and inverse desensitization method and device, storage medium and terminal
JP2006140944A (en) Information embedding device, method, system and user terminal
CN112084528B (en) Customer privacy data identification and protection method based on data model
CN117272353B (en) Data encryption storage protection system and method
Ghann et al. Preserving the Privacy of Sensitive Data Using Bit-Coded-Sensitive Algorithm (BCSA).
Mattsson Data Security: On Premise or in the Cloud.
JP2006004301A (en) Method of managing data, and information processing device
CN115906175A (en) Consumption data management method, system and equipment supporting privacy protection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant