CN112541196A - Dynamic data desensitization method and system - Google Patents

Dynamic data desensitization method and system Download PDF

Info

Publication number
CN112541196A
CN112541196A CN202011535750.0A CN202011535750A CN112541196A CN 112541196 A CN112541196 A CN 112541196A CN 202011535750 A CN202011535750 A CN 202011535750A CN 112541196 A CN112541196 A CN 112541196A
Authority
CN
China
Prior art keywords
data
desensitization
user
access
accessed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011535750.0A
Other languages
Chinese (zh)
Other versions
CN112541196B (en
Inventor
柴森春
王昭洋
唐嘉
崔灵果
李慧芳
姚分喜
张百海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202011535750.0A priority Critical patent/CN112541196B/en
Publication of CN112541196A publication Critical patent/CN112541196A/en
Application granted granted Critical
Publication of CN112541196B publication Critical patent/CN112541196B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Storage Device Security (AREA)

Abstract

The invention relates to a dynamic data desensitization method and a system. The dynamic data desensitization method provided by the invention is used for performing desensitization operation on different types of data in real time according to the data access requirements and the identity of an accessor. The dynamic desensitization system provided by the invention is different from the previous desensitization system in that different desensitization models are set for different data types, and meanwhile visitor information is specifically divided, so that the level judgment can be carried out according to the keys and the identities of different visitors, different desensitization degrees are set according to different access levels, and further the access authority control of different visitors on sensitive data is realized, and the purpose of improving the security of the sensitive data is achieved.

Description

Dynamic data desensitization method and system
Technical Field
The invention relates to the technical field of data processing, in particular to a dynamic data desensitization method and a dynamic data desensitization system.
Background
With the vigorous development of informatization construction, most of paper data are stored in a digitalized manner, so that for an enterprise, a large amount of sensitive information and data can be generated along with the continuous accumulation of internal databases. The MES system emphasizes the problem that the information of various industries can be interconnected and communicated. Therefore, sensitive information inside the enterprise is inevitably involved in this process. And these data, throughout the daily operation of each enterprise, if the sensitive data has the problems of losing, improper using, unauthorized being touched or modified by people, etc., it will cause huge loss to the enterprise.
Enterprises have sensitive data including business secrets, intellectual property, key business information, business partner information or user information and the like, and once leakage and damage occur, the enterprises can not only bring great economic loss, but also cause great negative effects on the image of the enterprises, and the enterprises are all harmed but not beneficial.
Data desensitization is a technique for processing sensitive information in data by replacing the sensitive information in the data or deforming the sensitive information in the data, and is characterized in that the processed data looks real, but does not expose any sensitive information, and has no use value for people who want to abuse the data.
Data desensitization can be divided into two categories in total, one is static desensitization and one is dynamic desensitization. Static data desensitization is a traditional data desensitization mode, a system needs to export data from an original database at one time, desensitization operation is carried out on the data, and desensitized data are obtained, wherein the desensitized data can be exported into files for the database and can also be stored in a mirror image library and used for test development or externally issued. In dynamic data desensitization, the system does not store desensitized data, but rather performs desensitization operations on the data in real time according to data access requirements and the identity of the visitor.
Although the static data desensitization has good desensitization effect and can be used for test development, the static data desensitization has a plurality of defects. With the arrival of the big data era and the rapid increase of data volume, people use stream computing to process increasing data, and the traditional static data desensitization can not well meet the application test development requirements. And in a production environment, as the data volume increases, the difficulty of maintaining the mirror desensitization database is increased.
The proposal of the dynamic data desensitization mainly aims to solve the problems that the static data desensitization can not be well adapted to the growth and change of data and the updating is slow, can set desensitization rules and desensitization strategies for different data types, and can also set different desensitization granularities according to different visitor identities to realize the access authority control of the sensitive data. However, the dynamic desensitization granularity in the prior art is the same, which is not beneficial to distinguishing visitors for access.
Therefore, providing a novel dynamic desensitization method or system to improve the security of sensitive information of an enterprise is a technical problem to be solved in the field.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides an MES-oriented dynamic data desensitization method and system for improving the security of sensitive information of enterprises.
In order to achieve the purpose, the invention provides the following scheme:
a method of dynamic data desensitization, comprising:
acquiring identity information of an access user and a data type of accessed data; the identity information comprises an access key; the data types include: non-numerical data, time data, numerical data and text data;
determining the access level of the access user according to the identity information; the access levels include: a client user, a low-level internal user, and a high-level internal user;
calling a desensitization model according to the access level and the data type of the accessed data; the desensitization model includes: a client user desensitization model, a low-level internal user desensitization model, and a high-level internal user desensitization model;
and performing desensitization operation on the data accessed by the access user by adopting the desensitization model.
Preferably, the desensitizing operation performed on the data accessed by the access user by using the desensitizing model specifically includes:
when the access level of the access user is a client user, performing desensitization operation on the data accessed by the access user by adopting a client user desensitization model;
when the access level of the access user is a low-level internal user, performing desensitization operation on data accessed by the access user by adopting a low-level internal user desensitization model;
and when the access level of the access user is a high-level internal user, performing desensitization operation on the data accessed by the access user by adopting a high-level internal user desensitization model.
Preferably, when the access level of the access user is a client user, performing desensitization operation on the data accessed by the access user by using a client user desensitization model specifically includes:
when the data type of the data accessed by the client user is non-numerical data, numerical data or text data, desensitizing the accessed data by adopting a suppression processing method;
and when the data type of the data accessed by the client user is time-class data, performing desensitization operation on the accessed data by adopting a generalization operation method.
Preferably, when the data type of the data accessed by the client user is non-numerical data, or text data, the desensitization operation is performed on the accessed data by using a method of suppression processing, which specifically includes:
when the data type of the data accessed by the client user is non-numerical data, numerical data or text data, replacing the data at a specific position in the non-numerical data, the numerical data or the text data by a special symbol; the special symbols include: "+", "%" or "#".
Preferably, when the data type of the data accessed by the client user is time-class data, desensitizing the accessed data by using a generalization operation method specifically includes:
when the data type of the data accessed by the client user is time class data, the data at a specific position in the time class data is discarded.
Preferably, when the access level of the access user is a low-level internal user, performing desensitization operation on the data accessed by the access user by using a low-level internal user desensitization model specifically includes:
when the data type of the data accessed by the low-level internal user is non-numerical data, desensitizing the non-numerical data by adopting a normalization method;
when the data type of the data accessed by the low-level internal user is time-class data, desensitizing the time-class data by adopting a noise adding method;
when the data type of the data accessed by the low-level internal user is numerical class data, the formula X ═ (X · S')% 10 is adoptedNOr X' ═ X.f (S)]%10NPerforming desensitization operation on the numerical data; wherein X represents original data, S 'represents the numerical value of the key, The% represents the remainder operation, X' represents the desensitized data, f (S) represents the key mapping function, and N represents the length of the original data;
when the data type of the data accessed by the low-level internal user is text type data, desensitizing the text type data by adopting a text replacement mode.
Preferably, when the access level of the access user is a high-level internal user, performing desensitization operation on data accessed by the access user by using a high-level internal user desensitization model specifically includes:
when the data type of the data accessed by the high-level internal user is non-numerical data, desensitizing the non-numerical data by adopting a normalization method;
when the data type of the data accessed by the high-level internal user is time-class data, performing desensitization operation on the time-class data by adopting an encryption algorithm;
when the data type of the data accessed by the high-level internal user is numerical value class data, desensitizing the numerical value class data by adopting a formula X ' ═ X · S ' or X ' ═ X · f (S); wherein X represents original data, S 'represents the numerical value of the key, X' represents desensitized data, and f (S) represents a key mapping function;
and when the data type of the data accessed by the high-level internal user is text data, performing desensitization operation on the text data in a text replacement mode.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
according to the dynamic data desensitization method provided by the invention, desensitization operation is carried out on different types of data in real time according to the data access requirements and the identity of the visitor. The dynamic desensitization system provided by the invention is different from the previous desensitization system in that different desensitization models are set for different data types, and meanwhile visitor information is specifically divided, so that the level judgment can be carried out according to the keys and the identities of different visitors, different desensitization degrees are set according to different access levels, and further the access authority control of different visitors on sensitive data is realized, and the purpose of improving the security of the sensitive data is achieved.
Corresponding to the dynamic data desensitization method, the invention also provides two dynamic data desensitization systems.
A dynamic data desensitization system, comprising:
the acquisition module is used for acquiring the identity information of the access user and the data type of the accessed data; the identity information comprises an access key; the data types include: non-numerical data, time data, numerical data and text data;
the access level determining module is used for determining the access level of the access user according to the identity information; the access levels include: a client user, a low-level internal user, and a high-level internal user;
the desensitization model calling module is used for calling a desensitization model according to the access level and the data type of the accessed data; the desensitization model includes: a client user desensitization model, a low-level internal user desensitization model, and a high-level internal user desensitization model;
and the desensitization operation module is used for performing desensitization operation on the data accessed by the access user by adopting the desensitization model.
Another dynamic data desensitization system, comprising:
the data source interface module is used for acquiring data to be desensitized in the enterprise;
the sensitive information classification module is connected with the data source interface module and is used for carrying out data classification on the data to be desensitized to obtain the classified data to be desensitized; the classified data to be desensitized comprises: non-numerical data, time data, numerical data and text data;
the information anomaly analysis module is connected with the sensitive information classification module and is used for eliminating the anomalous data in the classified data to be desensitized;
the user interface module is used for acquiring the identity information of the access user; the identity information comprises an access key;
the key matching module is connected with the user interface module and used for determining the access level of the access user according to the identity information; the access levels include: a client user, a low-level internal user, and a high-level internal user;
and the sensitive data desensitization module is respectively connected with the key matching module and the information anomaly analysis module and is used for calling different desensitization models according to the access levels so as to perform desensitization operation on the classified data to be desensitized after abnormal data are removed.
The technical effects and purposes of the dynamic data desensitization system provided by the invention are the same as those of the dynamic data desensitization method provided by the invention, and therefore, the details are not repeated.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flow chart of a method of dynamic data desensitization provided by the present invention;
FIG. 2 is a schematic structural diagram of a first dynamic data desensitization system provided by the present invention;
FIG. 3 is a schematic diagram of a second dynamic data desensitization system provided by the present invention;
FIG. 4 is a block diagram of a process for performing data desensitization by the dynamic data desensitization system in an embodiment of the present invention;
fig. 5 is a schematic diagram of a 36-ary conversion process in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a dynamic data desensitization method and a dynamic data desensitization system to improve the security of sensitive information of an enterprise.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Fig. 1 is a flowchart of a dynamic data desensitization method provided by the present invention, and as shown in fig. 1, a dynamic data desensitization method includes:
step 100: identity information of the accessing user and the data type of the accessed data are obtained. The identity information includes an access key. The data types include: non-numeric class data, temporal class data, numeric class data, and textual class data.
Step 110: and determining the access level of the access user according to the identity information. The access levels include: a client user, a low-level internal user, and a high-level internal user.
Step 120: the desensitization model is invoked according to the access level and the data type of the accessed data. Desensitization models include: a client user desensitization model, a low-level internal user desensitization model, and a high-level internal user desensitization model.
Step 130: and performing desensitization operation on the data accessed by the access user by adopting a desensitization model. The method specifically comprises the following steps:
step 1301: when the access level of the access user is a client user, performing desensitization operation on data accessed by the access user by adopting a client user desensitization model, which specifically comprises the following steps:
when the data type of the data accessed by the client user is non-numerical data, numerical data or text data, desensitizing the accessed data by adopting a suppression processing method, specifically:
when the data type of the data accessed by the client user is non-numerical data, numerical data or text data, the special symbols are adopted to replace the data at the specific position in the non-numerical data, the numerical data or the text data. The special symbols include: "+", "%" or "#". The data is protected by replacing the real values or parts of the real values with special symbols, e.g. replacing some special values (or text) with "+", so that the real data cannot be seen, wherein the data length remains the same and parts of the data can be kept and only desensitization operations are performed at specific locations.
When the data type of the data accessed by the client user is time data, desensitizing operation is performed on the accessed data by adopting a generalization operation method, which specifically comprises the following steps:
one of the generalization operations is an operation that converts the original precise value in the time to be desensitized into a range, fuzzy value. Another generalization operation is to rank the data top-down in membership, so that only values of certain layers of attributes are retained, with the remaining attribute values being truncated. In summary, the core of the generalization operation is to reduce the accuracy of data by rounding off certain values so that the real data fluctuates within a certain range, thereby protecting sensitive information. For sensitive data of time type with obvious dependency relationship, the top layer is year data, and the bottom layer is month, date and time data in sequence, so that the information of the type needs to only retain the year data and the month data, and specific time is directly omitted and comprises specific date and specific time.
Step 1302: when the access level of the access user is a low-level internal user, performing desensitization operation on data accessed by the access user by adopting a low-level internal user desensitization model, which specifically comprises the following steps:
when the data type of the data accessed by the low-level internal user is non-numerical data, desensitizing the non-numerical data by adopting a normalization method, which specifically comprises the following steps: scaling the data to fall within a small specified interval [ X ]min,Xmax]And the unit limit of the data is removed, and the data is converted into a dimensionless pure numerical value, so that the subsequent desensitization treatment of the data is facilitated. In the method, mainly to ensure that the length characteristics of the desensitized data can be consistent with those of the original data, wherein the length of the data is N, the conversion formula is
Figure BDA0002853385450000081
It should be noted that, if the digital characteristics of the bits need to be stored, the data needs to be converted into the designated area, see table 1-1.
TABLE 1-1
Original data length 1 2 k n
Transition intervals [0,36) [36,362) [36k-1,36k) [36n-1,36n)
Converting the data into 36-ary numbers, namely removing the 36-ary numbers to be converted from the dimensioned numerical values to obtain a quotient and a remainder, then dividing the quotient by 36 to obtain another quotient and remainder, continuing until the quotient is 0, and reversely arranging all obtained remainders to obtain the 36-ary numbers, wherein the specific calculation process is shown in fig. 5. Meanwhile, each key corresponds to one confusion table, one confusion table can correspond to a plurality of keys, each confusion table has a remainder and a value corresponding to one of 0-9 and A-Z, and the converted 36-system number is A1A2…AnAny one of the tables is shown in tables 1-2.
Tables 1 to 2
Remainder 0 1 2 3 4 5 6 7 8 9
Corresponding character P T H X 5 C D 4 G U
Remainder 10 11 12 13 14 15 16 17 18 19
Corresponding character I 3 K B 2 R V O W 8
Remainder 20 21 22 23 24 25 26 27 28 29
Corresponding character 7 J A 9 M 1 S F 0 Y
Remainder 30 31 32 33 34 35
Corresponding character N Z L Q 6 E
And then the obtained 36-system number is used as an ASCII code, and the data is subjected to remainder operation, so that the data is not reversible, and the characteristics of the original data can be ensured. The ASCII code conversion may be expressed as:
Xi′=ASCII(Ai)mod10。
where ASCII (·) indicates that the ASCII code is used for the bit.
When the data type of the data accessed by the low-level internal user is time class data, desensitizing the time class data by adopting a noise adding method.
Desensitization of time-class data requires irreversible processing of the data, and therefore a method of adding noise is chosen. On the other hand, since the numerical value is different in the time class data, such as 12 in month, 30 or 31 in date, and 60 in hour, minute and second, the time data needs to be processed. The invention considers the conversion into a time stamp format, which is defined as the total number of seconds from the time greenwich time 1970, 01, 00 h, 00 min, 00 s to the present, and is widely applied to the processing of data. The specific treatment process comprises the following steps:
the time class data is converted into a time stamp format, denoted F.
Generating a disturbance value of the sensitive data through the key to enable the time data to fluctuate, namely a desensitization formula: f ± (S '×) percent (M × 3600 × 24), where F' represents the time stamp after desensitization conversion, S represents the key, and the remainder represents the time shift down over M days, i.e., the date fluctuates around M days.
And then the data in the time stamp format is restored to the original time data format.
When the data type of the data accessed by the low-level internal user is numerical data, the formula X ' (X · S ')% 10N or X ' ([ X · f (S))]%10NDesensitizing the numerical data. Wherein, X represents original data, S 'represents the numerical value of the key, the% represents the remainder operation, X' represents the desensitized data, f (S) represents the key mapping function, and N represents the length of the original data.
Desensitizing numerical data, and performing irreversible processing on the numerical data, wherein the processing method is to combine a secret key and consider the value range [ X ] of the numerical datamin,Xmax]There are two categories of situations that need to be distinguished:
wherein for numerical data without negative values, the desensitization formula is:
Figure BDA0002853385450000091
X'=(X·S′)%10Nor X ═[X·f(S)]%10N
For data in the numerical class containing negative values, the desensitization formula is:
Figure BDA0002853385450000092
X'=(X·S′)%10Nor X' ═ X.f (S)]%10N
Wherein, X represents original data, S 'represents the numerical value of the key, the% represents the remainder operation, and X' represents the desensitized data.
It should be noted that the key may contain letters in the design, so when performing numerical operation, it is default to take the ASCII code for each digit character in the key, and perform a remainder operation on 10, that is, the expression:
Si′=ASCII(Si)mod10。
when the data type of the data accessed by the low-level internal user is text type data, desensitizing the text type data by adopting a text replacement mode.
The text replacement mode performed when desensitizing the text data is specifically as follows:
and acquiring the initial of each piece of data in the text data, importing the initial into a dictionary, sequencing the Chinese characters under each initial, and marking numbers. The dictionary adopted is an existing electronic dictionary or a dictionary artificially constructed according to actual requirements.
A functional relationship operation is performed on the key to generate a set of random numbers, and then each first letter is able to find other chinese characters corresponding to the random number.
And rearranging the obtained Chinese characters according to the initial letters of the previous text data, thereby realizing the desensitization treatment of the text.
Step 1303: and when the access level of the access user is the advanced internal user, performing desensitization operation on the data accessed by the access user by adopting an advanced internal user desensitization model.
And when the data type of the data accessed by the high-level internal user is non-numerical data, performing desensitization operation on the non-numerical data by adopting a normalization method.
The desensitization operation process is the same as the desensitization operation method performed when the data type of the data accessed by the low-level internal user is non-numerical data, and details are not repeated here.
When the data type of the data accessed by the high-level internal user is time-class data, performing desensitization operation on the time-class data by adopting an encryption algorithm, specifically:
the time data is converted into a time stamp format.
By multiplying the time stamp by the value or functional mapping of the key, there is
F ' ═ F · S ' or F ' ═ F · F (S)
Through the normalization process, the data is remapped into the effective time stamp, so that the encryption is realized.
When the data type of the data accessed by the advanced internal user is numerical value type data, desensitization operation is carried out on the numerical value type data by adopting a formula X ═ X · S' or X ═ X · f (S). Wherein X represents the original data, S 'represents the numerical size of the key, X' represents the desensitized data, and f (S) represents the key mapping function.
The process requires encryption of the values, with the desensitization formula being
X ' ═ X · S ' or X ' ═ X · f (S).
X represents the original data, S 'represents the numerical size of the key, X' represents the desensitized data, and f (S) represents a mapping function designed about the key. Similarly, the key may be designed to contain letters, so that when performing a numerical operation, each digit of the key is defined as ASCII code by default, and a remainder is performed on 10 or a mapping function is designed for the key.
And finally, carrying out normalization processing, and remapping the data into an effective value domain so as to realize encryption.
When the data type of the data accessed by the high-level internal user is text data, desensitizing the text data by adopting a text replacement mode, specifically:
a comparison table of the initial of the text data and the name of the text data is established, each letter has a Chinese character corresponding to the letter, different keys have different comparison tables, and one comparison table can correspond to a plurality of keys.
Desensitization of the original data can be achieved by correlating the initials to a look-up table and performing a look-up.
For the above non-numeric class data, it is essentially non-numeric class fixed-length sensitive data. In this type of data, the data always maintains a specific length, is associated with certain information, and does not represent a specific numerical value, and even if the first digits in the data are 0, the 0 cannot be omitted, such as an identification number, a telephone number, an order number, and the like.
The time-class data generally includes the order time of the client, the birth date of the client, and the like, and has time-related data which has specific meaning and represents XX minutes and XX seconds in XX month and XX day.
The numerical data may include composition ratios of some products, which may relate to confidential contents of enterprises, and therefore, the data needs to be modified to prevent loss of the numerical data while information is shared, which brings huge economic loss to the enterprises.
Thus, the present invention provides different desensitization procedures based on different data types and different access levels.
In conclusion, the dynamic data desensitization method provided by the invention carries out desensitization operation on different types of data in real time according to the data access requirements and the identity of the visitor. The dynamic desensitization system provided by the invention is different from the previous desensitization system in that different desensitization models are set for different data types, and meanwhile visitor information is specifically divided, so that the level judgment can be carried out according to the keys and the identities of different visitors, different desensitization degrees are set according to different access levels, and further the access authority control of different visitors on sensitive data is realized, and the purpose of improving the security of the sensitive data is achieved.
In addition, corresponding to the provided dynamic data desensitization method, the invention also provides two dynamic data desensitization systems:
one of the dynamic data desensitization systems, as shown in fig. 2, includes: an acquisition module 200, an access level determination module 210, a desensitization model invocation module 220, and a desensitization operation module 230.
The obtaining module 200 is used for obtaining the identity information of the accessing user and the data type of the accessed data. The identity information includes an access key. The data types include: non-numeric class data, temporal class data, numeric class data, and textual class data.
The access level determination module 210 is configured to determine an access level of the accessing user according to the identity information. The access levels include: a client user, a low-level internal user, and a high-level internal user.
Desensitization model calling module 220 is used to call desensitization models based on access levels and data types of the accessed data. Desensitization models include: a client user desensitization model, a low-level internal user desensitization model, and a high-level internal user desensitization model.
Desensitization operation module 230 is used to perform desensitization operations on data accessed by an accessing user using a desensitization model.
The specific implementation process of the dynamic data desensitization system is the same as the specific implementation process of the dynamic data desensitization method provided by the invention, and the detailed description of the dynamic data desensitization method is omitted here.
Another dynamic data desensitization system, as shown in fig. 3, includes: a data source interface module 300, a sensitive information classification module 310, an information anomaly analysis module 320, a user interface module 330, a key matching module 340, and a sensitive data desensitization module 350.
The data source interface module 300 is used for acquiring data to be desensitized inside an enterprise. The method is particularly used for acquiring data and texts, importing the information needing desensitization, and providing a detailed view function of a data source.
The sensitive information classification module 310 is connected to the data source interface module 300, and is configured to perform data classification on data to be desensitized, so as to obtain the classified data to be desensitized. The classified data to be desensitized includes: non-numeric class data, temporal class data, numeric class data, and textual class data.
The information anomaly analysis module 320 is connected to the sensitive information classification module 310 and is configured to remove anomalous data in the classified data to be desensitized. Specifically, after information to be desensitized is classified, each type of information has possible requirements, and an abnormal data set can be directly imported for data missing or data with insufficient length.
The user interface module 330 is used to obtain identity information of the accessing user. The identity information includes an access key.
The key matching module 340 is connected to the user interface module 330, and is configured to determine an access level of the accessing user according to the identity information. The access levels include: a client user, a low-level internal user, and a high-level internal user. Different keys correspond to different identity information and determine different access degrees, so that the keys need to be identified and the access information belonging to the identity is judged.
The sensitive data desensitization module 350 is respectively connected to the key matching module 340 and the information anomaly analysis module 320, and is configured to invoke different desensitization models according to access levels, so as to perform desensitization operation on the classified data to be desensitized from which the abnormal data is rejected. According to different accesses of different identities to different data, the idea of desensitizing each identity to sensitive information is provided for each type of information.
As shown in fig. 4, when the data source is imported into the database through the data interface module 300, the data source is classified into non-numeric fixed-length data information, time data information, numeric fixed-length data information, and text fixed-length data information according to the sensitive information classification module 310. And after the accurate classification, eliminating abnormal data of different sensitive information to finish the processing of the data part. When a client accesses a certain type of data, the access authority of the user is judged through the key matching module 340, the data desensitization module 350 is called, and different desensitization models are called according to the user authority of accessing the sensitive information.
In this system, there is a division of data access personnel permissions, roughly into three categories: customers (visitors), low level insiders, high level insiders. The desensitization degree of data which can be accessed by personnel in different levels is different, identities are distinguished through different keys bound by users, and three types of users are explained below.
For the client (tourist), the access right is minimum, and subsequent data processing is not needed, so the desensitization level of the user belongs to the lowest level, and the data and text retention degree is lowest, which means that the desensitization of the user mostly adopts direct elimination or special symbols to shield sensitive characters.
For low-level insiders, the access authority is high, data needs to be subsequently processed, including business of order downloading, order processing and the like, the data needs to be supported, the authenticity of the data is not concerned, and only fake data with the same format is needed. However, this type of user can access dummy data, which is not reversible, i.e. the original customer information cannot be deduced in the reverse direction by knowing the system. This means that the user desensitization data is processed irreversibly, which ensures that the original features of the data can be preserved, the authenticity of the lost data is ensured, and the data can never be decrypted reversely.
For high-level insiders, the higher access rights mean that the data can be further mined, so that the data is only encrypted. Meaning that this type of user can obtain the true value of the private data, but the number of this type of user is small. This means that this type of desensitization process is reversible, but the specific content of the transcryption mechanism is not available to the person himself, but can be decrypted only by means of the key.
On the other hand, because the desensitization modes are different due to different sensitive information, different processing methods are designed for sensitive information types possibly existing in the MES system, and the desensitization methods mainly comprise modes of non-numerical fixed-length sensitive data desensitization, time sensitive data desensitization, numerical sensitive data desensitization and text sensitive data desensitization. The classification of the different types of sensitive information is judged according to the characteristics of the sensitive information, and the characteristics are summarized as the following tables 1-3:
tables 1 to 3
Figure BDA0002853385450000141
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (9)

1. A method of dynamic data desensitization, comprising:
acquiring identity information of an access user and a data type of accessed data; the identity information comprises an access key; the data types include: non-numerical data, time data, numerical data and text data;
determining the access level of the access user according to the identity information; the access levels include: a client user, a low-level internal user, and a high-level internal user;
calling a desensitization model according to the access level and the data type of the accessed data; the desensitization model includes: a client user desensitization model, a low-level internal user desensitization model, and a high-level internal user desensitization model;
and performing desensitization operation on the data accessed by the access user by adopting the desensitization model.
2. The dynamic data desensitization method according to claim 1, wherein said applying the desensitization model to desensitize data accessed by the access user comprises:
when the access level of the access user is a client user, performing desensitization operation on the data accessed by the access user by adopting a client user desensitization model;
when the access level of the access user is a low-level internal user, performing desensitization operation on data accessed by the access user by adopting a low-level internal user desensitization model;
and when the access level of the access user is a high-level internal user, performing desensitization operation on the data accessed by the access user by adopting a high-level internal user desensitization model.
3. The dynamic data desensitization method according to claim 2, wherein when the access level of the accessing user is a client user, performing desensitization operation on the data accessed by the accessing user by using a client user desensitization model specifically includes:
when the data type of the data accessed by the client user is non-numerical data, numerical data or text data, desensitizing the accessed data by adopting a suppression processing method;
and when the data type of the data accessed by the client user is time-class data, performing desensitization operation on the accessed data by adopting a generalization operation method.
4. The dynamic data desensitization method according to claim 3, wherein when the data type of the data accessed by the client user is non-numerical data, numerical data or text data, the desensitization operation is performed on the accessed data by using a suppression processing method, specifically including:
when the data type of the data accessed by the client user is non-numerical data, numerical data or text data, replacing the data at a specific position in the non-numerical data, the numerical data or the text data by a special symbol; the special symbols include: "+", "%" or "#".
5. The dynamic data desensitization method according to claim 3, wherein when the data type of the data accessed by the client user is time-class data, a generalization operation method is used to perform desensitization operation on the accessed data, which specifically includes:
when the data type of the data accessed by the client user is time class data, the data at a specific position in the time class data is discarded.
6. The dynamic data desensitization method according to claim 2, wherein when the access level of the accessing user is a low-level internal user, performing desensitization operation on the data accessed by the accessing user by using a low-level internal user desensitization model, specifically comprising:
when the data type of the data accessed by the low-level internal user is non-numerical data, desensitizing the non-numerical data by adopting a normalization method;
when the data type of the data accessed by the low-level internal user is time-class data, desensitizing the time-class data by adopting a noise adding method;
when the data type of the data accessed by the low-level internal user is numerical class data, the formula X ═ (X · S')% 10 is adoptedNOr X' ═ X.f (S)]%10NPerforming desensitization operation on the numerical data; wherein X represents original data, S 'represents the numerical value of the key, The% represents the remainder operation, X' represents the desensitized data, f (S) represents the key mapping function, and N represents the length of the original data;
when the data type of the data accessed by the low-level internal user is text type data, desensitizing the text type data by adopting a text replacement mode.
7. The dynamic data desensitization method according to claim 2, wherein when the access level of the access user is a high-level internal user, performing desensitization operation on the data accessed by the access user by using a high-level internal user desensitization model specifically includes:
when the data type of the data accessed by the high-level internal user is non-numerical data, desensitizing the non-numerical data by adopting a normalization method;
when the data type of the data accessed by the high-level internal user is time-class data, performing desensitization operation on the time-class data by adopting an encryption algorithm;
when the data type of the data accessed by the high-level internal user is numerical value class data, desensitizing the numerical value class data by adopting a formula X ' ═ X · S ' or X ' ═ X · f (S); wherein X represents original data, S 'represents the numerical value of the key, X' represents desensitized data, and f (S) represents a key mapping function;
and when the data type of the data accessed by the high-level internal user is text data, performing desensitization operation on the text data in a text replacement mode.
8. A dynamic data desensitization system, comprising:
the acquisition module is used for acquiring the identity information of the access user and the data type of the accessed data; the identity information comprises an access key; the data types include: non-numerical data, time data, numerical data and text data;
the access level determining module is used for determining the access level of the access user according to the identity information; the access levels include: a client user, a low-level internal user, and a high-level internal user;
the desensitization model calling module is used for calling a desensitization model according to the access level and the data type of the accessed data; the desensitization model includes: a client user desensitization model, a low-level internal user desensitization model, and a high-level internal user desensitization model;
and the desensitization operation module is used for performing desensitization operation on the data accessed by the access user by adopting the desensitization model.
9. A dynamic data desensitization system, comprising:
the data source interface module is used for acquiring data to be desensitized in the enterprise;
the sensitive information classification module is connected with the data source interface module and is used for carrying out data classification on the data to be desensitized to obtain the classified data to be desensitized; the classified data to be desensitized comprises: non-numerical data, time data, numerical data and text data;
the information anomaly analysis module is connected with the sensitive information classification module and is used for eliminating the anomalous data in the classified data to be desensitized;
the user interface module is used for acquiring the identity information of the access user; the identity information comprises an access key;
the key matching module is connected with the user interface module and used for determining the access level of the access user according to the identity information; the access levels include: a client user, a low-level internal user, and a high-level internal user;
and the sensitive data desensitization module is respectively connected with the key matching module and the information anomaly analysis module and is used for calling different desensitization models according to the access levels so as to perform desensitization operation on the classified data to be desensitized after abnormal data are removed.
CN202011535750.0A 2020-12-23 2020-12-23 Dynamic data desensitization method and system Active CN112541196B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011535750.0A CN112541196B (en) 2020-12-23 2020-12-23 Dynamic data desensitization method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011535750.0A CN112541196B (en) 2020-12-23 2020-12-23 Dynamic data desensitization method and system

Publications (2)

Publication Number Publication Date
CN112541196A true CN112541196A (en) 2021-03-23
CN112541196B CN112541196B (en) 2022-10-21

Family

ID=75017608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011535750.0A Active CN112541196B (en) 2020-12-23 2020-12-23 Dynamic data desensitization method and system

Country Status (1)

Country Link
CN (1) CN112541196B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113836580A (en) * 2021-09-26 2021-12-24 中国电信股份有限公司 Data desensitization method, system, equipment and storage medium
CN113988226A (en) * 2021-12-29 2022-01-28 深圳红途科技有限公司 Data desensitization validity verification method and device, computer equipment and storage medium
CN115470509A (en) * 2022-11-14 2022-12-13 优铸科技(北京)有限公司 Display method, device and medium for workshop billboard
CN115879156A (en) * 2022-12-27 2023-03-31 北京明朝万达科技股份有限公司 Dynamic desensitization method, device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080270370A1 (en) * 2007-04-30 2008-10-30 Castellanos Maria G Desensitizing database information
CN108009443A (en) * 2017-11-30 2018-05-08 广州天鹏计算机科技有限公司 The access method and system of data
CN108418676A (en) * 2018-01-26 2018-08-17 山东超越数控电子股份有限公司 A kind of data desensitization method based on permission
CN110532797A (en) * 2019-07-24 2019-12-03 方盈金泰科技(北京)有限公司 The desensitization method and system of big data
CN112115482A (en) * 2020-09-16 2020-12-22 安徽长泰信息安全服务有限公司 Big data-based data security monitoring system for protecting data
CN112115512A (en) * 2020-09-22 2020-12-22 安徽长泰信息安全服务有限公司 Dynamic desensitization system and method based on database plug-in

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080270370A1 (en) * 2007-04-30 2008-10-30 Castellanos Maria G Desensitizing database information
CN108009443A (en) * 2017-11-30 2018-05-08 广州天鹏计算机科技有限公司 The access method and system of data
CN108418676A (en) * 2018-01-26 2018-08-17 山东超越数控电子股份有限公司 A kind of data desensitization method based on permission
CN110532797A (en) * 2019-07-24 2019-12-03 方盈金泰科技(北京)有限公司 The desensitization method and system of big data
CN112115482A (en) * 2020-09-16 2020-12-22 安徽长泰信息安全服务有限公司 Big data-based data security monitoring system for protecting data
CN112115512A (en) * 2020-09-22 2020-12-22 安徽长泰信息安全服务有限公司 Dynamic desensitization system and method based on database plug-in

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈天莹等: "大数据环境下的智能数据脱敏系统", 《通信技术》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113836580A (en) * 2021-09-26 2021-12-24 中国电信股份有限公司 Data desensitization method, system, equipment and storage medium
CN113988226A (en) * 2021-12-29 2022-01-28 深圳红途科技有限公司 Data desensitization validity verification method and device, computer equipment and storage medium
CN113988226B (en) * 2021-12-29 2022-04-19 深圳红途科技有限公司 Data desensitization validity verification method and device, computer equipment and storage medium
CN115470509A (en) * 2022-11-14 2022-12-13 优铸科技(北京)有限公司 Display method, device and medium for workshop billboard
CN115879156A (en) * 2022-12-27 2023-03-31 北京明朝万达科技股份有限公司 Dynamic desensitization method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112541196B (en) 2022-10-21

Similar Documents

Publication Publication Date Title
CN112541196B (en) Dynamic data desensitization method and system
US8649552B2 (en) Data obfuscation of text data using entity detection and replacement
US8949625B2 (en) Systems for structured encryption using embedded information in data strings
CA2906475C (en) Method and apparatus for substitution scheme for anonymizing personally identifiable information
JP2017091515A (en) Computer-implemented system and method for automatically identifying attributes for anonymization
WO2023065632A1 (en) Data desensitization method, data desensitization apparatus, device, and storage medium
CN113158233B (en) Data preprocessing method and device and computer storage medium
CN108881230B (en) Secure transmission method and device for government affair big data
CN114398665A (en) Data desensitization method, device, storage medium and terminal
CN110598066A (en) Bank full-name rapid matching method based on word vector expression and cosine similarity
CN117290888B (en) Information desensitization method for big data, storage medium and server
Kulkarni et al. Personally identifiable information (pii) detection in the unstructured large text corpus using natural language processing and unsupervised learning technique
FI20195426A1 (en) Compatible anonymization of data sets of different source
CN107908732B (en) Mutually isolated multi-source big data fusion analysis method and system
CN113868707A (en) Method and apparatus for data desensitization
CN109729076B (en) Data desensitization and inverse desensitization method and device, storage medium and terminal
CN116776173A (en) Power measurement data desensitization method based on convolutional neural network
CN114861205A (en) Data classification-based privacy protection system with high safety performance
Vatsalan Scalable and approximate privacy-preserving record linkage
Abitha et al. A cryptographic approach for achieving privacy in data mining
Peng et al. Differential attribute desensitization system for personal information protection
CN117272353B (en) Data encryption storage protection system and method
Ghann et al. Preserving the Privacy of Sensitive Data Using Bit-Coded-Sensitive Algorithm (BCSA).
Mattsson Data Security: On Premise or in the Cloud.
CN112084528B (en) Customer privacy data identification and protection method based on data model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant