CN111737703A - Method for realizing data lake security based on dynamic data desensitization technology - Google Patents

Method for realizing data lake security based on dynamic data desensitization technology Download PDF

Info

Publication number
CN111737703A
CN111737703A CN201911030621.3A CN201911030621A CN111737703A CN 111737703 A CN111737703 A CN 111737703A CN 201911030621 A CN201911030621 A CN 201911030621A CN 111737703 A CN111737703 A CN 111737703A
Authority
CN
China
Prior art keywords
data
desensitization
dynamic
technology
lake
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911030621.3A
Other languages
Chinese (zh)
Inventor
吴奇锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iReadyIT Beijing Co Ltd
Original Assignee
iReadyIT Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iReadyIT Beijing Co Ltd filed Critical iReadyIT Beijing Co Ltd
Priority to CN201911030621.3A priority Critical patent/CN111737703A/en
Publication of CN111737703A publication Critical patent/CN111737703A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a method for realizing data lake safety based on a dynamic data desensitization technology, which comprises the following steps: the dynamic data desensitization technology is embedded in a data lake frame, the practical application of the dynamic data desensitization technology embedded in the data lake frame under different scenes is known based on the use principle of the dynamic data desensitization technology, the method for applying the dynamic desensitization technology to data lake safety is known based on the use principle of the dynamic data desensitization technology, desensitization processing can be performed on data, leakage of sensitive and key information can be avoided, and the phenomenon of information loss of the data lake is caused.

Description

Method for realizing data lake security based on dynamic data desensitization technology
Technical Field
The invention belongs to the technical field related to data lake safety, and particularly relates to a method for realizing data lake safety based on a dynamic data desensitization technology.
Background
Data lake architecture for information storage oriented to multiple data sources, including the internet of things, large data analysis or archiving can process or deliver data subsets to requesting users by accessing the data lake, in the data lake architecture, information security is a challenge and is often ignored by people, compared with other types of storage security, the data lake architecture is more important by definition to put all eggs in one basket, if the security of one of the repositories is destroyed, an unknown party can possibly access all data, many data are stored in a read-ready format, such as JPEG or PDF files, and if your data lake architecture is not secure enough, information loss is easy.
The existing data lake safety technology has the following problems: in data lake safety, data desensitization has obtained the acceptance and attention of industry as an important ring of data safety, and data desensitization mainly divide into static data desensitization and dynamic data desensitization at present, and both data desensitization modes are in the in-process of overall arrangement, and static desensitization is the main mostly, and the prevalence and the rate of utilization of dynamic desensitization are lower, and then have led to the dynamic desensitization technique to be in the state of lagging compared with static desensitization technique, are unfavorable for the diversified development of desensitization technique.
Disclosure of Invention
The invention aims to provide a method for realizing data lake security based on a dynamic data desensitization technology, which aims to solve the problem that in the data lake security provided in the background technology, data desensitization is regarded as an important ring of data security and is accepted and valued by the industry, the data desensitization is mainly divided into static data desensitization and dynamic data desensitization at present, and the static desensitization is mainly used and the dynamic desensitization is low in popularity and utilization rate in the layout process, so that the dynamic desensitization technology is in a lagged state compared with the static desensitization technology and is not beneficial to the diversified development of the desensitization technology.
In order to achieve the purpose, the invention provides the following technical scheme:
a method for realizing data lake security based on a dynamic data desensitization technology comprises the following steps: the method is characterized in that the use principle and the use scene of a dynamic data desensitization technology and a static data desensitization technology are known based on the data desensitization technology, the similarities and the differences in the use process are compared, the dynamic data desensitization technology is embedded in a data lake frame, the practical application of the dynamic data desensitization technology embedded in the data lake frame in different scenes is known based on the use principle of the dynamic data desensitization technology, and the method for applying the dynamic desensitization technology to the data lake safety is known based on the use principle of the dynamic data desensitization technology.
Preferably, the static data desensitization is generally used in a non-production environment, sensitive data is extracted from the production environment and desensitized and then is given to the non-production environment for use, the static data desensitization is commonly used in databases of non-production systems such as training, analysis, testing and development, dynamic data desensitization is commonly used in the production environment, desensitization is immediately carried out when sensitive data is accessed, the static data desensitization is generally used for solving the problem that desensitization scenes of different levels are carried out when the same sensitive data needs to be read according to different situations in the production environment, two data desensitization technologies are compared, and the difference between the two data processing methods is marked.
Preferably, the dynamic data desensitization is a process of uniquely shielding, encrypting, hiding, auditing or blocking access ways for data at a user layer, when an application program, a maintenance and development tool requests to desensitize through dynamic data, a requested SQL statement is screened in real time, sensitive data are shielded according to user roles, authorities and other desensitization rules, transverse or longitudinal security levels can be applied, and meanwhile, the number of rows returned by responding to an inquiry is limited, and the dynamic data desensitization ensures that business personnel, operation and maintenance personnel and outsourcing development personnel access the sensitive data strictly according to the work requirements and security levels of the business personnel, the operation and maintenance personnel and the outsourcing development personnel in such a way.
Preferably, in service desensitization, the dynamic desensitization system is configured to control data permissions when a common user of a service system accesses an application system, and in normal situations, after performing identity authentication according to a user identity during development of the service system, different users perform access to restricted data, and in data exchange desensitization, data access between the service system and the service system needs desensitization processing on exchanged data when privacy protection is satisfied, but instead of exporting data desensitization and then transferring the data desensitization as in conventional static desensitization, the data access is directly invoked through an interface between the service systems, which belongs to data exchange without falling to the ground between the application systems, and desensitization processing is performed on the exchanged data.
Preferably, the replacing method replaces real data with fictitious data, such as creating a larger dictionary data table, generating random factors for each real value record, replacing the original data content with the dictionary table content, replacing part of the true value or true value with special symbols by the invalidation method, re-randomly distributing the values of the sensitive data column by the disorder method, obfuscating the relation between the original value and other fields without affecting the statistical properties of the original data, calculating their mean values for the numerical data by the mean value method, then randomly distributing desensitized values around the mean value, thereby keeping the sum of the data constant, finding the mapping that may infer another sensitive field from some fields by the inverse correlation method, desensitizing the fields, encrypting the original data by the symmetric encryption method using the encryption key and the algorithm, the cipher text format is consistent with the original data in logic rules, the original data can be recovered through a decryption key, only part of response data is changed according to predefined rules through a dynamic environment control method, and if the business data is not accessed under the appointed condition, the data content is controlled, and the specific field content is shielded.
Compared with the prior art, the invention provides a method for realizing data lake safety based on a dynamic data desensitization technology, which has the following beneficial effects:
the dynamic data desensitization technology can provide layer-by-layer security guarantee for the security of the data lake, desensitize the data, and avoid the leakage of sensitive and key information to cause the information loss phenomenon of the data lake.
Detailed Description
All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a technical scheme of a method for realizing data lake security based on a dynamic data desensitization technology, which comprises the following steps:
a method for realizing data lake security based on a dynamic data desensitization technology comprises the following steps: the method is characterized in that the use principle and the use scene of a dynamic data desensitization technology and a static data desensitization technology are known based on a data desensitization technology, similarities and differences in the use process are compared, the static data desensitization is generally used in a non-production environment, sensitive data are extracted from the production environment and desensitized and then are used in the non-production environment, the static data desensitization is commonly used in a database of a non-production system for training, analysis, testing, development and the like, the dynamic data desensitization is commonly used in the production environment, the desensitization is immediately carried out when the sensitive data are accessed, the problem that the situation that different levels of desensitization are carried out when the same sensitive data are required to be read in the production environment according to different conditions is generally solved, and the differences between the two data.
A method for realizing data lake security based on dynamic data desensitization technology includes embedding dynamic data desensitization technology in data lake frame, carrying out unique shielding, encryption, hiding, auditing or blocking access path on data at user layer for dynamic data desensitization, screening requested SQL statements in real time when application program, maintenance and development tool requests to pass through dynamic data desensitization, shielding sensitive data according to user role, authority and other desensitization rules, applying horizontal or longitudinal security level, and limiting number of lines returned by response to an inquiry.
A method for realizing data lake security based on dynamic data desensitization technology includes knowing actual application of dynamic data desensitization technology embedded in data lake frame under different scenes based on application principle of dynamic data desensitization technology, carrying out identity verification according to user identity identification when service system is developed under normal condition, then different users carry out access limitation on data, carrying out desensitization treatment on exchanged data when privacy protection is satisfied by data access between service system and service system in data exchange desensitization, and directly transferring data through interface between service systems instead of traditional static desensitization, thus belonging to data exchange without landing between application systems, data for this exchange needs to be desensitized.
A method for realizing data lake security based on dynamic data desensitization technology includes such steps as using virtual data to replace real data, creating a large dictionary data table, generating random factors for each real value record, replacing the contents of dictionary table with original data, invalidating the real value or part of the real value with special symbol, randomly distributing the values of sensitive data array, mixing original values with other fields, averaging the values, calculating their mean values, randomly distributing the desensitized values around the mean values, and keeping the total data constant, the mapping of another sensitive field which is probably deduced from some fields is searched through a back-off method, the fields are desensitized, the original data are encrypted through an encryption key and an algorithm through a symmetric encryption method, the format of a ciphertext is consistent with that of the original data in a logic rule, the original data can be recovered through a decryption key, only part of response data is changed according to a predefined rule through a dynamic environment control method, and if the service data is not accessed under an appointed condition, the data content is controlled, and the content of the specific fields is shielded.
The working principle and the using process of the invention are as follows:
1. the use method of the invention is based on the use principle and the use scene of a data desensitization technology, a dynamic data desensitization technology and a static data desensitization technology are known based on the data desensitization technology, the dissimilarity part in the use process is compared, the static data desensitization is generally used in a non-production environment, sensitive data are extracted from the production environment and desensitized and then are given to the non-production environment for use, the static data desensitization technology is commonly used in a database of a non-production system for training, analysis, testing, development and the like, the dynamic data desensitization is commonly used in the production environment, the desensitization is carried out immediately after the sensitive data are accessed, the situation that the desensitization of different levels is carried out when the production environment needs to read the same sensitive data according to different conditions is generally solved, the dissimilarity between the two data desensitization technologies is marked by comparing.
2. The dynamic data desensitization technology is embedded in a data lake framework, the dynamic data desensitization is a process of uniquely shielding, encrypting, hiding, auditing or blocking access ways for data at a user layer, when an application program, a maintenance and development tool requests to desensitize through dynamic data, SQL statements of the request are screened in real time, sensitive data are shielded according to user roles, authorities and other desensitization rules, horizontal or longitudinal security levels can be applied, meanwhile, the number of lines returned by responding to one query is limited, and the dynamic data desensitization ensures that business personnel, operation and maintenance personnel and outsourcing development personnel strictly access the sensitive data according to the work requirements and the security levels of the business personnel, the operation and maintenance personnel and the outsourcing development personnel in such a mode.
3. Understanding the practical application of embedding dynamic data desensitization technology in a data lake framework under different scenes, in service desensitization, a dynamic desensitization system firstly solves the problem that when a common user of a service system accesses the application system, the control on data authority is performed, under normal conditions, after the service system is subjected to identity authentication according to a user identity during development, different users perform data access limitation, in data exchange desensitization, data access between the service system and the service system needs desensitization processing on exchanged data when privacy protection is met, the data desensitization is not required to be exported and then handed over like the traditional static desensitization, but is directly called through an interface between the service systems, the data exchange without falling to the ground between the application systems belongs to data exchange, and desensitization processing is required to the exchanged data.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (5)

1. A method for realizing data lake security based on a dynamic data desensitization technology is characterized by comprising the following steps:
step 1: based on the data desensitization technology, the use principle and the use scene of the dynamic data desensitization technology and the static data desensitization technology are known, and similarities and differences in the use process are compared;
step 2: embedding the dynamic data desensitization technique in the data lake framework:
and step 3: based on the use principle of the dynamic data desensitization technology, the practical application of embedding the dynamic data desensitization technology in a data lake frame under different scenes is known;
and 4, step 4: based on the use principle of the dynamic data desensitization technology, the method for applying the dynamic desensitization technology to the data lake safety is known.
2. The method for realizing data lake security based on the dynamic data desensitization technology according to claim 1, wherein the method for realizing the use principle and the use scenario of the dynamic data desensitization technology and the static data desensitization technology based on the data desensitization technology comprises the following steps:
step 11: the static data desensitization is generally used in a non-production environment, sensitive data are extracted from the production environment and desensitized and then are used in the non-production environment, and the method is commonly used for databases of non-production systems such as training, analysis, testing, development and the like;
step 12: the dynamic data desensitization is usually used in a production environment, and desensitization is carried out in time when sensitive data are accessed, so that a situation that desensitization of different levels is carried out when the same sensitive data are required to be read according to different conditions in the production environment is generally solved;
step 13: and comparing the two data desensitization technologies, and marking the difference between the two data processing methods.
3. The method for implementing data lake security based on dynamic data desensitization technology according to claim 1, wherein the method for implementing dynamic data desensitization technology embedded in a data lake frame comprises the following steps:
step 21: dynamic data desensitization is a process of uniquely shielding, encrypting, hiding, auditing or blocking access ways for data at a user layer, when an application program, a maintenance and development tool requests to desensitize through the dynamic data, a requested SQL statement is screened in real time, the sensitive data is shielded according to user roles, authorities and other desensitization rules, a transverse or longitudinal security level can be applied, and simultaneously, the number of rows returned by responding to an inquiry is limited, and the dynamic data desensitization ensures that business personnel, operation and maintenance personnel and outsourcing development personnel strictly access the sensitive data according to the work requirements and the security level of the business personnel, the operation and maintenance personnel and the outsourcing development personnel in such a mode.
4. The method for realizing the safety of the data lake based on the dynamic data desensitization technology according to claim 1, wherein the method for realizing the practical application of embedding the dynamic data desensitization technology in the data lake frame under different scenes based on the use principle of the dynamic data desensitization technology comprises the following steps:
step 31: in service desensitization, the problem to be solved by the dynamic desensitization system is that when a common user of the service system accesses an application system, the data authority is controlled, and under normal conditions, after identity verification is carried out according to a user identity during service system development, different users limit data access;
step 32: in data exchange desensitization, data access between a service system and the service system needs to desensitize exchanged data when privacy protection is met, but the data access is directly called through an interface between the service systems instead of being subjected to data desensitization and then being handed over like traditional static desensitization, so that data exchange between application systems is not dropped, and desensitization treatment is needed for the exchanged data.
5. The method for realizing the safety of the data lake based on the dynamic data desensitization technology according to claim 1, wherein the method for understanding the application of the dynamic desensitization technology to the safety of the data lake based on the use principle of the dynamic data desensitization technology comprises the following steps:
step 41: replacing real data with fictional data by a replacement method, for example, establishing a larger dictionary data table, generating a random factor for each real value record, and replacing the dictionary table content for the original data content;
step 42: replacing a truth value or a part of the truth value with a special symbol by an invalidation method;
step 43: the values of the sensitive data columns are randomly distributed again by a disorder method, and the statistical characteristics of the original data are not influenced by the method of confusing the relation between the original values and other fields;
step 44: by an average value taking method, aiming at numerical data, firstly calculating the average value of the numerical data, and then randomly distributing desensitized values around the average value so as to keep the sum of the data unchanged;
step 45: by means of the reverse correlation method, the mapping of another sensitive field which is probably deduced from some fields is searched, and the fields are desensitized;
step 46: encrypting the original data by an encryption key and an algorithm through a symmetric encryption method, wherein the format of a ciphertext is consistent with that of the original data in terms of logic rules, and the original data can be recovered through a decryption key;
step 47: by a dynamic environment control method, only part of response data is changed according to a predefined rule, and if the service data is not accessed under the appointed condition, the data content is controlled, and the content of a specific field is shielded.
CN201911030621.3A 2019-10-28 2019-10-28 Method for realizing data lake security based on dynamic data desensitization technology Pending CN111737703A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911030621.3A CN111737703A (en) 2019-10-28 2019-10-28 Method for realizing data lake security based on dynamic data desensitization technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911030621.3A CN111737703A (en) 2019-10-28 2019-10-28 Method for realizing data lake security based on dynamic data desensitization technology

Publications (1)

Publication Number Publication Date
CN111737703A true CN111737703A (en) 2020-10-02

Family

ID=72646057

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911030621.3A Pending CN111737703A (en) 2019-10-28 2019-10-28 Method for realizing data lake security based on dynamic data desensitization technology

Country Status (1)

Country Link
CN (1) CN111737703A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270016A (en) * 2020-10-27 2021-01-26 上海淇馥信息技术有限公司 Service data request processing method and device and electronic equipment
CN112632597A (en) * 2020-12-08 2021-04-09 国家计算机网络与信息安全管理中心 Data desensitization method and device readable storage medium
CN113268771A (en) * 2021-05-26 2021-08-17 深圳泰莱生物科技有限公司 Human body clinical data desensitization method
CN113282913A (en) * 2021-07-23 2021-08-20 天聚地合(苏州)数据股份有限公司 Password replacement method and device
CN114626033A (en) * 2022-03-07 2022-06-14 福建中信网安信息科技有限公司 Implementation method and terminal of data security room

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108595979A (en) * 2018-04-13 2018-09-28 中国民航信息网络股份有限公司 Dynamic desensitization method and device
CN110378148A (en) * 2019-07-25 2019-10-25 哈尔滨工业大学 A kind of multiple domain data-privacy guard method of facing cloud platform

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108595979A (en) * 2018-04-13 2018-09-28 中国民航信息网络股份有限公司 Dynamic desensitization method and device
CN110378148A (en) * 2019-07-25 2019-10-25 哈尔滨工业大学 A kind of multiple domain data-privacy guard method of facing cloud platform

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
吕军等: "基于多业务场景的大数据脱敏技术研究及其在电力用户隐私信息保护中的应用" *
陈天莹;陈剑锋;: "大数据环境下的智能数据脱敏系统" *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270016A (en) * 2020-10-27 2021-01-26 上海淇馥信息技术有限公司 Service data request processing method and device and electronic equipment
CN112270016B (en) * 2020-10-27 2022-10-11 上海淇馥信息技术有限公司 Service data request processing method and device and electronic equipment
CN112632597A (en) * 2020-12-08 2021-04-09 国家计算机网络与信息安全管理中心 Data desensitization method and device readable storage medium
CN113268771A (en) * 2021-05-26 2021-08-17 深圳泰莱生物科技有限公司 Human body clinical data desensitization method
CN113282913A (en) * 2021-07-23 2021-08-20 天聚地合(苏州)数据股份有限公司 Password replacement method and device
CN114626033A (en) * 2022-03-07 2022-06-14 福建中信网安信息科技有限公司 Implementation method and terminal of data security room

Similar Documents

Publication Publication Date Title
CN111737703A (en) Method for realizing data lake security based on dynamic data desensitization technology
US9881164B1 (en) Securing data
KR101371608B1 (en) Database Management System and Encrypting Method thereof
US20120324225A1 (en) Certificate-based mutual authentication for data security
CN106022155A (en) Method and server for security management in database
Tao et al. Fine-grained big data security method based on zero trust model
Wu et al. How to protect reader lending privacy under a cloud environment: a technical method
AU2012266675B2 (en) Access control to data stored in a cloud
CN109829333B (en) OpenID-based key information protection method and system
Mohan et al. An authentication technique for accessing de-duplicated data from private cloud using one time password
CN116049884A (en) Data desensitization method, system and medium based on role access control
CN109214164A (en) Computer communication security login method Internet-based and system
CN117235796B (en) Electronic commerce data processing method
Grachev et al. Data security mechanisms implemented in the database with universal model
CN106356066A (en) Speech recognition system based on cloud computing
AU2011254219A1 (en) System and method for controlling and monitoring access to data processing applications
Suneetha et al. A novel framework using apache spark for privacy preservation of healthcare big data
CN106250453A (en) The cipher text retrieval method of numeric type data based on cloud storage and device
CN106254510A (en) The Internet financial resources integrates shared system
Yesin et al. Some approach to data masking as means to counteract the inference threat
Odabi et al. Data security in health information systems by applying software techniques
CA3188334A1 (en) A database server system
Wang et al. Research on Information Security of Network Accounting Based on the Combination of Apriori and AOI Algorithms
CN110084051A (en) A kind of data ciphering method and system
Muhasin et al. Managing sensitive data in cloud computing for effective information systems’ decisions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201002