CN111199048A - Big data grading desensitization method and system based on container with life cycle - Google Patents

Big data grading desensitization method and system based on container with life cycle Download PDF

Info

Publication number
CN111199048A
CN111199048A CN202010000740.0A CN202010000740A CN111199048A CN 111199048 A CN111199048 A CN 111199048A CN 202010000740 A CN202010000740 A CN 202010000740A CN 111199048 A CN111199048 A CN 111199048A
Authority
CN
China
Prior art keywords
data
desensitization
container
sensitive data
sensitive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010000740.0A
Other languages
Chinese (zh)
Other versions
CN111199048B (en
Inventor
顾津
潘竞旭
任钦正
孙少平
鲁龙
宋颖
陈晓敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aisino Corp
Original Assignee
Aisino Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aisino Corp filed Critical Aisino Corp
Priority to CN202010000740.0A priority Critical patent/CN111199048B/en
Publication of CN111199048A publication Critical patent/CN111199048A/en
Application granted granted Critical
Publication of CN111199048B publication Critical patent/CN111199048B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2113Multi-level security, e.g. mandatory access control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2137Time limited access, e.g. to a computer or data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a big data grading desensitization method and a big data grading desensitization system based on a container with a life cycle. The method and system achieve lifecycle management for data desensitization by staging sensitive data and by creating containers with lifecycles in which different staged desensitization models are established for different levels of sensitive data. According to the method and the system, through a container technology with a life cycle, system resource consumption and data transmission service operation and maintenance working cost are reduced, data processing and transmission efficiency is improved, different encryption algorithms are used for desensitization of sensitive data of different levels and different levels by establishing a hierarchical desensitization model, the risks of stealing and cracking of the sensitive data are greatly reduced on the premise of not changing the characteristics of original data, the defect of a large data platform in the aspect of data safety is overcome, the safety of the large data platform is improved, and the risk of information leakage of the large data is effectively reduced.

Description

Big data grading desensitization method and system based on container with life cycle
Technical Field
The invention relates to the technical field of data encryption, in particular to a big data grading desensitization method and system based on a container with a life cycle.
Background
With the rapid development of big data technology, a large amount of sensitive information is accumulated in the information system of an enterprise. The normal production and operation of an enterprise are extremely dependent on the data security guarantee of an information system, once the data protection in the information system is improper, business secrets such as business information, important personnel information, customer information, core product technical information and the like of the enterprise are leaked or stolen, and important risks and hidden dangers in the production and operation aspect of the enterprise are caused. Particularly, a large amount of sensitive data information related to enterprise visiting and leaving client information, supply chain transaction detail information and the like in an enterprise information system becomes a main target of a cyber crime group pursuing political or economic benefits and individual attention and attack.
Sensitive data has risks of being revealed and attacked in all links of the life cycle of the sensitive data, namely links of data generation, storage, application, exchange and the like. Therefore, enhancing the protection of data in the enterprise information system is a necessary premise and an important way for effectively maintaining the own rights and interests of the enterprise and ensuring the value preservation and increment of the enterprise.
On one hand, the traditional desensitization technical method mainly adopts static desensitization, the design flow is fixed, the tool capability is limited, the specificity is strong, the configuration rule is complex, the maintenance is difficult, the algorithm of the existing computer hardware and the decoding software is more and more advanced, and the traditional desensitization algorithm can be decoded easily. Once the desensitization algorithm is cracked, real data can be obtained through batch reduction processing, and sensitive data are seriously leaked.
On the other hand, as informatization is continuously deepened, the data volume of a service system is larger and larger, the data generation speed is higher and faster, and the data volume within a few days can reach the total information capacity of the previous 10 years. The data volume of the sensitive information is also rapidly increased to reach TB magnitude and PB magnitude, while the traditional desensitization technical method is mostly desensitization technology and products aiming at a relational database, and the desensitization problem of the sensitive data is difficult to effectively process and solve when the sensitive data with large capacity is faced.
Disclosure of Invention
Aiming at the technical problems that a static desensitization algorithm is easy to crack, sensitive data are large in magnitude and sensitive data desensitization is difficult to process and solve effectively in the prior art, the invention provides a big data grading desensitization method and system based on a container with a life cycle, so as to solve the problem that the existing big data desensitization method is insufficient in safety and reliability.
In a first aspect, the present invention provides a method of hierarchical desensitization of big data based on containers having a lifecycle, the method comprising:
carrying out data cleaning on pre-generated pre-desensitization data to generate sensitive data, identifying the sensitive data to determine a first type of the sensitive data, and grading the sensitive data to determine a sensitivity grade;
classifying the first type of sensitive data of each sensitivity level according to different data use objects and data content values, and determining a second type of the first type of sensitive data of each sensitivity level;
creating a container with a life cycle, allocating a network address to the container based on a virtualized network layer, and determining and storing a mapping relation between a port of the container and the network address according to port information of the container and the network address allocated to the container;
storing the sensitive data with the determined level to the container with the life cycle based on the mapping relation between the port of the container and the network address;
respectively configuring desensitization algorithms according to a second type of sensitive data stored in a container with a life cycle and then establishing respective corresponding data desensitization models;
desensitizing the sensitive data corresponding to the second type according to the established data desensitization model, and storing the desensitized data in a container with a life cycle;
and responding to a data acquisition request sent by a designated object, and transmitting the desensitized data stored in the container to the designated object, wherein when the storage time of the data in the container reaches a preset time and/or after the data stored in the container is transmitted to the designated object, the life cycle of the container is ended, the container is destroyed, and the data stored in the container is deleted.
Further, the method may also previously include generating pre-desensitization data, the generating pre-desensitization data comprising:
extracting source data in a distributed and heterogeneous data source in a business system by a big data extraction tool, wherein the source data comprises structured data and unstructured data;
and carrying out cleaning, conversion, integration and structuring operations on the source data to generate pre-desensitization data, and transmitting the pre-desensitization data to a desensitization database in a big data storage system.
Further, performing data cleaning on pre-generated pre-desensitization data to generate sensitive data, identifying the sensitive data to determine a first type of the sensitive data, and ranking the sensitive data, wherein determining a sensitivity ranking includes:
carrying out data cleaning on the pre-generated pre-desensitization data, and generating sensitive data after eliminating repeated values, missing values and abnormal values in the pre-desensitization data;
dividing the sensitive data according to different data attributes, and determining a first type of the sensitive data;
and evaluating the security value of the sensitive data according to the confidentiality, integrity and availability of the sensitive data, and determining the sensitivity level of the sensitive data.
Further, the evaluating the security value of the sensitive data according to confidentiality, integrity and availability thereof, and the determining the sensitivity level thereof comprises:
grading the sensitive data according to a preset score interval of each safety value scoring item, wherein the safety value scoring item comprises whether the sensitive data can directly identify a specific enterprise object and is closely related to the actual operation state of the enterprise object, and through whether other related information can be obtained through the data information, the data information can possibly cause potential economic loss and bring potential information threat to the enterprise;
summing the scores of each safety value scoring item of the sensitive data to determine the safety value score of the sensitive data;
and determining the sensitivity level of the sensitive data according to the corresponding relation between the sensitivity level and the safety value score.
Further, storing the sensitive data with the determined level to the container with the life cycle based on the mapping relationship between the port and the network address of the container comprises:
analyzing a protocol field of a received sensitive data message, and determining a destination network address of the sensitive data message;
determining a container port corresponding to a destination network address of the sensitive data message based on a mapping relation between the network address and the container port;
and distributing the sensitive data to a corresponding storage position of a container with a life cycle according to the container port corresponding to the destination network address of the sensitive data message.
Further, the establishing of the data desensitization models corresponding to the desensitization algorithms after respectively configuring the desensitization algorithms according to the second type of sensitive data stored in the container with the life cycle comprises:
configuring a desensitization algorithm according to a second type of sensitive data stored in containers having a lifecycle, respectively, wherein the desensitization algorithm is irreversible and is automated, repeatable;
and establishing a data desensitization model based on a desensitization algorithm configured by each second type of sensitive data, wherein the data desensitization model satisfies that the desensitized data has the characteristics of the original data, the integrity of the data is kept as much as possible, all non-sensitive fields which have relevance and can generate sensitive data are desensitized, and the desensitization grade of the desensitized data can be marked.
In a second aspect, the present invention provides a big data grading desensitization system based on containers having a lifecycle, the system comprising:
the device comprises a sensitivity grade unit, a data processing unit and a data processing unit, wherein the sensitivity grade unit is used for carrying out data cleaning on pre-generated pre-desensitization data to generate sensitive data, identifying the sensitive data to determine a first type of the sensitive data, grading the sensitive data and determining a sensitivity grade;
the data classification unit is used for classifying the sensitive data of the first type of each sensitivity level according to different data use objects and data content values and determining a second type of the sensitive data of the first type of each sensitivity level;
the system comprises a container establishing unit, a service establishing unit and a service establishing unit, wherein the container establishing unit is used for establishing a container with a life cycle, allocating a network address to the container based on a virtualized network layer, and determining and storing a mapping relation between a port of the container and the network address according to port information of the container and the network address allocated to the container;
the data storage unit is used for storing the sensitive data with the determined level to the container with the life cycle based on the mapping relation between the port of the container and the network address;
the desensitization model unit is used for respectively configuring desensitization algorithms according to the second type of the sensitive data stored in the container with the life cycle and then establishing data desensitization models corresponding to the desensitization algorithms;
the data desensitization unit is used for desensitizing the sensitive data corresponding to the second type according to the established data desensitization model and storing the desensitized data in a container with a life cycle;
and the data transmission unit is used for responding to a data acquisition request sent by a specified object and transmitting the desensitized data stored in the container to the specified object, wherein when the storage time length of the data in the container reaches a preset time length and/or after the data stored in the container is transmitted to the specified object, the life cycle of the container is ended, the container is destroyed, and the data stored in the container is deleted.
Further, the system further comprises a data pre-processing unit for generating pre-desensitization data, wherein the data pre-processing unit comprises:
the data extraction unit is used for extracting source data in heterogeneous data sources distributed in a business system through a big data extraction tool, wherein the source data comprises structured data and unstructured data;
and the data processing unit is used for generating pre-desensitization data after the source data are subjected to cleaning, conversion, integration and structuring operation, and transmitting the pre-desensitization data to a desensitization database in the big data storage system.
Further, the sensitivity level unit includes:
the sensitive data unit is used for carrying out data cleaning on the pre-generated pre-desensitization data and generating sensitive data after eliminating repeated values, missing values and abnormal values in the pre-desensitization data;
the data dividing unit is used for dividing the sensitive data according to different data attributes and determining a first type of the sensitive data;
and the grade determining unit is used for evaluating the security value of the sensitive data according to the confidentiality, the integrity and the availability of the sensitive data and determining the sensitivity grade of the sensitive data.
Further, the level determination unit evaluates the security value of the sensitive data according to confidentiality, integrity and availability thereof, and determining the sensitivity level thereof comprises:
grading the sensitive data according to a preset score interval of each safety value scoring item, wherein the safety value scoring item comprises whether the sensitive data can directly identify a specific enterprise object and is closely related to the actual operation state of the enterprise object, and through whether other related information can be obtained through the data information, the data information can possibly cause potential economic loss and bring potential information threat to the enterprise;
summing the scores of each safety value scoring item of the sensitive data to determine the safety value score of the sensitive data;
and determining the sensitivity level of the sensitive data according to the corresponding relation between the sensitivity level and the safety value score.
Further, the data storage unit includes:
the data analysis unit is used for analyzing the protocol field of the received sensitive data message and determining the destination network address of the sensitive data message;
a port determining unit, configured to determine, based on a mapping relationship between the network address and the container port, a container port corresponding to a destination network address of the sensitive data packet;
and the data distribution unit is used for distributing the sensitive data to the corresponding storage position of the container with the life cycle according to the container port corresponding to the destination network address of the sensitive data message.
Further, the desensitization model unit comprises:
an algorithm configuration unit for configuring a desensitization algorithm, respectively, according to a second type of sensitive data stored in containers having a lifecycle, wherein the desensitization algorithm is irreversible and is automated, repeatable;
and the model establishing unit is used for establishing a data desensitization model based on a desensitization algorithm configured by the second type of sensitive data, wherein the data desensitization model meets the requirement that the desensitized data has the characteristics of the original data, the integrity of the data is reserved as much as possible, all non-sensitive fields which have relevance and can generate sensitive data are desensitized, and the desensitization data can be marked with sensitivity levels.
In summary, the present invention provides a method and system for hierarchical desensitization of large data based on containers having a lifecycle. The method and system achieve lifecycle management for data desensitization by staging sensitive data and by creating containers with lifecycles in which different staged desensitization models are established for different levels of sensitive data. According to the method and the system, through a container technology with a life cycle, system resource consumption and data transmission service operation and maintenance working cost are reduced, data processing and transmission efficiency is improved, different encryption algorithms are used for desensitization of sensitive data of different levels and different levels by establishing a hierarchical desensitization model, the risks of stealing and cracking of the sensitive data are greatly reduced on the premise of not changing the characteristics of original data, the defect of a large data platform in the aspect of data safety is overcome, the safety of the large data platform is improved, and the risk of information leakage of the large data is effectively reduced.
Drawings
A more complete understanding of exemplary embodiments of the present invention may be had by reference to the following drawings in which:
FIG. 1 is a schematic flow diagram of a big data grading desensitization method based on containers with life cycles according to a preferred embodiment of the present invention;
fig. 2 is a schematic structural diagram of a big data grading desensitization system based on containers with life cycles according to a preferred embodiment of the present invention.
Detailed Description
The exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, however, the present invention may be embodied in many different forms and is not limited to the embodiments described herein, which are provided for complete and complete disclosure of the present invention and to fully convey the scope of the present invention to those skilled in the art. The terminology used in the exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the invention. In the drawings, the same units/elements are denoted by the same reference numerals.
Unless otherwise defined, terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Further, it will be understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense.
FIG. 1 is a schematic flow diagram of a big data grading desensitization method based on a container with a life cycle according to a preferred embodiment of the invention. As shown in FIG. 1, the big data graded desensitization method based on containers with a life cycle according to the preferred embodiment begins at step 101.
At step 101, pre-desensitization data is generated.
In the preferred embodiment, the source data in the heterogeneous data sources distributed in the business system, which includes structured data and unstructured data, is extracted by the big data extraction tool in the preferred embodiment. Unlike structured data in a relational database, the large data platform of the preferred embodiment also includes a large amount of unstructured data in the processed data. For unstructured data, a common way is to convert the data into structured data by means of indexing/tagging, so that the data has a definite meaning and a definite association relationship between the data. Therefore, the source data is extracted to a desensitization database in a big data storage system after cleaning, conversion, integration and structuring operations. Query and usage data can only be obtained from large data desensitization databases.
In step 102, data cleaning is performed on the pre-desensitization data to generate sensitive data, the sensitive data is identified to determine a first type of the sensitive data, and the sensitive data is graded to determine a sensitivity grade.
After the pre-desensitization data is cleaned to generate sensitive data, the first step of performing hierarchical management on the sensitive data is to identify the sensitive data to clarify the type of the sensitive data, such as enterprise core personnel information, bank account information, customer information, and the like.
The hierarchical management of the third data also evaluates the security value of the third data according to the confidentiality, integrity and availability of the third data, and determines the sensitivity level of the third data. In the process of grading sensitive data, the sensitive data should be fully communicated with a data provider and a data management department, so that the grading reasonability of the information is ensured, and the phenomena of overuse, excessive protection, resource waste and data information loss are prevented. Generally, the sensitive data classification adopts four sensitive levels of core quotient secret, common quotient secret, internal sensitivity and non-sensitivity.
In step 103, the sensitive data of the first type of each sensitivity level is classified according to different data use objects and data content values, and a second type of the sensitive data of the first type of each sensitivity level is determined.
The classification of the sensitive data should be based on the classification of the sensitive data. And after grading is finished, classifying the sensitive data of each grade by adopting a machine learning model and combining practical application experience. For example, when classifying sensitive data of enterprise information, the factors of two aspects of data use object property and information content are mainly considered:
a) aiming at the data use objects divided according to types, a limited differentiated data service strategy is adopted, the business requirements of the use objects of each level are met on the premise of ensuring the legal compliance of the data service and guaranteeing the data safety, the value conversion of data assets is realized, the improvement of the comprehensive service capability of the data of a large data platform is facilitated, the business waste is optimized, and higher added value is obtained. Therefore, classification by using object properties is a very important and fundamental way of partitioning.
b) Due to the difference of data use objects, the data information content value is different inevitably, and the influence caused by leakage is different. Different management measures are taken for different information data of the use object, so that the direct use of development and test on key data information is reduced or avoided, and the potential safety hazard of data information leakage can be effectively reduced. Therefore, dividing the client information by the information content is an important prerequisite for classifying and distinguishing the protection of the data usage object information.
In step 104, a container with a life cycle is created, a network address is allocated to the container based on the virtualized network layer, and a mapping relation between a port of the container and the network address is determined and stored according to the port information of the container and the network address allocated to the container.
At step 105, the sensitive data with the determined level is stored in the container with the life cycle based on the mapping relation between the port and the network address of the container.
In step 106, desensitization algorithms are configured according to the second type of sensitive data stored in the containers with the life cycles, and then corresponding data desensitization models are established.
In the preferred embodiment, the enterprise information is divided into enterprise basic attribute data and enterprise transaction numerical data according to different use objects. The industry basic attribute data is the basic information of enterprise name, various codes (taxpayer identification number, unified social credit code, invoice code/number, commodity code and the like) and enterprise address, telephone, account opening bank and the like, the desensitization of the data adopts a traditional desensitization method, and corresponding encryption algorithm can be selected from desensitization strategy configuration options according to the data attributes of the desensitization method for desensitization, for example: the commercial code 1040201240000000000 can be desensitized by selecting an asymmetric encryption algorithm (MD5) and converted into ef5c11c555b5e09fe75bf466b57338bcee11c40 b; the purchaser QQ number 3279248039@ qq.com may choose a masking algorithm desensitization to convert to × @qq.com; the seller address "Beijing Tongzhou district Luzhou town New City Industrial district No. 9" may be converted into "Beijing Tongzhou district" by desensitizing the interception algorithm.
The enterprise transaction numerical data is transaction numerical data such as amount, tax amount, price and tax sum, commodity unit price and tax rate, and the data desensitization adopts a homomorphic encryption algorithm, namely, a specific algebraic operation is carried out on the data to be desensitized to obtain a still encrypted result, and the result obtained by decrypting the data is the same as the result obtained by carrying out the same operation on a plaintext. The homomorphic encryption algorithm comprises the following operation steps:
(1) the original data is scaled to fall within a particular interval.
(2) And (3) disturbing the scaled original data by adding noise by using a scrambling technology to realize distortion and change of the original data. The noise term calculation adopts the steps of carrying out dimensionless normalization processing on the original data and then carrying out weighted synthesis on the original data and the original data.
In practical application, the noise item calculation of the encryption algorithm model adopts a maximum and minimum value normalization method and an inverse cotangent conversion normalization method according to different application scenes. If the data acquisition and updating period is a period of observation time (year/season/month), adopting a maximum and minimum value normalization method; if the data is collected in real time and updated in real time, an inverse cotangent conversion normalization method is adopted. The specific formula is as follows:
the maximum and minimum value normalization method comprises the following steps:
Figure BDA0002353286870000101
wherein the content of the first and second substances,
Figure BDA0002353286870000102
and the conversion value is the conversion value encrypted by the maximum and minimum value normalization method, omega is a normalized interference term weighting coefficient, sigma is a random interference term, and X is the full-scale sample value of the observation period.
The inverse cotangent conversion normalization method comprises the following steps:
Figure BDA0002353286870000103
wherein the content of the first and second substances,
Figure BDA0002353286870000104
the method is characterized in that the method is a conversion value encrypted by an inverse cotangent conversion normalization method, omega is a normalization interference item weighting coefficient, sigma is a random interference item, and tau is an original value scaling coefficient.
The advantages of the encryption algorithm are as follows: the unit dimension limitation of the original data is eliminated, and the unit dimension limitation is converted into a dimensionless pure numerical value, so that indexes of different units or orders can be compared and weighted conveniently. The data after noise disturbance still retains the distribution characteristics of the original data.
At step 107, desensitization is performed on the sensitive data corresponding to the second type according to the established data desensitization model, and the desensitized data is stored in containers having a life cycle.
In the preferred embodiment, the data desensitization model can automatically identify the desensitized data grade in the container, and mark the sensitive grade identification of the sensitive data according to different grades. Sensitive data of three levels of core quotient secret, common quotient secret and internal sensitivity after desensitization are stored in a container.
After the desensitized data is stored in the container, a mirror image container of the container can be created to back up the data in the container, so that the problem of low processing efficiency caused by the need of reacquiring the data in the system for processing when the data in the container is processed in error is avoided.
And in step 108, responding to a data acquisition request sent by a designated object, transmitting the desensitized data stored in the container to the designated object, wherein when the storage time of the data in the container reaches a preset time and/or after the data stored in the container is transmitted to the designated object, the life cycle of the container is ended, destroying the container and deleting the data stored in the container.
Preferably, the generating pre-desensitization data comprises:
extracting structured and unstructured data in distributed and heterogeneous data sources in a business system through a big data extraction tool;
and carrying out cleaning, conversion, integration and structuring operations on the source data to generate pre-desensitization data, and transmitting the pre-desensitization data to a desensitization database in a big data storage system.
Preferably, the method comprises the steps of performing data cleaning on pre-generated pre-desensitization data to generate sensitive data, identifying the sensitive data to determine a first type of the sensitive data, and grading the sensitive data, wherein determining the sensitivity grade comprises:
carrying out data cleaning on the pre-generated pre-desensitization data, and generating sensitive data after eliminating repeated values, missing values and abnormal values in the pre-desensitization data;
dividing the sensitive data according to different data attributes, and determining a first type of the sensitive data;
and evaluating the security value of the sensitive data according to the confidentiality, integrity and availability of the sensitive data, and determining the sensitivity level of the sensitive data.
Preferably, the evaluating the security value of the sensitive data according to confidentiality, integrity and availability thereof, and the determining the sensitivity level thereof comprises:
grading the sensitive data according to a preset score interval of each safety value scoring item, wherein the safety value scoring item comprises whether the sensitive data can directly identify a specific enterprise object and is closely related to the actual operation state of the enterprise object, and through whether other related information can be obtained through the data information, the data information can possibly cause potential economic loss and bring potential information threat to the enterprise;
summing the scores of each safety value scoring item of the sensitive data to determine the safety value score of the sensitive data;
and determining the sensitivity level of the sensitive data according to the corresponding relation between the sensitivity level and the safety value score.
Preferably, storing the sensitive data with the determined level to the container with the life cycle based on the mapping relationship between the port and the network address of the container comprises:
analyzing a protocol field of a received sensitive data message, and determining a destination network address of the sensitive data message;
determining a container port corresponding to a destination network address of the sensitive data message based on a mapping relation between the network address and the container port;
and distributing the sensitive data to a corresponding storage position of a container with a life cycle according to the container port corresponding to the destination network address of the sensitive data message.
Preferably, the establishing of the respective corresponding data desensitization models after respectively configuring the desensitization algorithms according to the second type of sensitive data stored in the container with the life cycle comprises:
configuring a desensitization algorithm according to a second type of sensitive data stored in containers having a lifecycle, respectively, wherein the desensitization algorithm is irreversible and is automated, repeatable;
and establishing a data desensitization model based on a desensitization algorithm configured by each second type of sensitive data, wherein the data desensitization model satisfies that the desensitized data has the characteristics of the original data, the integrity of the data is kept as much as possible, all non-sensitive fields which have relevance and can generate sensitive data are desensitized, and the desensitization grade of the desensitized data can be marked.
Fig. 2 is a schematic structural diagram of a big data grading desensitization system based on containers with life cycles according to a preferred embodiment of the present invention. As shown in FIG. 2, the big data grading desensitization system 200 based on containers with life cycles according to the preferred embodiment includes:
a preprocessing unit 201 for generating pre-desensitization data;
the sensitivity level unit 202 is used for performing data cleaning on pre-generated pre-desensitization data to generate sensitive data, identifying the sensitive data to determine a first type of the sensitive data, and grading the sensitive data to determine a sensitivity level;
the data classification unit 203 is used for classifying the sensitive data of the first type of each sensitivity level according to different data use objects and data content values and determining a second type of the sensitive data of the first type of each sensitivity level;
a container establishing unit 204, configured to create a container with a life cycle, assign a network address to the container based on a virtualized network layer, and determine and store a mapping relationship between a port of the container and the network address according to port information of the container and the network address assigned to the container;
a data storage unit 205, configured to store the sensitive data with the determined level to the container with the lifecycle based on a mapping relationship between a port of the container and a network address;
desensitization model unit 206, which is used to build the respective corresponding data desensitization models after respectively configuring desensitization algorithms according to the second type of sensitive data stored in the container with life cycle;
a data desensitization unit 207 for desensitizing the sensitive data corresponding to the second type according to the established data desensitization model and storing the desensitized data in a container having a life cycle;
and the data transmission unit 208 is used for responding to a data acquisition request sent by a specified object and transmitting the desensitized data stored in the container to the specified object, wherein when the storage time length of the data in the container reaches a preset time length and/or after the data stored in the container is transmitted to the specified object, the life cycle of the container is ended, the container is destroyed, and the data stored in the container is deleted.
Preferably, the data preprocessing unit 201 includes:
a data extraction unit 211, configured to extract source data in heterogeneous data sources distributed in a business system through a big data extraction tool, where the source data includes structured data and unstructured data;
and the data processing unit 212 is used for generating pre-desensitization data after the source data is subjected to cleaning, conversion, integration and structuring operation, and transmitting the pre-desensitization data to a desensitization database in the big data storage system.
Preferably, the sensitivity level unit 202 includes:
a sensitive data unit 221, configured to perform data cleaning on the pre-generated pre-desensitization data, and generate sensitive data after eliminating a repetition value, a missing value, and an abnormal value in the pre-desensitization data;
the data dividing unit 222 is configured to divide the sensitive data according to different data attributes, and determine a first type of the sensitive data;
and a grade determining unit 223 for evaluating the security value of the sensitive data according to the confidentiality, integrity and availability of the sensitive data, and determining the sensitivity grade of the sensitive data.
Preferably, the level determination unit 223 evaluates the security value of the sensitive data according to confidentiality, integrity and availability thereof, and determining the sensitivity level thereof includes:
grading the sensitive data according to a preset score interval of each safety value scoring item, wherein the safety value scoring item comprises whether the sensitive data can directly identify a specific enterprise object and is closely related to the actual operation state of the enterprise object, and through whether other related information can be obtained through the data information, the data information can possibly cause potential economic loss and bring potential information threat to the enterprise;
summing the scores of each safety value scoring item of the sensitive data to determine the safety value score of the sensitive data;
and determining the sensitivity level of the sensitive data according to the corresponding relation between the sensitivity level and the safety value score.
Preferably, the data storage unit 205 includes:
a data parsing unit 251, configured to parse a protocol field of a received sensitive data packet, and determine a destination network address of the sensitive data packet;
a port determining unit 252, configured to determine, based on a mapping relationship between the network address and the container port, a container port corresponding to a destination network address of the sensitive data packet;
and the data distribution unit 253 is configured to distribute the sensitive data to the corresponding storage location of the container with the life cycle according to the container port corresponding to the destination network address of the sensitive data packet.
Preferably, the desensitization model unit 206 comprises:
an algorithm configuration unit 261 for configuring a desensitization algorithm, respectively, according to a second type of sensitive data stored in containers having a life cycle, wherein the desensitization algorithm is irreversible and is automated, repeatable;
the model establishing unit 262 is used for establishing a data desensitization model based on a desensitization algorithm configured by the second type of sensitive data, wherein the data desensitization model satisfies that the desensitized data has the characteristics of the original data, the integrity of the data is kept as much as possible, desensitization processing is also performed on all non-sensitive fields which have relevance and can generate sensitive data, and the desensitization data can be marked with a sensitivity level.
The steps of desensitization of the big data grading desensitization system based on the container with the life cycle according to the preferred embodiment are the same as those of the big data grading desensitization method based on the container with the life cycle, so that the technical effects are the same, and further description is omitted.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
The invention has been described above by reference to a few embodiments. However, other embodiments of the invention than the one disclosed above are equally possible within the scope of the invention, as would be apparent to a person skilled in the art from the appended patent claims.
Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to "a// the [ device, component, etc ]" are to be interpreted openly as at least one instance of a device, component, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated.

Claims (12)

1. A method for hierarchical desensitization of big data based on containers having a life cycle, the method comprising:
carrying out data cleaning on pre-generated pre-desensitization data to generate sensitive data, identifying the sensitive data to determine a first type of the sensitive data, and grading the sensitive data to determine a sensitivity grade;
classifying the first type of sensitive data of each sensitivity level according to different data use objects and data content values, and determining a second type of the first type of sensitive data of each sensitivity level;
creating a container with a life cycle, allocating a network address to the container based on a virtualized network layer, and determining and storing a mapping relation between a port of the container and the network address according to port information of the container and the network address allocated to the container;
storing the sensitive data with the determined level to the container with the life cycle based on the mapping relation between the port of the container and the network address;
respectively configuring desensitization algorithms according to a second type of sensitive data stored in a container with a life cycle and then establishing respective corresponding data desensitization models;
desensitizing the sensitive data corresponding to the second type according to the established data desensitization model, and storing the desensitized data in a container with a life cycle;
and responding to a data acquisition request sent by a designated object, and transmitting the desensitized data stored in the container to the designated object, wherein when the storage time of the data in the container reaches a preset time and/or after the data stored in the container is transmitted to the designated object, the life cycle of the container is ended, the container is destroyed, and the data stored in the container is deleted.
2. The method of claim 1, further comprising, prior to generating pre-desensitization data, the generating pre-desensitization data comprising:
extracting source data in a distributed and heterogeneous data source in a business system by a big data extraction tool, wherein the source data comprises structured data and unstructured data;
and carrying out cleaning, conversion, integration and structuring operations on the source data to generate pre-desensitization data, and transmitting the pre-desensitization data to a desensitization database in a big data storage system.
3. The method of claim 1, wherein data cleansing pre-generated pre-desensitization data generates sensitive data and identifying the sensitive data to determine a first type of sensitive data, and wherein ranking the sensitive data comprises:
carrying out data cleaning on the pre-generated pre-desensitization data, and generating sensitive data after eliminating repeated values, missing values and abnormal values in the pre-desensitization data;
dividing the sensitive data according to different data attributes, and determining a first type of the sensitive data;
and evaluating the security value of the sensitive data according to the confidentiality, integrity and availability of the sensitive data, and determining the sensitivity level of the sensitive data.
4. The method of claim 3, wherein the security value of the sensitive data is evaluated based on its confidentiality, integrity and availability, and wherein determining its sensitivity level comprises:
grading the sensitive data according to a preset score interval of each safety value scoring item, wherein the safety value scoring item comprises whether the sensitive data can directly identify a specific enterprise object and is closely related to the actual operation state of the enterprise object, and through whether other related information can be obtained through the data information, the data information can possibly cause potential economic loss and bring potential information threat to the enterprise;
summing the scores of each safety value scoring item of the sensitive data to determine the safety value score of the sensitive data;
and determining the sensitivity level of the sensitive data according to the corresponding relation between the sensitivity level and the safety value score.
5. The method of claim 1, wherein storing the sensitive data with the determined level to the container with the life cycle based on the mapping relationship between the port and the network address of the container comprises:
analyzing a protocol field of a received sensitive data message, and determining a destination network address of the sensitive data message;
determining a container port corresponding to a destination network address of the sensitive data message based on a mapping relation between the network address and the container port;
and distributing the sensitive data to a corresponding storage position of a container with a life cycle according to the container port corresponding to the destination network address of the sensitive data message.
6. The method of claim 1, wherein establishing respective corresponding data desensitization models after respectively configuring desensitization algorithms according to the second type of sensitive data stored in the life cycle container comprises:
configuring a desensitization algorithm according to a second type of sensitive data stored in containers having a lifecycle, respectively, wherein the desensitization algorithm is irreversible and is automated, repeatable;
and establishing a data desensitization model based on a desensitization algorithm configured by each second type of sensitive data, wherein the data desensitization model satisfies that the desensitized data has the characteristics of the original data, the integrity of the data is kept as much as possible, all non-sensitive fields which have relevance and can generate sensitive data are desensitized, and the desensitization grade of the desensitized data can be marked.
7. A big data grading desensitization system based on containers having a lifecycle, the system comprising:
the device comprises a sensitivity grade unit, a data processing unit and a data processing unit, wherein the sensitivity grade unit is used for carrying out data cleaning on pre-generated pre-desensitization data to generate sensitive data, identifying the sensitive data to determine a first type of the sensitive data, grading the sensitive data and determining a sensitivity grade;
the data classification unit is used for classifying the sensitive data of the first type of each sensitivity level according to different data use objects and data content values and determining a second type of the sensitive data of the first type of each sensitivity level;
the system comprises a container establishing unit, a service establishing unit and a service establishing unit, wherein the container establishing unit is used for establishing a container with a life cycle, allocating a network address to the container based on a virtualized network layer, and determining and storing a mapping relation between a port of the container and the network address according to port information of the container and the network address allocated to the container;
the data storage unit is used for storing the sensitive data with the determined level to the container with the life cycle based on the mapping relation between the port of the container and the network address;
the desensitization model unit is used for respectively configuring desensitization algorithms according to the second type of the sensitive data stored in the container with the life cycle and then establishing data desensitization models corresponding to the desensitization algorithms;
the data desensitization unit is used for desensitizing the sensitive data corresponding to the second type according to the established data desensitization model and storing the desensitized data in a container with a life cycle;
and the data transmission unit is used for responding to a data acquisition request sent by a specified object and transmitting the desensitized data stored in the container to the specified object, wherein when the storage time length of the data in the container reaches a preset time length and/or after the data stored in the container is transmitted to the specified object, the life cycle of the container is ended, the container is destroyed, and the data stored in the container is deleted.
8. The system of claim 7, further comprising a data pre-processing unit for generating pre-desensitization data, wherein the data pre-processing unit comprises:
the data extraction unit is used for extracting source data in heterogeneous data sources distributed in a business system through a big data extraction tool, wherein the source data comprises structured data and unstructured data;
and the data processing unit is used for generating pre-desensitization data after the source data are subjected to cleaning, conversion, integration and structuring operation, and transmitting the pre-desensitization data to a desensitization database in the big data storage system.
9. The system of claim 7, wherein the sensitivity level unit comprises:
the sensitive data unit is used for carrying out data cleaning on the pre-generated pre-desensitization data and generating sensitive data after eliminating repeated values, missing values and abnormal values in the pre-desensitization data;
the data dividing unit is used for dividing the sensitive data according to different data attributes and determining a first type of the sensitive data;
and the grade determining unit is used for evaluating the security value of the sensitive data according to the confidentiality, the integrity and the availability of the sensitive data and determining the sensitivity grade of the sensitive data.
10. The system of claim 9, wherein the level determining unit evaluates the security value of the sensitive data according to its confidentiality, integrity and availability, and wherein determining the level of sensitivity comprises:
grading the sensitive data according to a preset score interval of each safety value scoring item, wherein the safety value scoring item comprises whether the sensitive data can directly identify a specific enterprise object and is closely related to the actual operation state of the enterprise object, and through whether other related information can be obtained through the data information, the data information can possibly cause potential economic loss and bring potential information threat to the enterprise;
summing the scores of each safety value scoring item of the sensitive data to determine the safety value score of the sensitive data;
and determining the sensitivity level of the sensitive data according to the corresponding relation between the sensitivity level and the safety value score.
11. The system of claim 7, wherein the data storage unit comprises:
the data analysis unit is used for analyzing the protocol field of the received sensitive data message and determining the destination network address of the sensitive data message;
a port determining unit, configured to determine, based on a mapping relationship between the network address and the container port, a container port corresponding to a destination network address of the sensitive data packet;
and the data distribution unit is used for distributing the sensitive data to the corresponding storage position of the container with the life cycle according to the container port corresponding to the destination network address of the sensitive data message.
12. The system of claim 7, wherein the desensitization model unit comprises:
an algorithm configuration unit for configuring a desensitization algorithm, respectively, according to a second type of sensitive data stored in containers having a lifecycle, wherein the desensitization algorithm is irreversible and is automated, repeatable;
and the model establishing unit is used for establishing a data desensitization model based on a desensitization algorithm configured by the second type of sensitive data, wherein the data desensitization model meets the requirement that the desensitized data has the characteristics of the original data, the integrity of the data is reserved as much as possible, all non-sensitive fields which have relevance and can generate sensitive data are desensitized, and the desensitization data can be marked with sensitivity levels.
CN202010000740.0A 2020-01-02 2020-01-02 Big data hierarchical desensitization method and system based on container with life cycle Active CN111199048B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010000740.0A CN111199048B (en) 2020-01-02 2020-01-02 Big data hierarchical desensitization method and system based on container with life cycle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010000740.0A CN111199048B (en) 2020-01-02 2020-01-02 Big data hierarchical desensitization method and system based on container with life cycle

Publications (2)

Publication Number Publication Date
CN111199048A true CN111199048A (en) 2020-05-26
CN111199048B CN111199048B (en) 2023-07-25

Family

ID=70746763

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010000740.0A Active CN111199048B (en) 2020-01-02 2020-01-02 Big data hierarchical desensitization method and system based on container with life cycle

Country Status (1)

Country Link
CN (1) CN111199048B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112182657A (en) * 2020-10-26 2021-01-05 天津市城市规划设计研究总院有限公司 Desensitization method for big data in urban planning
CN112231745A (en) * 2020-09-03 2021-01-15 中国电子科技集团公司第三十研究所 Big data security and privacy protection method based on geometric deformation and storage medium
CN112667624A (en) * 2021-01-21 2021-04-16 厦门信息集团大数据运营有限公司 Data quality management method and system thereof
CN113297623A (en) * 2021-06-23 2021-08-24 天道金科股份有限公司 Sensitive data desensitization system based on database
CN113536325A (en) * 2021-09-14 2021-10-22 杭州振牛信息科技有限公司 Digital information risk monitoring method and device
CN113705530A (en) * 2021-09-09 2021-11-26 湖南强智科技发展有限公司 Smart campus big data acquisition management system based on cloud computing
CN113704816A (en) * 2021-08-05 2021-11-26 绿盟科技集团股份有限公司 Data desensitization method, device and storage medium
US11256785B2 (en) * 2019-07-09 2022-02-22 Microsoft Technologly Licensing, LLC Using secure memory enclaves from the context of process containers
WO2022267177A1 (en) * 2021-06-22 2022-12-29 深圳壹账通智能科技有限公司 Address desensitization method and apparatus, and electronic device and storage medium
CN116611093A (en) * 2023-06-13 2023-08-18 瀚高基础软件(济南)有限公司 Method and equipment for authorizing use of database resources

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140150060A1 (en) * 2012-11-28 2014-05-29 Wal-Mart Stores, Inc. System and method for protecting data in an enterprise environment
CN106649587A (en) * 2016-11-17 2017-05-10 国家电网公司 High-security desensitization method based on big data information system
CN107194276A (en) * 2017-05-03 2017-09-22 上海上讯信息技术股份有限公司 Database Dynamic desensitization method and equipment
CN108111513A (en) * 2017-12-21 2018-06-01 泰康保险集团股份有限公司 Applied to the data managing method of front device, device, medium and electronic equipment
CN109740363A (en) * 2019-01-04 2019-05-10 贵州大学 Rating documents desensitization encryption method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140150060A1 (en) * 2012-11-28 2014-05-29 Wal-Mart Stores, Inc. System and method for protecting data in an enterprise environment
CN106649587A (en) * 2016-11-17 2017-05-10 国家电网公司 High-security desensitization method based on big data information system
CN107194276A (en) * 2017-05-03 2017-09-22 上海上讯信息技术股份有限公司 Database Dynamic desensitization method and equipment
CN108111513A (en) * 2017-12-21 2018-06-01 泰康保险集团股份有限公司 Applied to the data managing method of front device, device, medium and electronic equipment
CN109740363A (en) * 2019-01-04 2019-05-10 贵州大学 Rating documents desensitization encryption method

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11256785B2 (en) * 2019-07-09 2022-02-22 Microsoft Technologly Licensing, LLC Using secure memory enclaves from the context of process containers
CN112231745A (en) * 2020-09-03 2021-01-15 中国电子科技集团公司第三十研究所 Big data security and privacy protection method based on geometric deformation and storage medium
CN112182657A (en) * 2020-10-26 2021-01-05 天津市城市规划设计研究总院有限公司 Desensitization method for big data in urban planning
CN112667624A (en) * 2021-01-21 2021-04-16 厦门信息集团大数据运营有限公司 Data quality management method and system thereof
WO2022267177A1 (en) * 2021-06-22 2022-12-29 深圳壹账通智能科技有限公司 Address desensitization method and apparatus, and electronic device and storage medium
CN113297623A (en) * 2021-06-23 2021-08-24 天道金科股份有限公司 Sensitive data desensitization system based on database
CN113297623B (en) * 2021-06-23 2022-05-10 天道金科股份有限公司 Sensitive data desensitization system based on database
CN113704816A (en) * 2021-08-05 2021-11-26 绿盟科技集团股份有限公司 Data desensitization method, device and storage medium
CN113705530A (en) * 2021-09-09 2021-11-26 湖南强智科技发展有限公司 Smart campus big data acquisition management system based on cloud computing
CN113536325A (en) * 2021-09-14 2021-10-22 杭州振牛信息科技有限公司 Digital information risk monitoring method and device
CN116611093A (en) * 2023-06-13 2023-08-18 瀚高基础软件(济南)有限公司 Method and equipment for authorizing use of database resources
CN116611093B (en) * 2023-06-13 2024-03-08 瀚高基础软件(济南)有限公司 Method and equipment for authorizing use of database resources

Also Published As

Publication number Publication date
CN111199048B (en) 2023-07-25

Similar Documents

Publication Publication Date Title
CN111199048B (en) Big data hierarchical desensitization method and system based on container with life cycle
Van Wegberg et al. Plug and prey? measuring the commoditization of cybercrime via online anonymous markets
CN111079174A (en) Power consumption data desensitization method and system based on anonymization and differential privacy technology
US8412712B2 (en) Grouping methods for best-value determination from values for an attribute type of specific entity
CN111738549A (en) Food safety risk assessment method, device, equipment and storage medium
JP2008262601A (en) Distributed risk assessment system and method for assessing financial fraud risk
CN110502638B (en) Enterprise news risk classification method based on target entity
CN108304725A (en) A kind of method and system to the desensitization of government data resource
Singh Towards data privacy and security framework in big data governance
CN112417492A (en) Service providing method based on data classification and classification
Singh et al. Design and implementation of continuous monitoring and auditing in SAP enterprise resource planning
Plaksiy et al. Applying big data technologies to detect cases of money laundering and counter financing of terrorism
Turner et al. Follow the money: Revealing risky nodes in a Ransomware-Bitcoin network
Domashova et al. Identification of non-typical international transactions on bank cards of individuals using machine learning methods
Chang et al. A review paper on the application of big data by banking institutions and related ethical issues and responses
Rjaibi et al. Developing a novel holistic taxonomy of security requirements
Gupta et al. Security measures in data mining
CN115471258A (en) Violation behavior detection method and device, electronic equipment and storage medium
Madyatmadja et al. The effectiveness of security and customer convenience in the use of e-commerce
CN111861699B (en) Anti-fraud index generation method based on operator data
CN113469701A (en) Block chain asset transaction risk analysis method
Faccia et al. Financial Big Data security and privacy in X-accounting. A step further to implement the triple-entry accounting
Gise-Sproģe et al. MONEY LAUNDERING INVESTIGATION: THE CASE OF LATVIA.
Yang et al. Analysis of platform economic supervision mode from the perspective of blockchain
Varma et al. MS excel functions as supply chain fraud detector

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant