CN116541887A

CN116541887A - Data security protection method for big data platform

Info

Publication number: CN116541887A
Application number: CN202310831904.8A
Authority: CN
Inventors: 胡琦; 严鹤; 王俊; 杨权
Original assignee: Yunqi Intelligent Technology Co ltd
Current assignee: Yunqi Intelligent Technology Co ltd
Priority date: 2023-07-07
Filing date: 2023-07-07
Publication date: 2023-08-04
Anticipated expiration: 2043-07-07
Also published as: CN116541887B

Abstract

The invention provides a data security protection method of a big data platform, which relates to the technical field of computers and comprises the following steps: the large data platform gathers all data of the service system, and stores all data tables in a data warehouse according to categories in the process of data development and treatment; automatically capturing data blood relationship among the data tables according to ETL scheduling job dependency relationship of the data management platform, forming a relationship graph of the data tables and the data blood relationship, and storing the relationship graph into a metadata database; adopting security protection measures according to different service demands, forming a plurality of data security protection strategies by the service demands and the corresponding security protection measures, and storing the data security protection strategies to a data security management platform; the user inputs the current data and the current business requirement, inquires the security protection measure of the current data according to the current business requirement, and executes security protection on the current data based on the security protection measure. The invention utilizes the data blood margin to realize the rapid identification of the data, and greatly improves the efficiency of identifying the data.

Description

Data security protection method for big data platform

Technical Field

The invention relates to the technical field of computers, in particular to a data security protection method for a big data platform.

Background

A database, in short, can be considered an electronic filing cabinet. In the prior art, metadata is a very important class of data generated during database management. Metadata is also called intermediate data, relay data, which is data describing data, or structural data for providing information about a certain resource. Metadata is primarily information describing data attributes to support functions such as indicating storage locations, history data, resource lookups, file records, etc. In terms of the data structure, metadata is an electronic catalog, and in order to achieve the purpose of cataloging, the content or characteristics of the data must be described and collected, so as to achieve the purpose of assisting in data retrieval.

Data warehouses in large data platforms are typically managed hierarchically, with different data layers storing sensitive data. A large number of new data tables are generated for each data layer in the processes of data acquisition, data development and data management. These data tables contain sensitive data, and there are many related methods for protecting sensitive data. The Chinese patent application number 201511026582.1 discloses a sensitive data protection system and method for data circulation and transaction of a big data platform, realizes the protection of sensitive data from the whole link of data circulation, and simultaneously provides an automatic sensitive data discovery method based on expert system and natural language processing, which can effectively verify the correctness and authenticity of a desensitization result. However, the security protection of data in the prior art relies on a large amount of labor, and the efficiency is not high.

Disclosure of Invention

In view of the above, the invention provides a data security protection method for a big data platform, which combines the data blood relationship with a data table to form a relationship diagram, and marks and protects the sensitive data in batches by utilizing the superior performance of the relationship diagram, thereby greatly improving the data identification efficiency and reducing the error and leakage.

The technical scheme of the invention is realized as follows: the invention provides a data security protection method for a big data platform, which comprises the following steps:

s1, acquiring all data tables in a large data platform, and storing all the data tables in a data warehouse according to categories, wherein the data warehouse comprises a plurality of data layers, and the data tables in one data layer have the same category;

s2, automatically capturing data blood-margin relations among the data tables according to ETL scheduling job dependency relations of the data management platform, forming a relation diagram of the data tables and the data blood-margin relations, and storing the relation diagram into a metadata database;

s3, adopting security protection measures according to different service demands, forming a plurality of data security protection strategies by the service demands and the corresponding security protection measures, and storing the data security protection strategies to a data security management platform;

s4, the user inputs the current data and the current service requirement, inquires the security protection measure of the current data according to the current service requirement, and executes security protection on the current data based on the security protection measure.

On the basis of the above technical solution, preferably, in step S2, the process of forming the relationship diagram includes:

performing sql statement analysis on the header in the data table to obtain a grammar tree of the header, determining semantic information of the header according to the grammar tree, and taking the semantic information as table name information of the header;

executing sql statement analysis on each field of a table in a data table to obtain a grammar tree of each field, determining semantic information of each field according to the grammar tree, and taking the semantic information as field information of the field;

linking each field information with the corresponding table name information to obtain a table field, and taking the table field as a node of the relation graph;

and storing the data blood relationship between the data tables as the edges of the relationship graph, wherein the data blood relationship is a directed relationship between the table fields, and each directed relationship divides the corresponding table field into an upstream table field and a downstream table field.

On the basis of the above technical solution, preferably, step S3 includes:

making corresponding data security levels for the data in the data table according to the security management specification, wherein the data security levels are divided into a plurality of security levels;

dividing service requirements into data access and service operations;

determining the safety protection measures adopted according to the service requirements, the data layer where the data are located and the data safety level of the data;

and constructing a data security protection strategy by the data-service requirement-data security level-data layer-security protection measures according to a one-to-one correspondence relationship, and storing the data security protection strategy to a data security management platform.

Still more preferably, step S3 further includes:

and identifying the data security protection strategy and the corresponding data in the big data platform by adopting an identification method based on the relationship graph, linking the identified process and result with the corresponding data security protection strategy, and storing the linked result in the data security management platform.

Still further preferably, the identification method includes:

firstly, randomly selecting data in a large data platform as target data by an expert, extracting a target table field and a data security level of the target data, judging the sensitivity of the target data by the expert, if the target data is sensitive data, giving a corresponding desensitization algorithm by the expert, marking the data security level of the target data, a sensitivity judgment result of the target data and the desensitization algorithm, and obtaining a marking result of the target data;

step two, taking a node corresponding to a target table field as a starting point in the relation diagram, recursively traversing the relation diagram according to a depth-first algorithm from the starting point according to the directed relation, searching a downstream table field related to the starting point, and storing the searched result to a first list;

step three, taking a node corresponding to a target table field as a starting point in the relation diagram, recursively traversing the relation diagram according to a depth-first algorithm from the starting point according to the directed relation, searching an upstream table field related to the starting point, and storing the searched result to a first list;

step four, the table segments in the first list are sorted to obtain associated data of the target data, an expert identifies the associated data manually, the data security level of the associated data, the sensitivity judgment result of the associated data and a desensitization algorithm are marked, and the marking result of the associated data is obtained;

and fifthly, repeating the first step to the fourth step until all the data in the big data platform are marked, and storing the marking results of the final target data and the associated data to the data security management platform.

Still further preferably, the current data is access data, and the current service requirement is data access, and step S4 includes:

the user executes data access operation, access data is input, and the access data is sensitive data;

a desensitization algorithm for calling access data from the data security management platform;

a desensitization algorithm is performed on the access data.

Still further preferably, the current data is service data, the current service requirement is service operation, and step S4 includes:

the user executes the business operation and inputs business data;

calling the data security level of the service data from the data security management platform;

querying a data layer of service data from a metadata database;

inquiring security protection measures of the service data from a data security management platform according to the service operation, the data layer of the service data and the data security level of the service data;

security protection measures are performed on the traffic data.

Still further preferably, the method further comprises:

when the big data platform detects that the relation diagram is updated, the data security protection strategy in the data security management platform is automatically identified, and the result is updated and stored in the data security management platform.

Still further preferably, the automatically identifying the data security protection policy in the data security management platform includes:

traversing the updated data blood-edge relation in the relation diagram after searching and updating, comparing the updated data blood-edge relation with the original relation diagram to obtain a plurality of target data tables with direct or indirect link relation with the updated data blood-edge relation, and storing the plurality of target data tables into a second list;

traversing each target data table in the second list, obtaining all table fields of each target data table in the updated relation diagram according to a map query mode, taking the table fields as a first table field set, and storing the table fields into a third list;

traversing a third list, determining a directed relation between the first table fields according to the updated data blood-edge relation, forming a plurality of updated paths by using the updated data blood-edge relation and the first table fields, searching the table fields positioned at the most upstream in each updated path based on the directed relation between the first table fields, taking the table fields at the most upstream as a second table field, and storing the second table field in a fourth list;

step four, traversing the fourth list, and sequentially inquiring the data security level and the marking result of the second table field in the data security management platform;

traversing the fourth list, recursively searching all downstream table fields of each second table field in the updated relation diagram to obtain a third table field set of each second table field, and storing the second table field, the corresponding third table field set, the corresponding data security level and the corresponding marking result into the fifth list;

and step six, traversing the fifth list, automatically assigning the data security level and the marking result of the second table field to the corresponding third table field set until all the table fields in the fifth list contain the data security level and the marking result, and storing the traversed fifth list to the data security management platform.

Still more preferably, the desensitization algorithm is a method for hiding sensitive information, and includes a mask type desensitization algorithm, a hash type desensitization algorithm, a truncated type desensitization algorithm, and a symmetric encryption type desensitization algorithm.

Compared with the prior art, the method has the following beneficial effects:

(1) The data table and the data blood relationship are analyzed to form a relationship diagram, so that the data is managed and utilized in a deeper level, and the utilization rate of the data is greatly increased;

(2) By utilizing the map performance of the relation diagram, when sensitive data are identified manually, batch marking and safety protection are realized, the efficiency of data identification is improved, and the safety protection performance is also improved;

(3) An automatic updating and verifying mechanism is arranged, and after the relation diagram is updated, the data in the platform is subjected to relevant safety protection check by utilizing the data blood relationship so as to ensure that the safety of the sensitive data is not destroyed.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a directed relationship according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a manual identification method according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating the execution of data access according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating the operation of a business according to another embodiment of the present invention;

FIG. 6 is a schematic diagram of an automatic identification method according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of an architecture according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will clearly and fully describe the technical aspects of the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.

As shown in fig. 1, the present invention provides a data security protection method for a big data platform, including:

Specifically, in an embodiment of the present invention, step S1 includes:

the big data platform comprises a data management platform, a data warehouse and a data safety management platform, wherein all data in the big data platform are stored in the data warehouse in a layering manner, and the layering manner of the data warehouse is determined according to specific data content. And storing the data into different data layers according to the type of each data, namely, the data in each data layer are of the same type. In data governance and data development services, new data tables are created at each data layer.

Specifically, in an embodiment of the present invention, step S2 includes:

the data management platform comprises a metadata management module, wherein metadata is data describing information resources or data and other objects, and mainly is information describing data attributes and used for supporting functions such as indication storage positions, historical data, resource searching, file recording and the like. In this embodiment, the metadata management module specifically includes a metadata database, which includes a relationship chart, that is, a data blood relationship, and metadata in the metadata database, where metadata in this embodiment is diversified metadata, when a data table is created, corresponding metadata to be formed describes table names, field information, field types, field lengths, and the like of the data table, and at the same time, metadata also describes storage locations of the data table, that is, which data layer in the data warehouse the data table is located on, and after the relationship chart is formed, all the nodes, edges, that is, the table fields, and relationships between the table fields in the relationship chart have metadata to be described, and in the subsequent data identification process, new data tables, new data, updated relationship charts, and the like are generated, which are all described corresponding metadata to be formed.

In the embodiment of the invention, the data blood-edge relationship is automatically captured according to the ETL scheduling operation dependency relationship of the data management platform, and in a specific example, the automatic capturing can be realized by presetting a blood-edge hook function. The data blood relationship is a directed relationship between a field of a data table formed during data processing of a large data platform to a field of another data table. After the data blood-edge relation is obtained, the metadata management module stores the data of the data blood-edge relation in a metadata database.

In one embodiment of the present invention, the process of forming the relationship graph includes:

The grammar tree can analyze the grammar of the sql sentence, and converts the character strings in the sql sentence into a structure body, so that a computer can more easily understand the specific meaning of the character strings in the sql sentence. In a specific implementation process, an sql statement parser may be used to parse each sql statement in the sql statement set separately, so as to obtain a syntax tree of each sql statement, for example, a guide or other parser may be used to parse the sql statement.

After the syntax tree of each sql statement is obtained, traversing the syntax tree of each sql statement to obtain field information and table name information related to the sql statement. The field information and the table name information extracted from each grammar tree are linked firstly, for example, the table name information in the same data table should be linked to the field information of the corresponding data column for a plurality of times, if the table name information of one data table is Y1 and the data table has 3 field information w1, w2 and w3, the table name information and the field information are linked into the table fields Y1-w1, Y1-w2 and Y1-w3 firstly when the subsequent operation is executed. After the table fields are obtained, the table fields are used as nodes of a relation graph, the table fields are marked with directional relations according to the obtained data blood relation, each two associated table fields are divided into an upstream table field and a downstream table field by the directional relations, the directional relations are stored as edges of the relation graph, the relation graph between the data is formed according to the nodes and the directional relations, and the data is stored in a metadata database. Referring to FIG. 2, FIG. 2 is a simplified diagram showing the directional relationship of table fields among data tables in one embodiment of the present invention to facilitate an understanding of the description of the directional relationship of the present invention. In fig. 2, there is a directional relationship between field 1 of table a to field 1 of table D, where the upstream table field is field 1 of table a and the downstream table field is field 1 of table D.

Specifically, in an embodiment of the present invention, step S3 includes:

dividing service requirements into data access and service operations;

The data security level is an identification for classifying and grading the data according to the security management standards, and the number of the security levels is different according to different data contents. In this embodiment, the data security level includes 3 security levels.

The importance and privacy information of the data are analyzed to determine which security level the data corresponds to, and different security levels correspond to different security measures. Therefore, in the embodiment of the invention, according to the service requirement, the data layer where the data is located and the data security level of the data, the three factors are comprehensively considered, and then what security protection measures corresponding to the data are determined.

Specifically, in an embodiment of the present invention, after a data security protection policy is set, an identification method is used to identify the data security protection policy and its corresponding data in a large data platform based on a relational graph, and the identified process and result are linked with the corresponding data security protection policy and then stored in a data security management platform.

Referring to fig. 3, the identification method includes:

The identification process is a manual marking process, in the manual marking process, the data blood relationship between the data is fully utilized, the sensitive data in the large data platform is rapidly identified and judged to be safe and protected, the data identification efficiency is greatly improved, and a great effect is played on the safe and protection of the data.

In the fourth step of the identification method, when the expert performs manual identification on the associated data, the expert is a very fast process, and the security level, the sensitivity judgment result and the desensitization algorithm of the target data and the associated data are the same as each other due to the data blood edges of the unidirectional arrows between the target data and the associated data. Thus, the expert can quickly mass-label the associated data. This can greatly reduce the time required for recognition.

It should be understood that, in the manual identification method of the present invention, an expert randomly selects one data as the target data at the beginning, and then uses the relationship between the blood edges of the data to accelerate the identification process, so that the time of the expert can be reduced, which is a preferred embodiment of the present invention. However, the expert may first classify the data in the big data platform, first pick out the suspected sensitive data, and then randomly select one from the primarily selected data as the target data.

In this embodiment, the desensitization algorithm is a method for hiding sensitive information, and includes a mask type desensitization algorithm, a hash type desensitization algorithm, a truncated type desensitization algorithm, and a symmetric encryption type desensitization algorithm. The method comprises the following steps: the mask class includes masking sensitive information such as name, identity, phone number, etc. The hash-like algorithm includes desensitizing sensitive information using SM3/MD 5/SHA-1. The truncation type algorithm comprises the steps of truncating date, numerical value and other data. Symmetric encryption classes include data desensitization using SM 4/DES/AES.

The identification method will be described with a specific example:

the large data platform comprises a plurality of data, some of the data are sensitive data, and the manual identification mode adopted by the embodiment of the invention is that an expert randomly selects one data as target data, and the target table field of the expert in the relation graph is determined according to the target data.

Taking the target table field as a starting point, and performing the following two operations:

1. according to the position of the starting point in the relation diagram, the starting point is used as an upstream table field, a first direction is determined according to the directed relation, the first direction is the direction of the downstream table field of the starting point, the whole relation diagram is traversed based on a depth-first algorithm by taking the first direction as a searching direction, a first searching path is obtained, and nodes on the first searching path and the starting point are directly or indirectly associated, namely, the nodes on the first searching path and the starting point have blood-edge relation with each other. Nodes on the first search path are saved to a first list. Specifically, there may be multiple first directions, for example, when the starting point is used as the upstream table field and there are three downstream table fields, the first directions are also three, and when searching is performed, one first direction is sequentially selected to perform depth-first searching, so as to finally obtain three first searching paths, and all the three first searching paths are saved to the first list. Specifically, since the depth-first algorithm, when executed, will travel on a route until it can no longer go deep, and then return to a certain node, and continue to seek downward, the first search path generally refers to a tree-like path, and has a plurality of branch paths with different depths, besides a deepest trunk path.

2. According to the position of the starting point in the relation diagram, the starting point is used as a downstream table field, a second direction is determined according to the directed relation, the second direction is the direction of the upstream table field of the starting point, the whole relation diagram is traversed based on a depth-first algorithm by taking the second direction as a searching direction, and a second searching path is obtained, wherein nodes on the second searching path and the starting point are directly or indirectly associated, namely, the nodes on the second searching path and the starting point have blood-edge relation with each other. Nodes on the second search path are saved to the first list. Specifically, there may be multiple second directions, for example, when the starting point is used as the downstream table field and there are five upstream table fields, the second directions are also five, and when searching is performed, one second direction is sequentially selected to perform depth-first searching, so as to finally obtain five second searching paths, and all the five second searching paths are saved to the first list. Specifically, the second search path also refers to a tree-like path, and has a plurality of branch paths with different depths, in addition to a deepest trunk path.

And sorting all table fields in the first list, namely counting the tree diagrams of all the first search paths and the second search paths to obtain the associated data of the target data. And the expert marks the associated data in batches, and takes the data security level of the associated data, the sensitivity judgment result of the associated data and the desensitization algorithm as the marking result of the associated data. And then storing the marking result of the target data and the marking result of the associated data to the data security management platform.

Specifically, the manual identification method is performed for a plurality of times, namely, an expert randomly selects a plurality of data as target data later, and the characteristics of the relation diagram are utilized for batch marking, so that the efficiency of data security protection is greatly improved.

Specifically, referring to fig. 4, in an embodiment of the present invention, the current data is access data, the current service requirement is data access, the data access includes data query, data open API service, and data batch exchange service, and step S4 includes:

a desensitization algorithm is performed on the access data.

Specifically, according to the access data input by the user, the content information of the access data is analyzed, a desensitization algorithm for determining the access data is searched in the data security management platform, for example, the access data is 3-level sensitive data, the corresponding desensitization algorithm is to desensitize the data through a hash desensitization algorithm, and then the data access operation executes SM3/MD5/SHA-1 to desensitize the access data.

Specifically, referring to fig. 5, in an embodiment of the present invention, the current data is service data, the current service requirement is service operation, the service operation includes a data resource application, and step S4 includes:

the user executes the business operation and inputs business data;

querying a data layer of service data from a metadata database;

security protection measures are performed on the traffic data.

It should be noted that, in the above two embodiments, the data access and the service operation have different execution processes, and when the data access is executed, the emphasis is on the query of the data, so that the data content of the access data needs to be determined first, the marked desensitization algorithm of the data is found in the data security management platform according to the data content, and then the sensitive information in the access data is subjected to the desensitization operation and is displayed to the user. When executing the business operation, the emphasis is on the utilization of the data, so that after determining the security level of the data, the corresponding metadata is also required to be used for addressing from the metadata database to determine which data layer the business data is located, and after performing security protection measures to desensitize sensitive information in the business data, the user is provided with a downloading service according to the addressing function.

Specifically, in one embodiment of the present invention, when the metadata management module performs an update operation on the relationship graph, the platform automatically identifies the data security protection policy, and referring to fig. 6, the automatic identification process includes:

In the embodiment of the invention, the situation that the relation diagram is updated comprises: new data is added in the platform, the original data is subjected to error correction, the blood relationship between the original data is subjected to error correction, and the like.

The above verification process will be described with a specific example:

the change analysis of the original relationship graph and the updated relationship graph can be realized by adopting a change detection graph model, and the changed data blood-edge relationship is identified and extracted to obtain the updated data blood-edge relationship, wherein the updated data blood-edge relationship can be a newly added data blood-edge relationship or a corrected data blood-edge relationship.

And searching target data tables affected by the updated data blood-edge relation according to the updated data blood-edge relation, wherein field contents in the target data tables have direct or indirect relation with the updated data blood-edge relation, and storing the target data tables into a second list.

And extracting information of table fields from each target data table in the second list, positioning the table fields in the updated relation diagram by utilizing a map query mode, and storing the table fields as a first table field set to the third list.

The updated data blood-edge relationship affects all the first table fields, so that the directional relationship among the first table fields can be determined by the updated data blood-edge relationship, and when traversing the third list, only the first table fields which are related to the updated data blood-edge relationship in the relationship graph are searched, namely, a plurality of updated paths are formed by utilizing the updated data blood-edge relationship and the first table fields, each updated path comprises the directional relationship among the continuous first table fields, and the most upstream table field is positioned in the updated paths and is used as the second table field and is stored in the fourth list;

and inquiring the data security protection strategy corresponding to the second table field in the data security management platform, determining the data security level of the data security protection strategy, and inquiring the marking result of the data security protection strategy.

Traversing the fourth list, sequentially taking the second table field in the fourth list as a starting point, taking the starting point as an upstream table field according to the position of the starting point in the updated relation diagram, determining the travelling direction according to the directed relation, taking the travelling direction as the direction of the downstream table field of the starting point, traversing the whole updated relation diagram based on a depth-first algorithm by taking the travelling direction as a searching direction, obtaining a third searching path, and taking the nodes on the third searching path as a third table field set of the starting point. And finally obtaining a third table field set of each second table field.

And storing the second table field, the data security level and the marking result of the second table field and the third table field set of the second table field to a fifth list.

Traversing the fifth list, wherein the table fields of the same data blood source have the same property, namely the second table field and the corresponding third table field set have the same security level, sensitivity judgment result and desensitization algorithm, so that the data security level and the marking result of the second table field are automatically assigned to the corresponding third table field set in batches. And the quick batch identification is realized, and the data identification efficiency is improved.

Referring to fig. 7, an architecture diagram of an embodiment of the present invention is shown to illustrate a big data platform:

in fig. 7, the big data platform includes a data security management platform, a data management platform and a data warehouse, where the requirements of the big data platform are data services, and are classified into two types, namely, data access, including data query, data open API service, data batch exchange service, and the like, and one type is a service operation, including data resource application, and the like.

In the data security management platform, except the data security protection strategy which is initially set, namely the data security level and security protection measures, and the data security protection strategy which is obtained according to manual identification and automatic verification in the data processing process, the data security management platform also comprises data security level management, desensitization algorithm management and data identification.

The data security level management module stores data security levels obtained by marking the big data platform in the process of executing various processing operations, wherein the data security levels are identifiers for classifying and grading the data according to security management specifications; in a specific embodiment, the desensitization algorithm is a method for hiding sensitive information, so that a desensitization algorithm management module in the data security management platform is a desensitization algorithm obtained by marking in the manual identification and automatic identification processes. And the data identification refers to the identification and judgment of the data content, and the data security level and the desensitization algorithm are marked.

In the data management platform, besides the metadata management module, the data management platform also comprises data acquisition, data standard, main data, data quality and data assets, wherein the data acquisition module is used for acquiring data from multiple sources and transmitting the acquired data as original data to an original library of a data warehouse for storage; the data standard module performs standardization processing on the original data and transmits the standard data to a standard library of a data warehouse for storage; the main data module classifies the subject of the original data or the standard data, and transmits the classified data to a subject database of the data warehouse for storage; the data quality module detects and records the quality of the data; the data asset module records and displays all data in the big data platform; the metadata management module is used for constructing a relation graph, storing the data blood-edge relation and metadata to form a metadata database, and the metadata database plays a role in technical support in the data identification process of the data security management platform, the verification process of the platform and the execution of the requirements and access processes of users so as to realize the data security protection method.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. The data security protection method for the big data platform is characterized by comprising the following steps of:

2. The method of claim 1, wherein in step S2, the process of forming the relationship map includes:

3. The method of claim 2, wherein step S3 comprises:

dividing service requirements into data access and service operations;

4. The method of claim 3, wherein step S3 further comprises:

5. The method of claim 4, wherein the identifying method comprises:

6. The method of claim 5, wherein the current data is access data and the current business requirement is data access, and step S4 comprises:

a desensitization algorithm is performed on the access data.

7. The method of claim 5, wherein the current data is service data and the current service requirement is a service operation, and step S4 comprises:

the user executes the business operation and inputs business data;

querying a data layer of service data from a metadata database;

security protection measures are performed on the traffic data.

8. The method of claim 5, wherein the method further comprises:

9. The method of claim 8, wherein automatically identifying the data security protection policy in the data security management platform comprises:

10. The method of claim 5, wherein the desensitizing algorithm is a method of hiding sensitive information, including a mask-type desensitizing algorithm, a hash-type desensitizing algorithm, a truncated-type desensitizing algorithm, a symmetric encryption-type desensitizing algorithm.