CN117171800B

CN117171800B - Sensitive data identification method and device based on zero trust protection system

Info

Publication number: CN117171800B
Application number: CN202311371138.8A
Authority: CN
Inventors: 王理朝; 张立杰; 何涛; 谢坚; 史晓婧
Original assignee: Shenzhen Zhuyun Technology Co ltd
Current assignee: Shenzhen Zhuyun Technology Co ltd
Priority date: 2023-10-23
Filing date: 2023-10-23
Publication date: 2024-02-06
Anticipated expiration: 2043-10-23
Also published as: CN117171800A

Abstract

The invention relates to the technical field of data processing, and discloses a sensitive data identification method and device based on a zero trust protection system, wherein the method is used for a sensitive data identification system based on the zero trust protection system, and the sensitive data identification system based on the zero trust protection system is connected with a terminal access system; the invention identifies the sensitive data in the sensitive data identification system based on the zero trust protection system, and the sensitive data identification system based on the zero trust protection system and the terminal access system are mutually independent, so that the user access experience is not affected. Further, different target desensitization rules can be selected according to the acquired target message data to be identified, one or a plurality of target desensitization rules can be adopted, most data types and sensitive data identification requirements can be met, the sensitive data can be fully covered and analyzed as far as possible, and the identification accuracy and the identification efficiency of the sensitive data are improved.

Description

Sensitive data identification method and device based on zero trust protection system

Technical Field

The invention relates to the technical field of data processing, in particular to a sensitive data identification method and device based on a zero trust protection system.

Background

Sensitive data discovery is an important technical field in the field of information security, and aims to identify, locate and protect sensitive data. With the advent of the digital age, various types of sensitive data such as personal identification information, financial data, medical records, etc. are widely stored and transmitted, and thus it has become critical to protect the privacy and security of these data. The aim of the sensitive data discovery technology is to identify sensitive data through an automatic method and tool, help organizations discover potential data leakage risks and take corresponding security measures for protection.

In the past, manual methods have been relied primarily on to identify process sensitive data, such as manual audits, manual sorting, etc., but such methods have been inefficient and prone to error. With the dramatic increase in data volume and the continual enhancement of privacy regulations, traditional manual processing methods have failed to meet the needs. Therefore, the development of sensitive data discovery technology has become a necessary trend.

With the rapid development of machine learning and artificial intelligence, sensitive data discovery techniques have gradually introduced automated and intelligent methods. Currently, the most widely used method of sensitive data discovery is to identify in combination with machine learning and Natural Language Processing (NLP) methods, but since sensitive data may be stored in different formats and systems, including databases, file systems, cloud storage, etc. For processing and accessing these heterogeneous data, the problem of data integration and format conversion needs to be solved, so that the sensitive data cannot be covered and analyzed comprehensively, and the problems of false alarm and missing report of the sensitive data may occur, so that the identification accuracy and the identification efficiency of the sensitive data cannot be ensured.

Disclosure of Invention

In view of this, the invention provides a method and a device for identifying sensitive data based on a zero trust protection system, so as to solve the problems that in the prior art, sensitive data may be stored in different formats and systems, and data integration and format conversion need to be solved, so that the sensitive data cannot be covered and analyzed comprehensively, and the problems of false alarm and false alarm of the sensitive data may occur, and further the problems of identification accuracy and identification efficiency of the sensitive data cannot be ensured.

In a first aspect, the invention provides a sensitive data identification method based on a zero trust protection system, which is used for a sensitive data identification system based on the zero trust protection system, wherein the sensitive data identification system based on the zero trust protection system is connected with a terminal access system; the sensitive data identification method based on the zero trust protection system comprises the following steps:

when a sensitive data identification request exists, the terminal access system generates at least one target message data based on the sensitive data identification request; the sensitive data identification system based on the zero trust protection system acquires target message data to be identified from each target message data generated by the terminal access system; selecting at least one target desensitization rule in a preset desensitization rule set based on target message data to be identified; and sequentially carrying out desensitization processing on the target message data to be identified by utilizing each target desensitization rule, and determining target sensitive data existing in the target message data to be identified.

According to the sensitive data identification method based on the zero trust protection system, sensitive data identification is carried out in the sensitive data identification system based on the zero trust protection system, the sensitive data identification system based on the zero trust protection system and the terminal access system are mutually independent, user access experience is not affected, and management of sensitive data identification processes by management staff is facilitated. Further, different target desensitization rules can be selected according to the acquired target message data to be identified, one or a plurality of target desensitization rules can be adopted, most data types and sensitive data identification requirements can be met, the sensitive data can be fully covered and analyzed as far as possible, and the identification accuracy and the identification efficiency of the sensitive data are improved.

In an alternative embodiment, a terminal access system includes a user terminal, a gateway proxy, a storage system, and at least one application service; when a sensitive data identification request exists, the terminal access system generates at least one target message data based on the sensitive data identification request, and the method comprises the following steps:

the user terminal establishes a sensitive data identification request and sends the sensitive data identification request to the gateway proxy; the gateway proxy acquires at least one initial message data of each application service from the corresponding application service based on the sensitive data identification request; the gateway agent analyzes each initial message data to obtain at least one target message data, and sends each target message data to the storage system for storage.

The invention generates the message data in the terminal access system, is mutually independent from the sensitive data identification process, and does not influence the user access experience.

In an optional implementation manner, the sensitive data identification system based on the zero trust protection system acquires target message data to be identified from each target message data generated by the terminal access system, and the method includes:

the sensitive data identification system based on the zero trust protection system acquires initial message data to be identified from each target message data generated by the terminal access system; judging whether the initial message data to be identified is already identified by a sensitive data identification system based on a zero trust protection system; if the initial message data to be identified is not identified by the sensitive data identification system based on the zero trust protection system, determining the initial message data to be identified as target message data to be identified; if the initial message data to be identified is already identified by the sensitive data identification system based on the zero trust protection system, continuing to acquire the initial message data to be identified in each target message data generated by the terminal access system until the acquired initial message data to be identified is not identified by the sensitive data identification system based on the zero trust protection system, and acquiring the target message data to be identified.

When the sensitive data identification system based on the zero trust protection system is used for acquiring the message data to be identified from the terminal access system, each acquired message data is judged, and only the sensitive data identification process is processed, so that the sensitive data identification efficiency is improved.

In an alternative embodiment, selecting at least one target desensitization rule in a preset desensitization rule set based on target message data to be identified, includes:

when the target message data to be identified exists, selecting a data item type desensitization rule from a preset desensitization rule set as a target desensitization rule; when the target message data to be identified is structured data, selecting a data set type desensitization rule from a preset desensitization rule set as a target desensitization rule; when the target message data to be identified contains file transmission, selecting a data file type desensitization rule from a preset desensitization rule set as a target desensitization rule.

According to the method, different target desensitization rules are selected according to the acquired target message data to be identified, one or a plurality of target desensitization rules can be adopted, most data types and sensitive data identification requirements can be met, the sensitive data can be fully covered and analyzed as far as possible, and the identification accuracy and the identification efficiency of the sensitive data are improved.

In an alternative embodiment, the desensitizing processing is sequentially performed on the target to-be-identified message data by using each target desensitizing rule, and determining the target sensitive data existing in the target to-be-identified message data includes:

judging whether the target message data to be identified contains data conforming to the desensitization rule of the data item type; and if the target message data to be identified contains data conforming to the data item type desensitization rule, determining the data conforming to the data item type desensitization rule as target sensitive data.

According to the invention, different desensitization rules are selected for sensitive data identification processing according to the target message data to be identified, so that most data types and sensitive data identification requirements are met, the sensitive data can be fully covered and analyzed as much as possible, and the identification accuracy and identification efficiency of the sensitive data are improved.

In an optional implementation manner, the desensitizing processing is sequentially performed on the target message data to be identified by using each target desensitizing rule, so as to determine target sensitive data existing in the target message data to be identified, and the method further includes:

judging whether the target message data to be identified is structured data or not; when the target message data to be identified is structured data, analyzing the data structure of the target message data to be identified to obtain a target data structure; judging whether the data set type desensitization rule contains a target data structure or not; if the data set type desensitization rule contains a target data structure, judging whether the target message data to be identified contains a preset mark characteristic value or not; if the target message data to be identified contains the preset mark characteristic value, the target message data to be identified is determined to be target sensitive data.

judging whether the target message data to be identified contains file transmission or not; when the target message data to be identified contains file transmission data, judging whether the file type corresponding to the file transmission data is consistent with the file suffix; if the file type corresponding to the file transmission data is consistent with the file suffix, judging whether the data file type desensitization rule contains the file name corresponding to the file transmission data; if the data file type desensitization rule contains a file name, determining the file transmission data as target sensitive data.

In an alternative embodiment, the method further comprises:

acquiring a preset desensitization algorithm; and (3) performing desensitization treatment on the target sensitive data by using a preset desensitization algorithm to obtain desensitized data.

The invention carries out desensitization treatment on the identified sensitive data, and can avoid the leakage of the sensitive data.

In a second aspect, the invention provides a sensitive data identification device based on a zero-trust protection system, which is used for a sensitive data identification system based on the zero-trust protection system, wherein the sensitive data identification system based on the zero-trust protection system is connected with a terminal access system; the sensitive data identification device based on the zero trust protection system comprises:

the generation module is used for generating at least one target message data based on the sensitive data identification request by the terminal access system when the sensitive data identification request exists; the acquisition module is used for acquiring target message data to be identified from each target message data generated by the terminal access system based on the sensitive data identification system of the zero trust protection system; the selection module is used for selecting at least one target desensitization rule in a preset desensitization rule set based on target message data to be identified; the processing and determining module is used for sequentially desensitizing the target message data to be identified by utilizing each target desensitizing rule, and determining target sensitive data existing in the target message data to be identified.

In a third aspect, the present invention provides a computer readable storage medium having stored thereon computer instructions for causing a computer to perform the sensitive data identification method based on the zero trust protection architecture of the first aspect or any one of its corresponding embodiments.

In a fourth aspect, the present invention provides a computer device comprising: the processor is in communication connection with the memory, the memory stores computer instructions, and the processor executes the computer instructions to perform the sensitive data identification method based on the zero trust protection system according to the first aspect or any implementation manner corresponding to the first aspect.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a connection structure between a sensitive data identification system and a terminal access system based on a zero trust protection system according to an embodiment of the present invention;

FIG. 2 is a flow diagram of a method for identifying sensitive data based on a zero trust protection architecture according to an embodiment of the present invention;

FIG. 3 is a flow chart of another method for identifying sensitive data based on a zero trust protection architecture according to an embodiment of the present invention;

FIG. 4 is a flow chart of yet another method for identifying sensitive data based on a zero trust protection architecture according to an embodiment of the present invention;

FIG. 5A is a flow chart of yet another method for identifying sensitive data based on a zero trust protection architecture according to an embodiment of the present invention;

FIG. 5B is a flow chart of yet another method for identifying sensitive data based on a zero trust protection architecture according to an embodiment of the present invention;

FIG. 5C is a flow chart of yet another method for identifying sensitive data based on a zero trust protection architecture according to an embodiment of the present invention;

FIG. 6 is a schematic illustration of an end user access flow in accordance with an embodiment of the invention;

FIG. 7 is a schematic diagram of a desensitization algorithm according to an embodiment of the invention;

FIG. 8 is a schematic diagram of a desensitization rule according to an embodiment of the invention;

FIG. 9 is a block diagram of a sensitive data identification device based on a zero trust protection architecture in accordance with an embodiment of the present invention;

Fig. 10 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Currently, the most widely used sensitive data discovery method is to perform recognition by combining a machine learning and Natural Language Processing (NLP) method, but the following problems are considered, and the accuracy and efficiency of recognition of sensitive data cannot be guaranteed in the prior art:

complexity of data acquisition and processing: sensitive data discovery requires the acquisition and processing of large amounts of data, which may involve spanning multiple data sources and systems. Ensuring data integrity, consistency, and accuracy is a challenge. In addition, sensitive data discovery tools and platforms need to be able to efficiently process large-scale data and provide high performance and scalability.

False alarm and missing report problems: sensitive data discovery techniques may face false positives and false negatives. False positives refer to erroneously identifying non-sensitive data as sensitive data, while false negatives refer to failure to correctly identify and discover truly sensitive data. Accuracy and reliability are key indicators of sensitive data discovery, and algorithms and models need to be continuously improved and optimized to reduce false alarm and missing report situations.

Challenges for heterogeneous data: the sensitive data of the organization may be stored in different formats and systems, including databases, file systems, cloud storage, and the like. For processing and accessing these heterogeneous data, it may be desirable to address the issues of data integration and format conversion, ensuring that sensitive data can be comprehensively overlaid and analyzed.

By comprehensively considering the problems, the invention provides a sensitive data identification method based on a zero trust protection system, which improves the identification accuracy and efficiency of sensitive data through means of automation, multidimensional analysis and the like, helps organization to discover and protect the sensitive data, and is used for coping with increasing data security challenges.

In accordance with an embodiment of the present invention, there is provided an embodiment of a sensitive data identification method based on a zero trust protection architecture, it being noted that the steps illustrated in the flowchart of the figures may be performed in a computer system, such as a set of computer executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order other than that illustrated herein.

In this embodiment, a method for identifying sensitive data based on a zero trust protection system is provided, which can be used for a sensitive data identification system 1 based on a zero trust protection system, as shown in fig. 1, where the sensitive data identification system 1 based on the zero trust protection system is connected with a terminal access system 2; FIG. 2 is a flow chart of a method for identifying sensitive data based on a zero trust protection architecture according to an embodiment of the present invention, as shown in FIG. 2, the flow includes the following steps:

in step S201, when there is a sensitive data identification request, the terminal access system generates at least one target message data based on the sensitive data identification request.

Specifically, when the terminal access system 2 receives the sensitive data identification request, the terminal access system 2 generates at least one corresponding target message data according to the received sensitive data identification request.

Step S202, a sensitive data identification system based on a zero trust protection system acquires target message data to be identified from each target message data generated by a terminal access system.

Specifically, the sensitive data identification system 1 based on the zero trust protection system and the terminal access system 2 are independent from each other, so when sensitive data identification is performed in the sensitive data identification system 1 based on the zero trust protection system, firstly, message data of sensitive data to be identified, namely target message data to be identified, needs to be obtained from the terminal access system 2.

Further, the sensitive data identification system 1 and the terminal access system 2 based on the zero trust protection system are independent from each other, so that the user access experience is not affected.

Step S203, selecting at least one target desensitization rule in a preset desensitization rule set based on the target to-be-identified message data.

The preset desensitization rule set may include a data item type desensitization rule, a data set type desensitization rule, a data file type desensitization rule, and the like.

Specifically, one or more proper desensitization rules can be selected in a preset desensitization rule set according to the obtained target message data to be identified, and further, most data types and sensitive data identification requirements can be met, the sensitive data can be fully covered and analyzed as far as possible, and the identification accuracy and identification efficiency of the sensitive data are improved.

Step S204, desensitizing the target message data to be identified by utilizing each target desensitizing rule in turn, and determining target sensitive data existing in the target message data to be identified.

Specifically, the target message data to be identified is desensitized by using one or more selected target desensitization rules, so that sensitive data in the target message data to be identified, namely target sensitive data, can be identified and found.

According to the sensitive data identification method based on the zero trust protection system, sensitive data identification is performed in the sensitive data identification system based on the zero trust protection system, and the sensitive data identification system based on the zero trust protection system and the terminal access system are mutually independent, so that user access experience is not affected. Further, different target desensitization rules can be selected according to the acquired target message data to be identified, one or a plurality of target desensitization rules can be adopted, most data types and sensitive data identification requirements can be met, the sensitive data can be fully covered and analyzed as far as possible, and the identification accuracy and the identification efficiency of the sensitive data are improved.

In this embodiment, a method for identifying sensitive data based on a zero trust protection system is provided, which can be used for a sensitive data identification system 1 based on a zero trust protection system, as shown in fig. 1, where the sensitive data identification system 1 based on the zero trust protection system is connected with a terminal access system 2; FIG. 3 is a flow chart of a method for identifying sensitive data based on a zero trust protection architecture according to an embodiment of the present invention, as shown in FIG. 3, the flow includes the following steps:

in step S301, when there is a sensitive data identification request, the terminal access system generates at least one target message data based on the sensitive data identification request.

As shown in fig. 1, the terminal access system 2 includes: a user terminal 21, a gateway proxy 22, a storage system 23 and application services 24. Further, the number of application services 24 may be one or more.

Specifically, the step S301 includes:

in step S3011, the user terminal establishes a sensitive data identification request, and sends the sensitive data identification request to the gateway proxy.

Specifically, the user creation sensitive data identification task corresponding to the user terminal 21 may include information such as a task deadline, an application system (service) required to perform the task, and a data type. Further, the user terminal establishes a corresponding sensitive data identification request based on the sensitive data identification task.

Further, after the sensitive data identification request is established, the corresponding sensitive data identification task can continuously run in the background of the terminal access system 2.

Further, the user terminal 21 sends the sensitive data identification request to the gateway proxy 22.

In step S3012, the gateway proxy obtains at least one initial message data of each application service from the corresponding application service based on the sensitive data identification request.

Specifically, after receiving the sensitive data identification request, the gateway proxy 22 first parses the sensitive data identification request to obtain at least one corresponding access address (uniform resource locator, URL). Then, the corresponding application service 24 is accessed according to each access address agent, and response data, i.e., initial message data, of each application service 24 is acquired. Where URL means a method of identification, also called "web address", for fully describing the addresses of web pages and other resources on the Internet.

Step S3013, the gateway agent analyzes each initial message data to obtain at least one target message data, and sends each target message data to a storage system for storage.

Specifically, the gateway proxy 22 parses each obtained initial message data to obtain corresponding target message data, and then asynchronously sends each obtained target message data to the storage system 23 for storage in order to reduce the user access latency.

Further, gateway proxy 22 feeds back the response result to user terminal 21.

Step S302, a sensitive data identification system based on a zero trust protection system acquires target message data to be identified from each target message data generated by a terminal access system.

Specifically, the step S302 includes:

in step S3021, the sensitive data identification system based on the zero trust protection system acquires initial message data to be identified from each target message data generated by the terminal access system.

Specifically, when the sensitive data identification request starts to run, the sensitive data identification system 1 based on the zero trust protection system continuously acquires the message data, i.e. the initial message data to be identified, from the storage system 23 in the terminal access system 2.

Step S3022, determining whether the initial message data to be identified has been identified by the sensitive data identification system based on the zero trust protection system.

Specifically, whether the initial message data to be identified is identified by the anaphylactic data or not can be judged according to the access URL, namely whether the initial message data to be identified is identified by a sensitive data identification system based on a zero trust protection system or not.

In step S3023, if the initial message data to be identified is not identified by the sensitive data identification system based on the zero trust protection system, the initial message data to be identified is determined as the target message data to be identified.

Specifically, if it is determined that the initial message data to be identified does not already carry out the anaphylactic sense data identification, the initial message data to be identified is used as the message data of the sensitive data to be identified, namely the target message data to be identified.

Step S3024, if the initial message data to be identified has been identified by the sensitive data identification system based on the zero trust protection system, continuing to acquire the initial message data to be identified from each target message data generated by the terminal access system until the acquired initial message data to be identified is not identified by the sensitive data identification system based on the zero trust protection system, thereby obtaining the target message data to be identified.

Specifically, if it is determined that the initial message data to be identified has been identified by anaphylactic sense data, new message data is continuously obtained from each target message data generated by the terminal access system until the target message data to be identified, which is not identified by the sensitive data identification system based on the zero trust protection system, is obtained.

Step S303, selecting at least one target desensitization rule in a preset desensitization rule set based on the target to-be-identified message data. Please refer to step S203 in the embodiment shown in fig. 2 in detail, which is not described herein.

Step S304, desensitizing the target message data to be identified by utilizing each target desensitizing rule in turn, and determining target sensitive data existing in the target message data to be identified. Please refer to step S204 in the embodiment shown in fig. 2 in detail, which is not described herein.

The sensitive data identification method based on the zero trust protection system provided by the embodiment generates the message data in the terminal access system, is mutually independent from the sensitive data identification process, and does not influence the user access experience. Further, when the sensitive data identification system based on the zero trust protection system is used for acquiring the message data to be identified from the terminal access system, each acquired message data is judged, and only the sensitive data identification process is processed, so that the sensitive data identification efficiency is improved.

In this embodiment, a method for identifying sensitive data based on a zero trust protection system is provided, which can be used for a sensitive data identification system 1 based on a zero trust protection system, as shown in fig. 1, where the sensitive data identification system 1 based on the zero trust protection system is connected with a terminal access system 2; FIG. 4 is a flowchart of a method for identifying sensitive data based on a zero trust protection architecture according to an embodiment of the present invention, as shown in FIG. 4, the flowchart comprising the steps of:

in step S401, when there is a sensitive data identification request, the terminal access system generates at least one target message data based on the sensitive data identification request. Please refer to step S301 in the embodiment shown in fig. 3 in detail, which is not described herein.

Step S402, the sensitive data identification system based on the zero trust protection system acquires target message data to be identified from each target message data generated by the terminal access system. Please refer to step S302 in the embodiment shown in fig. 3 in detail, which is not described herein.

Step S403, selecting at least one target desensitization rule in a preset desensitization rule set based on the target message data to be identified.

Specifically, the step S403 includes:

step S4031, when the target message data to be identified exists, selecting the data item type desensitization rule from the preset desensitization rule set as the target desensitization rule.

Specifically, as long as there is message data to be identified by a target, a data item type desensitization rule in a preset desensitization rule set may be used as a target desensitization rule.

Step S4032, when the target message data to be identified is structured data, selecting the data set type desensitization rule from the preset desensitization rule set as the target desensitization rule.

Specifically, when the target message data to be identified is structured data, the data set type desensitization rule in the preset desensitization rule set can be used as the target desensitization rule.

In step S4033, when the target message data to be identified includes file transmission, a data file type desensitization rule is selected from the preset desensitization rule set as a target desensitization rule.

Specifically, when the target message data to be identified includes file transmission, the data file type desensitization rule in the preset desensitization rule set may be used as the target desensitization rule.

Further, when the target message data to be identified simultaneously accords with the plurality of conditions described above, each desensitization rule corresponding to each condition can be used as the target desensitization rule.

Step S404, desensitizing the target message data to be identified by utilizing each target desensitizing rule in turn, and determining target sensitive data existing in the target message data to be identified. Please refer to step S204 in the embodiment shown in fig. 2 in detail, which is not described herein.

Step S405, acquiring a preset desensitizing algorithm.

Specifically, the preset desensitization algorithm may include three algorithms of replacing with a fixed character, replacing with a fixed value, and replacing with a calculated value.

The substitution includes three strategies of fixed data, erasing data and data labels, and the substitution includes algorithm such as BASE64, MD5, CRC32, chinese address mask and the like in a numerical mode after operation.

And step S406, performing desensitization processing on the target sensitive data by using a preset desensitization algorithm to obtain desensitized data.

Specifically, the desensitization treatment flow is as follows:

(1) Fixed character algorithm: according to the replacement sequence and the replacement strategy set by the algorithm, calculating the position of the part of the sensitive data to be replaced, then circulating the sensitive data, and replacing the content of the position by using the replacement character set by the algorithm;

(2) The specified values are: replacing the sensitive data part with a value appointed by an algorithm, and if the sensitive data part is erased, clearing the sensitive data part, and replacing the data label with a data label to which the rule belongs;

(3) Numerical value after calculation: the sensitive data content is processed by a preset value algorithm, so that the information summary algorithm encryption of the data can be performed, and the address information or numbers in the sensitive data content can be replaced by the encryption character strings.

According to the sensitive data identification method based on the zero trust protection system, different target desensitization rules can be selected according to the acquired target message data to be identified, one or more target desensitization rules can be selected, most data types and sensitive data identification requirements can be met, the sensitive data can be fully covered and analyzed as much as possible, and the identification accuracy and the identification efficiency of the sensitive data are improved. Further, the desensitization processing is carried out on the identified sensitive data, so that the leakage of the sensitive data can be avoided.

In this embodiment, a method for identifying sensitive data based on a zero trust protection system is provided, which can be used for a sensitive data identification system 1 based on a zero trust protection system, as shown in fig. 1, where the sensitive data identification system 1 based on the zero trust protection system is connected with a terminal access system 2; fig. 5A-5C are flowcharts of a method for identifying sensitive data based on a zero trust protection architecture according to an embodiment of the present invention, as shown in fig. 5A-5C, the flowchart comprising the steps of:

in step S501, when there is a sensitive data identification request, the terminal access system generates at least one target message data based on the sensitive data identification request. Please refer to step S301 in the embodiment shown in fig. 3 in detail, which is not described herein.

Step S502, the sensitive data identification system based on the zero trust protection system acquires target message data to be identified from each target message data generated by the terminal access system. Please refer to step S302 in the embodiment shown in fig. 3 in detail, which is not described herein.

Step S503, selecting at least one target desensitization rule in a preset desensitization rule set based on the target to-be-identified message data. Please refer to step S403 in the embodiment shown in fig. 4 in detail, which is not described herein.

Step S504, desensitizing the target message data to be identified by utilizing each target desensitizing rule in turn, and determining target sensitive data existing in the target message data to be identified.

Preferably, as shown in fig. 5A, the step S504 includes:

in step S5041A, it is determined whether the target message data to be identified includes data conforming to the data item type desensitization rule.

The data item type desensitization rule is to judge the sensitivity of partial content in the target message data to be identified by setting a regular or key word. For example, the data portion of the data item type desensitization rule containing the keyword password is sensitive data.

Specifically, whether the target message data to be identified contains the regular or key words set in the data item type desensitization rule is judged.

In step S5042A, if the target message data to be identified includes data conforming to the data item type desensitization rule, the data conforming to the data item type desensitization rule is determined as target sensitive data.

Specifically, if the target message data to be identified contains a keyword set in a data item type desensitization rule, the data item identification corresponding to the keyword in the target message data to be identified is marked as target sensitive data.

For example, when the message data is { "username": "wangcl", "password": when "abc123" }, the keyword password accords with the set data item type desensitization rule, and then abc123 is the target sensitive data.

Preferably, as shown in fig. 5B, the step S504 further includes:

step S5041B, determine whether the target message data to be identified is structured data.

Specifically, whether the target message data to be identified is structured data can be judged, so that the problems of false alarm and missing report of sensitive data are avoided.

In step S5042B, when the target to-be-identified message data is structured data, the data structure of the target to-be-identified message data is parsed to obtain the target data structure.

Specifically, when the target to-be-identified message data is structured data, the data structure of the target to-be-identified message data needs to be analyzed, and the data structure corresponding to the target to-be-identified message data, namely, the target data structure, is obtained.

In step S5043B, it is determined whether the target data structure is included in the data set type desensitization rule.

Specifically, the data set type desensitization rule includes a data structure template uploaded in advance. Therefore, it can be further determined whether the target data structure of the target message data to be identified is included in the data set type desensitization rule.

Step S5044B, if the data set type desensitization rule includes a target data structure, determining whether the target message data to be identified includes a preset flag feature value.

Specifically, if the target data structure is consistent with the pre-uploaded data structure template in the dataset type desensitization rule, the dataset type desensitization rule includes the target data structure.

Further, the data set type desensitization rule judges the integral characteristic value of the request message data, so that whether the target message data to be identified contains the preset mark characteristic value needs to be further judged.

In step S5045B, if the target to-be-identified message data includes the preset flag feature value, the target to-be-identified message data is determined as the target sensitive data.

Specifically, if the target to-be-identified message data contains a preset marking characteristic value, the overall target to-be-identified message data is identified and marked as target sensitive data.

Preferably, as shown in fig. 5C, the step S504 further includes:

S5041C, judging whether the target message data to be identified contains file transmission.

Specifically, whether the target message data to be identified contains file transmission or not can be judged, and the problems of false report and missing report of sensitive data are avoided.

S5042C, when the target message data to be identified contains file transmission data, judging whether the file type corresponding to the file transmission data is consistent with the file suffix.

Specifically, when the target message data to be identified includes file transmission data, it is further required to determine whether the file type corresponding to the file transmission data is consistent with the file suffix.

S5043C, if the file type corresponding to the file transmission data is consistent with the file suffix, judging whether the data file type desensitization rule contains the file name corresponding to the file transmission data.

Specifically, if the file type corresponding to the file transmission data is consistent with the file suffix, the data file type desensitization rule is executed in sequence, and whether the file name corresponding to the file transmission data is hit by the data file type desensitization rule is checked, namely whether the file name corresponding to the file transmission data is contained in the data file type desensitization rule is judged.

S5044C, if the file name is included in the data file type desensitization rule, determines the file transfer data as target sensitive data.

Specifically, if the data file type desensitization rule includes the file name, the corresponding file transmission data in the target message data to be identified is used as target sensitive data.

According to the sensitive data identification method based on the zero trust protection system, different desensitization rules are selected according to the target message data to be identified to conduct sensitive data identification processing, so that most data types and sensitive data identification requirements are met, sensitive data can be covered and analyzed as far as possible, and identification accuracy and identification efficiency of the sensitive data are improved.

In an example, a method for discovering sensitive data is provided, which is mainly used for protecting sensitive data of multiple application systems, in a real scene, the data format and the request mode of each application system are different, and an administrator cannot fully understand an application containing sensitive data, so that the sensitive data discovery process is automated and the discovery task and the application access process are independent for convenience and safety.

In practical application, the end user access flow is shown in fig. 6.

For the data discovery part, the method mainly comprises a preparation stage and an identification stage, and comprises the following specific steps:

1. the preparation stage:

1. desensitization algorithm setup

The desensitization algorithm is a processing mode of sensitive data, and comprises three modes of replacing with fixed characters, replacing with specified values and replacing with calculated values, wherein the replacing with the fixed values comprises three strategies of fixing values, erasing data and data labels, and the replacing with the calculated values comprises the algorithms of BASE64, MD5, CRC32, chinese address masks and the like, as shown in figure 7. Wherein BASE64 is an encoding algorithm for converting binary data into a textual representation of printable characters; MD5 is a common hash function algorithm, which is used to map data with any length into hash values with fixed length; CRC32 is a cyclic redundancy check algorithm used to detect errors during data transmission or storage.

2. Desensitization rule set

The desensitization rule is a definition process of sensitive data, and whether the sensitive data is identified mainly through judging the characteristic value of a data source. As shown in fig. 8, the desensitization rules are largely classified into the following three categories:

(1) Data item

After the request message data is analyzed, the sensitivity of part of the contents in the interactive data is judged, whether the data meeting the conditions are contained or not is judged by setting a regular or key word mode, and the data marked as sensitive data is identified. For example, the manager sets a type of desensitization rule for the data item, and the rule sets a data part containing the keyword password as sensitive data;

(2) Data set

The data set is used for judging the integral characteristic value of the request message data, a manager sets the characteristic value through uploading a data template, and when the request message data has the characteristic value, the integral request message data is identified and marked as sensitive data;

(3) Data file

The data file is obtained by analyzing the request message data, checking whether the request message data contains file transmission or not, identifying the desensitization rule of the file, and marking the file identification as sensitive data if the desensitization rule is met.

2. And (3) an identification stage:

the identification of sensitive data finds that all desensitization rules are used, and is mainly divided into the following steps:

1. creating sensitive data discovery tasks

The manager needs to create a discovery task on the management platform, and the task mainly comprises information such as task deadline, an application system needing to execute the task, data types and the like. When the task is started to run, all desensitization rules are used by default, and message data are continuously acquired from a storage system.

2. Request message parsing

When the request of the terminal arrives and the return data is acquired, the platform analyzes the message data, asynchronously sends the message data to the storage system and returns the message data, for example, the user accesses http:// www.sys.com/userinfocus=wangcc, and the detailed steps are as follows:

1) The user network request arrives at the gateway, and the gateway acquires a corresponding application system according to the access domain name;

2) The gateway analyzes the user request, and the proxy accesses the application system to acquire the response data of the application system;

3) The gateway analyzes the response message of the application system, and in order to reduce the water splash waiting for the user to access, the gateway asynchronously sends the message to the storage system and then returns the message to the user response result.

3. Identification marking stage

The data discovery program continuously reads the messages from the storage system and sequentially executes the desensitization rules to determine whether sensitive data is contained, and if the data meeting the conditions exists, the data is marked. The detailed steps are as follows:

1) The data discovery program obtains a piece of response message data after analysis from the storage system;

2) Judging whether the request is subjected to anaphylactic data discovery or not according to the access URL, if not, taking charge of skipping the subsequent step, acquiring new message data from the storage system, and restarting the judgment;

3) And acquiring message data, sequentially executing desensitization rules of all data item types, checking whether the data contains keywords set in the rules, if the keywords are hit, the part is sensitive data, and recording the URL of the request, the hit data and corresponding desensitization rule information. For example, when the packet data is { "username": "wangcl", "password": when "abc123" }, the keyword password meets the desensitization rule set above, and abc123 is sensitive data.

4) Judging whether the message data is structured data, if so, executing the desensitization rule of all data set types in turn, analyzing the data structure, judging whether the data structure is consistent with the template file uploaded by the data set rule, if so, checking whether the data contains the characteristic value marked by the manager, and if so, marking the data.

5) Judging whether the message data contains file transmission or not, if so, judging whether the file type is consistent with the file suffix, then sequentially executing the desensitization rule of the data file type, checking whether the file name is hit by the desensitization rule, and if so, marking the data.

6) When the sensitive data discovery task set by the manager is reached, the discovery program automatically stops running.

By the method for finding the sensitive data, which is provided by the embodiment, the data items, the data sets, the data files and the like can be covered by flexibly setting the desensitization rule, so that the data types and the desensitization requirements of most application systems can be met, and management personnel can be combined at will, so that the expansibility is greatly improved. Further, the sensitive data discovery process is independent of the user access system and does not affect the user access experience.

In this embodiment, a sensitive data identification device based on a zero trust protection system is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, which have been described and will not be repeated. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

The embodiment provides a sensitive data identification device based on a zero trust protection system, which is used for a sensitive data identification system based on the zero trust protection system, wherein the sensitive data identification system based on the zero trust protection system is connected with a terminal access system; as shown in fig. 9, includes:

The generating module 901 is configured to generate, when a sensitive data identification request exists, at least one target packet data based on the sensitive data identification request by the terminal access system.

The acquiring module 902 is configured to acquire target message data to be identified from each target message data generated by the terminal access system based on the sensitive data identification system of the zero trust protection system.

The selecting module 903 is configured to select at least one target desensitization rule from a preset desensitization rule set based on the target to-be-identified message data.

The processing and determining module 904 is configured to sequentially perform desensitization processing on the target to-be-identified message data by using each target desensitization rule, so as to determine target sensitive data existing in the target to-be-identified message data.

In some alternative embodiments, a terminal access system includes a user terminal, a gateway proxy, a storage system, and at least one application service; the generating module 901 includes:

and the sending unit is used for establishing a sensitive data identification request by the user terminal and sending the sensitive data identification request to the gateway proxy.

The first acquisition unit is used for acquiring at least one initial message data of each application service from the corresponding application service based on the sensitive data identification request by the gateway agent.

The analyzing unit is used for analyzing each initial message data by the gateway agent to obtain at least one target message data, and sending each target message data to the storage system for storage.

In some alternative embodiments, the obtaining module 902 includes:

the second acquisition unit is used for acquiring initial message data to be identified from each target message data generated by the terminal access system based on the sensitive data identification system of the zero trust protection system.

The first judging unit is used for judging whether the initial message data to be identified is already identified by the sensitive data identification system based on the zero trust protection system.

The first determining unit is used for determining the initial message data to be identified as target message data to be identified if the initial message data to be identified is not identified by the sensitive data identification system based on the zero trust protection system.

And the third acquisition unit is used for continuously acquiring the initial message data to be identified in each target message data generated by the terminal access system if the initial message data to be identified is already identified by the sensitive data identification system based on the zero trust protection system, and acquiring the target message data to be identified until the acquired initial message data to be identified is not identified by the sensitive data identification system based on the zero trust protection system.

In some alternative embodiments, the selection module 903 includes:

and the first selection unit is used for selecting the data item type desensitization rule from the preset desensitization rule set as the target desensitization rule when the target message data to be identified exists.

And the second selection unit is used for selecting the data set type desensitization rule from the preset desensitization rule set as the target desensitization rule when the target message data to be identified is the structured data.

And the third selection unit is used for selecting the data file type desensitization rule from the preset desensitization rule set as the target desensitization rule when the target message data to be identified contains file transmission.

In some alternative embodiments, the processing and determining module 904 includes:

and the second judging unit is used for judging whether the target message data to be identified contains data conforming to the desensitization rule of the data item type.

And the second determining unit is used for determining the data conforming to the data item type desensitization rule as target sensitive data if the target message data to be identified contains the data conforming to the data item type desensitization rule.

In some alternative embodiments, the processing and determining module 904 further includes:

and the third judging unit is used for judging whether the target message data to be identified is structured data or not.

And the analysis unit is used for analyzing the data structure of the target message data to be identified to obtain a target data structure when the target message data to be identified is structured data.

And a fourth judging unit for judging whether the data set type desensitization rule contains the target data structure.

And a fifth judging unit, configured to judge whether the target message data to be identified contains a preset flag feature value if the data set type desensitization rule contains the target data structure.

And the third determining unit is used for determining the target message data to be identified as target sensitive data if the target message data to be identified contains the preset mark characteristic value.

and the sixth judging unit is used for judging whether the target message data to be identified contains file transmission or not.

And a seventh judging unit, configured to judge, when the target message data to be identified contains file transmission data, whether a file type corresponding to the file transmission data is consistent with a file suffix.

And an eighth judging unit, configured to judge whether the data file type desensitization rule includes a file name corresponding to the file transmission data if the file type corresponding to the file transmission data is consistent with the file suffix.

And a fourth determining unit for determining the file transfer data as the target sensitive data if the file name is included in the data file type desensitization rule.

In some alternative embodiments, the sensitive data identification device based on the zero trust protection architecture further comprises:

the first acquisition module is used for acquiring a preset desensitization algorithm.

The processing module is used for carrying out desensitization processing on the target sensitive data by utilizing a preset desensitization algorithm to obtain desensitized data.

Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.

The sensitive data identification means based on the zero trust protection architecture in this embodiment is presented in the form of functional units, here referred to as ASIC (Application Specific Integrated Circuit ) circuits, processors and memories executing one or more software or fixed programs, and/or other devices that can provide the above described functionality.

The embodiment of the invention also provides computer equipment, which is provided with the sensitive data identification device based on the zero trust protection system shown in the figure 9.

Referring to fig. 10, fig. 10 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, as shown in fig. 10, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 10.

The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.

Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform a method for implementing the embodiments described above.

The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the computer device, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.

The computer device also includes a communication interface 30 for the computer device to communicate with other devices or communication networks.

The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims

1. The sensitive data identification method based on the zero trust protection system is characterized by being used for a sensitive data identification system based on the zero trust protection system, wherein the sensitive data identification system based on the zero trust protection system is connected with a terminal access system; the method comprises the following steps:

when a sensitive data identification request exists, the terminal access system generates at least one target message data based on the sensitive data identification request;

the sensitive data identification system based on the zero trust protection system acquires target message data to be identified from each target message data generated by the terminal access system;

selecting at least one target desensitization rule in a preset desensitization rule set based on the target message data to be identified;

sequentially desensitizing the target message data to be identified by utilizing each target desensitizing rule, and determining target sensitive data existing in the target message data to be identified;

Wherein, based on the message data to be identified of the target, selecting at least one target desensitization rule in a preset desensitization rule set comprises:

when the target message data to be identified exists, selecting a data item type desensitization rule from the preset desensitization rule set as the target desensitization rule;

when the target message data to be identified is structured data, selecting a data set type desensitization rule from the preset desensitization rule set as the target desensitization rule;

when the target message data to be identified contains file transmission, selecting a data file type desensitization rule from the preset desensitization rule set as the target desensitization rule;

the method for determining the target sensitivity number in the target message data to be identified comprises the following steps of:

judging whether the target message data to be identified is the structured data or not;

when the target message data to be identified is the structured data, analyzing the data structure of the target message data to be identified to obtain a target data structure;

judging whether the data set type desensitization rule contains the target data structure or not;

If the data set type desensitization rule contains the target data structure, judging whether the target message data to be identified contains a preset mark characteristic value or not;

if the target message data to be identified contains the preset mark characteristic value, determining the target message data to be identified as the target sensitive data;

the method comprises the steps of sequentially carrying out desensitization processing on the target message data to be identified by utilizing each target desensitization rule, determining target sensitive data existing in the target message data to be identified, and further comprising the following steps:

judging whether the target message data to be identified contains the file transmission or not;

when the target message data to be identified contains the file transmission data, judging whether the file type corresponding to the file transmission data is consistent with a file suffix;

if the file type corresponding to the file transmission data is consistent with the file suffix, judging whether the data file type desensitization rule contains the file name corresponding to the file transmission data;

and if the file name is contained in the data file type desensitization rule, determining the file transmission data as the target sensitive data.

2. The method of claim 1, wherein the terminal access system comprises a user terminal, a gateway proxy, a storage system, and at least one application service; when a sensitive data identification request exists, the terminal access system generates at least one target message data based on the sensitive data identification request, and the method comprises the following steps:

the user terminal establishes the sensitive data identification request and sends the sensitive data identification request to the gateway proxy;

the gateway agent acquires at least one initial message data of each application service from the corresponding application service based on the sensitive data identification request;

and the gateway agent analyzes each initial message data to obtain at least one target message data, and sends each target message data to the storage system for storage.

3. The method according to claim 1, wherein the sensitive data identification system based on the zero trust protection system obtains target message data to be identified from each target message data generated by the terminal access system, including:

the sensitive data identification system based on the zero trust protection system acquires initial message data to be identified from each target message data generated by the terminal access system;

Judging whether the initial message data to be identified is already identified by the sensitive data identification system based on the zero trust protection system;

if the initial message data to be identified is not identified by the sensitive data identification system based on the zero trust protection system, determining the initial message data to be identified as the target message data to be identified;

if the initial message data to be identified is already identified by the sensitive data identification system based on the zero trust protection system, continuing to acquire the initial message data to be identified in each target message data generated by the terminal access system until the acquired initial message data to be identified is not identified by the sensitive data identification system based on the zero trust protection system, and acquiring the target message data to be identified.

4. The method according to claim 1, wherein the step of sequentially desensitizing the target to-be-identified message data by using each target desensitization rule, and determining target sensitive data existing in the target to-be-identified message data includes:

judging whether the target message data to be identified contains data conforming to the data item type desensitization rule;

And if the target message data to be identified contains data conforming to the data item type desensitization rule, determining the data conforming to the data item type desensitization rule as the target sensitive data.

5. The method according to claim 1, wherein the method further comprises:

acquiring a preset desensitization algorithm;

and performing desensitization processing on the target sensitive data by using the preset desensitization algorithm to obtain desensitized data.

6. The sensitive data identification device based on the zero trust protection system is characterized by being used for a sensitive data identification system based on the zero trust protection system, wherein the sensitive data identification system based on the zero trust protection system is connected with a terminal access system; the device comprises:

the generation module is used for generating at least one target message data based on the sensitive data identification request by the terminal access system when the sensitive data identification request exists;

the acquisition module is used for acquiring target message data to be identified from each target message data generated by the terminal access system by the sensitive data identification system based on the zero trust protection system;

the selection module is used for selecting at least one target desensitization rule in a preset desensitization rule set based on the target message data to be identified;

The processing and determining module is used for sequentially carrying out desensitization processing on the target message data to be identified by utilizing each target desensitization rule, and determining target sensitive data existing in the target message data to be identified;

wherein, select the module, include:

the first selection unit is used for selecting the data item type desensitization rule from the preset desensitization rule set as the target desensitization rule when the target message data to be identified exist;

the second selection unit is used for selecting a data set type desensitization rule from a preset desensitization rule set as a target desensitization rule when the target message data to be identified is structured data;

the third selecting unit is used for selecting a data file type desensitization rule from a preset desensitization rule set as a target desensitization rule when the target message data to be identified contains file transmission;

wherein, the processing and determining module comprises:

the third judging unit is used for judging whether the target message data to be identified is structured data or not;

the analysis unit is used for analyzing the data structure of the target message data to be identified to obtain a target data structure when the target message data to be identified is structured data;

A fourth judging unit for judging whether the data set type desensitization rule contains a target data structure;

a fifth judging unit, configured to judge whether the target message data to be identified contains a preset flag feature value if the data set type desensitization rule contains the target data structure;

the third determining unit is used for determining the target message data to be identified as target sensitive data if the target message data to be identified contains a preset mark characteristic value;

wherein, processing and determining module still includes:

a sixth judging unit, configured to judge whether the target message data to be identified includes file transmission;

a seventh judging unit, configured to judge, when the target message data to be identified includes file transmission data, whether a file type corresponding to the file transmission data is consistent with a file suffix;

an eighth judging unit, configured to judge whether the data file type desensitization rule includes a file name corresponding to the file transmission data if the file type corresponding to the file transmission data is consistent with the file suffix;

7. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the zero trust protection system based sensitive data identification method of any one of claims 1 to 5.