CN117131236A - Sensitive data detection method and system - Google Patents

Sensitive data detection method and system Download PDF

Info

Publication number
CN117131236A
CN117131236A CN202311410918.9A CN202311410918A CN117131236A CN 117131236 A CN117131236 A CN 117131236A CN 202311410918 A CN202311410918 A CN 202311410918A CN 117131236 A CN117131236 A CN 117131236A
Authority
CN
China
Prior art keywords
sensitive data
data
mode
fragment
logic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311410918.9A
Other languages
Chinese (zh)
Other versions
CN117131236B (en
Inventor
谢朝海
齐大伟
谢朝战
雷德诚
彭波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Secidea Network Security Technology Co ltd
Original Assignee
Shenzhen Secidea Network Security Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Secidea Network Security Technology Co ltd filed Critical Shenzhen Secidea Network Security Technology Co ltd
Priority to CN202311410918.9A priority Critical patent/CN117131236B/en
Publication of CN117131236A publication Critical patent/CN117131236A/en
Application granted granted Critical
Publication of CN117131236B publication Critical patent/CN117131236B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9017Indexing; Data structures therefor; Storage structures using directory or table look-up
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to a method and a system for detecting sensitive data. The method comprises the following steps: defining sensitive data modes, constructing a mode library and a mode mapping table, respectively carrying out static analysis and dynamic analysis on a target application program to obtain a target pile insertion point, defining a class, defining a method comprising pile insertion logic on the class, and then inserting the class and the method on the target pile insertion point to detect sensitive data. Through interaction of the custom pile inserting logic, the mode library and the mode mapping table, various sensitive data can be detected, sensitive data fragments scattered in different data sources can be effectively associated and recombined in the life cycle of the data, and the sensitive data fragments are restored to complete sensitive data, so that the integrity of the sensitive data and the monitoring timeliness are ensured.

Description

Sensitive data detection method and system
Technical Field
The invention belongs to the technical field of sensitive data detection, and particularly relates to a sensitive data detection method and system.
Background
Enterprises can generate a large amount of sensitive data with great value in production activities and management, if mishandling can cause data security events, effective discovery and timely desensitization of the sensitive data are a precondition for realizing data security sharing.
Conventional sensitive data identification methods are generally more suitable for static data, which methods generally rely on static rules or pattern matching applied during data storage or transmission, are more suitable for batch processing or periodic inspection of static data, and generally have difficulty coping with real-time data dynamics, especially for data with a life cycle, data changing in real time or streaming data, and generally do not consider the life cycle of the data, which may lead to unnecessary detection overhead for data that becomes insensitive after a period of time.
And the traditional approach to detecting sensitive data segments scattered in different text or data sources can present the following difficulties: sensitive data fragments that are scattered in different text or data sources may be missed because they typically do not appear in full form and are difficult to detect by simple matching; and conventional approaches often have no built-in mechanism to reconstruct the scattered sensitive data segments into complete sensitive data, which makes it more difficult to analyze and process the data; in scattered sensitive data fragments, the context information may be lost, which means that it is difficult to understand the meaning and back story of the sensitive data, which may lead to misunderstanding or inability to efficiently process the sensitive data in some cases.
Disclosure of Invention
The invention provides a method and a system for detecting sensitive data, which aim to solve the problems mentioned in the background art.
The invention is realized in such a way that a method for detecting sensitive data is provided, comprising the following steps:
defining a sensitive data mode and constructing a mode library and a mode mapping table, wherein the sensitive data mode comprises a complete sensitive data mode and a sensitive data fragment mode, each of the sensitive data mode and the sensitive data fragment mode comprises a corresponding data structure and attribute, the attribute of the sensitive data fragment mode stores a global unique identifier of the sensitive data fragment mode, and the information of the mode mapping table comprises the sensitive data fragment mode and the global unique identifier thereof, the associated sensitive data fragment mode and the global unique identifier thereof, the relation type among the sensitive data fragments and recombination logic;
respectively carrying out static analysis and dynamic analysis on a target application program which is not operated and is operated so as to respectively obtain a static potential pile inserting point and a dynamic potential pile inserting point, and determining a target pile inserting point by combining the static potential pile inserting point and the dynamic potential pile inserting point;
in the target application, a new class is created, a method including instrumentation logic is defined in the newly created class, and the method including instrumentation logic is: receiving streaming data from each data source as parameters, traversing each sensitive data mode in a mode library to match and detect whether the streaming data contains sensitive data, judging whether the sensitive data belongs to complete sensitive data or sensitive data fragments according to the type of the sensitive data mode, if the sensitive data belongs to the complete sensitive data or the sensitive data fragments, distributing a corresponding global unique identifier for the sensitive data fragments, adding the sensitive data fragments marked with the global unique identifier into a buffer zone, matching each sensitive data fragment in the buffer zone in the life cycle of each sensitive data fragment, and calling a recombination logic in a mode mapping table in a combination zone to carry out association recombination on the matched sensitive data fragments so as to obtain the complete restored sensitive data;
Compiling a source code of a class and a method in which the instrumentation logic is written into byte codes, and inserting the byte codes into a target instrumentation point of a target application program;
when the target application program runs, the data flow is monitored in real time through pile inserting logic in the target pile inserting point, and when the data flow passes through the target pile inserting point, the data flow is analyzed through the pile inserting logic in the target pile inserting point, the existence condition of sensitive data in the data flow is detected, and corresponding data processing is carried out according to the existence condition of the sensitive data;
and if the pile inserting logic in the target pile inserting point detects the sensitive data, triggering an alarm mechanism and performing desensitization processing on the sensitive data.
Still further, the step of adding the sensitive data segment marked with the globally unique identifier to the buffer further comprises, prior to:
after detecting a sensitive data fragment, distributing an initialized function for starting a timer to the sensitive data fragment in a buffer zone, wherein a timer object is created in the function for starting the timer and a timer callback function is defined, the timer object triggers countdown after triggering time, time returns to 0 after the countdown time, and the timer callback function comprises processing logic when the timer object counts down to 0;
Defining attributes for the detected sensitive data fragments, wherein the attributes comprise data attributes, ID attributes and time attributes, storing the data content of the sensitive data fragments into the data attributes, storing the globally unique identifiers of the sensitive data fragments into the ID attributes, storing the preset trigger time and the preset countdown time into the time attributes, setting the preset trigger time to be 0, and setting the preset countdown time to be the life cycle of the sensitive data fragments;
the step of adding the sensitive data segment marked with the globally unique identifier to the buffer comprises:
and extracting the data attribute, the ID attribute and the time attribute of the sensitive data fragment, correspondingly adding the data attribute and the ID attribute of the sensitive data fragment into a list of the buffer area, and simultaneously, taking the data attribute, the ID attribute and the time attribute of the sensitive data fragment as input parameters to be input into a function for starting a timer, so that a timer object in the function for starting the timer triggers counting down after a preset triggering time and returns to 0 after the preset counting down time.
Furthermore, the step of matching each sensitive data segment in the buffer area in the life cycle of each sensitive data segment, and then calling the reorganization logic in the pattern mapping table in the combination area to perform association reorganization on the matched sensitive data segments to obtain restored complete sensitive data comprises the following steps:
Before the time of the timer object corresponding to the current sensitive data fragment returns to 0, searching a global unique identifier of the sensitive data fragment related to the current sensitive data fragment from a mode mapping table according to the ID attribute of the current sensitive data fragment;
performing one-time traversal matching on the global unique identifier of the associated sensitive data fragment in the list of the current buffer area to find out whether the sensitive data fragment with the same ID attribute as the global unique identifier of the associated sensitive data fragment exists in the list of the current buffer area;
if the data exists, matching is successful, the data attribute and the ID attribute of the current sensitive data fragment and the matched sensitive data fragment are called to a combined area, the data attribute and the ID attribute of the current sensitive data fragment and the matched sensitive data fragment in a list of a buffer area are deleted, and a function of a starting timer corresponding to the sensitive data fragment and the matched sensitive data fragment in the buffer area is initialized;
and acquiring the relation type and the reorganization logic between the corresponding sensitive data fragments from the mode mapping table, and carrying out association reorganization on the sensitive data fragments in the combination area according to the relation type and the reorganization logic between the sensitive data fragments so as to obtain the restored complete sensitive data.
Further, the step of performing a traversal matching on the global unique identifier of the associated sensitive data segment in the current list of the buffer area to find whether the sensitive data segment with the ID attribute identical to the global unique identifier of the associated sensitive data segment exists in the current list of the buffer area further includes:
if the time of the timer object corresponding to the current sensitive data fragment is not 0, performing secondary traversal or multiple traversal until the matching is successful or the time of the timer object corresponding to the current sensitive data fragment is 0, and stopping the matching;
if the matching is unsuccessful when the matching is stopped, deleting the data attribute and the ID attribute of the current sensitive data segment in the list of the buffer area, and initializing the function of a starting timer corresponding to the current sensitive data segment in the buffer area.
Further, the step of performing association reorganization on the sensitive data fragments in the combination area according to the relationship types and the reorganization logic between the sensitive data fragments to obtain restored complete sensitive data includes:
if the relationship type is a chain type relationship, determining a sensitive data segment with a relationship attribute of a main father node and a sensitive data segment with a relationship attribute of a child node, and respectively acquiring data of the main father node and data of the child node;
Respectively extracting a child association field and a parent association field from data of the child node and data of a main parent node;
associating the child association field with the parent association field to obtain a chain relationship;
and connecting the data of the child nodes to the data of the main parent node according to the chain relation to obtain the recombined sensitive data.
Further, the step of respectively performing static analysis and dynamic analysis on the target application program when not running and when running to respectively obtain a static potential pile inserting point and a dynamic potential pile inserting point, and determining the target pile inserting point by combining the static potential pile inserting point and the dynamic potential pile inserting point includes:
when the target application program does not run, a static code analysis tool is used for carrying out static analysis on a source code of the target application program so as to identify and acquire static potential instrumentation points in the code and mark the static potential instrumentation points, wherein the static potential instrumentation points are key points of database query, data processing, file reading and writing or network communication;
when the target application program runs, tracking analysis is carried out on the execution process of the target application program and the data flow when the target application program runs by running a dynamic analysis tool on the target application program so as to identify paths through which the data flow passes between different methods and capture operation points related to sensitive data on the corresponding paths, wherein the operation points are used as dynamic potential instrumentation points, the dynamic potential instrumentation points are marked, and the operations related to the sensitive data comprise, but are not limited to, input, output, storage and transmission of the sensitive data;
The potential pile inserting point marked by the secondary is identified and obtained as a target pile inserting point, so that sensitive information in the flowing data can be monitored and processed in real time at the target pile inserting point.
The invention also provides a detection system of the sensitive data, which is used for executing the detection method of the sensitive data, and comprises the following steps:
the construction module comprises: the method comprises the steps of defining a sensitive data mode and constructing a mode library and a mode mapping table, wherein the sensitive data mode comprises a complete sensitive data mode and a sensitive data fragment mode, each sensitive data mode comprises a corresponding data structure and a corresponding attribute, the attribute of the sensitive data fragment mode stores a global unique identifier of the sensitive data fragment mode, and the information of the mode mapping table comprises the sensitive data fragment mode and the global unique identifier thereof, the associated sensitive data fragment mode and the global unique identifier thereof, the relation type among the sensitive data fragments and recombination logic;
pile inserting point module: the method comprises the steps of respectively carrying out static analysis and dynamic analysis on a target application program when the target application program is not running and when the target application program is running so as to respectively obtain a static potential pile inserting point and a dynamic potential pile inserting point, and determining a target pile inserting point by combining the static potential pile inserting point and the dynamic potential pile inserting point;
Pile inserting logic module: for creating a new class in the target application, defining a method comprising instrumentation logic in the newly created class, in which method: receiving streaming data from each data source as parameters, traversing each sensitive data mode in a mode library to match and detect whether the streaming data contains sensitive data, judging whether the sensitive data belongs to complete sensitive data or sensitive data fragments according to the type of the sensitive data mode, if the sensitive data belongs to the complete sensitive data or the sensitive data fragments, distributing a corresponding global unique identifier for the sensitive data fragments, adding the sensitive data fragments marked with the global unique identifier into a buffer zone, matching each sensitive data fragment in the buffer zone in the life cycle of each sensitive data fragment, and calling a recombination logic in a mode mapping table in a combination zone to carry out association recombination on the matched sensitive data fragments so as to obtain the complete restored sensitive data;
and (3) an insertion module: the source code is used for compiling the class and the method which are compiled with the instrumentation logic into byte codes and inserting the byte codes into a target instrumentation point of a target application program;
and a detection module: the system comprises a target pile inserting point, a pile inserting logic, a data flow, a data processing module and a data processing module, wherein the target pile inserting point is used for inserting a target application program into the data flow, and the data flow is used for carrying out real-time monitoring on the data flow through the pile inserting logic in the target pile inserting point when the data flow passes through the target pile inserting point, analyzing the data flow through the pile inserting logic in the target pile inserting point, detecting the existence condition of sensitive data in the data flow and carrying out corresponding data processing according to the existence condition of the sensitive data;
Desensitization module: and the pile inserting logic is used for triggering an alarm mechanism and desensitizing the sensitive data if the pile inserting logic in the target pile inserting point detects the sensitive data.
Compared with the prior art, the method and the system for detecting the sensitive data have the advantages that the pile inserting logic is inserted into the pile inserting point obtained through static and dynamic comprehensive analysis to detect the pile inserting, so that the data flow during the operation of the target application program can be monitored in real time, the existence of the sensitive data can be detected in time, the pile inserting logic can be executed in real time according to the actual execution condition of the application program, the detection is flexible, the method and the system are suitable for dynamically-changed application program environments, and the sensitive data can be captured and analyzed more accurately. And monitoring the streaming data can discover much of the context information of the data, such as the data source, data type, data flow direction, associated data over the active time period, thereby facilitating capturing sensitive data segments associated over the active time period. Through interaction of the custom pile inserting logic, the mode library and the mode mapping table, various sensitive data can be detected, such as complete sensitive data and split sensitive data fragments, and the sensitive data fragments dispersed in different data sources can be effectively associated and recombined in the life cycle of the data to be restored into complete sensitive data, so that the integrity and timeliness of the sensitive data are ensured, and the method is very useful for analysis and processing of aged scattered data.
Drawings
FIG. 1 is a flow chart of a method for detecting sensitive data provided by the invention;
FIG. 2 is a system block diagram of a sensitive data detection system provided by the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Example 1
Referring to fig. 1, a first embodiment provides a method for detecting sensitive data, including steps S101 to S106:
s101, defining a sensitive data mode and constructing a mode library and a mode mapping table, wherein the sensitive data mode comprises a complete sensitive data mode and a sensitive data fragment mode, each sensitive data mode comprises a corresponding data structure and an attribute, the attribute of the sensitive data fragment mode stores a global unique identifier of the sensitive data fragment mode, and the information of the mode mapping table comprises the sensitive data fragment mode and the global unique identifier thereof, the associated sensitive data fragment mode and the global unique identifier thereof, the relation type among the sensitive data fragments and the reorganization logic.
It should be noted that the construction of the pattern library and the pattern mapping table provides a basis for subsequent instrumentation logic and detection, allowing the system to monitor the data flow in real time during operation, and detect and reorganize sensitive data according to the information in the pattern library and the pattern mapping table.
Sensitive data patterns are defined to explicitly define the structure and properties of sensitive data, and can be divided into two types: a complete sensitive data pattern and a sensitive data fragment pattern. The complete sensitive data schema describes the structure of the complete sensitive data object, including information such as data fields, data types, data formats, etc. The sensitive data fragment patterns are used to describe partial fragments of sensitive data, and include the structure of the fragments and corresponding attributes, such as globally unique identifiers, and the definition of these patterns should describe the structure and attributes of sensitive data in detail for subsequent detection and reassembly.
Once the sensitive data patterns are well defined, the patterns are stored in a collection, and the pattern library is a warehouse for storing the sensitive data patterns for subsequent matching and detection, and in the pattern library, the complete sensitive data patterns and sensitive data fragment patterns can be stored for subsequent matching and comparison.
The pattern mapping table is used for storing information related to different sensitive data fragment patterns, and comprises the following steps: sensitive data fragment patterns and globally unique identifiers thereof: for uniquely identifying different patterns of sensitive data fragments; associated sensitive data fragment patterns and globally unique identifiers thereof: the association relationship between different sensitive data fragment modes, such as father-son relationship, chain relationship and the like, is described; relationship type: the relationship type between different sensitive data fragment patterns is defined, which facilitates subsequent reorganization logic; recombination logic: it is described how these sensitive data fragments are reorganized according to the association to restore the complete sensitive data.
S102, respectively performing static analysis and dynamic analysis on the target application program when the target application program is not running and is running to respectively obtain a static potential pile inserting point and a dynamic potential pile inserting point, and determining the target pile inserting point by combining the static potential pile inserting point and the dynamic potential pile inserting point.
It should be noted that this step is intended to provide a target instrumentation point for implementation of instrumentation logic, involving both static and dynamic analysis to ensure efficient sensitive data detection and reorganization. The following is a detailed analysis:
Static analysis is an analysis of the target application without running it, the goal of which is to find potential instrumentation points that statically have in the code characteristics of sensitive data operations or data streaming, which may include specific function calls, data processing operations, conditional branches, etc., which helps to determine possible instrumentation points ahead of time.
Dynamic analysis is an analysis of the target application at runtime, which includes monitoring the execution of the application at actual runtime to identify potential instrumentation points. Dynamic analysis typically involves tracking data flows, function calls, code execution paths, etc. to find dynamic behavior related to sensitive data, which helps determine instrumentation points at actual run-time.
The static analysis and the dynamic analysis are used in combination, so that possible pile inserting points can be covered completely, accuracy and effectiveness of target pile inserting points are guaranteed, and the system can accurately monitor data flow, detect sensitive data and record and reorganize in real time. Static analysis provides theoretical potential stake points, while dynamic analysis verifies whether these points are indeed relevant to the operation of sensitive data in actual operation, and by combining the information of both, the final target stake point list can be determined. The key of the whole process is to ensure that the pile insertion point can cover the processing operation of sensitive data, so that the system can monitor and reorganize the data in real time during operation, thereby improving the safety of the data.
In combination with static and dynamic information, target instrumentation points are determined that will be used to implement instrumentation logic, monitor the data stream in real time and perform detection and reorganization of sensitive data, which is typically included around sensitive data processing operations to ensure detection as sensitive data enters and exits the application.
The step of respectively performing static analysis and dynamic analysis on the target application program when not running and running to respectively obtain a static potential pile inserting point and a dynamic potential pile inserting point, and determining the target pile inserting point by combining the static potential pile inserting point and the dynamic potential pile inserting point comprises the following steps:
when the target application program does not run, a static code analysis tool is used for carrying out static analysis on a source code of the target application program so as to identify and acquire static potential instrumentation points in the code and mark the static potential instrumentation points, wherein the static potential instrumentation points are key points of database query, data processing, file reading and writing or network communication;
when the target application program runs, tracking analysis is carried out on the execution process of the target application program and the data flow when the target application program runs by running a dynamic analysis tool on the target application program so as to identify paths through which the data flow passes between different methods and capture operation points related to sensitive data on the corresponding paths, wherein the operation points are used as dynamic potential instrumentation points, the dynamic potential instrumentation points are marked, and the operations related to the sensitive data comprise, but are not limited to, input, output, storage and transmission of the sensitive data;
The potential pile inserting point marked by the secondary is identified and obtained as a target pile inserting point, so that sensitive information in the flowing data can be monitored and processed in real time at the target pile inserting point.
It should be noted that this step describes how to obtain the target instrumentation point in detail, and the key of the whole process is to ensure that the instrumentation point can cover operations related to sensitive data processing, so that the instrumentation logic can monitor and reorganize data in real time during operation, and improve data security. The combination of static and dynamic analysis is to ensure the effectiveness and accuracy of the stake points.
When the target application is not running, the source code of the target application is analyzed using a static code analysis tool, such as a static analyzer or decompiler, with the goal of identifying potential static instrumentation points in the source code, which are code segments associated with sensitive data processing, which may cover key operations such as database queries, data processing, file reading and writing, or network communications.
While the target application is running, trace analysis is performed on the execution of the application using a dynamic analysis tool, such as a debugger or code execution tracker, which is intended to monitor the execution of the application at actual runtime in order to capture dynamic behavior related to sensitive data, including the tracking of paths that data streams pass through between different methods and capture the operating points related to sensitive data on these paths.
The results of the static and dynamic analysis are combined to identify secondary marked potential stake-insertion points that are both marked in the static analysis and verified as being relevant to the sensitive data in the dynamic analysis. This process ensures the accuracy and validity of instrumentation points because they exist statically in code and are dynamically verified at runtime.
These secondary marked potential instrumentation points are identified and obtained, which will be used as target instrumentation points that will be used to implement instrumentation logic to monitor the data stream in real time and to detect and reorganize the sensitive data, which instrumentation points are typically located around the sensitive data operations to ensure detection as the sensitive data enters and exits the application.
S103, creating a new class in the target application program, and defining a method containing instrumentation logic in the newly created class, wherein the method of the instrumentation logic comprises the following steps: receiving streaming data from each data source as parameters, traversing each sensitive data mode in the mode library to match and detect whether the streaming data contains sensitive data, judging whether the sensitive data belongs to complete sensitive data or sensitive data fragments according to the type of the sensitive data mode, if the sensitive data belongs to the complete sensitive data or the sensitive data fragments, distributing a corresponding global unique identifier for the sensitive data fragments, adding the sensitive data fragments marked with the global unique identifier into a buffer zone, matching each sensitive data fragment in the buffer zone in the life cycle of each sensitive data fragment, and calling a recombination logic in a mode mapping table in a combination zone to carry out association recombination on the matched sensitive data fragments so as to obtain the complete sensitive data after restoration.
It should be noted that the purpose of this step is to construct a powerful data detection and processing system that can process data from various sources, identify and correlate sensitive data, and reorganize it in order to maintain the security and integrity of the data. The instrumentation logic is responsible for receiving data from various data sources, performing detection and processing operations on sensitive data to ensure data security. By judging according to the pattern matching and the data type, the sensitive data fragments can be marked and added into a buffer area to prepare for the subsequent association reorganization. And carrying out association recombination on the matched fragments by recombination logic in the mode mapping table so as to restore the matched fragments into complete sensitive data. This step is the core of the whole data detection and processing process, ensuring real-time monitoring and processing of sensitive data.
A new class is created in the target application, the class is used for realizing the instrumentation logic, and a method containing the instrumentation logic is defined in the new class and used for receiving streaming data from each data source as parameters. The method receives streaming data from different data sources, which may include user input, network data, file reads and writes, and the like. The instrumentation logic passes these data to subsequent processing steps.
In the instrumentation logic, the instrumentation code traverses a pattern library that contains definitions of sensitive data patterns, including complete sensitive data patterns and sensitive data fragment patterns. Pattern matching is performed on the streaming data to detect whether sensitive data is contained in the data. The instrumentation logic may determine whether the streaming data belongs to complete sensitive data or sensitive data fragments based on the sensitive data pattern type. This step is to distinguish whether the data requires further processing, and if it is a sensitive data fragment, the next operation will be performed.
If the streaming data is determined to be a sensitive data segment, the instrumentation logic assigns a corresponding Globally Unique Identifier (GUID) to tag the sensitive data segment and adds it to the buffer. The buffer is used to store these sensitive data fragments so that they can be matched with the associated fragments later. The instrumentation logic may match each sensitive data segment in the buffer to find out whether there is an associated sensitive data segment. This is to find the fragment with which it is associated for subsequent associative reorganization.
Finally, the instrumentation logic will invoke the reassembly logic in the pattern mapping table in the assembly zone. The pattern mapping table includes a pattern of sensitive data fragments and their globally unique identifiers, an associated pattern of sensitive data fragments and their globally unique identifiers, a type of relationship between the sensitive data fragments, and reassembly logic. And according to the reorganization logic defined in the mode mapping table, the matched sensitive data fragments are associated and reorganized to obtain restored complete sensitive data.
Wherein the step of adding the sensitive data segment marked with the globally unique identifier to the buffer further comprises, before:
after detecting a sensitive data fragment, distributing an initialized function for starting a timer to the sensitive data fragment in a buffer zone, wherein a timer object is created in the function for starting the timer and a timer callback function is defined, the timer object triggers countdown after triggering time, time returns to 0 after the countdown time, and the timer callback function comprises processing logic when the timer object counts down to 0;
defining attributes for the detected sensitive data fragments, wherein the attributes comprise data attributes, ID attributes and time attributes, storing the data content of the sensitive data fragments into the data attributes, storing the globally unique identifiers of the sensitive data fragments into the ID attributes, storing the preset trigger time and the preset countdown time into the time attributes, setting the preset trigger time to be 0, and setting the preset countdown time to be the life cycle of the sensitive data fragments;
the step of adding the sensitive data segment marked with the globally unique identifier to the buffer comprises:
and extracting the data attribute, the ID attribute and the time attribute of the sensitive data fragment, correspondingly adding the data attribute and the ID attribute of the sensitive data fragment into a list of the buffer area, and simultaneously, taking the data attribute, the ID attribute and the time attribute of the sensitive data fragment as input parameters to be input into a function for starting a timer, so that a timer object in the function for starting the timer triggers counting down after a preset triggering time and returns to 0 after the preset counting down time.
It should be noted that, the purpose of this step is to allocate an independent timer function to each segment after detecting the sensitive data segment, so as to ensure that the life cycle of each segment can be managed independently, trigger when the sensitive data segment is added into the buffer, return to 0 when the life cycle of the sensitive data segment ends, facilitate the subsequent pairing of the sensitive data segment in the life cycle within the countdown time of the timer, and if the pairing is successful, the valid and complete sensitive data can be recombined, and if the pairing is unsuccessful, the sensitive data segment in the buffer is deleted in time, so as to ensure that after the life cycle of the sensitive data segment ends, the corresponding data is deleted from the buffer, so as to release resources and ensure the safety and high efficiency of the data management.
Whenever a sensitive data fragment is detected, a separate function is assigned to the fragment which functions to trigger after a specific trigger time, here set to 0, i.e. the sensitive data fragment starts immediately when it is added to the buffer, and the preset countdown time (life cycle of the sensitive data fragment) is reset to 0.
Inside the function that starts the timer, a timer object is created. This timer object is a tool for managing timing and is typically provided by a development environment or programming language.
A callback function is also defined that contains processing logic that should be executed when the timer counts down to 0, such as deleting the relevant sensitive data segment in the buffer.
The timer object needs to set a trigger time, here, when the buffer is added, since the buffer is added immediately after the sensitive data segment is detected, that is, when the preset trigger time is 0 after the sensitive data segment is detected, the timer is triggered immediately.
To ensure that stale sensitive data segments in the buffer are deleted after the life cycle of the sensitive data segments has ended, the countdown time of the timer object is typically set to the life cycle of the sensitive data segments. Thus, when the countdown time is reached, the timer will trigger a callback function to delete the stale sensitive data segment in the buffer. This arrangement allows efficient management of buffers and freeing up storage resources, ensuring that data will not continue to be retained after its lifecycle has ended, which is important for efficient processing of sensitive data fragments.
The relevant data attributes, ID attributes and time attributes are extracted before adding the sensitive data segment tagged with the globally unique identifier to the buffer, including the content of the sensitive data segment, the globally unique identifier, the trigger time and the countdown time. Passing these attributes as input parameters to the function of starting the timer ensures that the timer object can correctly set the trigger time and the countdown time.
Further, the step of matching each sensitive data segment in the buffer area in the life cycle of each sensitive data segment, and then calling the reorganization logic in the pattern mapping table in the combination area to perform association reorganization on the matched sensitive data segments to obtain the restored complete sensitive data includes:
before the time of the timer object corresponding to the current sensitive data fragment returns to 0, searching a global unique identifier of the sensitive data fragment related to the current sensitive data fragment from a mode mapping table according to the ID attribute of the current sensitive data fragment;
performing one-time traversal matching on the global unique identifier of the associated sensitive data fragment in the list of the current buffer area to find out whether the sensitive data fragment with the same ID attribute as the global unique identifier of the associated sensitive data fragment exists in the list of the current buffer area;
if the data exists, matching is successful, the data attribute and the ID attribute of the current sensitive data fragment and the matched sensitive data fragment are called to a combined area, the data attribute and the ID attribute of the current sensitive data fragment and the matched sensitive data fragment in a list of a buffer area are deleted, and a function of a starting timer corresponding to the sensitive data fragment and the matched sensitive data fragment in the buffer area is initialized;
And acquiring the relation type and the reorganization logic between the corresponding sensitive data fragments from the mode mapping table, and carrying out association reorganization on the sensitive data fragments in the combination area according to the relation type and the reorganization logic between the sensitive data fragments so as to obtain the restored complete sensitive data.
It should be noted that, the purpose of this step is to perform association and reorganization on the successfully matched sensitive data segments according to the relationship between the sensitive data segments and the reorganization logic before the time of the timer object corresponding to the current sensitive data segment is reset to 0, which is in the life cycle of the sensitive data segments, so as to obtain the restored complete sensitive data, thereby ensuring the integrity and timely validity of the sensitive data, which is very important in some application scenarios. Taking a medical system as an example, in a medical system, sensitive data may be distributed among multiple data segments, and different sensing devices may generate different types of data, such as vital signs of a patient, drug records, laboratory test results, etc., which may need to be reorganized into complete patient records in order to be able to obtain comprehensive patient information to support medical decisions. Meanwhile, vital signs of a patient need to be monitored in real time, and timely response is made to abnormal conditions, if data fragments are not recombined in time, inaccurate monitoring can be caused, and medical care is delayed.
From the pattern mapping table, the ID properties of the current sensitive data fragment are used to find globally unique identifiers of other sensitive data fragments associated with the current sensitive data fragment, which identifiers are used to indicate which sensitive data fragments are relevant.
Performing one-time traversal matching on the globally unique identifier of the associated sensitive data fragment and the sensitive data fragment in the list of the buffer area, wherein the goal is to find out whether the sensitive data fragment with the same ID attribute as the globally unique identifier of the associated sensitive data fragment exists in the list of the buffer area.
If so, the matching is successful, i.e. the associated sensitive data fragment is found in the buffer, and the following operations are performed:
and retrieving the data attribute and the ID attribute of the current sensitive data fragment and the successfully matched sensitive data fragment to a combined area. And delete the current sensitive data segments and the successfully matched sensitive data segments in the list of buffers to ensure that they are not processed multiple times.
The function of the start timer in the buffer corresponding to the current sensitive data segment and the successfully matched sensitive data segment is initialized in order to subsequently reassign the newly detected sensitive data segments to the timers for the next round of matching and processing.
The relation type and the reorganization logic between the current sensitive data fragment and the associated sensitive data fragment are obtained from the mode mapping table, and the information comprises the operation logic of reorganizing the associated fragment into complete sensitive data according to the corresponding relation type.
Depending on the relationship type and reassembly logic between sensitive data fragments, associative reassembly is performed in the combined region, which may involve concatenation, merging, stitching or other operations, depending on the relationship type and reassembly logic, and finally, the restored complete sensitive data may be extracted from the combined region for use.
Further, the step of performing a traversal matching on the global unique identifier of the associated sensitive data segment in the current list of the buffer area to find out whether the sensitive data segment with the same ID attribute as the global unique identifier of the associated sensitive data segment exists in the current list of the buffer area further includes:
if the time is not the same, performing secondary traversal or multiple traversals until the matching is successful or the time of the timer object corresponding to the current sensitive data fragment is 0, and stopping the matching;
if the matching is unsuccessful when the matching is stopped, deleting the data attribute and the ID attribute of the current sensitive data segment in the list of the buffer area, and initializing the function of a starting timer corresponding to the current sensitive data segment in the buffer area.
It should be noted that, during the countdown time, if one traversal of the list of the buffer area does not find a matching associated sensitive data segment, a second traversal or even multiple traversals may be performed until the matching is successful or the time of the timer object corresponding to the current sensitive data segment is 0, the matching is stopped, the occurrence of the associated sensitive data segments is sometimes not completely synchronized, and may have a slight time difference, but as long as they are matched and associated reorganized in their respective life cycles, the reorganized complete sensitive data is still valid. This approach can cope with time differences or incomplete synchronization that occur in practical applications, ensuring data integrity and validity, and in some applications, especially in scenarios where real-time data analysis and reassembly is required, this mechanism is very important, since data may not occur at the same time, but eventually needs to be reassembled in the correct way to meet the needs of analysis and processing.
Further, the step of performing association recombination on the sensitive data fragments in the combination area according to the relationship types and the recombination logic between the sensitive data fragments to obtain the restored complete sensitive data comprises the following steps:
If the relationship type is a chain type relationship, determining a sensitive data segment with a relationship attribute of a main father node and a sensitive data segment with a relationship attribute of a child node, and respectively acquiring data of the main father node and data of the child node;
respectively extracting a child association field and a parent association field from data of the child node and data of a main parent node;
associating the child association field with the parent association field to obtain a chain relationship;
and connecting the data of the child nodes to the data of the main parent node according to the chain relation to obtain the recombined sensitive data.
It should be noted that such a chained relationship is typically used in a parent-child relationship of data, such as a relationship between an order and an order item, the order being the primary parent node and the order item being the child node.
For other relationship types, such as tree relationships, parallel relationships, etc., the manner of processing may vary, depending on the relationship type and reassembly logic. If there is a tree relationship between sensitive data fragments, then the primary parent node may have multiple child nodes, in which case the reassembly logic needs to consider how to connect the multiple child nodes to the primary parent node to restore the complete data. Each node may be peer if there is a parallel relationship between sensitive data segments. The reassembly logic needs to handle how the nodes are connected in parallel to restore the complete data.
Different relationship types require different logic to perform the association reorganization to ensure the integrity of the sensitive data, which also depends on the actual structure and relationship of the data in the application.
S104, compiling the source code of the class and the method in which the instrumentation logic is written into byte codes, and inserting the byte codes into a target instrumentation point of a target application program.
After the classes and methods of the instrumentation logic are compiled, the instrumentation logic in the classes and methods monitors the data flow, detects the sensitive data, and processes the discovered sensitive data.
The class and method in which the source code contains instrumentation logic must be compiled into bytecode, an intermediate representation that can be executed on a Java Virtual Machine (JVM).
The compiled bytecode must be inserted into the target instrumentation point of the target application, which typically requires modification of the bytecode or class file of the target application in order to inject instrumentation logic at runtime.
And S105, when the target application program runs, monitoring the data flow in real time through pile inserting logic in the target pile inserting point, analyzing the data flow through the pile inserting logic in the target pile inserting point when the data flow passes through the target pile inserting point, detecting the existence condition of sensitive data in the data flow, and carrying out corresponding data processing according to the existence condition of the sensitive data.
It should be noted that the target application program starts to run, and the instrumentation logic is activated accordingly. The instrumentation logic begins monitoring the data stream as it passes through the target instrumentation point, which may be user input, database queries, file reads and writes, network communications, etc.
The instrumentation logic analyzes the data stream and detects whether the data contains sensitive information, which typically involves interaction with a pattern library and pattern mapping table to identify sensitive data and its type, and performing association reorganization of identified sensitive data segments.
And S106, if the pile inserting logic in the target pile inserting point detects the sensitive data, triggering an alarm mechanism and performing desensitization processing on the sensitive data.
It should be noted that if the instrumentation logic detects sensitive data, it triggers an alarm mechanism. This may include generating an alarm, logging, sending a notification, or other security measures.
Depending on the presence of the detected sensitive data, the instrumentation logic may need to take further action, possibly including data desensitization processing to ensure that the sensitive data does not leak, which may be encryption, substitution, deletion, etc. of the data.
According to the detection method of the sensitive data, the pile inserting logic is inserted into the pile inserting point obtained through static and dynamic comprehensive analysis to detect the pile inserting, so that the data flow during the operation of the target application program can be monitored in real time, the existence of the sensitive data can be detected in time, the pile inserting logic can be executed in real time according to the actual execution condition of the application program, the detection is flexible, the detection method is suitable for dynamically-changed application program environments, and the sensitive data can be captured and analyzed more accurately. And monitoring the streaming data can discover much of the context information of the data, such as the data source, data type, data flow direction, associated data over the active time period, thereby facilitating capturing sensitive data segments associated over the active time period. Through interaction of the custom pile inserting logic, the mode library and the mode mapping table, various sensitive data can be detected, such as complete sensitive data and split sensitive data fragments, and the sensitive data fragments dispersed in different data sources can be effectively associated and recombined in the life cycle of the data to be restored into complete sensitive data, so that the integrity and timeliness of the sensitive data are ensured, and the method is very useful for analysis and processing of aged scattered data.
Example two
Referring to fig. 2, a second embodiment provides a system for detecting sensitive data, including:
the construction module comprises: the method comprises the steps of defining a sensitive data mode and constructing a mode library and a mode mapping table, wherein the sensitive data mode comprises a complete sensitive data mode and a sensitive data fragment mode, each sensitive data mode comprises a corresponding data structure and a corresponding attribute, the attribute of the sensitive data fragment mode stores a global unique identifier of the sensitive data fragment mode, and the information of the mode mapping table comprises the sensitive data fragment mode and the global unique identifier thereof, the associated sensitive data fragment mode and the global unique identifier thereof, the relation type among the sensitive data fragments and recombination logic;
pile inserting point module: the method comprises the steps of respectively carrying out static analysis and dynamic analysis on a target application program when the target application program is not running and when the target application program is running so as to respectively obtain a static potential pile inserting point and a dynamic potential pile inserting point, and determining a target pile inserting point by combining the static potential pile inserting point and the dynamic potential pile inserting point;
pile inserting logic module: for creating a new class in the target application, defining a method comprising instrumentation logic in the newly created class, in which method: receiving streaming data from each data source as parameters, traversing each sensitive data mode in a mode library to match and detect whether the streaming data contains sensitive data, judging whether the sensitive data belongs to complete sensitive data or sensitive data fragments according to the type of the sensitive data mode, if the sensitive data belongs to the complete sensitive data or the sensitive data fragments, distributing a corresponding global unique identifier for the sensitive data fragments, adding the sensitive data fragments marked with the global unique identifier into a buffer zone, matching each sensitive data fragment in the buffer zone in the life cycle of each sensitive data fragment, and calling a recombination logic in a mode mapping table in a combination zone to carry out association recombination on the matched sensitive data fragments so as to obtain the complete restored sensitive data;
And (3) an insertion module: the source code is used for compiling the class and the method which are compiled with the instrumentation logic into byte codes and inserting the byte codes into a target instrumentation point of a target application program;
and a detection module: the system comprises a target pile inserting point, a pile inserting logic, a data flow, a data processing module and a data processing module, wherein the target pile inserting point is used for inserting a target application program into the data flow, and the data flow is used for carrying out real-time monitoring on the data flow through the pile inserting logic in the target pile inserting point when the data flow passes through the target pile inserting point, analyzing the data flow through the pile inserting logic in the target pile inserting point, detecting the existence condition of sensitive data in the data flow and carrying out corresponding data processing according to the existence condition of the sensitive data;
desensitization module: and the pile inserting logic is used for triggering an alarm mechanism and desensitizing the sensitive data if the pile inserting logic in the target pile inserting point detects the sensitive data.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims (7)

1. A method for detecting sensitive data, comprising the steps of:
defining a sensitive data mode and constructing a mode library and a mode mapping table, wherein the sensitive data mode comprises a complete sensitive data mode and a sensitive data fragment mode, each of the sensitive data mode and the sensitive data fragment mode comprises a corresponding data structure and attribute, the attribute of the sensitive data fragment mode stores a global unique identifier of the sensitive data fragment mode, and the information of the mode mapping table comprises the sensitive data fragment mode and the global unique identifier thereof, the associated sensitive data fragment mode and the global unique identifier thereof, the relation type among the sensitive data fragments and recombination logic;
Respectively carrying out static analysis and dynamic analysis on a target application program which is not operated and is operated so as to respectively obtain a static potential pile inserting point and a dynamic potential pile inserting point, and determining a target pile inserting point by combining the static potential pile inserting point and the dynamic potential pile inserting point;
in the target application, a new class is created, a method including instrumentation logic is defined in the newly created class, and the method including instrumentation logic is: receiving streaming data from each data source as parameters, traversing each sensitive data mode in a mode library to match and detect whether the streaming data contains sensitive data, judging whether the sensitive data belongs to complete sensitive data or sensitive data fragments according to the type of the sensitive data mode, if the sensitive data belongs to the complete sensitive data or the sensitive data fragments, distributing a corresponding global unique identifier for the sensitive data fragments, adding the sensitive data fragments marked with the global unique identifier into a buffer zone, matching each sensitive data fragment in the buffer zone in the life cycle of each sensitive data fragment, and calling a recombination logic in a mode mapping table in a combination zone to carry out association recombination on the matched sensitive data fragments so as to obtain the complete restored sensitive data;
Compiling a source code of a class and a method in which the instrumentation logic is written into byte codes, and inserting the byte codes into a target instrumentation point of a target application program;
when the target application program runs, the data flow is monitored in real time through pile inserting logic in the target pile inserting point, and when the data flow passes through the target pile inserting point, the data flow is analyzed through the pile inserting logic in the target pile inserting point, the existence condition of sensitive data in the data flow is detected, and corresponding data processing is carried out according to the existence condition of the sensitive data;
and if the pile inserting logic in the target pile inserting point detects the sensitive data, triggering an alarm mechanism and performing desensitization processing on the sensitive data.
2. The method of claim 1, wherein the step of adding the sensitive data segment marked with the globally unique identifier to the buffer further comprises, before:
after detecting a sensitive data fragment, distributing an initialized function for starting a timer to the sensitive data fragment in a buffer zone, wherein a timer object is created in the function for starting the timer and a timer callback function is defined, the timer object triggers countdown after triggering time, time returns to 0 after the countdown time, and the timer callback function comprises processing logic when the timer object counts down to 0;
Defining attributes for the detected sensitive data fragments, wherein the attributes comprise data attributes, ID attributes and time attributes, storing the data content of the sensitive data fragments into the data attributes, storing the globally unique identifiers of the sensitive data fragments into the ID attributes, storing the preset trigger time and the preset countdown time into the time attributes, setting the preset trigger time to be 0, and setting the preset countdown time to be the life cycle of the sensitive data fragments;
the step of adding the sensitive data segment marked with the globally unique identifier to the buffer comprises:
and extracting the data attribute, the ID attribute and the time attribute of the sensitive data fragment, correspondingly adding the data attribute and the ID attribute of the sensitive data fragment into a list of the buffer area, and simultaneously, taking the data attribute, the ID attribute and the time attribute of the sensitive data fragment as input parameters to be input into a function for starting a timer, so that a timer object in the function for starting the timer triggers counting down after a preset triggering time and returns to 0 after the preset counting down time.
3. The method for detecting sensitive data according to claim 2, wherein the step of matching each sensitive data segment in the buffer area during the life cycle of each sensitive data segment, and then invoking the reorganization logic in the pattern mapping table in the combination area to perform association reorganization on the matched sensitive data segments to obtain the restored complete sensitive data comprises:
Before the time of the timer object corresponding to the current sensitive data fragment returns to 0, searching a global unique identifier of the sensitive data fragment related to the current sensitive data fragment from a mode mapping table according to the ID attribute of the current sensitive data fragment;
performing one-time traversal matching on the global unique identifier of the associated sensitive data fragment in the list of the current buffer area to find out whether the sensitive data fragment with the same ID attribute as the global unique identifier of the associated sensitive data fragment exists in the list of the current buffer area;
if the data exists, matching is successful, the data attribute and the ID attribute of the current sensitive data fragment and the matched sensitive data fragment are called to a combined area, the data attribute and the ID attribute of the current sensitive data fragment and the matched sensitive data fragment in a list of a buffer area are deleted, and a function of a starting timer corresponding to the sensitive data fragment and the matched sensitive data fragment in the buffer area is initialized;
and acquiring the relation type and the reorganization logic between the corresponding sensitive data fragments from the mode mapping table, and carrying out association reorganization on the sensitive data fragments in the combination area according to the relation type and the reorganization logic between the sensitive data fragments so as to obtain the restored complete sensitive data.
4. A method of detecting sensitive data according to claim 3, wherein after the step of performing a traversal match on the global unique identifier of the associated sensitive data segment in the current list of buffers to find whether there is a sensitive data segment in the current list of buffers whose ID attribute is the same as the global unique identifier of the associated sensitive data segment, the method further comprises:
if the time of the timer object corresponding to the current sensitive data fragment is not 0, performing secondary traversal or multiple traversal until the matching is successful or the time of the timer object corresponding to the current sensitive data fragment is 0, and stopping the matching;
if the matching is unsuccessful when the matching is stopped, deleting the data attribute and the ID attribute of the current sensitive data segment in the list of the buffer area, and initializing the function of a starting timer corresponding to the current sensitive data segment in the buffer area.
5. The method for detecting sensitive data according to claim 3, wherein the step of performing association reorganization on the sensitive data according to the relationship type between the sensitive data fragments and the reorganization logic in the combination area to obtain the restored complete sensitive data comprises:
if the relationship type is a chain type relationship, determining a sensitive data segment with a relationship attribute of a main father node and a sensitive data segment with a relationship attribute of a child node, and respectively acquiring data of the main father node and data of the child node;
Respectively extracting a child association field and a parent association field from data of the child node and data of a main parent node;
associating the child association field with the parent association field to obtain a chain relationship;
and connecting the data of the child nodes to the data of the main parent node according to the chain relation to obtain the recombined sensitive data.
6. The method for detecting sensitive data according to claim 1, wherein the step of performing static analysis and dynamic analysis on the target application program when not running and when running to obtain a static potential pile insertion point and a dynamic potential pile insertion point, respectively, and determining the target pile insertion point by combining the static potential pile insertion point and the dynamic potential pile insertion point comprises:
when the target application program does not run, a static code analysis tool is used for carrying out static analysis on a source code of the target application program so as to identify and acquire static potential instrumentation points in the code and mark the static potential instrumentation points, wherein the static potential instrumentation points are key points of database query, data processing, file reading and writing or network communication;
when the target application program runs, tracking analysis is carried out on the execution process of the target application program and the data flow when the target application program runs by running a dynamic analysis tool on the target application program so as to identify paths through which the data flow passes between different methods and capture operation points related to sensitive data on the corresponding paths, wherein the operation points are used as dynamic potential instrumentation points, the dynamic potential instrumentation points are marked, and the operations related to the sensitive data comprise, but are not limited to, input, output, storage and transmission of the sensitive data;
The potential pile inserting point marked by the secondary is identified and obtained as a target pile inserting point, so that sensitive information in the flowing data can be monitored and processed in real time at the target pile inserting point.
7. A system for detecting sensitive data, comprising:
the construction module comprises: the method comprises the steps of defining a sensitive data mode and constructing a mode library and a mode mapping table, wherein the sensitive data mode comprises a complete sensitive data mode and a sensitive data fragment mode, each sensitive data mode comprises a corresponding data structure and a corresponding attribute, the attribute of the sensitive data fragment mode stores a global unique identifier of the sensitive data fragment mode, and the information of the mode mapping table comprises the sensitive data fragment mode and the global unique identifier thereof, the associated sensitive data fragment mode and the global unique identifier thereof, the relation type among the sensitive data fragments and recombination logic;
pile inserting point module: the method comprises the steps of respectively carrying out static analysis and dynamic analysis on a target application program when the target application program is not running and when the target application program is running so as to respectively obtain a static potential pile inserting point and a dynamic potential pile inserting point, and determining a target pile inserting point by combining the static potential pile inserting point and the dynamic potential pile inserting point;
pile inserting logic module: for creating a new class in the target application, defining a method comprising instrumentation logic in the newly created class, in which method: receiving streaming data from each data source as parameters, traversing each sensitive data mode in a mode library to match and detect whether the streaming data contains sensitive data, judging whether the sensitive data belongs to complete sensitive data or sensitive data fragments according to the type of the sensitive data mode, if the sensitive data belongs to the complete sensitive data or the sensitive data fragments, distributing a corresponding global unique identifier for the sensitive data fragments, adding the sensitive data fragments marked with the global unique identifier into a buffer zone, matching each sensitive data fragment in the buffer zone in the life cycle of each sensitive data fragment, and calling a recombination logic in a mode mapping table in a combination zone to carry out association recombination on the matched sensitive data fragments so as to obtain the complete restored sensitive data;
And (3) an insertion module: the source code is used for compiling the class and the method which are compiled with the instrumentation logic into byte codes and inserting the byte codes into a target instrumentation point of a target application program;
and a detection module: the system comprises a target pile inserting point, a pile inserting logic, a data flow, a data processing module and a data processing module, wherein the target pile inserting point is used for inserting a target application program into the data flow, and the data flow is used for carrying out real-time monitoring on the data flow through the pile inserting logic in the target pile inserting point when the data flow passes through the target pile inserting point, analyzing the data flow through the pile inserting logic in the target pile inserting point, detecting the existence condition of sensitive data in the data flow and carrying out corresponding data processing according to the existence condition of the sensitive data;
desensitization module: and the pile inserting logic is used for triggering an alarm mechanism and desensitizing the sensitive data if the pile inserting logic in the target pile inserting point detects the sensitive data.
CN202311410918.9A 2023-10-28 2023-10-28 Sensitive data detection method and system Active CN117131236B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311410918.9A CN117131236B (en) 2023-10-28 2023-10-28 Sensitive data detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311410918.9A CN117131236B (en) 2023-10-28 2023-10-28 Sensitive data detection method and system

Publications (2)

Publication Number Publication Date
CN117131236A true CN117131236A (en) 2023-11-28
CN117131236B CN117131236B (en) 2024-02-02

Family

ID=88861360

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311410918.9A Active CN117131236B (en) 2023-10-28 2023-10-28 Sensitive data detection method and system

Country Status (1)

Country Link
CN (1) CN117131236B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030103451A1 (en) * 2001-11-30 2003-06-05 Lutgen Craig L. Method and apparatus for managing congestion in a data communication network
CN105183642A (en) * 2015-08-18 2015-12-23 中国人民解放军信息工程大学 Instrumentation based program behavior acquisition and structural analysis method
US20160277306A1 (en) * 2013-11-29 2016-09-22 Huawei Technologies Co., Ltd. Data Stream Identifying Method and Device
CN106055980A (en) * 2016-05-30 2016-10-26 南京邮电大学 Rule-based JavaScript security testing method
CN107666486A (en) * 2017-09-27 2018-02-06 清华大学 A kind of network data flow restoration methods and system based on message protocol feature
CN109739946A (en) * 2018-12-25 2019-05-10 华联世纪工程咨询股份有限公司 The generation method and device of project data packet
CN115600241A (en) * 2022-10-07 2023-01-13 北京中安星云软件技术有限公司(Cn) Data stream real-time desensitization method and system based on big data technology
CN116722994A (en) * 2023-03-22 2023-09-08 北京站酷网络科技有限公司 Data detection method and device, electronic equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030103451A1 (en) * 2001-11-30 2003-06-05 Lutgen Craig L. Method and apparatus for managing congestion in a data communication network
US20160277306A1 (en) * 2013-11-29 2016-09-22 Huawei Technologies Co., Ltd. Data Stream Identifying Method and Device
CN105183642A (en) * 2015-08-18 2015-12-23 中国人民解放军信息工程大学 Instrumentation based program behavior acquisition and structural analysis method
CN106055980A (en) * 2016-05-30 2016-10-26 南京邮电大学 Rule-based JavaScript security testing method
CN107666486A (en) * 2017-09-27 2018-02-06 清华大学 A kind of network data flow restoration methods and system based on message protocol feature
CN109739946A (en) * 2018-12-25 2019-05-10 华联世纪工程咨询股份有限公司 The generation method and device of project data packet
CN115600241A (en) * 2022-10-07 2023-01-13 北京中安星云软件技术有限公司(Cn) Data stream real-time desensitization method and system based on big data technology
CN116722994A (en) * 2023-03-22 2023-09-08 北京站酷网络科技有限公司 Data detection method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN117131236B (en) 2024-02-02

Similar Documents

Publication Publication Date Title
Guo et al. Characterizing and detecting resource leaks in Android applications
Zhou et al. API deprecation: a retrospective analysis and detection method for code examples on the web
US8635204B1 (en) Mining application repositories
Kothari et al. Deriving state machines from TinyOS programs using symbolic execution
US11436133B2 (en) Comparable user interface object identifications
Ducasse et al. Object-oriented legacy system trace-based logic testing
Sultana et al. Evaluating micro patterns and software metrics in vulnerability prediction
US11119899B2 (en) Determining potential test actions
CN117131236B (en) Sensitive data detection method and system
WO2023143426A1 (en) Performance analysis programming framework, method and apparatus
Amintabar et al. ExceptionTracer: A solution recommender for exceptions in an integrated development environment
de Boer et al. Combining monitoring with run-time assertion checking
Liu A general framework to detect design patterns by combining static and dynamic analysis techniques
Kurath Analyzing serializability of cassandra applications
Fadel Techniques for the abstraction of system call traces to facilitate the understanding of the behavioural aspects of the Linux kernel
Leemans Hierarchical process mining for scalable software analysis
Masud et al. Automatic inference of task parallelism in task-graph-based actor models
CN112464242A (en) Webpage platform vulnerability collection method, system, terminal and storage medium
Arcelli et al. Design pattern detection in java systems: A dynamic analysis based approach
Gordon et al. Precise and comprehensive provenance tracking for android devices
US9471788B2 (en) Evaluation of software applications
Salah An environment for comprehending the behavior of software systems
Zhao et al. Studying and Complementing the Use of Identifiers in Logs
Mathur Java Smell Detector
CN111984311B (en) Software structure reproduction method based on running log

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant