CN117668908A - Data desensitizing method, device, electronic equipment and storage medium - Google Patents

Data desensitizing method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117668908A
CN117668908A CN202311662330.2A CN202311662330A CN117668908A CN 117668908 A CN117668908 A CN 117668908A CN 202311662330 A CN202311662330 A CN 202311662330A CN 117668908 A CN117668908 A CN 117668908A
Authority
CN
China
Prior art keywords
data
data set
identification
desensitization
desensitized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311662330.2A
Other languages
Chinese (zh)
Inventor
郑磊
唐守忠
张晶
王隆玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202311662330.2A priority Critical patent/CN117668908A/en
Publication of CN117668908A publication Critical patent/CN117668908A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data desensitizing method, a data desensitizing device, electronic equipment and a storage medium. Wherein the data desensitization method comprises the following steps: acquiring a data set to be desensitized, wherein the data set to be desensitized is a full metadata set; determining a data identification rule in response to rule setting operation for the data set to be desensitized, and carrying out iterative identification on the data set to be desensitized based on the data identification rule to obtain a target desensitized data set in the data set to be desensitized; and determining a target desensitization strategy in response to a strategy setting operation aiming at the target desensitization data set, and carrying out data desensitization on the target desensitization data set based on the target desensitization strategy to obtain the target desensitization data set with the desensitization completed. Based on the technical scheme of the embodiment of the invention, the data desensitization efficiency, flexibility and accuracy can be improved.

Description

Data desensitizing method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data desensitizing method, a data desensitizing device, an electronic device, and a storage medium.
Background
In order to realize unified and secure management of enterprise data, most enterprises usually perform desensitization processing on the data in the data management process, but due to the redundancy of enterprise-level data, performing sensitive identification and desensitization processing on tens of millions of form-level data is quite complex.
In the related art, generally, data are classified roughly, and then classified data are manually subjected to sensitive identification and desensitization treatment by a person skilled in the relevant art, but obviously, the method for manually performing data desensitization has the technical problems of poor efficiency, flexibility and accuracy.
Disclosure of Invention
The invention provides a data desensitization method, a device, electronic equipment and a storage medium, which are used for solving the technical problems of poor data desensitization efficiency, flexibility and accuracy.
According to an aspect of the present invention, there is provided a data desensitization method, wherein the method comprises:
acquiring a data set to be desensitized, wherein the data set to be desensitized is a full metadata set;
determining a data identification rule in response to rule setting operation for the data set to be desensitized, and carrying out iterative identification on the data set to be desensitized based on the data identification rule to obtain a target desensitized data set in the data set to be desensitized;
and determining a target desensitization strategy in response to a strategy setting operation aiming at the target desensitization data set, and carrying out data desensitization on the target desensitization data set based on the target desensitization strategy to obtain the target desensitization data set with the desensitization completed.
According to another aspect of the present invention, there is provided a data desensitizing apparatus, wherein the apparatus comprises:
the data acquisition module is used for acquiring a data set to be desensitized, wherein the data set to be desensitized is a full metadata set;
the iteration identification module is used for responding to the rule setting operation aiming at the data set to be desensitized, determining a data identification rule, and carrying out iteration identification on the data set to be desensitized based on the data identification rule to obtain a target desensitized data set in the data set to be desensitized;
and the data desensitization module is used for responding to the strategy setting operation aiming at the target desensitization data set, determining a target desensitization strategy, and carrying out data desensitization on the target desensitization data set based on the target desensitization strategy to obtain the target desensitization data set with the desensitization completed.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data desensitization method according to any of the embodiments of the invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to perform a data desensitization method according to any embodiment of the present invention.
According to the technical scheme, the data set to be desensitized is obtained, wherein the data set to be desensitized is a full metadata set; determining a data identification rule in response to rule setting operation for the data set to be desensitized, and carrying out iterative identification on the data set to be desensitized based on the data identification rule to obtain a target desensitized data set in the data set to be desensitized; and determining a target desensitization strategy in response to a strategy setting operation aiming at the target desensitization data set, and carrying out data desensitization on the target desensitization data set based on the target desensitization strategy to obtain the target desensitization data set with the desensitization completed. The data desensitization scheme (namely the data identification rule and the target desensitization strategy) can be customized based on the user requirement and the data characteristics of the data to be desensitized, so that the data desensitization scheme is formulated by human-computer interaction, the full-automatic data desensitization based on the formulated data desensitization scheme is realized, and the data desensitization efficiency, flexibility and accuracy are improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for desensitizing data according to a first embodiment of the present invention;
FIG. 2 is a flow chart of a method for desensitizing data according to a second embodiment of the invention;
FIG. 3 is a flow chart of an iterative identification provided in accordance with an embodiment of the present invention;
FIG. 4 is an overall flow chart of a method for desensitizing data provided in accordance with an embodiment of the present invention;
FIG. 5 is a schematic diagram of a data desensitizing apparatus according to a third embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device implementing a data desensitizing method according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of a data desensitizing method according to an embodiment of the present invention, where the method may be applied to a case of multi-segment progressive sensitive data identification, and the method may be performed by a data desensitizing device, where the data desensitizing device may be implemented in a form of hardware and/or software, and the data desensitizing device may be configured in a computer. As shown in fig. 1, the method includes:
s110, acquiring a data set to be desensitized, wherein the data set to be desensitized is a full-quantity metadata set.
Wherein the data set to be desensitized may be understood as a data set to be desensitized. Alternatively, the data set to be desensitized may be a full volume metadata set. In an embodiment of the invention, the data set to be desensitized comprises a plurality of data to be desensitized. The data to be desensitized may be preset according to the scene requirement, and is not specifically limited herein. The data to be desensitized may be, for example, enterprise data.
Optionally, the data set to be desensitized includes sensitive data and non-sensitive data.
The full metadata set may be understood as a data set of the full description data.
And S120, determining a data identification rule in response to the rule setting operation for the data set to be desensitized, and carrying out iterative identification on the data set to be desensitized based on the data identification rule to obtain a target desensitized data set in the data set to be desensitized.
Wherein the rule setting operation may be understood as an operation of setting the data identification rule. In the embodiment of the present invention, the rule setting operation may be preset according to the scene requirement, which is not specifically limited herein. Alternatively, the rule setting operation may be a selection operation for a preset rule option.
The data recognition rule may be understood as a rule for iteratively recognizing the data set to be desensitized. In the embodiment of the present invention, the data identification rule may be preset according to the scene requirement, which is not specifically limited herein. Optionally, the data identification rule may include at least one of an identifiable metadata field, a number of iterative identification times, and a sensitive data identification mode corresponding to each iterative identification operation.
Wherein the identifiable metadata field may be understood as a metadata field identified in an iterative identification process. In the embodiment of the present invention, the identifiable metadata field may be preset based on the rule setting operation, which is not specifically limited herein. Alternatively, the identifiable metadata field may include a field chinese name, a field english name, or a field type length, etc.
The number of iterative recognition may be understood as the number of iterative recognition. In the embodiment of the present invention, the number of iterative recognition times may be preset based on the rule setting operation, which is not specifically limited herein. The number of iterative recognition times may be 3, 6, 8, or the like, for example.
The sensitive data identification mode can be understood as a mode of identifying sensitive data. In the embodiment of the present invention, the sensitive data identification manner may be preset according to the scene requirement, which is not specifically limited herein. Optionally, the sensitive data identification mode may include at least one of list identification, regular expression identification, machine learning model identification, combined field name identification and field content identification.
The target desensitization data set may be understood as a set of iteratively identifying the determined sensitive data. Alternatively, the target desensitization dataset may comprise one or more sensitive data.
In this embodiment of the present invention, optionally, each iteration identification operation corresponds to at least one sensitive data identification manner, and the sensitive data identification manners corresponding to different iteration identification operations are the same or different.
S130, determining a target desensitization strategy in response to a strategy setting operation aiming at the target desensitization data set, and carrying out data desensitization on the target desensitization data set based on the target desensitization strategy to obtain the target desensitization data set with the desensitization completed.
Wherein the policy setting operation may be understood as an operation of setting the target desensitization policy. In the embodiment of the present invention, the policy setting operation may be preset according to a scene requirement, which is not specifically limited herein. Alternatively, the policy setting operation may be a selection operation for a preset policy option.
The target desensitization policy may be understood as a policy for data desensitization of the target desensitization dataset. In the embodiment of the present invention, the target desensitization policy may be preset according to the scene requirement, which is not specifically limited herein. Alternatively, the target desensitization policy may include content deletion, preset string replacement, string emulation replacement, or digital scaling.
According to the technical scheme, the data set to be desensitized is obtained, wherein the data set to be desensitized is a full metadata set; determining a data identification rule in response to rule setting operation for the data set to be desensitized, and carrying out iterative identification on the data set to be desensitized based on the data identification rule to obtain a target desensitized data set in the data set to be desensitized; and determining a target desensitization strategy in response to a strategy setting operation aiming at the target desensitization data set, and carrying out data desensitization on the target desensitization data set based on the target desensitization strategy to obtain the target desensitization data set with the desensitization completed. The data desensitization scheme (namely the data identification rule and the target desensitization strategy) can be customized based on the user requirement and the data characteristics of the data to be desensitized, so that the data desensitization scheme is formulated by human-computer interaction, the full-automatic data desensitization based on the formulated data desensitization scheme is realized, and the data desensitization efficiency, flexibility and accuracy are improved.
Example two
Fig. 2 is a flowchart of a data desensitizing method according to a second embodiment of the present invention, where the data set to be desensitized is iteratively identified and refined based on the data identifying rule in the foregoing embodiment. As shown in fig. 2, the method includes:
s210, acquiring a data set to be desensitized, wherein the data set to be desensitized is a full-quantity metadata set.
S220, determining a data identification rule in response to a rule setting operation for the data set to be desensitized.
In the embodiment of the invention, the data identification rule comprises at least one of identifiable metadata fields, iteration identification times and sensitive data identification modes corresponding to each iteration identification operation.
The sensitive data identification mode corresponding to each iteration identification operation may include the sensitive data identification mode corresponding to a first iteration identification operation, the sensitive data identification mode corresponding to a second iteration identification operation, the sensitive data identification mode corresponding to a third iteration identification operation, and so on.
And S230, identifying the identifiable metadata fields in the data set to be desensitized by the sensitive data identification mode corresponding to the first iteration identification operation to obtain a marked first sensitive data set and an unmarked first non-sensitive data set.
Wherein the first sensitive data set may be understood as a set of sensitive data in the data set to be desensitized identified by a first iterative identification operation. In the embodiment of the invention, the identified sensitive data can be marked in the iterative identification operation, so that the obtained first sensitive data set is the marked first sensitive data set.
The first non-sensitive data set may be understood as a set of non-sensitive data in the data set to be desensitized identified by a first iterative identification operation.
In the embodiment of the invention, the input corresponding to the first iterative identification operation is the data set to be desensitized, and the output is a marked first sensitive data set and an unmarked first non-sensitive data set.
S240, identifying the identifiable metadata fields in the first non-sensitive data set by the sensitive data identification mode corresponding to the second iteration identification operation to obtain a marked second sensitive data set and an unmarked second non-sensitive data set.
Wherein the second set of sensitive data may be understood as a set of sensitive data in the first set of non-sensitive data identified by a second iterative identification operation.
The second set of non-sensitive data may be understood as a set of non-sensitive data in the first set of non-sensitive data identified by a second iterative identification operation.
In the embodiment of the invention, the input corresponding to the second iteration identification operation is the first non-sensitive data set, and the output is the marked second sensitive data set and the unmarked second non-sensitive data set.
S250, the operation of identifying the identifiable metadata fields in the first non-sensitive data set through a sensitive data identification mode corresponding to the second iteration identification operation is carried out in a returning mode, a marked second sensitive data set and an unmarked second non-sensitive data set are obtained, and when the number of times of returning execution reaches the number of times of iteration identification, iteration identification is terminated.
In the embodiment of the present invention, the input corresponding to the current iteration identification operation is the unlabeled non-sensitive data set output by the previous iteration identification operation (refer to fig. 3). Fig. 3 is a flow chart of an iterative identification provided in accordance with an embodiment of the present invention.
Optionally, before terminating the iterative recognition, further comprising:
determining a modifiable identification rule, wherein the modifiable identification rule is the sensitive data identification mode corresponding to each non-executed iterative identification operation;
determining a modified modifiable identification rule in response to a modification operation for the modifiable identification rule;
and continuing to execute the iterative recognition operation based on the modified modifiable recognition rule.
Wherein the modifiable identification rule may be understood as a modifiable data identification rule. Optionally, the modifiable identification rule may be the sensitive data identification mode corresponding to each iteration identification operation that is not performed.
The modifying operation may be understood as an operation of modifying the modifiable identification rule. Optionally, the modification operation may be an operation of adding, deleting or modifying the sensitive data identification mode corresponding to each iteration identification operation which is not performed.
In the embodiment of the invention, the sensitive data identification mode corresponding to each unexecuted iterative identification operation can be modified based on the client requirement before the iterative identification is terminated, namely in the iterative identification process, so that the effect of flexibly modifying the data identification rule is realized, and the user experience is improved.
And S260, obtaining a target desensitization data set in the data set to be desensitized.
Optionally, the obtaining the target desensitization data set in the data set to be desensitized includes:
and taking a marked sensitive data set obtained through each iteration of the identification operation as the target desensitization data set.
In the embodiment of the invention, multiple rounds of iterative recognition are performed based on different sensitive data recognition modes, so that the comprehensiveness of the recognized target desensitization data set, namely the marked sensitive data set, is improved.
S270, determining a target desensitization strategy in response to the strategy setting operation aiming at the target desensitization data set, and carrying out data desensitization on the target desensitization data set based on the target desensitization strategy to obtain the target desensitization data set with the desensitization completed.
According to the technical scheme, the identifiable metadata fields in the data set to be desensitized are identified through the sensitive data identification mode corresponding to the first iteration identification operation, so that a marked first sensitive data set and an unmarked first non-sensitive data set are obtained; identifying the identifiable metadata fields in the first non-sensitive data set by the sensitive data identification mode corresponding to the second iteration identification operation to obtain a marked second sensitive data set and an unmarked second non-sensitive data set; and returning to execute the operation of identifying the identifiable metadata fields in the first non-sensitive data set through a sensitive data identification mode corresponding to the second iteration identification operation to obtain a marked second sensitive data set and an unmarked second non-sensitive data set, and terminating the iteration identification when the return execution times reach the iteration identification times. The efficiency and the comprehensiveness of iterative recognition are improved.
Fig. 4 is an overall flow chart of a data desensitization method provided in accordance with an embodiment of the present invention. As shown in fig. 3 and 4, the overall flow of the data desensitizing method may be:
1. the full metadata set is initialized, i.e. the field range for which sensitive marking is to be performed is determined.
2. The recognition rules of sensitive data, namely data recognition rules, such as list recognition, regular expression recognition, machine learning model recognition, combined field name recognition, field content recognition and the like, are generalized.
3. Metadata fields, such as field chinese names, field english names, field type lengths, etc., can be identified by specifying the metadata fields for which the data identification rules can be applied.
4. Defining a target desensitization strategy which covers the whole, such as content deletion, preset character string replacement, character string simulation replacement or digital scaling and the like.
5. A first segment of recognition rule R1 is configured in the sensitive recognition engine, and is input into a metadata set O (initial full quantity) at the moment, a metadata set C1 (marked) and a metadata set D1 (not marked) are output, wherein R1 represents the sensitive data recognition mode corresponding to the first iteration recognition operation, O represents a data set to be desensitized, C1 represents a first sensitive data set, and D1 represents a first non-sensitive data set.
6. A second segment of recognition rule R2 is configured in the sensitive recognition engine, the input is a metadata set D1 (not identified), a metadata set C2 (identified) and a metadata set D2 (not identified) are output, wherein R2 represents the sensitive data recognition mode corresponding to the second iteration recognition operation, C2 represents a second sensitive data set, and D2 represents a second non-sensitive data set.
7. A third segment of recognition rule R3 is configured in the sensitive recognition engine, the input is a metadata set C2 (marked), the metadata set C3 (marked) and a metadata set D3 (not marked) are output, wherein R3 represents the sensitive data recognition mode corresponding to the third iteration recognition operation, C3 represents a third sensitive data set, and D3 represents a third non-sensitive data set.
8. The multiple iterations are repeated, and recognition rules r 4..rn are configured in the sensitive recognition engine, ultimately outputting a metadata set Cn (identified) and a metadata set Dn (not identified).
9. A target desensitization strategy is generated for a metadata set c1+c2+ + Cn (identified), i.e. a target desensitization data set, resulting in the target desensitization data set with final desensitization completed.
Based on the technical scheme of the embodiment of the invention, the identification rule of customized sensitive data, namely the data identification rule, is realized, the target desensitization strategy is customized, and the flexibility and the accuracy of data desensitization are improved.
Example III
Fig. 5 is a schematic structural diagram of a data desensitizing device according to a third embodiment of the present invention. As shown in fig. 5, the apparatus includes: a data acquisition module 310, an iteration identification module 320, and a data desensitization module 330. Wherein,
a data acquisition module 310, configured to acquire a data set to be desensitized, where the data set to be desensitized is a full metadata set; an iteration identification module 320, configured to determine a data identification rule in response to a rule setting operation for the data set to be desensitized, and perform iteration identification on the data set to be desensitized based on the data identification rule, so as to obtain a target desensitized data set in the data set to be desensitized; a data desensitization module 330, configured to determine a target desensitization policy in response to a policy setting operation for the target desensitization dataset, and perform data desensitization on the target desensitization dataset based on the target desensitization policy, so as to obtain the target desensitization dataset with desensitization completed.
According to the technical scheme, the data set to be desensitized is obtained, wherein the data set to be desensitized is a full metadata set; determining a data identification rule in response to rule setting operation for the data set to be desensitized, and carrying out iterative identification on the data set to be desensitized based on the data identification rule to obtain a target desensitized data set in the data set to be desensitized; and determining a target desensitization strategy in response to a strategy setting operation aiming at the target desensitization data set, and carrying out data desensitization on the target desensitization data set based on the target desensitization strategy to obtain the target desensitization data set with the desensitization completed. The data desensitization scheme (namely the data identification rule and the target desensitization strategy) can be customized based on the user requirement and the data characteristics of the data to be desensitized, so that the data desensitization scheme is formulated by human-computer interaction, the full-automatic data desensitization based on the formulated data desensitization scheme is realized, and the data desensitization efficiency, flexibility and accuracy are improved.
Optionally, the data identification rule includes at least one of identifiable metadata fields, iteration identification times and sensitive data identification modes corresponding to each iteration identification operation, and the sensitive data identification modes include at least one of list identification, regular expression identification, machine learning model identification, combined field name identification and field content identification.
Optionally, the iteration identification module 320 is configured to:
identifying the identifiable metadata fields in the data set to be desensitized by the sensitive data identification mode corresponding to the first iterative identification operation to obtain a marked first sensitive data set and an unmarked first non-sensitive data set;
identifying the identifiable metadata fields in the first non-sensitive data set by the sensitive data identification mode corresponding to the second iteration identification operation to obtain a marked second sensitive data set and an unmarked second non-sensitive data set;
and returning to execute the operation of identifying the identifiable metadata fields in the first non-sensitive data set through a sensitive data identification mode corresponding to the second iteration identification operation to obtain a marked second sensitive data set and an unmarked second non-sensitive data set, and terminating the iteration identification when the return execution times reach the iteration identification times.
Optionally, the data desensitizing device further includes: a modifiable determining module, a rule modifying module, and a continuing iteration module; wherein,
the modifiable determining module is configured to determine a modifiable identification rule before terminating iterative identification, where the modifiable identification rule is the sensitive data identification manner corresponding to each iteration identification operation that is not performed;
the rule modification module is used for responding to the modification operation of the modifiable identification rule and determining the modifiable identification rule after modification;
and the continuing iteration module is used for continuing to execute the iteration identification operation based on the modified modifiable identification rule.
Optionally, the iteration identification module 320 is configured to:
and taking a marked sensitive data set obtained through each iteration of the identification operation as the target desensitization data set.
Optionally, the target desensitization strategy comprises content deletion, preset character string replacement, character string simulation replacement or digital scaling.
Optionally, each iteration identification operation corresponds to at least one sensitive data identification mode, and the sensitive data identification modes corresponding to different iteration identification operations are the same or different.
The data desensitizing device provided by the embodiment of the invention can execute the data desensitizing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the executing method.
Example IV
Fig. 6 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 6, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the data desensitization method.
In some embodiments, the data desensitization method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. One or more of the steps of the data desensitization method described above may be performed when the computer program is loaded into RAM 13 and executed by processor 11. Alternatively, in other embodiments, the processor 11 may be configured to perform the data desensitization method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method of desensitizing data, comprising:
acquiring a data set to be desensitized, wherein the data set to be desensitized is a full metadata set;
determining a data identification rule in response to rule setting operation for the data set to be desensitized, and carrying out iterative identification on the data set to be desensitized based on the data identification rule to obtain a target desensitized data set in the data set to be desensitized;
and determining a target desensitization strategy in response to a strategy setting operation aiming at the target desensitization data set, and carrying out data desensitization on the target desensitization data set based on the target desensitization strategy to obtain the target desensitization data set with the desensitization completed.
2. The method of claim 1, wherein the data recognition rules include at least one of a recognizable metadata field, a number of iterative recognition operations, and a sensitive data recognition pattern corresponding to each iterative recognition operation, the sensitive data recognition pattern including at least one of a list recognition, a regular expression recognition, a machine learning model recognition, a combined field name recognition, and a field content recognition.
3. The method of claim 2, wherein iteratively identifying the set of data to be desensitized based on the data identification rule comprises:
identifying the identifiable metadata fields in the data set to be desensitized by the sensitive data identification mode corresponding to the first iterative identification operation to obtain a marked first sensitive data set and an unmarked first non-sensitive data set;
identifying the identifiable metadata fields in the first non-sensitive data set by the sensitive data identification mode corresponding to the second iteration identification operation to obtain a marked second sensitive data set and an unmarked second non-sensitive data set;
and returning to execute the operation of identifying the identifiable metadata fields in the first non-sensitive data set through a sensitive data identification mode corresponding to the second iteration identification operation to obtain a marked second sensitive data set and an unmarked second non-sensitive data set, and terminating the iteration identification when the return execution times reach the iteration identification times.
4. A method according to claim 3, further comprising, prior to terminating the iterative recognition:
determining a modifiable identification rule, wherein the modifiable identification rule is the sensitive data identification mode corresponding to each non-executed iterative identification operation;
determining a modified modifiable identification rule in response to a modification operation for the modifiable identification rule;
and continuing to execute the iterative recognition operation based on the modified modifiable recognition rule.
5. A method according to claim 3, wherein said deriving a target desensitisation dataset of said dataset to be desensitised comprises:
and taking a marked sensitive data set obtained through each iteration of the identification operation as the target desensitization data set.
6. The method of claim 1, wherein the target desensitization policy comprises content deletion, preset string replacement, string emulation replacement, or digital scaling.
7. The method of claim 2, wherein each iterative identification operation corresponds to at least one of the sensitive data identification patterns, and wherein the sensitive data identification patterns corresponding to different iterative identification operations are the same or different.
8. A data desensitizing apparatus, comprising:
the data acquisition module is used for acquiring a data set to be desensitized, wherein the data set to be desensitized is a full metadata set;
the iteration identification module is used for responding to the rule setting operation aiming at the data set to be desensitized, determining a data identification rule, and carrying out iteration identification on the data set to be desensitized based on the data identification rule to obtain a target desensitized data set in the data set to be desensitized;
and the data desensitization module is used for responding to the strategy setting operation aiming at the target desensitization data set, determining a target desensitization strategy, and carrying out data desensitization on the target desensitization data set based on the target desensitization strategy to obtain the target desensitization data set with the desensitization completed.
9. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data desensitization method according to any one of claims 1-7.
10. A computer readable storage medium storing computer instructions for causing a processor to perform the data desensitization method according to any one of claims 1-7.
CN202311662330.2A 2023-12-06 2023-12-06 Data desensitizing method, device, electronic equipment and storage medium Pending CN117668908A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311662330.2A CN117668908A (en) 2023-12-06 2023-12-06 Data desensitizing method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311662330.2A CN117668908A (en) 2023-12-06 2023-12-06 Data desensitizing method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117668908A true CN117668908A (en) 2024-03-08

Family

ID=90084085

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311662330.2A Pending CN117668908A (en) 2023-12-06 2023-12-06 Data desensitizing method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117668908A (en)

Similar Documents

Publication Publication Date Title
CN114218931A (en) Information extraction method and device, electronic equipment and readable storage medium
CN116009847A (en) Code generation method, device, electronic equipment and storage medium
CN114564149B (en) Data storage method, device, equipment and storage medium
CN116126719A (en) Interface testing method and device, electronic equipment and storage medium
CN116303013A (en) Source code analysis method, device, electronic equipment and storage medium
CN116185389A (en) Code generation method and device, electronic equipment and medium
CN112887426B (en) Information stream pushing method and device, electronic equipment and storage medium
CN115169316A (en) Data processing template generation method and device, electronic equipment and storage medium
CN113590447B (en) Buried point processing method and device
CN117668908A (en) Data desensitizing method, device, electronic equipment and storage medium
CN115328736A (en) Probe deployment method, device, equipment and storage medium
CN113343064B (en) Data processing method, apparatus, device, storage medium, and computer program product
CN112560462B (en) Event extraction service generation method, device, server and medium
CN113434508B (en) Method and apparatus for storing information
CN117573561B (en) Automatic test system, method, electronic equipment and storage medium
CN117271840B (en) Data query method and device of graph database and electronic equipment
CN114359904B (en) Image recognition method, image recognition device, electronic equipment and storage medium
CN115983222A (en) EasyExcel-based file data reading method, device, equipment and medium
CN117742545A (en) Page turning information processing method, device, equipment and medium
CN118093965A (en) Information processing method, device, equipment and storage medium
CN117520092A (en) Log data determining method and device, electronic equipment and medium
CN116991737A (en) Software testing method, system, electronic equipment and storage medium
CN116303071A (en) Interface testing method and device, electronic equipment and storage medium
CN117591177A (en) Instruction storage method and device, electronic equipment and storage medium
CN115481090A (en) File processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination