CN112231759A - Log desensitization method, device, equipment and storage medium - Google Patents

Log desensitization method, device, equipment and storage medium Download PDF

Info

Publication number
CN112231759A
CN112231759A CN202011311978.1A CN202011311978A CN112231759A CN 112231759 A CN112231759 A CN 112231759A CN 202011311978 A CN202011311978 A CN 202011311978A CN 112231759 A CN112231759 A CN 112231759A
Authority
CN
China
Prior art keywords
data
desensitized
data table
desensitization
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011311978.1A
Other languages
Chinese (zh)
Inventor
廖有志
江旻
杨杨
吴磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202011311978.1A priority Critical patent/CN112231759A/en
Publication of CN112231759A publication Critical patent/CN112231759A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The method obtains a to-be-desensitized data table according to a to-be-desensitized log, and further determines the data quantity K at least contained in each type of data after desensitization of the to-be-desensitized data table according to performance parameters of a system, wherein K is larger than 1, and any data in each type of data cannot be distinguished from at least other K-1 data in the type of data, so that the to-be-desensitized data in the to-be-desensitized data table are desensitized according to K. Namely, the data to be desensitized in the data table to be desensitized is desensitized to form one or more types of data, each type of data comprises K data, and any data in each type of data cannot be distinguished from at least other K-1 data in the type of data, so that the data after desensitization is attacked, for example: the desensitization to the log caused by the chain attack is not complete enough, and the problem that sensitive information is leaked exists.

Description

Log desensitization method, device, equipment and storage medium
Technical Field
The application relates to the technical field of financial technology, in particular to a log desensitization method, a device, equipment and a storage medium.
Background
With the development of computer technology, more and more technologies are applied in the financial field, the traditional financial industry is gradually changing to financial technology (Finteh), the information technology is no exception, with the continuous extension and development of computer networks, more and more organizations or hackers collect and use personal information, illegal collection, abuse, leakage and the like of the personal information also occur, so that the personal information security faces serious threats, and higher requirements are provided for the personal information security due to the requirements of the security and the real-time performance of the financial industry. The transaction systems of existing financial systems receive a large number of customer-initiated requests with sensitive information during operation, which are typically output to a log. If the log file is revealed, there is a risk that sensitive information of the client is revealed. Therefore, a data desensitization process is required for the log file.
In the related art, data desensitization to the log file is generally achieved by matching sensitive data with keywords (such as an identity card number, an identification number (idNo), a mobile phone number, a city district, and the like), and then performing a digital replacement on the sensitive data.
However, there are many possibilities for the above data desensitization method to be attacked, such as: chain attack, which results in incomplete desensitization of the log, has the risk of sensitive information being revealed.
Disclosure of Invention
In order to solve the problems in the prior art, the application provides a log desensitization method, a device, equipment and a storage medium.
In a first aspect, an embodiment of the present application provides a log desensitization method, where the method includes:
acquiring a log to be desensitized;
obtaining a data table to be desensitized according to the log to be desensitized;
determining the data quantity K at least contained in each type of data after desensitization of the data table to be desensitized according to performance parameters of a system, wherein the K is larger than 1, and any data in each type of data can not be distinguished from at least other K-1 data in the type;
and desensitizing the data to be desensitized in the data table to be desensitized according to the K.
In one possible implementation manner, desensitizing the data to be desensitized in the data table to be desensitized according to the K includes:
determining the size of a maximum data table processed by the system according to the performance parameter;
and if the size of the data table to be desensitized is larger than or equal to the lower limit of the size of a preset data table, and the size of the data table to be desensitized is smaller than or equal to the size of the maximum data table processed by the system, wherein the size of the maximum data table processed by the system is larger than the lower limit of the size of the preset data table, desensitizing the data to be desensitized in the data table to be desensitized according to the K.
In one possible implementation manner, the desensitizing the data to be desensitized in the data table to be desensitized according to the K includes:
and according to the K, generalizing or inhibiting the data to be desensitized in the data table to be desensitized to obtain one or more types of data, wherein each type of data comprises at least the K data.
In a possible implementation manner, after the determining, according to the performance parameter, the size of the maximum data table processed by the system, the method further includes:
and if the size of the data table to be desensitized is larger than the size of the maximum data table processed by the system, deleting the data in the data table to be desensitized, so that the size of the data table to be desensitized after data deletion is larger than or equal to the lower limit of the size of the preset data table and smaller than or equal to the size of the maximum data table processed by the system.
In a possible implementation manner, after obtaining the to-be-desensitized data table according to the to-be-desensitized log, the method further includes:
if the size of the data table to be desensitized is smaller than the lower limit of the size of the preset data table, determining desensitized data corresponding to the data to be desensitized in the data table to be desensitized according to the corresponding relationship between pre-stored data to be desensitized and desensitized data;
and replacing the data to be desensitized in the data table to be desensitized with the desensitized data corresponding to the data to be desensitized in the data table to be desensitized.
In a possible implementation manner, before desensitizing the data to be desensitized in the data table to be desensitized according to the K, the method further includes:
acquiring the idle core number of a Central Processing Unit (CPU) of the system;
dividing the data in the data table to be desensitized according to the number of idle cores of the CPU;
according to the K, desensitizing the data to be desensitized in the data table to be desensitized comprises the following steps:
desensitizing the data to be desensitized in the partitioned data table to be desensitized according to the idle core of the CPU and the K.
In one possible implementation manner, the number of idle cores of the CPU is greater than or equal to a preset threshold;
the dividing the data in the data table to be desensitized according to the number of idle cores of the CPU comprises the following steps:
and dividing the data in the data table to be desensitized according to the number of idle cores of the CPU to obtain n groups of data, wherein n is an integer greater than zero and is less than the number of idle cores of the CPU.
In a possible implementation, one of the n sets of data corresponds to an idle core of the CPU;
desensitizing the data to be desensitized in the partitioned data table to be desensitized according to the idle core of the CPU and the K, comprising:
desensitizing data to be desensitized in a group of data corresponding to each idle core on each idle core of the CPU according to the K.
In one possible implementation, the performance parameters include free memory, CPU utilization, and data desensitization average elapsed time of the system;
determining the data quantity K at least contained in each type of data after desensitization of the data table to be desensitized according to the performance parameters of the system, wherein the determining comprises the following steps:
determining a first performance value of the system according to the free memory of the system, determining a second performance value of the system according to the CPU utilization rate, and determining a third performance value of the system according to the average time consumption of data desensitization;
and determining the data quantity K at least contained in each type of data after desensitization of the data table to be desensitized according to the first performance value, the second performance value and the third performance value.
In a possible implementation manner, determining, according to the first performance value, the second performance value, and the third performance value, a data number K at least included in each type of data desensitized to the data table to be desensitized includes:
determining a total value of performance of the system based on the first, second, and third performance values;
and determining the data quantity K at least contained in each type of data after desensitization of the data table to be desensitized according to the total performance value, the upper performance limit value of the system, the upper preset data quantity limit value and the lower preset data quantity limit value.
In a possible implementation manner, the determining a size of a maximum data table processed by the system according to the performance parameter includes:
and determining the size of the data table to be desensitized according to the data quantity K at least contained in each type of data after desensitization of the data table to be desensitized, a preset data quantity upper limit value and a preset data table size upper limit value.
In one possible implementation, the obtaining the log to be desensitized includes:
obtaining at least one log;
judging whether the at least one log contains sensitive data;
and if the at least one log contains the sensitive data, determining that the at least one log is the log to be desensitized.
In a possible implementation manner, after the desensitizing the data to be desensitized in the data table to be desensitized according to the K, the method further includes:
and outputting the desensitized data table to be desensitized to a log file of the system, and re-executing the step of acquiring the desensitized log.
In a second aspect, an embodiment of the present application provides a log desensitization apparatus, including:
the log obtaining module is used for obtaining a log to be desensitized;
the data table obtaining module is used for obtaining a data table to be desensitized according to the log to be desensitized;
the data quantity determining module is used for determining the data quantity K at least contained in each type of data after desensitization of the data table to be desensitized according to the performance parameters of the system, wherein the K is greater than 1, and any data in each type of data cannot be distinguished from at least other K-1 data in the type;
and the data desensitization module is used for desensitizing the data to be desensitized in the data table to be desensitized according to the K.
In one possible implementation, the data desensitization module is specifically configured to:
determining the size of a maximum data table processed by the system according to the performance parameter;
and if the size of the data table to be desensitized is larger than or equal to the lower limit of the size of a preset data table, and the size of the data table to be desensitized is smaller than or equal to the size of the maximum data table processed by the system, wherein the size of the maximum data table processed by the system is larger than the lower limit of the size of the preset data table, desensitizing the data to be desensitized in the data table to be desensitized according to the K.
In one possible implementation, the data desensitization module is specifically configured to:
and according to the K, generalizing or inhibiting the data to be desensitized in the data table to be desensitized to obtain one or more types of data, wherein each type of data comprises at least the K data.
In one possible implementation, the data desensitization module is further configured to:
and if the size of the data table to be desensitized is larger than the size of the maximum data table processed by the system, deleting the data in the data table to be desensitized, so that the size of the data table to be desensitized after data deletion is larger than or equal to the lower limit of the size of the preset data table and smaller than or equal to the size of the maximum data table processed by the system.
In a possible implementation manner, after the data table obtaining module obtains the data table to be desensitized according to the log to be desensitized, the data desensitization module is further configured to:
if the size of the data table to be desensitized is smaller than the lower limit of the size of the preset data table, determining desensitized data corresponding to the data to be desensitized in the data table to be desensitized according to the corresponding relationship between pre-stored data to be desensitized and desensitized data;
and replacing the data to be desensitized in the data table to be desensitized with the desensitized data corresponding to the data to be desensitized in the data table to be desensitized.
In one possible implementation, the data desensitization module is further configured to:
acquiring the number of idle cores of a CPU of the system;
dividing the data in the data table to be desensitized according to the number of idle cores of the CPU;
the data desensitization module is specifically configured to:
desensitizing the data to be desensitized in the partitioned data table to be desensitized according to the idle core of the CPU and the K.
In one possible implementation, the number of idle cores of the CPU is greater than or equal to a preset threshold.
The data desensitization module is specifically configured to:
and dividing the data in the data table to be desensitized according to the number of idle cores of the CPU to obtain n groups of data, wherein n is an integer greater than zero and is less than the number of idle cores of the CPU.
In one possible implementation, one of the n sets of data corresponds to an idle core of the CPU.
The data desensitization module is specifically configured to:
desensitizing data to be desensitized in a group of data corresponding to each idle core on each idle core of the CPU according to the K.
In one possible implementation, the performance parameters include free memory, CPU utilization, and data desensitization average elapsed time of the system.
The data quantity determination module is specifically configured to:
determining a first performance value of the system according to the free memory of the system, determining a second performance value of the system according to the CPU utilization rate, and determining a third performance value of the system according to the average time consumption of data desensitization;
and determining the data quantity K at least contained in each type of data after desensitization of the data table to be desensitized according to the first performance value, the second performance value and the third performance value.
In a possible implementation manner, the data quantity determining module is specifically configured to:
determining a total value of performance of the system based on the first, second, and third performance values;
and determining the data quantity K at least contained in each type of data after desensitization of the data table to be desensitized according to the total performance value, the upper performance limit value of the system, the upper preset data quantity limit value and the lower preset data quantity limit value.
In one possible implementation, the data desensitization module is specifically configured to:
and determining the size of the data table to be desensitized according to the data quantity K at least contained in each type of data after desensitization of the data table to be desensitized, a preset data quantity upper limit value and a preset data table size upper limit value.
In a possible implementation manner, the log obtaining module is specifically configured to:
obtaining at least one log;
judging whether the at least one log contains sensitive data;
and if the at least one log contains the sensitive data, determining that the at least one log is the log to be desensitized.
In one possible implementation manner, after desensitizing the data to be desensitized in the data table to be desensitized according to the K, the data desensitizing module is further configured to:
outputting the desensitized data table to be desensitized to a log file of the system;
and the log acquisition module re-executes the step of acquiring the log to be desensitized.
In a third aspect, an embodiment of the present application provides a logging desensitization apparatus, including:
a processor;
a memory; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor, the computer program comprising instructions for performing the method of the first aspect.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored, and the computer program causes a server to execute the method according to the first aspect.
According to the log desensitization method, the log desensitization device, the log desensitization equipment and the storage medium, the log to be desensitized is obtained, the data table to be desensitized is obtained according to the log to be desensitized, further, according to performance parameters of a system, the data quantity K at least contained in each type of data after desensitization of the data table to be desensitized is determined, the data quantity K is larger than 1, any data in each type of data cannot be distinguished from at least other K-1 data in the type, and therefore desensitization is conducted on the data to be desensitized in the data table to be desensitized according to the data quantity K. Namely, the data to be desensitized in the data table to be desensitized is desensitized to form one or more types of data, each type of data comprises at least K data, and any data in each type of data cannot be distinguished from at least other K-1 data in the type of data, so that the problem that the data after desensitization is attacked is solved, for example: the desensitization to the log caused by the chain attack is not complete enough, and the problem that sensitive information is leaked exists.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic diagram of an architecture of a log desensitization system according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a log desensitization method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of obtaining a table of data to be desensitized according to an embodiment of the present disclosure;
FIG. 4 is a schematic flow chart illustrating another desensitization method for a log according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of data desensitization provided by embodiments of the present application;
fig. 6 is a schematic structural diagram of a logging desensitization apparatus according to an embodiment of the present application;
fig. 7 is a diagram of a possible basic hardware architecture of a logging desensitization apparatus provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," "third," and "fourth," if any, in the description and claims of this application and the above-described figures are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
With the development of computer technology, more and more technologies are applied in the financial field, the traditional financial industry is gradually changing to financial technology (Finteh), the information technology is no exception, with the continuous extension and development of computer networks, more and more organizations or hackers collect and use personal information, illegal collection, abuse, leakage and the like of the personal information also occur, so that the personal information security faces serious threats, and higher requirements are provided for the personal information security due to the requirements of the security and the real-time performance of the financial industry. The transaction systems of existing financial systems receive a large number of customer-initiated requests with sensitive information during operation, which are typically output to a log. If the log file is revealed, there is a risk that sensitive information of the client is revealed. Therefore, a data desensitization process is required for the log file.
In the related art, data desensitization to the log file is generally achieved by matching sensitive data with keywords (such as an identity card number, idNo, a mobile phone number, a city district, and the like), and then performing x replacement, invalidation, and the like on the sensitive data. Here, Data desensitization (Data Masking), also called Data bleaching, Data privacy removal, or Data deformation, refers to performing Data deformation on some sensitive information through a desensitization rule, so as to achieve reliable protection of sensitive private Data.
Illustratively, as shown in table 1, a user raw data table has an identifier of a code, i.e., the code can directly identify an individual.
In order to protect the privacy of the user and prevent the information of the bank card of the user from being leaked, when the information is issued, if the data desensitization method is adopted for desensitization. For example, the numbers are all replaced by x, and the table after desensitization is shown in table 2, and the aim of protecting privacy is achieved.
TABLE 1 user raw data sheet
Number of Encoding Age (age) Using bank cards
1 47677 29 Construction bank
2 47602 22 Construction bank
3 47678 27 Construction bank
4 47905 43 Agricultural bank
5 47909 52 Construction bank
6 47906 47 Traffic bank
7 47605 30 Construction bank
8 47673 36 Traffic bank
9 47607 32 Traffic bank
TABLE 2 data sheet of existing desensitization
Figure BDA0002790115100000091
Figure BDA0002790115100000101
However, there are many possibilities for the above data desensitization method to be attacked, such as: and (4) chain attack. Here, the chain attack refers to that an attacker performs a link operation on published data and external data acquired from other channels to infer privacy data, thereby causing privacy disclosure, which is equivalent to an expansion of personal information dimensions. Illustratively, the attacker also has a corresponding relation table of ages, addresses and numbers of a plurality of users, as shown in table 3. The attacker can deduce which numbered user transacts which bank card according to the age, for example, the numbered 47677 user transacts sensitive information of constructing the bank card, which is the chain attack. Resulting in incomplete desensitization of the log and risk of sensitive information being revealed.
TABLE 3 corresponding relationship table of age, address and number of multiple users
Number of Age (age) Address Numbering
1 29 A1 cell 47677
2 22 A2 cell 47602
3 27 A3 cell 47678
4 43 B1 cell in B city 47905
5 52 B2 cell in B city 47909
6 47 B3 cell in B city 47906
7 30 C city C1 cell 47605
8 36 C city C2 cell 47673
9 32 C city C3 cell 47607
Therefore, an embodiment of the present application provides a log desensitization method, in which data to be desensitized in the data table to be desensitized is desensitized to form one or more types of data, where each type of data includes at least K data, and K is greater than 1, so that any data in each type of data cannot be distinguished from at least other K-1 data in the type of data, thereby solving the problem that the desensitized data is attacked, for example: the desensitization to the log caused by the chain attack is not complete enough, and the problem that sensitive information is leaked exists.
Optionally, fig. 1 is a schematic diagram of an architecture of a log desensitization system according to an embodiment of the present application. In fig. 1, a transaction system of a financial system is taken as an example. The architecture described above comprises a receiving means 11 and a logging desensitization means 12. The receiving device 11 and the log desensitization device 12 may be deployed in a financial system, or may be deployed independently of the financial system, which may be determined according to practical situations, and this embodiment of the present application is not particularly limited thereto.
It is to be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation to the log desensitization architecture. In other possible embodiments of the present application, the foregoing architecture may include more or less components than those shown in the drawings, or combine some components, or split some components, or arrange different components, which may be determined according to practical application scenarios, and is not limited herein. The components shown in fig. 1 may be implemented in hardware, software, or a combination of software and hardware.
In a specific implementation, a transaction system of a financial system receives a large number of customer-initiated requests with sensitive information during operation, and the transaction system outputs a log based on the sensitive information.
The receiving means 11 may be an input/output interface or a communication interface and may be used to receive the log output by the transaction system.
The log desensitization device 12 may first obtain a to-be-desensitized log from a log received by the receiving device 11, then obtain a to-be-desensitized data table according to the to-be-desensitized log, desensitize the to-be-desensitized data in the to-be-desensitized data table to form one or more types of data, where each type of data includes at least K data, and K is greater than 1, so that any data in each type of data cannot be distinguished from at least other K-1 data in the type of data. Illustratively, K is 3 and the log desensitization device 12 desensitizes the zip codes and ages in table 1 above, the table after desensitization being shown in table 4.
Table 4 table of data after desensitization of the application
Number of Post code Age (age) Using bank cards
1 476** 2* Construction bank
2 476** 2* Construction silverLine of
3 476** 2* Construction bank
4 4790* >=40 Agricultural bank
5 4790* >=40 Construction bank
6 4790* >=40 Traffic bank
7 476** 3* Construction bank
8 476** 3* Traffic bank
9 476** 3* Traffic bank
Thus, even if an attacker knows that a particular user has a particular zip code 47906, age 47, the attacker cannot determine which bank card the user transacts. That is, "zip code, age" in table 1 above is desensitized to form multiple types of data, each type of data includes at least 3 data, so that any data in each type of data cannot be distinguished from at least 2 other data in that type of data, thereby solving the problem that the desensitized data is attacked, such as: the desensitization to the log caused by the chain attack is not complete enough, and the problem that sensitive information is leaked exists.
After desensitization is completed, the log desensitization device 12 can output the desensitized table to a log file of the system, and can re-execute the above steps, so that data security of the financial system can be continuously guaranteed.
In addition, the system architecture and the service scenario described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not constitute a limitation to the technical solution provided in the embodiment of the present application, and it can be known by a person skilled in the art that along with the evolution of the system architecture and the appearance of a new service scenario, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems.
The technical solutions of the present application are described below with several embodiments as examples, and the same or similar concepts or processes may not be described in detail in some embodiments.
Fig. 2 is a schematic flow diagram of a log desensitization method according to an embodiment of the present application, and the embodiment of the present application provides a log desensitization method, which may be applied to log desensitization processing, and the method may be executed by any apparatus that executes the log desensitization method, and the apparatus may be implemented by software and/or hardware. As shown in fig. 2, based on the system architecture shown in fig. 1, the log desensitization method provided in the embodiment of the present application includes the following steps:
s201: and acquiring a log to be desensitized.
Here, in the embodiment of the present application, the above-described log desensitization apparatus 12 is taken as an example of an execution subject. The log desensitization device 12 may obtain at least one log and determine whether the at least one log contains sensitive data. If the at least one log contains sensitive data, the log desensitization device 12 determines that the at least one log is a log to be desensitized.
The sensitive data refers to data which may bring serious harm to the society or individuals after leakage, and may include personal privacy data, such as names, identification numbers, addresses, telephones, bank accounts, mailboxes, passwords, medical information, education backgrounds and the like; data that is not suitable for publishing by a business or social organization may also be included, such as business operations, business network configurations, IP address lists, etc.
S202: and obtaining a data table to be desensitized according to the log to be desensitized.
In the embodiment of the present application, the log desensitization apparatus 12 may collect logs to be desensitized, and form them into a data table to be desensitized. Illustratively, taking the transaction system of the financial system in fig. 1 as an example, the log desensitization device 12 may collect logs to be desensitized output by the transaction system through a log bucket, and form a data table to be desensitized.
Illustratively, as shown in fig. 3, the transaction system outputs the log to be desensitized to a log bucket, and the log desensitization apparatus 12 forms a data table to be desensitized by using the log to be desensitized in the log bucket. Therein, the data table to be desensitized may comprise a plurality of information sets, e.g. information set 1, information set 2, … …, information set K. Each information group may include a plurality of logs to be desensitized, and here, it is exemplified that each information group includes three logs to be desensitized. The to-be-desensitized logs output by the transaction system into the log bucket may include a to-be-desensitized log 1, a to-be-desensitized log 2, a to-be-desensitized log 3, … …, and a to-be-desensitized log 3K + 3.
S203: and determining the data quantity K at least contained in each type of data after desensitization of the data table to be desensitized according to the performance parameters of the system, wherein K is greater than 1, and any data in each type of data can not be distinguished from at least other K-1 data in the data of the type.
Here, the above system may be understood as a system in which the above-described logging desensitization apparatus 12 is located. Before desensitizing the data to be desensitized in the data table to be desensitized, the log desensitizing device 12 considers the performance of the system where the log desensitizing device is located, that is, the log desensitizing device obtains a better balance between performance and safety in order to reduce the influence on system services.
The performance parameters may include free memory, CPU utilization, and average time for data desensitization of the system. The log desensitization apparatus 12 may determine a first performance value of the system according to the free memory of the system, determine a second performance value of the system according to the CPU utilization, determine a third performance value of the system according to the average time consumption for desensitization of the data, and further determine the data number K at least included in each type of data after desensitization of the data table to be desensitized according to the first performance value, the second performance value, and the third performance value.
Here, the determining, by the logging desensitization device 12, the first performance value of the system according to the free memory of the system may include: and determining a first performance value corresponding to the idle memory of the system according to the corresponding relation between the pre-stored idle memory and the performance value. The corresponding relationship between the pre-stored free memory and the performance value can be set according to the actual situation, for example, the free memory > is 0.5 times of the total memory, and the performance value is 5 points; 0.5 times of total memory > free memory > -0.4 times of total memory, and the performance value is 4 points; 0.4 times of total memory > free memory > -0.3 times of total memory, and the performance value is 3 points; 0.3 times of total memory > free memory > -0.2 times of total memory, and the performance value is 2 points; 0.2 times of total memory > free memory > -0.1 times of total memory, and the performance value is 1 point; the total memory > 0.1 times the free memory > -0, and the performance value is 0 point.
Also, the determining, by the logging desensitization device 12, the second performance value of the system according to the CPU utilization may include: and determining a second performance value corresponding to the CPU utilization rate of the system according to the pre-stored corresponding relationship between the CPU utilization rate and the performance value. Here, the correspondence relationship between the CPU usage rate and the performance value that are prestored may also be set according to actual situations, for example, the CPU usage rate is less than 10%, and the performance value is 5 points; 10% < CPU utilization rate < ═ 20%, performance value is 4 points; 20% < CPU utilization rate < ═ 30%, performance value is 3 points; 30% < CPU utilization rate of 40%, performance value of 2 points; 40% < CPU usage rate < ═ 50%, performance value is 1 point; 50% < CPU usage, performance value is 0 point.
The average time consumption for data desensitization may be determined by the log desensitization apparatus 12 according to the average time consumption for the most recent data desensitization, for example, by calculating the average time consumption for the most recent 50 data desensitizations. The determining the third performance value of the system by the log desensitization device 12 according to the average time consumption of data desensitization may include: and determining a third performance value corresponding to the data desensitization average time according to a corresponding relation between pre-stored desensitization average time and the performance values. Here, the correspondence between the pre-stored desensitization average time consumption and the performance value may also be set according to the actual situation, for example, the desensitization average time consumption is less than 1ms, and the performance value is 10 points; 1ms < average desensitization time of 1.5ms, and performance value of 9 points; 1.5ms < average desensitization time of 2ms, and performance value of 8 points; … …, respectively; 5ms < mean time spent for desensitization, performance value 0 point.
In this embodiment, after determining the first performance value, the second performance value, and the third performance value, the log desensitization apparatus 12 may further determine a total performance value of the system according to the first performance value, the second performance value, and the third performance value, and further determine a data quantity K at least included in each type of data after desensitization of the data table to be desensitized according to the total performance value, an upper performance limit of the system, an upper preset data quantity limit, and a lower preset data quantity limit.
The log desensitization device 12 may determine a weighted proportion of parameters such as a first performance value corresponding to the free memory, a second performance value corresponding to the CPU utilization, and a third performance value corresponding to the average time consumption for data desensitization by using a genetic algorithm, and further determine a total performance value of the system according to the weighted proportion, the first performance value, the second performance value, and the third performance value.
The preset data quantity upper limit value and the preset data quantity lower limit value may be set according to actual conditions, for example, the preset data quantity upper limit value is 3, and the preset data quantity lower limit value is 100.
The determining, by the log desensitization apparatus 12, the data number K at least included in each type of data desensitized by the data table to be desensitized according to the total performance value, the performance upper limit value of the system, the preset data number upper limit value, and the preset data number lower limit value may include: and determining the data quantity K at least contained in each type of data after desensitization of the data table to be desensitized according to an expression K of K1+ (K2-K1)/s1 s2, wherein K1 represents a lower limit value of the preset data quantity, K2 represents an upper limit value of the preset data quantity, s1 represents an upper limit value of the performance of the system, and s2 represents a total value of the performance.
In addition, if k is greater than k2, then k is taken to be equal to k 2.
S204: desensitizing the data to be desensitized in the data table to be desensitized according to the data quantity K.
The data to be desensitized can be determined according to actual conditions, such as the codes and ages in table 1.
Here, the log desensitization apparatus 12 determines the data number K at least included in each type of data after desensitization of the data table to be desensitized according to the performance parameters of the system, and then desensitizes the data to be desensitized in the data table to be desensitized according to the data number K. That is, the log desensitization apparatus 12 generalizes or suppresses the data to be desensitized in the data table to be desensitized according to the data number K, so as to obtain one or more types of data, where each type of data includes at least K data, and any data in each type of data cannot be distinguished from at least other K-1 data in the type of data.
For example, the log desensitization device 12 may desensitize the data to be desensitized in the data table to be desensitized according to the data number K by using a mondriann algorithm. For example, as shown in table 3, K is 3, the log desensitization apparatus 12 may utilize a mondrian algorithm to generalize or suppress the zip codes in table 1 to obtain one or more types of data, where each type of data includes at least 3 data, so that any data in each type of data cannot be distinguished from at least the other 2 data in the type of data, thereby solving the problem that the desensitized data is attacked, such as: chain attack, resulting in incomplete desensitization to log and leakage of sensitive information
Wherein the class can be understood as an equivalence class (e.g., 476) having the same quasi-identifier as one class and 4790 having the same quasi-identifier as another class.
In this embodiment, when desensitizing the data to be desensitized in the data table to be desensitized according to the data quantity K, the log desensitizing apparatus 12 further considers system performance, that is, determines the size of the data table processed by the system according to the performance parameter, and desensitizes the data to be desensitized in the data table to be desensitized according to the data quantity K if the size of the data table to be desensitized is smaller than or equal to the size of the data table processed by the system, so as to obtain a better balance between performance and security.
The log desensitization device 12 may determine the size of the data table to be desensitized according to the data number K at least included in each type of data after desensitization of the data table to be desensitized, a preset data number upper limit value, and a preset data table size upper limit value.
Illustratively, the log desensitization apparatus 12 may perform the desensitization according to the expression: d is K/K2 × D, which determines the size D of the data table to be desensitized, where K represents the data number K at least included in each type of data desensitized by the data table to be desensitized, K2 represents the upper limit value of the preset data number, and D represents the upper limit value of the size of the preset data table.
Here, D may be an integer up to hundreds, and if D is greater than D, D is equal to D. The value of D can be determined according to practical situations, for example, 100 ten thousand.
In addition, when system performance is considered, the embodiment of the application may further set a lower limit value of the size of the system processing data table. If the size of the data table to be desensitized is greater than or equal to the lower limit of the size of the preset data table, and the size of the data table to be desensitized is less than or equal to the size of the maximum data table processed by the system, the log desensitization device 12 desensitizes the data to be desensitized in the data table to be desensitized according to the data number K. If the size of the data table to be desensitized is smaller than the lower limit of the size of the preset data table, the log desensitization device 12 can determine desensitized data corresponding to the data to be desensitized in the data table to be desensitized according to the corresponding relationship between pre-stored data to be desensitized and the desensitized data, and further replace the data to be desensitized in the data table to be desensitized with the desensitized data corresponding to the data to be desensitized in the data table to be desensitized, for example, replace the data to be desensitized with a data. Therefore, aiming at different conditions of the data table to be desensitized, the embodiment of the application adopts different modes to desensitize the data to be desensitized in the data table to be desensitized, so that various application requirements are met, and a better balance is obtained in performance and safety.
In this embodiment, after the log desensitization apparatus 12 determines the size of the data table processed by the system according to the performance parameter, if the size of the data table to be desensitized is larger than the size of the largest data table processed by the system, the log desensitization apparatus 12 may delete the data in the data table to be desensitized, so that the size of the data table to be desensitized after deleting the data is larger than or equal to the lower size limit of the preset data table and smaller than or equal to the size of the largest data table processed by the system.
Illustratively, the log desensitization device 12 may adopt a FIFO (first in first out) manner to eliminate old sensitive data of the data table to be desensitized, so as to keep the data table updated. Taking the transaction system of the financial system in fig. 1 as an example, the log desensitization device 12 may collect the logs to be desensitized output by the transaction system through a log bucket. When the sensitive information in the bucket is more, the log desensitization device 12 can eliminate the old sensitive data in the bucket in an FIFO manner.
Here, the log desensitization device 12 may desensitize at least one bit of data in the data to be desensitized in the data table to be desensitized according to the data number K, so that each type of data in the data table to be desensitized after desensitization at least includes the data of the data number K, which can better reduce data distortion and prevent chain attack. Illustratively, as shown in table 3, where K is 3, the log desensitization device 12 may generalize or suppress at least one of the data in the zip codes in table 1 to obtain a plurality of data types, each data type including at least 3 data types, such that any zip code in each data type is indistinguishable from at least the other 2 zip codes in the data type, e.g., 476 data types, 6 data types, and each data type 476 data type is indistinguishable from the other 5 data types in the data type. 4790 is another type, one having 3 4790, each 4790 being indistinguishable from the other 2 of that type of data.
For example, when the log desensitization apparatus 12 desensitizes at least one bit of data in the data table to be desensitized according to the data number K, a preset mapping set may be used. For example, there are three pieces of address data (zip code: 47905; zip code: 47909; zip code: 47906), which are generalized using a mapping set when the k value is 3: and (3) post code: 4790 a plurality of holes; and (3) post code: 4790 a plurality of holes; and (3) post code: 4790*. Therefore, the data distortion can be well reduced, and the condition that the postal code information in the log is desensitized to be' postal code: and simultaneously can prevent chain attacks.
After desensitizing the data to be desensitized in the data table to be desensitized according to the data number K, the log desensitizing device 12 can also output the desensitized data table to be desensitized to a log file of a system, and then re-execute the step of obtaining the log to be desensitized, so that data desensitization processing can be continuously performed on the log to be desensitized, and the risk of sensitive information leakage is reduced.
According to the data desensitization method and device, the logs to be desensitized are obtained, the data desensitization data table to be desensitized is obtained according to the logs to be desensitized, and further, according to performance parameters of a system, the data quantity K at least contained in each type of data after desensitization of the data desensitization data table to be desensitized is determined, wherein K is larger than 1, any data in each type of data cannot be distinguished from at least other K-1 data in the type of data, and therefore desensitization is conducted on the data to be desensitized in the data desensitization data table to be desensitized according to the data quantity K. Namely, the data to be desensitized in the data table to be desensitized is desensitized to form one or more types of data, each type of data comprises at least K data, and any data in each type of data cannot be distinguished from at least other K-1 data in the type of data, so that the problem that the data after desensitization is attacked is solved, for example: the desensitization to the log caused by the chain attack is not complete enough, and the problem that sensitive information is leaked exists.
In addition, in the embodiment of the present application, before desensitizing the data to be desensitized in the data table to be desensitized according to the data number K, the number of idle cores of the CPU of the system is also considered, so that the data in the data table to be desensitized is divided according to the number of idle cores, and the data to be desensitized in the divided data table to be desensitized is desensitized. Fig. 4 is a schematic flowchart of another log desensitization method according to an embodiment of the present application. As shown in fig. 4, the method includes:
s401: and acquiring a log to be desensitized.
S402: and obtaining a data table to be desensitized according to the log to be desensitized.
S403: and determining the data quantity K at least contained in each type of data after desensitization of the data table to be desensitized according to the performance parameters of the system, wherein the K is more than 1, and any data in each type of data can not be distinguished from at least other K-1 data in the type.
The steps S401 to S403 are the same as the steps S201 to S203, and are not described herein again.
S404: and acquiring the number of idle cores of the CPU of the system.
Here, the above-described log desensitization apparatus 12 may determine whether the core idle rate of the CPU reaches a preset idle threshold, for example, 80%. If so, the core is an idle core. The log desensitization means 12 takes the number n of free cores of the CPU.
S405: and dividing the data in the data table to be desensitized according to the number of idle cores of the CPU.
The number n of idle cores of the CPU is greater than or equal to a preset threshold value. The preset threshold may be set according to the actual situation, for example, 3, n ═ 3. The log desensitization device 12 may divide the data in the data table to be desensitized according to the number n of idle cores of the CPU to obtain n sets of data, where n is an integer greater than zero and is less than the number n of idle cores of the CPU. For example, n-2.
If the number n of idle cores of the CPU is less than the preset threshold, for example, n <3, the log desensitization apparatus 12 may not divide the data in the data table to be desensitized, but desensitize the data in the data table to be desensitized directly according to the number K of data.
S406: desensitizing the data to be desensitized in the partitioned data table to be desensitized according to the idle cores of the CPU and the data number K.
Here, one of the n sets of data corresponds to an idle core of the CPU. The log desensitization device 12 desensitizes data to be desensitized in a set of data corresponding to each idle core on each idle core of the CPU according to the data number K.
Namely, the embodiment of the application is improved aiming at the multi-core CPU of the existing server: by acquiring the number n of idle cores of the CPU, the thinking of rough division and parallel subdivision is adopted. And dividing the data in the data table to be desensitized according to the number n of idle cores of the CPU to obtain n groups of data, desensitizing the data of the n groups of divided data in parallel, and finally merging the results by a parent thread.
The log desensitization device 12 can make the above improvement on the multi-core CPU of the existing server based on the kd-tree algorithm in the mondrian algorithm.
Illustratively, as shown in fig. 5, the log desensitization device 12 divides the data in the data table to be desensitized according to the number of idle cores of the CPU to obtain 4 sets of data (a first set of data, a second set of data, a third set of data, and a fourth set of data), desensitizes the divided 4 sets of data in parallel, and finally merges the results by a parent thread.
In the embodiment of the application, the log desensitization device 12 decides whether to partition the data in the data table to be desensitized according to the number of idle cores of the CPU, and if so, performs parallel desensitization on the data to be desensitized in the partitioned data table to be desensitized, thereby ensuring security and reducing consumption of system resources. Moreover, the log desensitization device 12 determines, according to the performance parameters of the system, the data number K at least included in each type of data after desensitization of the data table to be desensitized, where K is greater than 1, and any data in each type of data cannot be distinguished from at least other K-1 data in the type where the data is located, so that desensitization is performed on the data to be desensitized in the data table to be desensitized according to the data number K, and the desensitization log is guaranteed to have a low distortion degree, and meanwhile, the data after desensitization is prevented from being attacked, for example: the desensitization to the log caused by the chain attack is not complete enough, and the problem that sensitive information is leaked exists. ,
corresponding to the log desensitization method in the foregoing embodiment, fig. 6 is a schematic structural diagram of a log desensitization apparatus provided in the embodiment of the present application. For convenience of explanation, only portions related to the embodiments of the present application are shown. Fig. 6 is a schematic structural diagram of a logging desensitization apparatus according to an embodiment of the present application, where the logging desensitization apparatus 60 includes: a log obtaining module 601, a data table obtaining module 602, a data quantity determining module 603 and a data desensitization module 604. The logging desensitization device here may be the logging desensitization device 12 itself described above, or a chip or an integrated circuit that implements the functionality of the logging desensitization device 12. It should be noted here that the division of the log obtaining module, the data table obtaining module, the data quantity determining module, and the data desensitization module is only a division of logical functions, and the log obtaining module, the data table obtaining module, the data quantity determining module, and the data desensitization module may be integrated or independent physically.
The log obtaining module 601 is configured to obtain a log to be desensitized.
A data table obtaining module 602, configured to obtain a data table to be desensitized according to the log to be desensitized.
A data quantity determining module 603, configured to determine, according to a performance parameter of the system, a data quantity K at least included in each type of data after desensitization of the data table to be desensitized, where K is greater than 1, and any data in each type of data is indistinguishable from at least other K-1 data in the type of data.
And a data desensitization module 604, configured to desensitize the data to be desensitized in the data table to be desensitized according to K.
In one possible implementation, the data desensitization module 604 is specifically configured to:
determining the size of a maximum data table processed by the system according to the performance parameter;
and if the size of the data table to be desensitized is larger than or equal to the lower limit of the size of a preset data table, and the size of the data table to be desensitized is smaller than or equal to the size of the maximum data table processed by the system, wherein the size of the maximum data table processed by the system is larger than the lower limit of the size of the preset data table, desensitizing the data to be desensitized in the data table to be desensitized according to the K.
In one possible implementation, the data desensitization module 604 is specifically configured to:
and according to the K, generalizing or inhibiting the data to be desensitized in the data table to be desensitized to obtain one or more types of data, wherein each type of data comprises at least the K data.
In one possible implementation, the data desensitization module 604 is further configured to:
and if the size of the data table to be desensitized is larger than the size of the maximum data table processed by the system, deleting the data in the data table to be desensitized, so that the size of the data table to be desensitized after data deletion is larger than or equal to the lower limit of the size of the preset data table and smaller than or equal to the size of the maximum data table processed by the system.
In one possible implementation manner, after the data table obtaining module obtains the data table to be desensitized according to the log to be desensitized, the data desensitization module 604 is further configured to:
if the size of the data table to be desensitized is smaller than the lower limit of the size of the preset data table, determining desensitized data corresponding to the data to be desensitized in the data table to be desensitized according to the corresponding relationship between pre-stored data to be desensitized and desensitized data;
and replacing the data to be desensitized in the data table to be desensitized with the desensitized data corresponding to the data to be desensitized in the data table to be desensitized.
In one possible implementation, the data desensitization module 604 is further configured to:
acquiring the number of idle cores of a CPU of the system;
dividing the data in the data table to be desensitized according to the number of idle cores of the CPU;
the data desensitization module 604 is specifically configured to:
desensitizing the data to be desensitized in the partitioned data table to be desensitized according to the idle core of the CPU and the K.
In one possible implementation, the number of idle cores of the CPU is greater than or equal to a preset threshold.
The data desensitization module 604 is specifically configured to:
and dividing the data in the data table to be desensitized according to the number of idle cores of the CPU to obtain n groups of data, wherein n is an integer greater than zero and is less than the number of idle cores of the CPU.
In one possible implementation, one of the n sets of data corresponds to an idle core of the CPU.
The data desensitization module 604 is specifically configured to:
desensitizing data to be desensitized in a group of data corresponding to each idle core on each idle core of the CPU according to the K.
In one possible implementation, the performance parameters include free memory, CPU utilization, and data desensitization average elapsed time of the system.
The data quantity determining module 603 is specifically configured to:
determining a first performance value of the system according to the free memory of the system, determining a second performance value of the system according to the CPU utilization rate, and determining a third performance value of the system according to the average time consumption of data desensitization;
and determining the data quantity K at least contained in each type of data after desensitization of the data table to be desensitized according to the first performance value, the second performance value and the third performance value.
In a possible implementation manner, the data quantity determining module 603 is specifically configured to:
determining a total value of performance of the system based on the first, second, and third performance values;
and determining the data quantity K at least contained in each type of data after desensitization of the data table to be desensitized according to the total performance value, the upper performance limit value of the system, the upper preset data quantity limit value and the lower preset data quantity limit value.
In one possible implementation, the data desensitization module 604 is specifically configured to:
and determining the size of the data table to be desensitized according to the data quantity K at least contained in each type of data after desensitization of the data table to be desensitized, a preset data quantity upper limit value and a preset data table size upper limit value.
In a possible implementation manner, the log obtaining module 601 is specifically configured to:
obtaining at least one log;
judging whether the at least one log contains sensitive data;
and if the at least one log contains the sensitive data, determining that the at least one log is the log to be desensitized.
In one possible implementation manner, after desensitizing the data to be desensitized in the data table to be desensitized according to K, the data desensitizing module 604 is further configured to:
outputting the desensitized data table to be desensitized to a log file of the system;
and the log acquisition module re-executes the step of acquiring the log to be desensitized.
The apparatus provided in the embodiment of the present application may be configured to implement the technical solution of the method embodiment, and the implementation principle and the technical effect are similar, which are not described herein again in the embodiment of the present application.
Alternatively, fig. 7 respectively schematically provides a possible basic hardware architecture of the logging desensitization apparatus described in the present application.
Referring to fig. 7, the logging desensitization device includes at least one processor 701 and a communication interface 703. Further optionally, a memory 702 and a bus 704 may also be included.
Wherein, in the logging desensitization device, the number of the processors 701 can be one or more, and fig. 7 only illustrates one of the processors 701. Alternatively, the processor 701 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or a Digital Signal Processing (DSP). If the logging desensitization device has multiple processors 701, the types of the multiple processors 701 may be different, or may be the same. Optionally, the plurality of processors 701 of the logging desensitization device may also be integrated as a multi-core processor.
Memory 702 stores computer instructions and data; the memory 702 may store computer instructions and data required to implement the above-described log desensitization methods provided herein, e.g., the memory 702 stores instructions for implementing the steps of the above-described log desensitization methods. Memory 702 can be any one or any combination of the following storage media: nonvolatile memory (e.g., Read Only Memory (ROM), Solid State Disk (SSD), hard disk (HDD), optical disk), volatile memory.
The communication interface 703 may provide information input/output for the at least one processor. Any one or any combination of the following devices may also be included: a network interface (e.g., an ethernet interface), a wireless network card, etc. having a network access function.
Optionally, the communication interface 703 may also be used for data communication between the logging desensitization device and other computing devices or terminals.
Further alternatively, fig. 7 shows the bus 704 by a thick line. The bus 704 may connect the processor 701 with the memory 702 and the communication interface 703. Thus, via bus 704, processor 701 may access memory 702 and may also interact with other computing devices or terminals using communication interface 703.
In this application, the log desensitization device executes computer instructions in the memory 702, so that the log desensitization device implements the log desensitization method provided in this application, or the log desensitization device deploys the log desensitization apparatus.
From the viewpoint of logical functional division, illustratively, as shown in fig. 7, the memory 702 may include therein a log obtaining module 601, a data table obtaining module 602, a data quantity determining module 603, and a data desensitization module 604. The inclusion herein merely refers to that the instructions stored in the memory, when executed, may implement the functions of the log obtaining module, the data table obtaining module, the data quantity determining module and the data desensitization module, respectively, and is not limited to a physical structure.
In addition, the above-mentioned logging desensitization device can be implemented by software as in fig. 7, or can be implemented by hardware as a hardware module, or as a circuit unit.
A computer-readable storage medium is provided that includes computer instructions that instruct a computing device to perform the aforementioned log desensitization methods provided herein.
The present application provides a chip comprising at least one processor and a communication interface providing information input and/or output for the at least one processor. Further, the chip may also include at least one memory for storing computer instructions. The at least one processor is configured to invoke and execute the computer instructions to perform the above-described log desensitization method provided herein.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Claims (16)

1. A method of log desensitization, comprising:
acquiring a log to be desensitized;
obtaining a data table to be desensitized according to the log to be desensitized;
determining the data quantity K at least contained in each type of data after desensitization of the data table to be desensitized according to performance parameters of a system, wherein the K is larger than 1, and any data in each type of data can not be distinguished from at least other K-1 data in the type;
and desensitizing the data to be desensitized in the data table to be desensitized according to the K.
2. The method according to claim 1, wherein desensitizing the data to be desensitized in the data table to be desensitized according to the K comprises:
determining the size of a maximum data table processed by the system according to the performance parameter;
and if the size of the data table to be desensitized is larger than or equal to the lower limit of the size of a preset data table, and the size of the data table to be desensitized is smaller than or equal to the size of the maximum data table processed by the system, wherein the size of the maximum data table processed by the system is larger than the lower limit of the size of the preset data table, desensitizing the data to be desensitized in the data table to be desensitized according to the K.
3. The method according to claim 1 or 2, wherein desensitizing the data to be desensitized in the data table to be desensitized according to the K comprises:
and according to the K, generalizing or inhibiting the data to be desensitized in the data table to be desensitized to obtain one or more types of data, wherein each type of data comprises at least the K data.
4. The method of claim 2, further comprising, after said determining a size of a maximum data table processed by said system based on said performance parameter:
and if the size of the data table to be desensitized is larger than the size of the maximum data table processed by the system, deleting the data in the data table to be desensitized, so that the size of the data table to be desensitized after data deletion is larger than or equal to the lower limit of the size of the preset data table and smaller than or equal to the size of the maximum data table processed by the system.
5. The method according to claim 2 or 4, further comprising, after the obtaining a table of data to be desensitized from the log of data to be desensitized:
if the size of the data table to be desensitized is smaller than the lower limit of the size of the preset data table, determining desensitized data corresponding to the data to be desensitized in the data table to be desensitized according to the corresponding relationship between pre-stored data to be desensitized and desensitized data;
and replacing the data to be desensitized in the data table to be desensitized with the desensitized data corresponding to the data to be desensitized in the data table to be desensitized.
6. The method according to claim 1 or 2, before desensitizing the data to be desensitized in the data table to be desensitized according to the K, further comprising:
acquiring the number of idle cores of a Central Processing Unit (CPU) of the system;
dividing the data in the data table to be desensitized according to the number of idle cores of the CPU;
according to the K, desensitizing the data to be desensitized in the data table to be desensitized comprises the following steps:
desensitizing the data to be desensitized in the partitioned data table to be desensitized according to the idle core of the CPU and the K.
7. The method of claim 6, wherein the number of idle cores of the CPU is greater than or equal to a preset threshold;
the dividing the data in the data table to be desensitized according to the number of idle cores of the CPU comprises the following steps:
and dividing the data in the data table to be desensitized according to the number of idle cores of the CPU to obtain n groups of data, wherein n is an integer greater than zero and is less than the number of idle cores of the CPU.
8. The method of claim 7, wherein one of said n sets of data corresponds to an idle core of said CPU;
desensitizing the data to be desensitized in the partitioned data table to be desensitized according to the idle core of the CPU and the K, comprising:
desensitizing data to be desensitized in a group of data corresponding to each idle core on each idle core of the CPU according to the K.
9. The method of claim 2, wherein the performance parameters include free memory, CPU usage, and data desensitization average elapsed time of the system;
determining the data quantity K at least contained in each type of data after desensitization of the data table to be desensitized according to the performance parameters of the system, wherein the determining comprises the following steps:
determining a first performance value of the system according to the free memory of the system, determining a second performance value of the system according to the CPU utilization rate, and determining a third performance value of the system according to the average time consumption of data desensitization;
and determining the data quantity K at least contained in each type of data after desensitization of the data table to be desensitized according to the first performance value, the second performance value and the third performance value.
10. The method of claim 9 wherein determining the number of data K contained in each type of data desensitized to the data table to be desensitized based on the first, second, and third performance values comprises:
determining a total value of performance of the system based on the first, second, and third performance values;
and determining the data quantity K at least contained in each type of data after desensitization of the data table to be desensitized according to the total performance value, the upper performance limit value of the system, the upper preset data quantity limit value and the lower preset data quantity limit value.
11. The method of claim 9 or 10, wherein said determining a size of a maximum data table processed by said system according to said performance parameter comprises:
and determining the size of the data table to be desensitized according to the data quantity K at least contained in each type of data after desensitization of the data table to be desensitized, a preset data quantity upper limit value and a preset data table size upper limit value.
12. The method according to claim 1 or 2, wherein the obtaining of the log to be desensitized comprises:
obtaining at least one log;
judging whether the at least one log contains sensitive data;
and if the at least one log contains the sensitive data, determining that the at least one log is the log to be desensitized.
13. The method according to claim 1 or 2, further comprising, after desensitizing the data to be desensitized in the data table to be desensitized according to the K:
and outputting the desensitized data table to be desensitized to a log file of the system, and re-executing the step of acquiring the desensitized log.
14. A logging desensitization apparatus, comprising:
the log obtaining module is used for obtaining a log to be desensitized;
the data table obtaining module is used for obtaining a data table to be desensitized according to the log to be desensitized;
the data quantity determining module is used for determining the data quantity K at least contained in each type of data after desensitization of the data table to be desensitized according to the performance parameters of the system, wherein the K is greater than 1, and any data in each type of data cannot be distinguished from at least other K-1 data in the type;
and the data desensitization module is used for desensitizing the data to be desensitized in the data table to be desensitized according to the K.
15. A logging desensitization device, comprising:
a processor;
a memory; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor, the computer program comprising instructions for performing the method of any of claims 1-13.
16. A computer-readable storage medium, characterized in that it stores a computer program that causes a server to execute the method of any of claims 1-13.
CN202011311978.1A 2020-11-20 2020-11-20 Log desensitization method, device, equipment and storage medium Pending CN112231759A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011311978.1A CN112231759A (en) 2020-11-20 2020-11-20 Log desensitization method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011311978.1A CN112231759A (en) 2020-11-20 2020-11-20 Log desensitization method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112231759A true CN112231759A (en) 2021-01-15

Family

ID=74124401

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011311978.1A Pending CN112231759A (en) 2020-11-20 2020-11-20 Log desensitization method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112231759A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107301350A (en) * 2017-05-24 2017-10-27 国信优易数据有限公司 A kind of data processing method and system
CN110990869A (en) * 2019-11-29 2020-04-10 国家电网有限公司客户服务中心 Electric power big data desensitization method applied to privacy protection
CN111125769A (en) * 2019-12-27 2020-05-08 上海轻维软件有限公司 Mass data desensitization method based on ORACLE database
CN111695153A (en) * 2020-06-08 2020-09-22 中南大学 K-anonymization method, system, equipment and readable storage medium for multi-branch forest

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107301350A (en) * 2017-05-24 2017-10-27 国信优易数据有限公司 A kind of data processing method and system
CN110990869A (en) * 2019-11-29 2020-04-10 国家电网有限公司客户服务中心 Electric power big data desensitization method applied to privacy protection
CN111125769A (en) * 2019-12-27 2020-05-08 上海轻维软件有限公司 Mass data desensitization method based on ORACLE database
CN111695153A (en) * 2020-06-08 2020-09-22 中南大学 K-anonymization method, system, equipment and readable storage medium for multi-branch forest

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘振鹏: "PDMP:εk个性化数据脱敏保护方法", 《计算机应用研究》, 31 October 2020 (2020-10-31), pages 3068 - 3082 *
周倩伊;王亚民;王闯;: "基于互联网大数据的脱敏分析技术研究", 数据分析与知识发现, no. 02, 25 February 2018 (2018-02-25) *

Similar Documents

Publication Publication Date Title
CN111556059A (en) Abnormity detection method, abnormity detection device and terminal equipment
CN106372977B (en) A kind of processing method and equipment of virtual account
US11019494B2 (en) System and method for determining dangerousness of devices for a banking service
CN111464525B (en) Session identification method, session identification device, session identification control equipment and storage medium
CN110570311B (en) Block chain consensus method, device and equipment
CN114374566B (en) Attack detection method and device
CN112527433A (en) Page popup control method and device, computer equipment and storage medium
CN111597548B (en) Data processing method and device for realizing privacy protection
CN111723159A (en) Data verification method and device based on block chain
CN112395630A (en) Data encryption method and device based on information security, terminal equipment and medium
CN110390211B (en) Sensitive attribute data processing method and system
CN112560114A (en) Method and device for calling intelligent contract
CN113687949B (en) Server deployment method, device, deployment equipment and storage medium
CN114268497A (en) Network asset scanning method, device, equipment and medium
CN111880942A (en) Network threat processing method and device
CN112231759A (en) Log desensitization method, device, equipment and storage medium
CN110489568B (en) Method and device for generating event graph, storage medium and electronic equipment
CN105718767B (en) information processing method and device based on risk identification
CN110891097B (en) Cross-device user identification method and device
CN108650249A (en) POC attack detection methods, device, computer equipment and storage medium
WO2021144978A1 (en) Attack estimation device, attack estimation method, and attack estimation program
CN110062001A (en) Data put-on method, device, equipment and computer readable storage medium
EP3441930A1 (en) System and method of identifying potentially dangerous devices during the interaction of a user with banking services
CN115396280B (en) Alarm data processing method, device, equipment and storage medium
CN113742438B (en) Method and device for determining landslide susceptibility distribution map and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination