CN114398665A - Data desensitization method, device, storage medium and terminal - Google Patents

Data desensitization method, device, storage medium and terminal Download PDF

Info

Publication number
CN114398665A
CN114398665A CN202111525829.XA CN202111525829A CN114398665A CN 114398665 A CN114398665 A CN 114398665A CN 202111525829 A CN202111525829 A CN 202111525829A CN 114398665 A CN114398665 A CN 114398665A
Authority
CN
China
Prior art keywords
data
sensitive
fields
field
sensitive field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111525829.XA
Other languages
Chinese (zh)
Inventor
李震宇
王振众
张哲�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangxiao Steel Structure Co Ltd
Original Assignee
Hangxiao Steel Structure Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangxiao Steel Structure Co Ltd filed Critical Hangxiao Steel Structure Co Ltd
Priority to CN202111525829.XA priority Critical patent/CN114398665A/en
Publication of CN114398665A publication Critical patent/CN114398665A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0407Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the identity of one or more communicating identities is hidden

Abstract

The invention discloses a data desensitization method, a data desensitization device, a storage medium and a terminal, which are applied to a server side, wherein the method comprises the following steps: when a target data text to be distributed is received, acquiring a plurality of fields of the target data text; identifying sensitive fields and non-sensitive fields in a plurality of fields, and generating a sensitive field set and a non-sensitive field set; desensitizing data of each sensitive field in the sensitive field set to generate desensitized data; and combining the desensitized data with the data of the non-sensitive field, and distributing the combined data to the client. According to the method and the device, the sensitive field existing in the target data text is automatically identified at the server side for automatic desensitization treatment, so that automatic identification, desensitization and distribution of data can be realized, the risk of data leakage caused by data circulation is reduced, and the efficiency of data distribution is improved.

Description

Data desensitization method, device, storage medium and terminal
Technical Field
The invention relates to the technical field of computers, in particular to a data desensitization method, a data desensitization device, a storage medium and a terminal.
Background
The sensitive data is also called private data, and commonly includes names, identification numbers, addresses, telephones, bank accounts, mailboxes, passwords, medical information, education backgrounds and the like. These information, which is closely related to personal life and work, is regulated by various industry and government data privacy regulations. If the enterprise or government responsible for storing and publishing such information cannot guarantee data privacy, serious financial, legal, or accountability risks are faced, with enormous losses in user trust. In the process of business development and daily activities of enterprises and governments, the scenario that data is issued step by step cannot be avoided. These data often include personal data of citizens. How to normally distribute data on the premise of ensuring that the personal information of citizens is not leaked becomes a difficult problem in daily work of governments and enterprises.
In the existing technical scheme, a data distribution method for the personal information of citizens mainly adopts a manual intervention mode, namely before data distribution, a special data security manager audits the data or a sensitive data identification tool identifies whether the data is sensitive or not, fuzzification/tagging opinions are given, and after fuzzification/tagging is carried out by an functional department, the data distribution is carried out. The current data distribution has the following defects: the method is extremely dependent on manpower, cannot be directly and automatically distributed due to the particularity of sensitive data, needs the intervention of manpower or sensitive data identification tools to carry out data identification classification and classification, and then manually distributes, and cannot form effective knowledge accumulation, and depends on the personal experience of functional personnel, so that the method consumes manpower resources, is lack of efficiency, and has influence on the timeliness of data; the method can not deposit to form an effective knowledge gallery, which causes the sensitive data distribution audit file failure easily caused by personnel change, and reduces the efficiency of data distribution.
Disclosure of Invention
The embodiment of the application provides a data desensitization method, a data desensitization device, a storage medium and a terminal. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
In a first aspect, an embodiment of the present application provides a data desensitization method, which is applied to a server, and the method includes:
when a target data text to be distributed is received, acquiring a plurality of fields of the target data text;
identifying sensitive fields and non-sensitive fields in a plurality of fields, and generating a sensitive field set and a non-sensitive field set;
desensitizing data of each sensitive field in the sensitive field set to generate desensitized data;
and combining the desensitized data with the data of the non-sensitive field, and distributing the combined data to the client.
Optionally, obtaining a plurality of fields of the target data text includes:
analyzing the target data text according to a preset key value rule, and generating an analyzed data text;
traversing the parsed data text one by one to obtain a key-value pair set;
dividing each key-value pair in the key-value pair set into attribute characteristics and value characteristics to obtain the attribute characteristics and the value characteristics of each key-value pair;
a plurality of fields of the target data text are determined based on the attribute features and the value features of each key-value pair.
Optionally, determining a plurality of fields of the target data text based on the attribute feature and the value feature of each key-value pair includes:
performing characteristic splicing on the attribute characteristics and the value characteristics of each key value pair to obtain a plurality of splicing characteristics;
converting each splicing characteristic into a vector to obtain a vector set;
combining the vectors in the vector set to generate a term matrix;
singular value decomposition is carried out on the lexical item matrix to obtain a plurality of semantic indexes;
and inquiring a field corresponding to each semantic index in the plurality of semantic indexes according to a preset semantic field table to obtain a plurality of fields of the target data text.
Optionally, identifying the sensitive field and the non-sensitive field in the plurality of fields, and generating a set of sensitive fields and a set of non-sensitive fields, includes:
acquiring a sensitive field type table set for a sensitive field;
analyzing a data type of each of the plurality of fields;
mapping one by one according to the sensitive field type table and the data type of each field to judge whether each field is a sensitive field, marking the sensitive field as 1 and marking the non-sensitive field as 0;
counting the fields marked as 1 and determining the fields as a sensitive field set;
the field marked 0 is counted and determined as the non-sensitive field set.
Optionally, analyzing the data type of each of the plurality of fields includes:
inputting each field in the plurality of fields into a pre-trained data type recognition model, and outputting the data type of each field; wherein the content of the first and second substances,
generating a pre-trained data type recognition model according to the following steps, including:
creating a data type identification model by adopting a convolutional neural network;
acquiring a plurality of pieces of pre-marked field-type label data;
inputting a plurality of field-type label data into a data type identification model for training, and outputting a loss value;
when the loss value reaches the minimum, a pre-trained data type recognition model is generated.
Optionally, desensitizing data of each sensitive field in the sensitive field set to generate desensitized data, including:
obtaining a desensitization rule corresponding to each sensitive field according to the type information of each sensitive field in the sensitive field set;
desensitizing the data of the corresponding sensitive field according to the desensitizing rule corresponding to each sensitive field to obtain desensitized data;
desensitized data is generated after desensitized replication of the desensitized data.
Optionally, after desensitizing the desensitized data and generating desensitized data by desensitizing replication, the method further includes:
sensitive field sets and desensitization data are deleted.
In a second aspect, an embodiment of the present application provides a data desensitization apparatus, which is applied to a server, and includes:
the field acquisition module is used for acquiring a plurality of fields of a target data text when the target data text to be distributed is received;
the field identification module is used for identifying sensitive fields and non-sensitive fields in a plurality of fields and generating a sensitive field set and a non-sensitive field set;
the field desensitization module is used for desensitizing data of each sensitive field in the sensitive field set to generate desensitized data;
and the data distribution module is used for combining the desensitized data with the data of the non-sensitive field and distributing the combined data to the client.
In a third aspect, embodiments of the present application provide a computer storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor and to perform the above-mentioned method steps.
In a fourth aspect, an embodiment of the present application provides a terminal, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
in the embodiment of the application, a data desensitization device firstly acquires a plurality of fields of a target data text when receiving the target data text to be distributed, then identifies sensitive fields and non-sensitive fields in the plurality of fields to generate a sensitive field set and a non-sensitive field set, secondly desensitizes data of each sensitive field in the sensitive field set to generate desensitized data, and finally combines the desensitized data with the data of the non-sensitive fields and distributes the combined data to a client. According to the method and the device, the sensitive field existing in the target data text is automatically identified at the server side for automatic desensitization treatment, so that automatic identification, desensitization and distribution of data can be realized, the risk of data leakage caused by data circulation is reduced, and the efficiency of data distribution is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a schematic flow chart of a data desensitization method according to an embodiment of the present application;
FIG. 2 is a schematic flow diagram of a data desensitization process provided herein;
FIG. 3 is a schematic flow chart diagram of another data desensitization method provided by embodiments of the present application;
FIG. 4 is a schematic structural diagram of a data desensitization apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present application.
Detailed Description
The following description and the drawings sufficiently illustrate specific embodiments of the invention to enable those skilled in the art to practice them.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
The application provides a data desensitization method, a data desensitization device, a storage medium and a terminal, which are applied to a server to solve the problems in the related technical problems. In the technical scheme provided by the application, because the sensitive field existing in the target data text is automatically identified at the server side for automatic desensitization processing, automatic identification, desensitization and distribution of data are realized, the risk of data leakage caused by data circulation is reduced, and the efficiency of data distribution is improved, which is described in detail by adopting an exemplary embodiment.
The data desensitization method provided by the embodiments of the present application will be described in detail below with reference to fig. 1-3. The method may be implemented in dependence on a computer program, operable on a data desensitizing device based on the von neumann architecture. The computer program may be integrated into the application or may run as a separate tool-like application.
Referring to fig. 1, a flow diagram of a data desensitization method is provided for the embodiment of the present application, and is applied to a server. As shown in fig. 1, the method of the embodiment of the present application may include the following steps:
s101, when a target data text to be distributed is received, acquiring a plurality of fields of the target data text;
in general, a data issuer refers to an organization or an individual terminal intended to share data to another party, and has a database access authority.
In a possible implementation manner, when multiple fields of a target data text are obtained, the target data text is firstly analyzed according to a preset key value rule, the analyzed data text is generated, then a key value pair set is obtained through traversal one by one in the analyzed data text, then each key value pair in the key value pair set is divided into attribute characteristics and value characteristics, the attribute characteristics and the value characteristics of each key value pair are obtained, and finally the multiple fields of the target data text are determined based on the attribute characteristics and the value characteristics of each key value pair.
Specifically, when determining a plurality of fields of a target data text based on the attribute features and the value features of each key value pair, firstly, feature splicing is carried out on the attribute features and the value features of each key value pair to obtain a plurality of splicing features, then, each splicing feature is converted into a vector to obtain a vector set, then, each vector in the vector set is combined to generate a term matrix, then, the term matrix is subjected to singular value decomposition to obtain a plurality of semantic indexes, and finally, the fields corresponding to each semantic index in the plurality of semantic indexes are inquired according to a preset semantic field table to obtain a plurality of fields of the target data text.
S102, identifying sensitive fields and non-sensitive fields in the fields, and generating a sensitive field set and a non-sensitive field set;
wherein a sensitive field is a field requiring desensitization processing, which field relates to private data. The non-sensitive field is a field which is not required to be encrypted and can be directly provided to a third party.
In a possible implementation manner, when a sensitive field set and a non-sensitive field set are generated, a sensitive field type table set for a sensitive field is obtained, a data type of each field in a plurality of fields is analyzed, then mapping is performed one by one according to the sensitive field type table and the data type of each field to judge whether each field is a sensitive field or not, the sensitive field is marked as 1, the non-sensitive field is marked as 0, and finally, the field marked as 1 is counted and determined as the sensitive field set, the field marked as 0 is counted and determined as the non-sensitive field set.
Specifically, when the data type of each of the plurality of fields is analyzed, each of the plurality of fields is input into a pre-trained data type recognition model, and the data type of each field is output.
Further, a pre-trained data type recognition model can be generated according to the following steps, firstly, a convolutional neural network is adopted to create the data type recognition model, then, a plurality of pieces of field-type label data which are marked in advance are obtained, then, the plurality of pieces of field-type label data are input into the data type recognition model to be trained, a loss value is output, and finally, when the loss value reaches the minimum value, the pre-trained data type recognition model is generated.
Further, when the loss value does not reach the minimum value, continuously inputting the pre-marked multiple pieces of field-type label data into the data type recognition model for training until the loss value cannot be changed, and generating the pre-trained data type recognition model.
S103, desensitizing the data of each sensitive field in the sensitive field set to generate desensitized data;
in the embodiment of the application, when desensitization processing is performed, a desensitization rule corresponding to each sensitive field is obtained according to type information of each sensitive field in a sensitive field set, then desensitization processing is performed on data of the corresponding sensitive field according to the desensitization rule corresponding to each sensitive field to obtain desensitization data, and finally desensitization data is copied to generate desensitization data.
Further, after desensitized data is generated after desensitized data is copied, the sensitive field set and the desensitized data need to be deleted, so that the security of the sensitive data can be greatly increased.
It should be noted that, generally, in a production environment, sensitive data needs to be desensitized in real time, because sometimes, for reading the same sensitive data under different conditions, desensitization processing needs to be performed at different levels, for example: the desensitization schemes performed by different roles, different authorities, may differ.
Specifically, the desensitization rule can be divided into an invalidation scheme, a random value scheme, a data replacement scheme, a symmetric encryption scheme, an average scheme, and an offset and rounding scheme.
In a possible implementation manner, when the desensitization rule is an invalidation manner, and the invalidation manner is used for desensitizing the sensitive data by performing truncation, encryption, hiding and the like on the data value of the sensitive field when the data to be desensitized is processed, the sensitive data no longer has a utilization value. The method for hiding sensitive data is simple, but has the disadvantage that a user cannot know the format of original data, and the user needs to authorize the query if the user wants to obtain complete information. For example, replacing the true number with the identification number has become "220724 a 3523"
In another possible implementation manner, when the desensitization rule is a random value manner, the random value replacement manner is to change letters of the sensitive data into random letters, numbers into random numbers, and characters replace characters randomly to change the sensitive data.
In another possible implementation, when the desensitization rule is data replacement, the data replacement is similar to the previous invalidation, except that instead of masking with special characters, the true values are replaced with a set virtual value. For example, we set the mobile phone number to "13651300000" uniformly.
In another possible implementation, when the desensitization rule is a symmetric encryption method, symmetric encryption is a special reversible desensitization method, sensitive data is encrypted by an encryption key and an algorithm, a ciphertext format is consistent with the original data in a logic rule, and the original data can be recovered by key decryption, for example, the SHA256 encryption algorithm is used for encryption.
In another possible implementation, when the desensitization rule is an average method, the average method needs to calculate the average of numerical data, and then randomly distribute desensitized values around the average, so as to keep the sum of the data constant.
In another possible implementation, when the desensitization rule is an offset and round-robin scheme, the scheme changes digital data through random shifting, and offset round-robin guarantees approximate authenticity of a range while maintaining data security, is closer to real data than previous schemes, and has a larger meaning in a big data analysis scene. For example, 2020-12-0815: 12:25 in the date field create _ time becomes 2018-01-0215: 00: 00.
And S104, combining the desensitized data with the data of the non-sensitive field, and distributing the combined data to the client.
In one possible implementation, after the desensitized target field is obtained, the desensitized data is combined with the data of the non-sensitive field, and the combined data is distributed to the client.
For example, as shown in fig. 2, fig. 2 is a schematic flow chart of a data desensitization process provided in the present application, and a user of an issuing area refers to a database user specifically created for a data issuing party in a database, so that the issued data is isolated from other data in the database; the delivered data (target data text to be distributed) refers to data which the delivery party intends to share, wherein the data can be sensitive data or non-sensitive data. The sensitive data identification device classifies data according to the data text characteristics, the classification standard refers to the data classification standard in the organization, judges whether the data is sensitive data or not according to the type, and records and marks the judged type. The type information of the data is the basis for judging whether the data is sensitive or not and desensitizing the sensitive data, and is separately configured and stored in a table and comprises information such as table-field-sensitive types and the like. And desensitizing copying the data by the desensitizing copying device according to the data type information to create desensitized copies. Desensitization copy and receiving area users, in desensitization copy, the data of the sensitive field will be fuzzified, including replacing with, SHA256 encrypts, etc., make the fuzzified data have no possibility of reproducing, distinguishing, according to the data classification standard in the organization. The receiving area users refer to database users created in the database for the receiving party, and only desensitization copies are stored for data isolation. The data receiving side is the downstream of the data distribution and receives the data distributed upstream.
In the embodiment of the application, a data desensitization device firstly acquires a plurality of fields of a target data text when receiving the target data text to be distributed, then identifies sensitive fields and non-sensitive fields in the plurality of fields to generate a sensitive field set and a non-sensitive field set, secondly desensitizes data of each sensitive field in the sensitive field set to generate desensitized data, and finally combines the desensitized data with the data of the non-sensitive fields and distributes the combined data to a client. According to the method and the device, the sensitive field existing in the target data text is automatically identified at the server side for automatic desensitization treatment, so that automatic identification, desensitization and distribution of data can be realized, the risk of data leakage caused by data circulation is reduced, and the efficiency of data distribution is improved.
Referring to fig. 3, a schematic flow chart of a data desensitization method is provided for the embodiment of the present application, and is applied to a server. As shown in fig. 3, the method of the embodiment of the present application may include the following steps:
s201, analyzing a target data text according to a preset key value rule, and generating an analyzed data text;
s202, traversing the parsed data text one by one to obtain a key-value pair set;
s203, dividing each key-value pair in the key-value pair set into attribute characteristics and value characteristics to obtain the attribute characteristics and the value characteristics of each key-value pair;
s204, determining a plurality of fields of the target data text based on the attribute characteristics and the value characteristics of each key value pair;
s205, acquiring a sensitive field type table set for a sensitive field;
s206, analyzing the data type of each field in the plurality of fields;
s207, mapping one by one according to the sensitive field type table and the data type of each field to judge whether each field is a sensitive field, marking the sensitive field as 1 and marking the non-sensitive field as 0;
s208, counting the field marked as 1 and determining the field as a sensitive field set, and counting the field marked as 0 and determining the field as a non-sensitive field set;
s209, acquiring desensitization rules corresponding to each sensitive field according to the type information of each sensitive field in the sensitive field set;
s210, desensitizing the data of the corresponding sensitive fields according to desensitization rules corresponding to each sensitive field to obtain desensitization data, and desensitizing and copying the desensitization data to generate desensitized data;
and S211, combining the desensitized data with the data of the non-sensitive field, and distributing the combined data to the client.
In the embodiment of the application, a data desensitization device firstly acquires a plurality of fields of a target data text when receiving the target data text to be distributed, then identifies sensitive fields and non-sensitive fields in the plurality of fields to generate a sensitive field set and a non-sensitive field set, secondly desensitizes data of each sensitive field in the sensitive field set to generate desensitized data, and finally combines the desensitized data with the data of the non-sensitive fields and distributes the combined data to a client. According to the method and the device, the sensitive field existing in the target data text is automatically identified at the server side for automatic desensitization treatment, so that automatic identification, desensitization and distribution of data can be realized, the risk of data leakage caused by data circulation is reduced, and the efficiency of data distribution is improved.
The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details which are not disclosed in the embodiments of the apparatus of the present invention, reference is made to the embodiments of the method of the present invention.
Referring to fig. 4, a schematic structural diagram of a data desensitization apparatus according to an exemplary embodiment of the present invention is shown, which is applied to a server. The data desensitization means may be implemented as all or part of the terminal in software, hardware or a combination of both. The device 1 comprises a field acquisition module 10, a field identification module 20, a field desensitization module 30 and a data distribution module 40.
The field acquisition module 10 is configured to acquire a plurality of fields of a target data text when the target data text to be distributed is received;
a field identification module 20, configured to identify a sensitive field and a non-sensitive field in a plurality of fields, and generate a sensitive field set and a non-sensitive field set;
a field desensitization module 30, configured to perform desensitization processing on data of each sensitive field in the sensitive field set, and generate desensitized data;
and the data distribution module 40 is used for combining the desensitized data with the data of the non-sensitive field and distributing the combined data to the client.
It should be noted that, when the data desensitization apparatus provided in the foregoing embodiment performs the data desensitization method, only the division of the functional modules is illustrated, and in practical applications, the above function allocation may be completed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the data desensitization device provided by the above embodiment and the data desensitization method embodiment belong to the same concept, and details of implementation processes thereof are found in the method embodiment and are not described herein again.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
In the embodiment of the application, a data desensitization device firstly acquires a plurality of fields of a target data text when receiving the target data text to be distributed, then identifies sensitive fields and non-sensitive fields in the plurality of fields to generate a sensitive field set and a non-sensitive field set, secondly desensitizes data of each sensitive field in the sensitive field set to generate desensitized data, and finally combines the desensitized data with the data of the non-sensitive fields and distributes the combined data to a client. According to the method and the device, the sensitive field existing in the target data text is automatically identified at the server side for automatic desensitization treatment, so that automatic identification, desensitization and distribution of data can be realized, the risk of data leakage caused by data circulation is reduced, and the efficiency of data distribution is improved.
The present invention also provides a computer readable medium having stored thereon program instructions which, when executed by a processor, implement the data desensitization method provided by the various method embodiments described above.
The present invention also provides a computer program product containing instructions which, when run on a computer, cause the computer to perform the data desensitization methods of the various method embodiments described above.
Please refer to fig. 5, which provides a schematic structural diagram of a terminal according to an embodiment of the present application. As shown in fig. 5, terminal 1000 can include: at least one processor 1001, at least one network interface 1004, a user interface 1003, memory 1005, at least one communication bus 1002.
Wherein a communication bus 1002 is used to enable connective communication between these components.
The user interface 1003 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.
Processor 1001 may include one or more processing cores, among other things. The processor 1001, which is connected to various parts throughout the electronic device 1000 using various interfaces and lines, performs various functions of the electronic device 1000 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 1005 and calling data stored in the memory 1005. Alternatively, the processor 1001 may be implemented in at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 1001 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 1001, but may be implemented by a single chip.
The Memory 1005 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 1005 includes a non-transitory computer-readable medium. The memory 1005 may be used to store an instruction, a program, code, a set of codes, or a set of instructions. The memory 1005 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. The memory 1005 may optionally be at least one memory device located remotely from the processor 1001. As shown in fig. 5, memory 1005, which is one type of computer storage medium, may include an operating system, a network communication module, a user interface module, and a data desensitization application.
In the terminal 1000 shown in fig. 5, the user interface 1003 is mainly used as an interface for providing input for a user, and acquiring data input by the user; and the processor 1001 may be configured to invoke a data desensitization application stored in the memory 1005 and specifically perform the following operations:
when a target data text to be distributed is received, acquiring a plurality of fields of the target data text;
identifying sensitive fields and non-sensitive fields in a plurality of fields, and generating a sensitive field set and a non-sensitive field set;
desensitizing data of each sensitive field in the sensitive field set to generate desensitized data;
and combining the desensitized data with the data of the non-sensitive field, and distributing the combined data to the client.
In one embodiment, when the processor 1001 executes the following operations to obtain multiple fields of a target data text:
analyzing the target data text according to a preset key value rule, and generating an analyzed data text;
traversing the parsed data text one by one to obtain a key-value pair set;
dividing each key-value pair in the key-value pair set into attribute characteristics and value characteristics to obtain the attribute characteristics and the value characteristics of each key-value pair;
a plurality of fields of the target data text are determined based on the attribute features and the value features of each key-value pair.
In one embodiment, the processor 1001, when performing the determining of the plurality of fields of the target data text based on the attribute feature and the value feature of each key-value pair, specifically performs the following operations:
performing characteristic splicing on the attribute characteristics and the value characteristics of each key value pair to obtain a plurality of splicing characteristics;
converting each splicing characteristic into a vector to obtain a vector set;
combining the vectors in the vector set to generate a term matrix;
singular value decomposition is carried out on the lexical item matrix to obtain a plurality of semantic indexes;
and inquiring a field corresponding to each semantic index in the plurality of semantic indexes according to a preset semantic field table to obtain a plurality of fields of the target data text.
In one embodiment, when identifying the sensitive field and the non-sensitive field in the plurality of fields and generating the sensitive field set and the non-sensitive field set, the processor 1001 specifically performs the following operations:
acquiring a sensitive field type table set for a sensitive field;
analyzing a data type of each of the plurality of fields;
mapping one by one according to the sensitive field type table and the data type of each field to judge whether each field is a sensitive field, marking the sensitive field as 1 and marking the non-sensitive field as 0;
counting the fields marked as 1 and determining the fields as a sensitive field set;
the field marked 0 is counted and determined as the non-sensitive field set.
In one embodiment, the processor 1001 performs the following operations when performing the analysis of the data type of each of the plurality of fields:
inputting each field in the plurality of fields into a pre-trained data type recognition model, and outputting the data type of each field; wherein the content of the first and second substances,
generating a pre-trained data type recognition model according to the following steps, including:
creating a data type identification model by adopting a convolutional neural network;
acquiring a plurality of pieces of pre-marked field-type label data;
inputting a plurality of field-type label data into a data type identification model for training, and outputting a loss value;
when the loss value reaches the minimum, a pre-trained data type recognition model is generated.
In an embodiment, when performing desensitization processing on data in each sensitive field in the sensitive field set and generating desensitized data, the processor 1001 specifically performs the following operations:
obtaining a desensitization rule corresponding to each sensitive field according to the type information of each sensitive field in the sensitive field set;
desensitizing the data of the corresponding sensitive field according to the desensitizing rule corresponding to each sensitive field to obtain desensitized data;
desensitized data is generated after desensitized replication of the desensitized data.
In the embodiment of the application, a data desensitization device firstly acquires a plurality of fields of a target data text when receiving the target data text to be distributed, then identifies sensitive fields and non-sensitive fields in the plurality of fields to generate a sensitive field set and a non-sensitive field set, secondly desensitizes data of each sensitive field in the sensitive field set to generate desensitized data, and finally combines the desensitized data with the data of the non-sensitive fields and distributes the combined data to a client. According to the method and the device, the sensitive field existing in the target data text is automatically identified at the server side for automatic desensitization treatment, so that automatic identification, desensitization and distribution of data can be realized, the risk of data leakage caused by data circulation is reduced, and the efficiency of data distribution is improved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by a computer program to instruct associated hardware, and the data desensitization program may be stored in a computer readable storage medium, and when executed, may include the processes of the embodiments of the methods as described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory or a random access memory.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims (10)

1. A data desensitization method, applied to a server, the method comprising:
when a target data text to be distributed is received, acquiring a plurality of fields of the target data text;
identifying sensitive fields and non-sensitive fields in the fields, and generating a sensitive field set and a non-sensitive field set;
desensitizing the data of each sensitive field in the sensitive field set to generate desensitized data;
and combining the desensitized data with the data of the non-sensitive field, and distributing the combined data to a client.
2. The method of claim 1, wherein obtaining the plurality of fields of the target data text comprises:
analyzing the target data text according to a preset key value rule, and generating an analyzed data text;
traversing the parsed data text one by one to obtain a key-value pair set;
dividing each key-value pair in the key-value pair set into attribute characteristics and value characteristics to obtain the attribute characteristics and the value characteristics of each key-value pair;
determining a plurality of fields of the target data text based on the attribute features and the value features of each key-value pair.
3. The method of claim 2, wherein determining the plurality of fields of the target data text based on the attribute features and the value features of each key-value pair comprises:
performing characteristic splicing on the attribute characteristics and the value characteristics of each key value pair to obtain a plurality of splicing characteristics;
converting each splicing feature into a vector to obtain a vector set;
combining the vectors in the vector set to generate a term matrix;
singular value decomposition is carried out on the lexical item matrix to obtain a plurality of semantic indexes;
and inquiring a field corresponding to each semantic index in the plurality of semantic indexes according to a preset semantic field table to obtain a plurality of fields of the target data text.
4. The method of claim 1, wherein identifying sensitive fields and non-sensitive fields in the plurality of fields, generating a set of sensitive fields and a set of non-sensitive fields comprises:
acquiring a sensitive field type table set for the sensitive field;
analyzing a data type of each of the plurality of fields;
mapping one by one according to the sensitive field type table and the data type of each field to judge whether each field is a sensitive field, marking the sensitive field as 1 and marking the non-sensitive field as 0;
counting the fields marked as 1 and determining the fields as a sensitive field set;
and counting the fields marked as 0 and determining as a non-sensitive field set.
5. The method of claim 4, wherein analyzing the data type of each of the plurality of fields comprises:
inputting each field in the plurality of fields into a pre-trained data type recognition model, and outputting the data type of each field; wherein the content of the first and second substances,
generating a pre-trained data type recognition model according to the following steps, including:
creating a data type identification model by adopting a convolutional neural network;
acquiring a plurality of pieces of pre-marked field-type label data;
inputting the multiple pieces of field-type label data into the data type recognition model for training, and outputting a loss value;
and when the loss value reaches the minimum value, generating a pre-trained data type recognition model.
6. The method according to claim 1, wherein the desensitizing the data of each sensitive field in the set of sensitive fields to generate desensitized data comprises:
obtaining a desensitization rule corresponding to each sensitive field according to the type information of each sensitive field in the sensitive field set;
desensitizing the data of the corresponding sensitive field according to the desensitizing rule corresponding to each sensitive field to obtain desensitized data;
desensitized data is generated after desensitized replication of the desensitized data.
7. The method of claim 6, wherein after generating desensitized data after desensitizing the replication of desensitized data, further comprising:
deleting the set of sensitive fields and desensitization data.
8. A data desensitization apparatus, applied to a server, the apparatus comprising:
the field acquisition module is used for acquiring a plurality of fields of a target data text when the target data text to be distributed is received;
the field identification module is used for identifying sensitive fields and non-sensitive fields in the fields and generating a sensitive field set and a non-sensitive field set;
the field desensitization module is used for desensitizing data of each sensitive field in the sensitive field set to generate desensitized data;
and the data distribution module is used for combining the desensitized data and the data of the non-sensitive field and distributing the combined data to the client.
9. A computer storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to perform the method steps according to any of claims 1-7.
10. A terminal, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1-7.
CN202111525829.XA 2021-12-14 2021-12-14 Data desensitization method, device, storage medium and terminal Pending CN114398665A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111525829.XA CN114398665A (en) 2021-12-14 2021-12-14 Data desensitization method, device, storage medium and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111525829.XA CN114398665A (en) 2021-12-14 2021-12-14 Data desensitization method, device, storage medium and terminal

Publications (1)

Publication Number Publication Date
CN114398665A true CN114398665A (en) 2022-04-26

Family

ID=81226329

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111525829.XA Pending CN114398665A (en) 2021-12-14 2021-12-14 Data desensitization method, device, storage medium and terminal

Country Status (1)

Country Link
CN (1) CN114398665A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115174246A (en) * 2022-07-18 2022-10-11 中国银行股份有限公司 Information processing method and system
CN115514564A (en) * 2022-09-22 2022-12-23 窦彦彬 Data security processing method and system based on data sharing
CN115618398A (en) * 2022-12-20 2023-01-17 吉林省信息技术研究所 System and method for encrypting user information of network database
CN116070205A (en) * 2023-03-07 2023-05-05 北京和升达信息安全技术有限公司 Data clearing method and device, electronic equipment and storage medium
CN116361858A (en) * 2023-04-10 2023-06-30 广西南宁玺北科技有限公司 User session resource data protection method and software product applying AI decision
CN117094033A (en) * 2023-10-19 2023-11-21 南京怡晟安全技术研究院有限公司 Security destruction evaluation system and method based on key data sensitivity

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115174246A (en) * 2022-07-18 2022-10-11 中国银行股份有限公司 Information processing method and system
CN115174246B (en) * 2022-07-18 2024-03-19 中国银行股份有限公司 Information processing method and system
CN115514564A (en) * 2022-09-22 2022-12-23 窦彦彬 Data security processing method and system based on data sharing
CN115618398A (en) * 2022-12-20 2023-01-17 吉林省信息技术研究所 System and method for encrypting user information of network database
CN116070205A (en) * 2023-03-07 2023-05-05 北京和升达信息安全技术有限公司 Data clearing method and device, electronic equipment and storage medium
CN116361858A (en) * 2023-04-10 2023-06-30 广西南宁玺北科技有限公司 User session resource data protection method and software product applying AI decision
CN116361858B (en) * 2023-04-10 2024-01-26 北京无限自在文化传媒股份有限公司 User session resource data protection method and software product applying AI decision
CN117094033A (en) * 2023-10-19 2023-11-21 南京怡晟安全技术研究院有限公司 Security destruction evaluation system and method based on key data sensitivity
CN117094033B (en) * 2023-10-19 2024-01-09 南京怡晟安全技术研究院有限公司 Security destruction evaluation system and method based on key data sensitivity

Similar Documents

Publication Publication Date Title
CN114398665A (en) Data desensitization method, device, storage medium and terminal
CN109815742B (en) Data desensitization method and device
KR102430649B1 (en) Computer-implemented system and method for automatically identifying attributes for anonymization
CN110598442A (en) Sensitive data self-adaptive desensitization method and system
CN116506217B (en) Analysis method, system, storage medium and terminal for security risk of service data stream
CN111079174A (en) Power consumption data desensitization method and system based on anonymization and differential privacy technology
US20230410177A1 (en) Distributed database structures for anonymous information exchange
CN115168887B (en) Mobile terminal stealth processing method and device based on differential authority privacy protection
CN112417492A (en) Service providing method based on data classification and classification
CN112598513B (en) Method and device for identifying stockholder risk transaction behaviors
CN113111369B (en) Data protection method and system in data annotation
CN111639179B (en) Batch customer information privacy control method and device for bank front-end query system
CN111402120A (en) Method and device for processing annotated image
CN114186275A (en) Privacy protection method and device, computer equipment and storage medium
CN111782719B (en) Data processing method and device
CN114818000B (en) Privacy protection set confusion intersection method, system and related equipment
CN110197078B (en) Data processing method and device, computer readable medium and electronic equipment
CN110879808A (en) Information processing method and device
GB2600823A (en) Masking sensitive information in a document
CN115758435A (en) External sharing security processing method for company marketing data and related equipment
CN112765673A (en) Sensitive data statistical method and related device
CN112651039A (en) Electric power data differentiation desensitization method and device fusing service scenes
CN112000727B (en) Desensitization display method for dynamically configured service data
WO2023031938A1 (en) System and method for managing data access requests
CN114237798A (en) Data processing method, device, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination