CN115310127A - Data desensitization method and device - Google Patents

Data desensitization method and device Download PDF

Info

Publication number
CN115310127A
CN115310127A CN202210947754.2A CN202210947754A CN115310127A CN 115310127 A CN115310127 A CN 115310127A CN 202210947754 A CN202210947754 A CN 202210947754A CN 115310127 A CN115310127 A CN 115310127A
Authority
CN
China
Prior art keywords
desensitization
data
external table
warehouse tool
data warehouse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210947754.2A
Other languages
Chinese (zh)
Inventor
秦胜勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202210947754.2A priority Critical patent/CN115310127A/en
Publication of CN115310127A publication Critical patent/CN115310127A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1737Details of further file system functions for reducing power consumption or coping with limited storage space, e.g. in mobile devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data desensitization method and device which can be applied to the technical field of data processing or the financial field. When the method is executed, data to be desensitized in a source heterogeneous database is obtained; then converting the data to be desensitized into a first memory data stream by using a preset rule, writing the data to be desensitized into a distributed file of a source data warehouse tool external table, and then desensitizing the source data warehouse tool external table by using the preset desensitization rule to obtain a desensitization result corresponding to the source data warehouse tool external table; and finally writing the desensitization result into a distributed file of an external table of the target data warehouse tool. Therefore, the data to be desensitized in the source heterogeneous database is directly converted into the first memory data stream, no data falls into a disk in the middle, and the database system does not need to frequently access the disk file, so that the effect of reducing the disk I/O overhead is achieved.

Description

Data desensitization method and device
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data desensitization method and apparatus.
Background
Data desensitization refers to data deformation of some sensitive information through desensitization rules, and reliable protection of sensitive private data is achieved. In the case of client security data or some business sensitive data, the real data is modified and provided for test use without violating system rules.
In the prior art, when desensitizing data of a heterogeneous database, the data needs to be exported to a file and then uploaded to an HDFS distributed file system, then a database command is executed to import the data to an external table of a Hive data warehouse tool, and finally desensitizing the data in an external table database of the Hive data warehouse tool.
After the data to be desensitized is exported to be a local file, the file is stored in a local disk, and when the file needs to be uploaded to the HDFS distributed file system, the file is read from the local disk, so that the Hive database system frequently accesses the disk file, the disk I/O overhead is high, and the disk is easy to damage.
Disclosure of Invention
In view of this, embodiments of the present application provide a data desensitization method and apparatus, which aim to convert data to be desensitized into a first memory data stream, and do not need to export the data to be desensitized into a local file, thereby reducing disk I/O overhead.
In a first aspect, an embodiment of the present application provides a data desensitization method, including:
acquiring data to be desensitized in a source heterogeneous database;
converting the data to be desensitized into a first memory data stream by using a preset rule, and writing the data to be desensitized into a distributed file of an external table of a source data warehouse tool;
desensitizing the external table of the source data warehouse tool by using a preset desensitization rule to obtain a desensitization result corresponding to the external table of the source data warehouse tool;
and writing the desensitization result into a distributed file of a table external to the target data warehouse tool.
Preferably, after the data to be desensitized is converted into the first memory data stream, writing the data to be desensitized into the distributed file in the external table of the source data warehouse tool includes:
converting the first memory data stream into a corresponding text;
and writing the text into the distributed file of the external table of the source data warehouse tool through the system interface of the distributed file.
Preferably, the performing, by using a preset desensitization rule, desensitization processing on the source data warehouse tool external table to obtain a desensitization result corresponding to the source data warehouse tool external table includes:
generating a database desensitization statement by using a preset desensitization rule;
and desensitizing fields of the external table of the source data warehouse tool by using the database desensitization statement to obtain a desensitization result corresponding to the external table of the source data warehouse tool.
Preferably, after writing the desensitization result in the distributed file of the target data warehouse tool external table, the method further comprises:
obtaining a desensitization result in a distributed file of the target data warehouse tool external table;
converting the desensitization result into a second memory data stream;
and writing the second memory data stream into a target heterogeneous database.
In a second aspect, an embodiment of the present application provides a data desensitization apparatus, including:
the first acquisition module is used for acquiring data to be desensitized in the source heterogeneous database;
the first writing module is used for converting the data to be desensitized into a first memory data stream by using a preset rule and then writing the data into a distributed file of an external table of a source data warehouse tool;
the data desensitization module is used for performing desensitization treatment on the external table of the source data warehouse tool by using a preset desensitization rule to obtain a desensitization result corresponding to the external table of the source data warehouse tool;
and the second writing module is used for writing the desensitization result into a distributed file of a table outside the target data warehouse tool.
Preferably, the second obtaining module is specifically configured to convert the first memory data stream into a corresponding text;
and writing the text into the distributed file of the external table of the source data warehouse tool through the system interface of the distributed file.
Preferably, the data desensitization module is specifically configured to generate a database desensitization statement by using a preset desensitization rule;
and carrying out desensitization processing on the fields of the external table of the source data warehouse tool by using the database desensitization statement to obtain a desensitization result corresponding to the external table of the source data warehouse tool.
Preferably, after writing the desensitization result in a distributed file of a table external to the target data warehouse tool, the apparatus further comprises:
the second acquisition module is used for acquiring a desensitization result in the distributed file of the external table of the target data warehouse tool;
the first conversion module is used for converting the desensitization result into a second memory data stream;
and the third writing module is used for writing the second memory data stream into a target heterogeneous database.
In a third aspect, an embodiment of the present application provides an electronic device, where the electronic device includes:
a memory for storing one or more programs;
a processor; the one or more programs, when executed by the processor, implement the data desensitization processing method of any of the preceding first aspects.
In a fourth aspect, an embodiment of the present application provides a computer storage medium, in which a program is stored, and when the program is executed by a processor, the method for desensitizing data processing according to any one of the preceding first aspects is implemented.
The technical scheme has the following beneficial effects:
the embodiment of the application provides a data desensitization method and device. When the method is executed, data to be desensitized in a source heterogeneous database are obtained; then converting the data to be desensitized into a first memory data stream by using a preset rule, writing the data to be desensitized into a distributed file of a source data warehouse tool external table, and then performing desensitization processing on the source data warehouse tool external table by using the preset desensitization rule to obtain a desensitization result corresponding to the source data warehouse tool external table; and finally writing the desensitization result into a distributed file of an external table of the target data warehouse tool. Therefore, the data to be desensitized in the source heterogeneous database is directly converted into the first memory data stream, no data falls into a disk in the middle, and the database system does not need to frequently access the disk file, so that the effect of reducing the disk I/O overhead is achieved.
Drawings
To illustrate the technical solutions in the present embodiment or the prior art more clearly, the drawings needed to be used in the description of the embodiment or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of a method of a data desensitization method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a data desensitization apparatus according to an embodiment of the present application.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the prior art, when desensitizing data of a heterogeneous database, the data needs to be exported to a File, then uploaded to a Distributed File System (HDFS), then executed by a database command to import the data to an external table of a Hive data warehouse tool, and finally desensitized in an external table database of the Hive data warehouse tool.
After the data to be desensitized is exported to be a local file, the file is stored in a local disk, and when the file needs to be uploaded to an HDFS distributed file system, the file is read from the local disk, so that the Hive database system frequently accesses the disk file, the disk I/O overhead is high, and the service life of the disk is influenced due to frequent reading and writing of the disk, so that the disk is easily damaged.
In order to overcome the technical problem described above, embodiments of the present application provide a data desensitization method, which may be executed by an operating system of a database.
It should be noted that the data desensitization method and apparatus provided by the present application can be used in the technical field of data processing or the financial field. The above is merely an example, and the application field of the data desensitization method and apparatus provided in the present application is not limited.
The related terms of the embodiments of the present application are described as follows:
hive: hive is a data warehouse tool based on Hadoop, is used for data extraction, transformation and loading, and is a mechanism for storing, querying and analyzing large-scale data stored in Hadoop. The Hive data warehouse tool can map the structured data file into a database table, provide SQL query function and convert SQL sentences into MapReduce tasks for execution.
HDFS (Hadoop distributed File System): HDFS refers to a Distributed File System (Distributed File System) designed to fit on general purpose hardware (comfort hardware). HDFS provides high throughput data access, and is well suited for application on large-scale datasets.
Heterogeneous databases: the heterogeneous database system is a set of a plurality of related database systems, sharing and transparent access of data can be realized, the database systems exist before being added into the heterogeneous database system, each component part which has the own database management system and the external database has autonomy, and each database system still has own application characteristics, integrity control and safety control while realizing data sharing.
Referring to fig. 1, fig. 1 is a flowchart of a method of a data desensitization method according to an embodiment of the present application, where the method may include:
step S101: and acquiring data to be desensitized in the source heterogeneous database.
Specifically, in this embodiment of the present application, the data to be desensitized in the source heterogeneous database may be obtained by an operating system of the database.
It should be noted that, the embodiment of the present application does not limit the manner of obtaining the data to be desensitized in the source heterogeneous Database, for example, the data to be desensitized may be obtained by linking the source heterogeneous Database through JDBC (Java Database Connectivity) Database interface specification, or obtained by connecting the source heterogeneous Database according to an interface provided by the source heterogeneous Database.
After the data to be desensitized in the source heterogeneous database is obtained, the data to be desensitized is conveniently converted into a first memory data stream by using a preset rule subsequently.
Step S102: and converting the data to be desensitized into a first memory data stream by using a preset rule, and writing the data to be desensitized into a distributed file of an external table of a source data warehouse tool.
In the embodiment of the application, through step S101, after obtaining data to be desensitized in the source heterogeneous database, the database operating system may convert the data to be desensitized into the first memory data stream by using a script. For example, the database operating system may write a corresponding script according to the data to be desensitized, and convert the data to be desensitized into the first memory data stream using the script.
It should be noted that, according to the embodiment of the present application, all data to be desensitized may be converted into a first memory data stream according to the idle condition of the memory; or converting part of data to be desensitized into a first memory data stream, and after writing the first memory data into a distributed file in an external table of a source Hive data warehouse tool, converting the rest of data to be desensitized into the first memory data stream, thereby relieving the memory pressure.
For example: assuming that the data to be desensitized is 0.5G, and 6G are idle; at this time, the database operating system can compile a corresponding script according to the data to be desensitized, and all the data to be desensitized are converted into a first memory data stream by using the script. Suppose that the data to be desensitized is 2G, and the memory has 2G idle; at this time, the database operating system can compile a corresponding script according to the data to be desensitized, and by using the script, part of the data to be desensitized (for example, 1G of data to be desensitized) can be converted into a first memory data stream, and after the first memory data stream is written into a distributed file in an external table of the source Hive data warehouse tool, the remaining part of the data to be desensitized is converted into the first memory data stream according to the idle condition of the memory, so that the memory pressure can be reduced.
Accordingly, after the data to be desensitized is converted into the first memory data stream, the database operating system may write the first memory data stream into the HDFS file in the external table of the source data warehouse tool.
It should be noted that, in the embodiment of the present application, the database operating system may write the first memory data stream into the HDFS file of the external table of the source Hive data warehouse tool, which is only an example and does not limit the source Hive data warehouse tool in the present application.
It can be understood that, since the first memory data stream is directly written into the HDFS file in the external table of the source Hive data warehouse tool, no data is dropped into the disk in the middle, which reduces the disk I/O overhead, and further, it is not necessary to execute a database command to load the file into the Hive database, thereby saving system resources.
It should be noted that in the embodiment of the application, the data to be desensitized in the heterogeneous databases are converted into the Hive database, that is, the heterogeneous databases and the Hive database are decoupled, so that differences among the heterogeneous databases are shielded.
As a preferred embodiment, after converting the data to be desensitized into the first in-memory data stream, writing into the distributed file in the external table of the source data warehouse tool may include: the database operating system converts the first memory data stream into a corresponding text; and writing the text into a distributed file of a table external to the source data warehouse tool through a system interface of the distributed file.
It can be understood that, due to the format requirement of the external table of the source data warehouse tool, the first memory data stream needs to be converted into a corresponding text, and the text is written into the HDFS file of the external table of the source data warehouse tool by calling a system interface of the HDFS file.
For example: the table structure of a certain external table of the source data warehouse tool is 'ID, name, age and ethnicity', the first memory data stream can be converted into corresponding texts '111222, zhang III and 22', and the texts are written into the HDFS file of the external table of the source data warehouse tool by calling a system interface of the HDFS file.
Step S103: and desensitizing the external table of the source data warehouse tool by using a preset desensitization rule to obtain a desensitization result corresponding to the external table of the source data warehouse tool.
In the embodiment of the application, the database system carries out desensitization processing on the external table of the source data warehouse tool by using a preset desensitization rule, so that a desensitization result corresponding to the external table of the source data warehouse tool is obtained.
It can be understood that, through step S102, the first memory data stream is directly written into the distributed file of the external table of the source data warehouse tool, and after the writing is completed, the first memory data stream is already in the database, so that data desensitization can be directly performed on the first memory data stream, and no other additional operation is required, thereby saving system resources.
As a preferred embodiment, the performing, by using a preset desensitization rule, a desensitization process on the source data warehouse tool external table to obtain a desensitization result corresponding to the source data warehouse tool external table may include: firstly, a database operating system generates a database desensitization statement by using a preset desensitization rule; and then desensitizing fields of the external table of the source data warehouse tool by using the database desensitization statement, so as to obtain a desensitization result corresponding to the external table of the source data warehouse tool.
For example: an operating system of the database can formulate a corresponding desensitization rule according to a field of an external table of a source data warehouse tool, generate a database desensitization statement by using the desensitization rule, and perform desensitization processing on the field of the external table of the source data warehouse tool by using the database desensitization statement to obtain a desensitization result corresponding to the external table of the source data warehouse tool.
Step S104: and writing the desensitization result into a distributed file of a table external to the target data warehouse tool.
In this embodiment of the application, after the desensitization result corresponding to the external table of the source data warehouse tool is obtained in step S103, the desensitization result may be written into a distributed file in the external table of the target data warehouse tool.
According to the technical scheme, the data to be desensitized in the source heterogeneous database are obtained firstly; then converting the data to be desensitized into a first memory data stream by using a preset rule, writing the data to be desensitized into a distributed file of a source data warehouse tool external table, and then desensitizing the source data warehouse tool external table by using the preset desensitization rule to obtain a desensitization result corresponding to the source data warehouse tool external table; and finally writing the desensitization result into a distributed file of an external table of the target data warehouse tool. Therefore, the data to be desensitized in the source heterogeneous database is directly converted into the first memory data stream, no data falls into a disk in the middle, and the database system does not need to frequently access the disk file, so that the effect of reducing the disk I/O overhead is achieved.
As a preferred embodiment, after writing the desensitization result to the distributed file of the target data warehouse tool external table, the method further comprises: obtaining a desensitization result in a distributed file of the target data warehouse tool external table; converting the desensitization result into a second memory data stream; and writing the second memory data stream into a target heterogeneous database.
Specifically, the database operating system first obtains a desensitization result in the distributed file in the external table of the target data warehouse tool, then converts the desensitization result into a second memory data stream, and finally writes the second memory data stream into the target heterogeneous database, wherein no data is dropped into a disk in the middle, thereby reducing the disk I/O overhead. Moreover, because the desensitization result in the HDFS file of the external table of the target data warehouse tool is directly obtained, the external table of the database does not need to be exported to be a local file by executing a database command, and the expenditure of disk I/O is further reduced.
It should be noted that, in the embodiment of the present application, the desensitization result of the target data warehouse tool database is transferred to the heterogeneous database, that is, the target data warehouse tool database is decoupled from the heterogeneous database, so that differences between the heterogeneous databases are shielded.
Some specific implementation manners of the data desensitization method are provided above for the embodiments of the present application, and based on this, the present application further provides a corresponding apparatus. The device provided by the embodiment of the present application will be described in terms of functional modularity.
Referring to FIG. 2, an exemplary structure of a data desensitization apparatus is shown, which includes a first acquisition module 100, a first write module 200, a data desensitization module 300, and a second write module 400.
The first obtaining module 100 is configured to obtain data to be desensitized in a source heterogeneous database;
a first writing module 200, configured to, after converting the data to be desensitized into a first memory data stream by using a preset rule, write the data to be desensitized into a distributed file in an external table of a source data warehouse tool;
the data desensitization module 300 is configured to perform desensitization processing on the source data warehouse tool external table according to a preset desensitization rule, so as to obtain a desensitization result corresponding to the source data warehouse tool external table;
a second writing module 400 for writing the desensitization result to a distributed file of a table external to the target data warehouse tool.
Optionally, the second obtaining module is specifically configured to convert the first memory data stream into a corresponding text;
and writing the text into the distributed file of the external table of the source data warehouse tool through the system interface of the distributed file.
Optionally, the data desensitization module is specifically configured to generate a database desensitization statement by using a preset desensitization rule;
and carrying out desensitization processing on the fields of the external table of the source data warehouse tool by using the database desensitization statement to obtain a desensitization result corresponding to the external table of the source data warehouse tool.
Optionally, after writing the desensitization result to the distributed file of the target data warehouse tool external table, the apparatus further comprises:
the second acquisition module is used for acquiring a desensitization result in the distributed file of the external table of the target data warehouse tool;
the first conversion module is used for converting the desensitization result into a second memory data stream;
and the third writing module is used for writing the second memory data stream into a target heterogeneous database.
The application discloses a data desensitization method and device. When the method is executed, data to be desensitized in a source heterogeneous database is obtained; then converting the data to be desensitized into a first memory data stream by using a preset rule, writing the data to be desensitized into a distributed file of a source data warehouse tool external table, and then desensitizing the source data warehouse tool external table by using the preset desensitization rule to obtain a desensitization result corresponding to the source data warehouse tool external table; and finally writing the desensitization result into a distributed file of an external table of the target data warehouse tool. Therefore, the data to be desensitized in the source heterogeneous database is directly converted into the first memory data stream, no data falls into a disk in the middle, and the database system does not need to frequently access the disk file, so that the effect of reducing the disk I/O overhead is achieved.
An embodiment of the present application further provides an electronic device, including: a memory for storing one or more programs;
a processor; the one or more programs, when executed by the processor, implement the data desensitization method of the above embodiments.
The embodiment of the application also provides a computer storage medium, wherein the computer storage medium stores a program, and when the program is executed by a processor, the data desensitization method in the embodiment is realized.
In the embodiments of the present application, the names "first" and "second" (if present) in the names "first" and "second" are used for name identification, and do not represent the first and second in sequence.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
Those skilled in the art can understand that the flowchart shown in the figure is only one example in which the embodiments of the present application can be implemented, and the application scope of the embodiments of the present application is not limited in any aspect by the flowchart.
In the several embodiments provided in the present application, it should be understood that the disclosed method, apparatus, and device may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some communication interfaces, indirect coupling or communication connection between devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of data desensitization, the method comprising:
acquiring data to be desensitized in a source heterogeneous database;
converting the data to be desensitized into a first memory data stream by using a preset rule, and writing the data to be desensitized into a distributed file of an external table of a source data warehouse tool;
desensitizing the external table of the source data warehouse tool by using a preset desensitization rule to obtain a desensitization result corresponding to the external table of the source data warehouse tool;
and writing the desensitization result into a distributed file of a table external to the target data warehouse tool.
2. The method of claim 1, wherein writing to the distributed file in the external table of the source data warehouse tool after converting the data to be desensitized to the first in-memory data stream comprises:
converting the first memory data stream into a corresponding text;
and writing the text into the distributed file of the external table of the source data warehouse tool through the system interface of the distributed file.
3. The method according to claim 1, wherein the desensitization processing of the external table of the source data warehouse tool using a preset desensitization rule to obtain a desensitization result corresponding to the external table of the source data warehouse tool comprises:
generating a database desensitization statement by using a preset desensitization rule;
and desensitizing fields of the external table of the source data warehouse tool by using the database desensitization statement to obtain a desensitization result corresponding to the external table of the source data warehouse tool.
4. The method of claim 1, wherein after writing the desensitization result to the distributed file in the target data warehouse tool external table, the method further comprises:
obtaining a desensitization result in a distributed file of the target data warehouse tool external table;
converting the desensitization result into a second memory data stream;
and writing the second memory data stream into a target heterogeneous database.
5. A data desensitization apparatus, characterized in that the apparatus comprises:
the first acquisition module is used for acquiring data to be desensitized in the source heterogeneous database;
the first writing module is used for converting the data to be desensitized into a first memory data stream by using a preset rule and then writing the data into a distributed file of an external table of a source data warehouse tool;
the data desensitization module is used for performing desensitization processing on the source data warehouse tool external table by using a preset desensitization rule to obtain a desensitization result corresponding to the source data warehouse tool external table;
and the second writing module is used for writing the desensitization result into a distributed file of a table outside the target data warehouse tool.
6. The apparatus according to claim 5, wherein the second obtaining module is specifically configured to convert the first in-memory data stream into a corresponding text; and writing the text into the distributed file of the external table of the source data warehouse tool through the system interface of the distributed file.
7. The apparatus according to claim 5, wherein the data desensitization module is specifically configured to generate a database desensitization statement using a preset desensitization rule; and desensitizing fields of the external table of the source data warehouse tool by using the database desensitization statement to obtain a desensitization result corresponding to the external table of the source data warehouse tool.
8. The apparatus of claim 5, wherein after writing the desensitization result to the distributed file in the target data warehouse tool external table, the apparatus further comprises:
the second acquisition module is used for acquiring a desensitization result in the distributed file of the external table of the target data warehouse tool;
the first conversion module is used for converting the desensitization result into a second memory data stream;
and the third writing module is used for writing the second memory data stream into a target heterogeneous database.
9. An electronic device, comprising:
a memory for storing one or more programs;
a processor; the one or more programs, when executed by the processor, implement the method of any of claims 1-4.
10. A storage medium, characterized in that the storage medium has stored thereon a program which, when executed by a processor, implements the method of any one of claims 1 to 4.
CN202210947754.2A 2022-08-05 2022-08-05 Data desensitization method and device Pending CN115310127A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210947754.2A CN115310127A (en) 2022-08-05 2022-08-05 Data desensitization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210947754.2A CN115310127A (en) 2022-08-05 2022-08-05 Data desensitization method and device

Publications (1)

Publication Number Publication Date
CN115310127A true CN115310127A (en) 2022-11-08

Family

ID=83861450

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210947754.2A Pending CN115310127A (en) 2022-08-05 2022-08-05 Data desensitization method and device

Country Status (1)

Country Link
CN (1) CN115310127A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117725623A (en) * 2024-02-18 2024-03-19 北京安华金和科技有限公司 Data desensitization processing method and system based on database bottom file

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117725623A (en) * 2024-02-18 2024-03-19 北京安华金和科技有限公司 Data desensitization processing method and system based on database bottom file
CN117725623B (en) * 2024-02-18 2024-05-17 北京安华金和科技有限公司 Data desensitization processing method and system based on database bottom file

Similar Documents

Publication Publication Date Title
US10169437B2 (en) Triplestore replicator
CN111324610A (en) Data synchronization method and device
CN111258966A (en) Data deduplication method, device, equipment and storage medium
CN108334609B (en) Method, device, equipment and storage medium for realizing JSON format data access in Oracle
CN106648569B (en) Target serialization realization method and device
CN113268500B (en) Service processing method and device and electronic equipment
CN110795697A (en) Logic expression obtaining method and device, storage medium and electronic device
CN110362630B (en) Data management method, device, equipment and computer readable storage medium
KR20220088958A (en) Systems and methods for managing connections in a scalable cluster
CN116561146A (en) Database log recording method, device, computer equipment and computer readable storage medium
CN115310127A (en) Data desensitization method and device
US11580251B1 (en) Query-based database redaction
CN110888972A (en) Sensitive content identification method and device based on Spark Streaming
CN111259038A (en) Database query and data export method, system, medium and equipment
CN113722296A (en) Agricultural information processing method and device, electronic equipment and storage medium
CN113703777A (en) Code generation method and device based on database table, storage medium and equipment
CN110704635B (en) Method and device for converting triplet data in knowledge graph
CN111046636A (en) Method and device for screening PDF file information, computer equipment and storage medium
US9286348B2 (en) Dynamic search system
US9201936B2 (en) Rapid provisioning of information for business analytics
CN115934537A (en) Interface test tool generation method, device, equipment, medium and product
CN117312420A (en) Data sharing method and related system
CN113792048B (en) Form verification rule generation method and system for non-relational database
EP1183596B1 (en) Generating optimized computer data field conversion routines
CN113157726B (en) Database processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination