CN111291403B - Data desensitizing device based on distributed cluster - Google Patents

Data desensitizing device based on distributed cluster Download PDF

Info

Publication number
CN111291403B
CN111291403B CN202010042550.5A CN202010042550A CN111291403B CN 111291403 B CN111291403 B CN 111291403B CN 202010042550 A CN202010042550 A CN 202010042550A CN 111291403 B CN111291403 B CN 111291403B
Authority
CN
China
Prior art keywords
data
thread
server
threads
queue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010042550.5A
Other languages
Chinese (zh)
Other versions
CN111291403A (en
Inventor
程永新
宋辉
郭振宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai New Torch Network Information Technology Ltd By Share Ltd
Original Assignee
Shanghai New Torch Network Information Technology Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai New Torch Network Information Technology Ltd By Share Ltd filed Critical Shanghai New Torch Network Information Technology Ltd By Share Ltd
Priority to CN202010042550.5A priority Critical patent/CN111291403B/en
Publication of CN111291403A publication Critical patent/CN111291403A/en
Application granted granted Critical
Publication of CN111291403B publication Critical patent/CN111291403B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data desensitizing device based on a distributed cluster, which comprises a master server, a thread master scheduler and a plurality of slave servers, wherein each slave server is provided with a thread scheduler for distributing threads of the slave servers, and the master server slices each source data table in a database which needs to be desensitized and places the table slices into a slicing queue of the source data table; and the main server distributes a defined thread pipeline to each source data table through a thread total scheduler, and the thread total scheduler dispatches threads to pull data from the slicing queues through a thread scheduler of the slave server to be desensitized and then loads the data into the target data table. The invention coordinates the thread dispatcher of the slave server through the thread master dispatcher, thereby realizing the dynamic allocation of threads and improving the loading performance; the distributed cluster arrangement of the master server and the slave server has good expansion performance; and high-speed data extraction is realized through table data slicing.

Description

Data desensitizing device based on distributed cluster
Technical Field
The invention relates to a data desensitizing device, in particular to a data desensitizing device based on a distributed cluster.
Background
Data desensitization refers to the deformation of data of certain sensitive information through a desensitization rule, so that the reliable protection of sensitive privacy data is realized. This allows for the safe use of the desensitized real data set in development, testing and other non-production environments and outsourcing environments. A large amount of sensitive information in the relational database requires desensitization.
The existing desensitization method comprises the following two steps:
scheme 1: desensitization was performed using a simple JDBC approach.
Scheme 2: data desensitization of multiple tables is performed using a single machine.
The existing desensitization method has the following problems:
the existing scheme 1 causes the following problems: JDBC can extract and load data, but when the data volume of a single table reaches a hundred million levels, the performance of extraction and loading is very slow, even a query timeout phenomenon may occur, and the desensitization task cannot be completed.
Existing scheme 2 may lead to the following problems: because the CPU and the memory of the single machine are limited, if the library to be desensitized has thousands of tables, the total amount is more than the TB level, and the memory overflow is possibly caused, so that the problem that the CPU cannot process is solved.
The existing production environment is mostly desensitized by a whole library or a whole set, mass data of a plurality of tables are simultaneously desensitized, a desensitization algorithm is executed for acquiring the mass data of the plurality of tables, the desensitization algorithm is complex and consumes more CPU resources, so that the desensitization server is executed as a CPU intensive task, CPU resources are limited, CPU context switching is frequent if the number of the started multithreading is not limited, a large amount of CPU time is consumed, and meanwhile, the CPU is possibly not used for processing the data, the data is backlogged in a pipeline (the pipeline is a memory-based queue Array Blocking Queue), and the JVM memory overflows. For a single server CPU, such as a 16 core CPU, it is appropriate to use a 16 x 2 thread count, but for data volumes above the TB level, the single server CPU and memory is clearly insufficient for processing. Accordingly, the prior art is in need of improvement.
Disclosure of Invention
The invention aims to provide a data desensitizing device based on a distributed cluster, which solves the problems.
The invention provides a data desensitizing device based on a distributed cluster, which comprises a master server, a thread master scheduler and a plurality of slave servers, wherein each slave server is provided with a thread scheduler for distributing threads of the slave servers, and the master server slices each source data table in a database which needs to be desensitized and places the table slices into a slicing queue of the source data table; the main server distributes and defines the thread pipeline group number and the thread quantity to each source data table through a thread total scheduler, and the thread total scheduler dispatches threads to pull data from the slicing queues through a thread scheduler of the slave server to be desensitized and then loads the data into a target data table.
Further, the threads of the slave server comprise a decimating thread, a desensitizing thread and a loading thread, each group of thread pipelines consists of a decimating thread, a desensitizing thread and a loading thread which are correspondingly arranged, the decimating thread, the desensitizing thread and the loading thread which are correspondingly arranged carry out data transmission through queues to form a serial thread pipeline, the decimating thread and the desensitizing thread carry out data transmission through a pipeline queue I, and the desensitizing thread and the loading thread carry out data transmission through a pipeline queue II.
Further, the step of pulling the data from the slicing queue from the server to perform desensitization specifically includes: the extraction thread of the thread pipeline reads data from the slicing queue and sends the data to the pipeline queue I; the desensitization thread of the thread pipeline pulls data from the pipeline queue I, performs data desensitization, and transmits the desensitized data to the pipeline queue II; the loading thread of the thread pipeline pulls data from the pipeline queue II and loads the data to the target data table.
Further, the total number of thread pipelines of each slave server is 32, and the thread total scheduler sequentially pulls data from the slicing queues of the table for desensitization according to the ordering of the table through the thread scheduler scheduling threads of each slave server.
Further, the first thread pipeline of each slave server sequentially pulls the sliced data from the sliced queue of the first source data table until the first thread pipeline of each slave server is completely distributed, and after the sliced data pulled from the first thread pipeline of the server is completely desensitized, the data are sequentially pulled from the sliced queue of the first source data table to be desensitized until the first source data table is completely desensitized, and then the new source data table is desensitized; according to the ordering of the tables, the second thread pipeline of each slave server sequentially pulls data from the slicing queue of the second source data table, and the third thread pipeline of each slave server sequentially pulls data from the slicing queue of the third source data table until all the thread pipelines of the slave servers are distributed; after the active data table is desensitized, the thread pipelines corresponding to the active data table are desensitized again according to the ordering of the table; until all source data tables have completed desensitization.
Further, the main server uniformly slices a source data table in the database; the database is an ORACLE relational database, each source data table uses a sample () function of ORACLE to uniformly take out N physical storage addresses ROWIDs of the table, the number of the physical storage addresses is dynamically modified according to the size of the table until the number of the physical storage addresses is extracted to be suitable, then the ROWIDs are sorted into intervals in pairs and divided into a plurality of fragments, and after the fragments are completed, all fragments of the table are put into a fragment queue of the table; the database is a MYSQL relational database, and each source data table is divided into a plurality of fragments by id uniformly and then put into a fragment queue of the table through id of a main key max and id of min to obtain a maximum main key value and a minimum main key value of the table; if the table is a normal table, the table is partitioned once, and if the table is a partitioned table, each partition of the table is partitioned once.
Further, when the slave server loads the desensitized data into the target data table, the database is an ORACLE relational database, then batch direct path loading is adopted, the JDBC is utilized to drive the cache, and a plurality of pieces of data are connected and then are sent to the slave server together to realize loading; and if the database is a MYSQL relational database, text loading is adopted, data is written into the text, compression is started through JDBC driving connection, and loading is realized.
Furthermore, the target data tables are stored by adopting different table spaces when the desensitized data are loaded to the target data tables by the slave server, different target data tables are mapped to different table space disks to balance I/O, meanwhile, each target data table is partitioned, different partitions are mapped to different disks to balance I/O, and simultaneous high-speed writing of a plurality of disks is realized.
Further, each slave server periodically reports the execution state and the heartbeat information to the master server, the master server uniformly collects and calculates real-time data from the execution state of each slave server until all data of the source data table are processed, and the master server counts all performance data and persists the performance data to the database.
Compared with the prior art, the invention has the following beneficial effects: according to the data desensitizing device based on the distributed cluster, the thread scheduler of the slave server is coordinated through the thread master scheduler, so that the dynamic allocation of threads is realized, and the loading performance is improved; the distributed cluster arrangement of the master server and the slave server has good expansion performance; through the table data slicing, slicing is uniform, the data extraction performance is improved, and high-speed data extraction is realized; and adopting different methods to realize high-speed loading according to different databases.
Drawings
Fig. 1 is a schematic diagram of desensitizing a data desensitizing device based on a distributed cluster in an embodiment of the invention.
Detailed Description
The invention is further described below with reference to the drawings and examples.
Fig. 1 is a schematic diagram of desensitizing a data desensitizing device based on a distributed cluster in an embodiment of the invention.
Referring to fig. 1, a data desensitizing apparatus based on a distributed cluster according to an embodiment of the present invention includes a master server, a thread master scheduler, and a plurality of slave servers, where each slave server is provided with a thread scheduler to allocate threads of the slave servers, and the master server slices each source data table that needs to be desensitized in a database and places the table slices into a slice queue of the source data table; the main server distributes and defines the thread pipeline group number and the thread quantity to each source data table through a thread total scheduler, and the thread total scheduler dispatches threads to pull data from the slicing queues through a thread scheduler of the slave server to be desensitized and then loads the data into a target data table.
The data desensitizing device based on the distributed cluster in the embodiment of the invention uses a main server and three slave servers to perform data desensitization of 100 tables, wherein a host of the main server is used as the total coordination of cluster scheduling, the main server can make two main and standby systems available, and the main server is responsible for scheduling and slicing library tables and managing the states of the slave servers; three slave server hosts are taken as task executors of desensitization data, the executors can be extended infinitely, and each slave server is provided with a thread scheduler which is used for thread allocation.
The total pipeline number of each slave server for desensitizing task allocation is 32, each pipeline starts 3 threads, the extraction thread is responsible for inquiring data, the desensitizing thread is responsible for desensitizing data, the loading thread is responsible for loading data, and the master server is responsible for cutting out a plurality of fragments through the fragments of the table data in turn.
The table data slicing principle is as follows:
ORACLE database: a plurality of rowid are randomly fetched from the table through the sample () function, and are encapsulated into a plurality of query SQL statements, e.g., select from table a where rowid < xxxx and rowid > =xxxx.
Other relational databases: the maximum primary key value and the minimum primary key value of the table are obtained through primary keys max (id) and min (id), and are packaged into a plurality of query SQL sentences through the ids, for example, a selection from table A where id < XXXXXXX and id > =XXXXX.
The master server slices the table, puts all slices of the table into a slicing queue, one table is provided with a slicing queue, the slicing queues are all placed in a public space and can be accessed by all the slave servers, the master server informs all the slave servers that slicing SQL processing data can be pulled for desensitization, then the table is continuously sliced, and so on until all 100 tables are sliced.
Pulling the sliced data from the sliced queue of the first source data table in turn by the first thread pipeline of each slave server until the first thread pipeline of each slave server is distributed, continuously pulling the data from the sliced queue of the first source data table in turn for desensitization after the sliced pulled from the first thread pipeline of the server is desensitized until the first source data table is desensitized, and then performing desensitization of a new source data table; according to the ordering of the tables, the second thread pipeline of each slave server sequentially pulls data from the slicing queue of the second source data table, and the third thread pipeline of each slave server sequentially pulls data from the slicing queue of the third source data table until all the thread pipelines of the slave servers are distributed; after the active data table is desensitized, the thread pipelines corresponding to the active data table are desensitized again according to the ordering of the table; until all source data tables have completed desensitization.
The assignment of thread pipes from the servers may also take other forms, such as: the slave server 1 has 32 pipes, starts pipe 1 to pull one slice from table 1 to execute, starts pipe 2 to pull one slice from table 2, runs out of the number of pipes-1 until 32 pipes are used up, the number of pipes=0, starts new pipe to pull one slice from table 33 immediately when pipe execution is finished, and so on until no slice can be pulled, and other slave servers poll 100 table slices according to the rule until all 100 table slices are pulled.
Because desensitization and loading are usually the slowest links and consume CPU and memory resources relatively, so in each 32 upper pipelines of the slave server, 3 threads in each pipeline are subjected to current limiting processing based on a memory queue Array Blocking Queue, the queue defaults to limit the current by a counter, 1 ten thousand pieces of data are defaulted, more than 1 ten thousand pieces of data block the extraction threads, but the counter cannot accurately calculate the data capacity size, when one piece of data has a large text CLOB/BLOB type field, the memory usage amount is possibly exceeded, memory overflow is caused, the queue is optimized, a capacity counter is added, the capacity size current limit can be realized, the default is calculated according to 1 ten thousand pieces of data=64M, if the capacity limit exceeds 64M, the extraction threads are blocked, the memory overflow problem is effectively avoided, the data are put into the memory queue after the extraction of the thread inquiry data, the desensitization threads obtain the data from the memory queue, for example, the first 6 bits are address codes, 7 to 14 bits are the data capacity size of the data are desensitization, 15 to 17 bits are in the same order of birth year, the number of marks are assigned to the same order of men and women in the same year, and the even numbered area represent the same year, the same order of number is assigned to women in the same year, and the same year, the even numbered area represents women are assigned to the same year; bit 18 is a check code, using ISO 7064:1983, MOD 11-2 check character System. The 1 st to 6 th positions are desensitized by adopting a seed+address dictionary, an address code is obtained in a fixed dictionary through the seed, the 7 th to 14 th positions are selected by adopting a reasonable range of the seed+date, the year is selected according to the seed within 100 years, the month is selected according to the seed from 12 months, the day is selected according to the seed from 30 days, the month is automatically matched, the 15 th to 17 th order codes are selected according to the seed from a dispatch place code of the place, and the 18 th check code is calculated according to the 17 th order codes according to a unified formula. Such complex desensitization algorithms consume CPU resources to develop the capability and high performance of simultaneous computation of multiple servers under distributed clusters. And the desensitization thread completes desensitization and puts the data into a second memory queue.
The loading thread of each pipeline of each slave server acquires data from the memory queue II, and different high-speed loading technologies are adopted according to the characteristics of different databases.
ORACLE database: the method comprises the steps of collecting batch loading and direct path loading, and using JDBC driven cache, assembling every 5000 pieces of data together INTO a plurality of INSERT/+APPEND_VALUES (A) and/or INTO T_APPEND A (ID, NAME) VALUES (3, 'ABC'), and sending the data to an ORACLE server together to realize high-speed loading.
MYSQL database: collecting text for high-speed loading, writing data into the text, starting compression by JDBC driving connection, and executing SQL sentences: LOAD DATA INFILE 'f/Book1.csv' NTO TABLE test_Book FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY ',' lines terminated BY '\r\n' ignore 1lines (id, name, data) to achieve high speed loading.
And simultaneously, each table adopts a partition, and different partitions are mapped to different disks to balance I/O, so that high-speed writing by using a plurality of disks simultaneously can be realized, concurrent loading of data is realized, and the highest performance of distributed desensitization is realized.
Each slave server reports the execution state and heartbeat information to the master server regularly, the master server uniformly collects the execution state of each slave server and calculates real-time data until the data of 100 tables are processed, the master server counts all performance data to be persisted into a database, and the task is ended.
In summary, the data desensitizing device based on the distributed cluster in the embodiment of the invention coordinates the thread scheduler of the slave server through the thread master scheduler, so as to realize the dynamic allocation of threads and improve the loading performance; the distributed cluster arrangement of the master server and the slave server has good expansion performance; through the table data slicing, slicing is uniform, the data extraction performance is improved, and high-speed data extraction is realized; and adopting different methods to realize high-speed loading according to different databases.
While the invention has been described with reference to the preferred embodiments, it is not intended to limit the invention thereto, and it is to be understood that other modifications and improvements may be made by those skilled in the art without departing from the spirit and scope of the invention, which is therefore defined by the appended claims.

Claims (6)

1. The data desensitizing device based on the distributed cluster is characterized by comprising a master server, a thread master scheduler and a plurality of slave servers, wherein each slave server is provided with a thread scheduler for distributing threads of the slave servers, and the master server slices each source data table needing desensitization in a database and places the table slices into a slicing queue of the source data table; the main server distributes and defines the thread pipeline group number and the thread quantity to each source data table through a thread total scheduler, and the thread total scheduler dispatches threads to pull data from the slicing queues through a thread scheduler of the slave server to desensitize and then loads the data to a target data table;
the threads of the slave server comprise extraction threads, desensitization threads and loading threads, each group of thread pipelines consists of the extraction threads, the desensitization threads and the loading threads which are correspondingly arranged, the extraction threads, the desensitization threads and the loading threads which are correspondingly arranged perform data transmission through queues to form a serial thread pipeline, the extraction threads and the desensitization threads perform data transmission through a pipeline queue I, and the desensitization threads and the loading threads perform data transmission through a pipeline queue II;
the step of pulling data from the slicing queues from the server for desensitization specifically comprises the following steps: the extraction thread of the thread pipeline reads data from the slicing queue and sends the data to the pipeline queue I; the desensitization thread of the thread pipeline pulls data from the pipeline queue I, performs data desensitization, and transmits the desensitized data to the pipeline queue II; the loading thread of the thread pipeline pulls data from the pipeline queue II and loads the data to the target data table;
when the slave server loads the desensitized data into a target data table, the database is an ORACLE relational database, then batch direct path loading is adopted, JDBC is utilized to drive cache, and a plurality of data are connected and then sent to the slave server together to realize loading; and if the database is a MYSQL relational database, text loading is adopted, data is written into the text, compression is started through JDBC driving connection, and loading is realized.
2. The distributed cluster-based data desensitizing apparatus according to claim 1, wherein the total number of thread pipes per said slave server is 32, and the thread total scheduler desensitizes by each slave server's thread scheduler scheduling thread sequentially pulling data from the table's slicing queues in the order of the table.
3. The distributed cluster-based data desensitizing apparatus according to claim 2, wherein the first thread pipe of each slave server sequentially pulls the sliced data from the sliced queue of the first source data table until the first thread pipe of each slave server is allocated, and the sliced data pulled from the first thread pipe of the server continues to sequentially pull the data from the sliced queue of the first source data table for desensitizing until the first source data table is completely desensitized, and then desensitizes the new source data table; according to the ordering of the tables, the second thread pipeline of each slave server sequentially pulls data from the slicing queue of the second source data table, and the third thread pipeline of each slave server sequentially pulls data from the slicing queue of the third source data table until all the thread pipelines of the slave servers are distributed; after the active data table is desensitized, the thread pipelines corresponding to the active data table are desensitized again according to the ordering of the table; until all source data tables have completed desensitization.
4. The distributed cluster-based data desensitizing apparatus according to claim 1, wherein said master server uniformly slices source data tables in a database; the database is an ORACLE relational database, each source data table uses a sample () function of ORACLE to uniformly take out N physical storage addresses ROWIDs of the table, the number of the physical storage addresses is dynamically modified according to the size of the table until the number of the physical storage addresses is extracted to be suitable, then the ROWIDs are sorted into intervals in pairs and divided into a plurality of fragments, and after the fragments are completed, all fragments of the table are put into a fragment queue of the table; the database is a MYSQL relational database, and each source data table is divided into a plurality of fragments by id uniformly and then put into a fragment queue of the table through id of a main key max and id of min to obtain a maximum main key value and a minimum main key value of the table; if the table is a normal table, the table is partitioned once, and if the table is a partitioned table, each partition of the table is partitioned once.
5. The distributed cluster-based data desensitizing apparatus according to claim 1, wherein the slave servers perform concurrent loading when loading the desensitized data into the target data tables, the target data tables are stored with different tablespaces, different target data tables are mapped to different tablespace disks to balance the I/O, each target data table is partitioned, different partitions are mapped to different disks to balance the I/O, and simultaneous high-speed writing of a plurality of disks is realized.
6. The distributed cluster-based data desensitizing apparatus according to claim 1, wherein each of said slave servers periodically reports the execution status and heartbeat information to a master server, said master server uniformly collects and calculates real-time data from the execution status of each slave server until all data of all source data tables are processed, and the master server counts all performance data and persists to a database.
CN202010042550.5A 2020-01-15 2020-01-15 Data desensitizing device based on distributed cluster Active CN111291403B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010042550.5A CN111291403B (en) 2020-01-15 2020-01-15 Data desensitizing device based on distributed cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010042550.5A CN111291403B (en) 2020-01-15 2020-01-15 Data desensitizing device based on distributed cluster

Publications (2)

Publication Number Publication Date
CN111291403A CN111291403A (en) 2020-06-16
CN111291403B true CN111291403B (en) 2023-09-19

Family

ID=71024204

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010042550.5A Active CN111291403B (en) 2020-01-15 2020-01-15 Data desensitizing device based on distributed cluster

Country Status (1)

Country Link
CN (1) CN111291403B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112131027A (en) * 2020-09-29 2020-12-25 中国银行股份有限公司 Distributed application cluster and data desensitization method
CN112506632A (en) * 2020-12-03 2021-03-16 中国人寿保险股份有限公司 Method and system for scheduling batch tasks in distributed system
CN112800447A (en) * 2021-01-27 2021-05-14 北京明略软件系统有限公司 Data meaning configuration format based reserved encryption method and system
CN113987049A (en) * 2021-12-27 2022-01-28 北京安华金和科技有限公司 Sensitive data discovery processing method and system
CN114546610B (en) * 2022-01-17 2022-11-18 山西省信息通信网络技术保障中心 Mass data distributed desensitization device
CN114896295B (en) * 2022-07-12 2022-10-04 云启智慧科技有限公司 Data desensitization method, desensitization device and desensitization system in big data scene

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766741A (en) * 2017-10-23 2018-03-06 中恒华瑞(北京)信息技术有限公司 Data desensitization system and method
CN108197486A (en) * 2017-12-20 2018-06-22 北京天融信网络安全技术有限公司 Big data desensitization method, system, computer-readable medium and equipment
CN109933631A (en) * 2019-03-20 2019-06-25 江苏瑞中数据股份有限公司 Distributed parallel database system and data processing method based on Infiniband network
CN109960944A (en) * 2017-12-14 2019-07-02 中兴通讯股份有限公司 A kind of data desensitization method, server, terminal and computer readable storage medium
CN110019251A (en) * 2019-03-22 2019-07-16 深圳市腾讯计算机系统有限公司 A kind of data processing system, method and apparatus
WO2019219010A1 (en) * 2018-05-14 2019-11-21 杭州海康威视数字技术股份有限公司 Data migration method and device and computer readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766741A (en) * 2017-10-23 2018-03-06 中恒华瑞(北京)信息技术有限公司 Data desensitization system and method
CN109960944A (en) * 2017-12-14 2019-07-02 中兴通讯股份有限公司 A kind of data desensitization method, server, terminal and computer readable storage medium
CN108197486A (en) * 2017-12-20 2018-06-22 北京天融信网络安全技术有限公司 Big data desensitization method, system, computer-readable medium and equipment
WO2019219010A1 (en) * 2018-05-14 2019-11-21 杭州海康威视数字技术股份有限公司 Data migration method and device and computer readable storage medium
CN109933631A (en) * 2019-03-20 2019-06-25 江苏瑞中数据股份有限公司 Distributed parallel database system and data processing method based on Infiniband network
CN110019251A (en) * 2019-03-22 2019-07-16 深圳市腾讯计算机系统有限公司 A kind of data processing system, method and apparatus

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王映东 ; 匡艺 ; 费江涛 ; .Bigtable系统的负载平衡技术研究.计算机安全.2009,(第02期),全文. *
艾解清 ; 魏理豪 ; 梁承东 ; 陈亮 ; .客户隐私数据流转安全管理系统.信息安全研究.2018,(第02期),全文. *

Also Published As

Publication number Publication date
CN111291403A (en) 2020-06-16

Similar Documents

Publication Publication Date Title
CN111291403B (en) Data desensitizing device based on distributed cluster
CN111125769B (en) Mass data desensitization method based on ORACLE database
US9805077B2 (en) Method and system for optimizing data access in a database using multi-class objects
EP2885728B1 (en) Hardware implementation of the aggregation/group by operation: hash-table method
KR101806055B1 (en) Generating a multi-column index for relational databases by interleaving data bits for selectivity
US9235590B1 (en) Selective data compression in a database system
US7146365B2 (en) Method, system, and program for optimizing database query execution
US20100042587A1 (en) Method for Laying Out Fields in a Database in a Hybrid of Row-Wise and Column-Wise Ordering
US20150142733A1 (en) System and method for efficient management of big data in a database using streaming tables
US8108411B2 (en) Methods and systems for merging data sets
CN112437916A (en) Incremental clustering of database tables
Eltabakh et al. Eagle-eyed elephant: split-oriented indexing in Hadoop
US7890480B2 (en) Processing of deterministic user-defined functions using multiple corresponding hash tables
US8505015B2 (en) Placing a group work item into every prioritized work queue of multiple parallel processing units based on preferred placement of the work queues
EP3329393A1 (en) Materializing expressions within in-memory virtual column units to accelerate analytic queries
GB2508503A (en) Batch evaluation of remote method calls to an object oriented database
US10127281B2 (en) Dynamic hash table size estimation during database aggregation processing
US20130339395A1 (en) Parallel operation on b+ trees
US11526960B2 (en) GPU-based data join
CN107209768A (en) Method and apparatus for the expansible sequence of data set
CN109918450A (en) Based on the distributed parallel database and storage method under analysis classes scene
Chambi et al. Optimizing druid with roaring bitmaps
CN105550180A (en) Data processing method, device and system
US7502778B2 (en) Apparatus, system, and method for efficient adaptive parallel data clustering for loading data into a table
CA2415018C (en) Adaptive parallel data clustering when loading a data structure containing data clustered along one or more dimensions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant