CN111241171A

CN111241171A - Full-amount data extraction method for database

Info

Publication number: CN111241171A
Application number: CN201911033825.2A
Authority: CN
Inventors: 陈慧慧; 柳遵梁; 闻建霞
Original assignee: Hangzhou Meichuang Technology Co ltd
Current assignee: Hangzhou Meichuang Technology Co ltd
Priority date: 2019-10-28
Filing date: 2019-10-28
Publication date: 2020-06-05

Abstract

The invention discloses a method for extracting full data of a database, which comprises the following steps: and (3) full data extraction configuration: configuring data source information, extracting contents, modes and objects; and (3) extracting full data: reading a data dictionary to obtain object contents to be extracted, taking a table as a unit, extracting a plurality of threads concurrently, and storing the data contents in a cache in a mode of a producer and a consumer; full data storage and recording: and storing the data in the cache into a local file, recording the number of the currently extracted tables, the field value of the main key and whether the state is finished, and storing the data locally for a breakpoint resume function. The invention has the characteristics.

Description

Full-amount data extraction method for database

Technical Field

The invention relates to the technical field of database data processing, in particular to a database full data extraction method capable of quickly, accurately and timely extracting full data.

Background

In recent years, with the progress of information technology and the rapid development of the internet, a large amount of database business data is accumulated. Data extraction, conversion, loading and backup are required to be carried out on the whole data so as to deal with abnormal conditions such as natural disasters and the like and ensure that the data are not lost.

At present, the oracle database implements migration of full data through a data pump (expdp/impdp), however, in an actual service, data needs to be filtered, converted and mapped, and a simple data migration cannot meet requirements of the actual service, and is not flexible enough, so that a full data extraction scheme is needed.

In the process of extracting the full data, the integrity, the accuracy and the consistency of the data under the large data volume cannot be ensured, and meanwhile, after the server fails and is recovered to be normal again, the data needs to be extracted again, so that the speed of data extraction cannot be ensured.

Disclosure of Invention

The invention aims to overcome the defects that the integrity, the accuracy and the consistency of big data cannot be ensured and the data extraction speed cannot be ensured in the whole data extraction process in the prior art, and provides a database whole data extraction method capable of extracting whole data quickly, accurately and timely.

In order to achieve the purpose, the invention adopts the following technical scheme:

a method for extracting full data of a database comprises the following steps:

(1-1) full data extraction configuration: configuring data source information, extracting contents, modes and objects;

(1-2) full data extraction: reading a data dictionary to obtain object contents to be extracted, taking a table as a unit, extracting a plurality of threads concurrently, and storing the data contents in a cache in a mode of a producer and a consumer;

(1-3) full data storage and recording: and storing the data in the cache into a local file, recording the number of the currently extracted tables, the field value of the main key and whether the state is finished, and storing the data locally for a breakpoint resume function.

The method has the advantages that the full data can be extracted quickly, accurately and timely, especially under the condition of mass data, the extraction of the full data can be completed, meanwhile, the realization difficulty is low, the cost of a user is saved, and the extraction can be continued after the server fails and is recovered to be normal again; the extraction is not repeated.

Preferably, the data source information includes ip address, port, instance, user name and password; the extraction content is data or data and a structure, and the extraction mode is a user level or a table level; the extraction object is a table under the extracted user or an excluded table.

Preferably, the concurrent extraction of the multiple threads is to set the number of concurrent threads to be configurable, extract multiple tables simultaneously, and dynamically modify the tables according to the performance of the server;

the generator and consumer mode is characterized in that an extrater extraction thread and a worker thread which take a table as a unit are started, the extrater thread reads a data dictionary and stores row data content acquired from the data dictionary into a queue, and the worker thread reads and processes row data in the queue in batch.

Preferably, the data is stored in a way that the data is compressed and encrypted and then is stored locally; the breakpoint resume function is that after the program is interrupted and restarted, the full data extraction is continued on the basis of the last extraction, and the extraction does not need to be started again.

Preferably, the recorded main key field value is used for breakpoint transmission, and the extractor extraction thread reads the data dictionary on the basis of the main key field value after the program is restarted.

Preferably, the recording is to record whether the status is complete for breakpoint transmission, and when the status is not started or is not complete, the extraction is continued.

Preferably, the number of the currently extracted tables is used for comparing the time of finishing extraction with the total row number of the tables in the database, judging whether data are lost or not, and ensuring the accuracy of the data.

Therefore, the invention has the following beneficial effects: the method has the advantages that the full data can be extracted quickly, accurately and timely, the full data can be extracted under the condition of mass data, the realization difficulty is low, the cost of a user is saved, and the extraction can be continued after the server fails and is recovered to be normal again; the extraction is not repeated.

Drawings

FIG. 1 is a schematic view of the present invention;

fig. 2 is a flow chart of the present invention.

Detailed Description

The invention is further described with reference to the following figures and detailed description.

As shown in fig. 1, the present invention provides a device for extracting full data of a database, which mainly comprises: performing full data extraction configuration, performing multi-thread parallel extraction, and storing and recording full data;

the method comprises the following steps:

A. the application program links and configures data source information through a database, extracts content, modes and objects;

B. multithread parallel extraction, a plurality of threads are realized by taking a table as a unit, the maximum thread number of the parallel extraction is configurable, an extracter thread reads a data dictionary according to configuration information, object contents to be extracted are obtained, row data contents are stored in a queue, worker threads read row data in the queue in batch and process the row data, and the processed row data are stored in a cache;

C. and storing and recording the full data, compressing and encrypting the data in the cache and storing the data in a local file, recording the number of the currently extracted tables, the field value of the primary key and whether state information is finished, and storing the data locally for a breakpoint resume function.

The following is a detailed description based on the above steps.

As shown in fig. 2, first configuring source database information including ip address, port, instance, user name and password; the configuration extraction content can select only data or data and structure, and the configuration extraction mode can select user level or table level; the configuration extraction object can select which user-under tables or excluded tables to extract.

Then, a full data extraction thread starts a plurality of extractors and worker extraction threads by taking a table as a unit, the extractor thread reads the table extraction record file which is extracted and stored in the local last time to acquire whether extraction is finished or not, if the extraction is finished, the extractor thread sends the state to the worker thread and stops the thread, and if the extraction is finished, the worker thread stops the thread; if the state is an incomplete state or an unfinished state, acquiring a main key field value extracted last time, reading a data dictionary by an extracter thread on the basis of the main key field value, reading row data contents stored in a queue by a worker thread in batch, processing the row data contents, and storing the processed row data contents in a cache; and stopping the thread when the data dictionary is read by the extractor thread, sending the completion state to the worker thread, and stopping the thread when the worker thread acquires the completion state.

And finally, encrypting and compressing the line data in the cache to store the line data in a local file, and recording the primary key value of the line data, wherein the extracted number and the extracted state are in the local file.

The following examples illustrate:

suppose a user of the oracle database has N100 ten thousand rows of large tables, tables 1-N, and suppose the table structure is ainteger primary key, b varchar, c number (10).

1) Firstly, according to the configuration source database information in the steps, the extraction content is configured into data, the extraction mode is configured into a table level, the extraction object is configured into only the tables 1-N, and other tables are completely excluded.

2) Then, a thread pool is created, and the extra 1-N and worker1-N threads generated in tables 1-N are placed in a thread pool cache queue, and at most 5 threads are concurrently extracted. The extractor 1-N thread and the worker1-N thread respectively start to extract tables 1-N, the extractor 1-N thread respectively reads local table 1-N record files firstly, the value of the main key a extracted last time in the tables 1-N and the lastStatus extracted last time are obtained, if the lastStatus is finished, the extractor 1-N sends the finished status to the worker1-N thread and stops the thread, and the worker1-N thread stops the thread after receiving the finished status. If lastStatus is in an incomplete or non-beginning state, the extractor 1-N thread reads the data dictionaries in tables 1-N respectively on the basis of the value of the primary key a extracted last time, and puts the data into the queue, and the worker1-N thread reads and processes the data from the queue in batches each time and stores the data in the cache after the processing is finished.

3) And finally, encrypting and compressing the line data in the cache to store the line data in a local file, and recording the primary key value of the line data, wherein the extracted number and the extracted state are in the local file.

It should be understood that this example is for illustrative purposes only and is not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

Claims

1. A method for extracting full data of a database is characterized by comprising the following steps:

2. The method for extracting the full data of the database according to claim 1, wherein the data source information comprises ip addresses, ports, instances, user names and passwords; the extraction content is data or data and a structure, and the extraction mode is a user level or a table level; the extraction object is a table under the extracted user or an excluded table.

3. The method for extracting the full data of the database according to claim 1, wherein the concurrent extraction of the multiple threads is to set the number of concurrent threads to be configurable, and multiple tables are extracted simultaneously and dynamically modified according to the performance of the server;

4. The method for extracting the full data of the database according to claim 1, wherein the data is stored in a way that the data is compressed and encrypted and then is stored locally; the breakpoint resume function is that after the program is interrupted and restarted, the full data extraction is continued on the basis of the last extraction, and the extraction does not need to be started again.

5. The method for extracting the full data of the database as claimed in claim 1, wherein the value of the main key field is recorded for breakpoint resume, and the extractor extraction thread reads the data dictionary based on the value of the main key field after the program is restarted.

6. The method for extracting the full data of the database as claimed in claim 1, wherein the record is recorded whether the status is completed or not for breakpoint transmission, and when the status is not started or completed, the extraction is continued.

7. The method for extracting the full-scale data of the database according to claim 1, 2, 3, 4, 5 or 6, wherein the number of the currently extracted tables is used for comparing the number of the rows of the tables in the database with the total number of the rows of the tables when the extraction is finished, judging whether the data is lost or not, and ensuring the accuracy of the data.