CN111241171A - Full-amount data extraction method for database - Google Patents

Full-amount data extraction method for database Download PDF

Info

Publication number
CN111241171A
CN111241171A CN201911033825.2A CN201911033825A CN111241171A CN 111241171 A CN111241171 A CN 111241171A CN 201911033825 A CN201911033825 A CN 201911033825A CN 111241171 A CN111241171 A CN 111241171A
Authority
CN
China
Prior art keywords
data
extraction
extracting
full
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911033825.2A
Other languages
Chinese (zh)
Inventor
陈慧慧
柳遵梁
闻建霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Meichuang Technology Co ltd
Original Assignee
Hangzhou Meichuang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Meichuang Technology Co ltd filed Critical Hangzhou Meichuang Technology Co ltd
Priority to CN201911033825.2A priority Critical patent/CN111241171A/en
Publication of CN111241171A publication Critical patent/CN111241171A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a method for extracting full data of a database, which comprises the following steps: and (3) full data extraction configuration: configuring data source information, extracting contents, modes and objects; and (3) extracting full data: reading a data dictionary to obtain object contents to be extracted, taking a table as a unit, extracting a plurality of threads concurrently, and storing the data contents in a cache in a mode of a producer and a consumer; full data storage and recording: and storing the data in the cache into a local file, recording the number of the currently extracted tables, the field value of the main key and whether the state is finished, and storing the data locally for a breakpoint resume function. The invention has the characteristics.

Description

Full-amount data extraction method for database
Technical Field
The invention relates to the technical field of database data processing, in particular to a database full data extraction method capable of quickly, accurately and timely extracting full data.
Background
In recent years, with the progress of information technology and the rapid development of the internet, a large amount of database business data is accumulated. Data extraction, conversion, loading and backup are required to be carried out on the whole data so as to deal with abnormal conditions such as natural disasters and the like and ensure that the data are not lost.
At present, the oracle database implements migration of full data through a data pump (expdp/impdp), however, in an actual service, data needs to be filtered, converted and mapped, and a simple data migration cannot meet requirements of the actual service, and is not flexible enough, so that a full data extraction scheme is needed.
In the process of extracting the full data, the integrity, the accuracy and the consistency of the data under the large data volume cannot be ensured, and meanwhile, after the server fails and is recovered to be normal again, the data needs to be extracted again, so that the speed of data extraction cannot be ensured.
Disclosure of Invention
The invention aims to overcome the defects that the integrity, the accuracy and the consistency of big data cannot be ensured and the data extraction speed cannot be ensured in the whole data extraction process in the prior art, and provides a database whole data extraction method capable of extracting whole data quickly, accurately and timely.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method for extracting full data of a database comprises the following steps:
(1-1) full data extraction configuration: configuring data source information, extracting contents, modes and objects;
(1-2) full data extraction: reading a data dictionary to obtain object contents to be extracted, taking a table as a unit, extracting a plurality of threads concurrently, and storing the data contents in a cache in a mode of a producer and a consumer;
(1-3) full data storage and recording: and storing the data in the cache into a local file, recording the number of the currently extracted tables, the field value of the main key and whether the state is finished, and storing the data locally for a breakpoint resume function.
The method has the advantages that the full data can be extracted quickly, accurately and timely, especially under the condition of mass data, the extraction of the full data can be completed, meanwhile, the realization difficulty is low, the cost of a user is saved, and the extraction can be continued after the server fails and is recovered to be normal again; the extraction is not repeated.
Preferably, the data source information includes ip address, port, instance, user name and password; the extraction content is data or data and a structure, and the extraction mode is a user level or a table level; the extraction object is a table under the extracted user or an excluded table.
Preferably, the concurrent extraction of the multiple threads is to set the number of concurrent threads to be configurable, extract multiple tables simultaneously, and dynamically modify the tables according to the performance of the server;
the generator and consumer mode is characterized in that an extrater extraction thread and a worker thread which take a table as a unit are started, the extrater thread reads a data dictionary and stores row data content acquired from the data dictionary into a queue, and the worker thread reads and processes row data in the queue in batch.
Preferably, the data is stored in a way that the data is compressed and encrypted and then is stored locally; the breakpoint resume function is that after the program is interrupted and restarted, the full data extraction is continued on the basis of the last extraction, and the extraction does not need to be started again.
Preferably, the recorded main key field value is used for breakpoint transmission, and the extractor extraction thread reads the data dictionary on the basis of the main key field value after the program is restarted.
Preferably, the recording is to record whether the status is complete for breakpoint transmission, and when the status is not started or is not complete, the extraction is continued.
Preferably, the number of the currently extracted tables is used for comparing the time of finishing extraction with the total row number of the tables in the database, judging whether data are lost or not, and ensuring the accuracy of the data.
Therefore, the invention has the following beneficial effects: the method has the advantages that the full data can be extracted quickly, accurately and timely, the full data can be extracted under the condition of mass data, the realization difficulty is low, the cost of a user is saved, and the extraction can be continued after the server fails and is recovered to be normal again; the extraction is not repeated.
Drawings
FIG. 1 is a schematic view of the present invention;
fig. 2 is a flow chart of the present invention.
Detailed Description
The invention is further described with reference to the following figures and detailed description.
As shown in fig. 1, the present invention provides a device for extracting full data of a database, which mainly comprises: performing full data extraction configuration, performing multi-thread parallel extraction, and storing and recording full data;
the method comprises the following steps:
A. the application program links and configures data source information through a database, extracts content, modes and objects;
B. multithread parallel extraction, a plurality of threads are realized by taking a table as a unit, the maximum thread number of the parallel extraction is configurable, an extracter thread reads a data dictionary according to configuration information, object contents to be extracted are obtained, row data contents are stored in a queue, worker threads read row data in the queue in batch and process the row data, and the processed row data are stored in a cache;
C. and storing and recording the full data, compressing and encrypting the data in the cache and storing the data in a local file, recording the number of the currently extracted tables, the field value of the primary key and whether state information is finished, and storing the data locally for a breakpoint resume function.
The following is a detailed description based on the above steps.
As shown in fig. 2, first configuring source database information including ip address, port, instance, user name and password; the configuration extraction content can select only data or data and structure, and the configuration extraction mode can select user level or table level; the configuration extraction object can select which user-under tables or excluded tables to extract.
Then, a full data extraction thread starts a plurality of extractors and worker extraction threads by taking a table as a unit, the extractor thread reads the table extraction record file which is extracted and stored in the local last time to acquire whether extraction is finished or not, if the extraction is finished, the extractor thread sends the state to the worker thread and stops the thread, and if the extraction is finished, the worker thread stops the thread; if the state is an incomplete state or an unfinished state, acquiring a main key field value extracted last time, reading a data dictionary by an extracter thread on the basis of the main key field value, reading row data contents stored in a queue by a worker thread in batch, processing the row data contents, and storing the processed row data contents in a cache; and stopping the thread when the data dictionary is read by the extractor thread, sending the completion state to the worker thread, and stopping the thread when the worker thread acquires the completion state.
And finally, encrypting and compressing the line data in the cache to store the line data in a local file, and recording the primary key value of the line data, wherein the extracted number and the extracted state are in the local file.
The following examples illustrate:
suppose a user of the oracle database has N100 ten thousand rows of large tables, tables 1-N, and suppose the table structure is ainteger primary key, b varchar, c number (10).
1) Firstly, according to the configuration source database information in the steps, the extraction content is configured into data, the extraction mode is configured into a table level, the extraction object is configured into only the tables 1-N, and other tables are completely excluded.
2) Then, a thread pool is created, and the extra 1-N and worker1-N threads generated in tables 1-N are placed in a thread pool cache queue, and at most 5 threads are concurrently extracted. The extractor 1-N thread and the worker1-N thread respectively start to extract tables 1-N, the extractor 1-N thread respectively reads local table 1-N record files firstly, the value of the main key a extracted last time in the tables 1-N and the lastStatus extracted last time are obtained, if the lastStatus is finished, the extractor 1-N sends the finished status to the worker1-N thread and stops the thread, and the worker1-N thread stops the thread after receiving the finished status. If lastStatus is in an incomplete or non-beginning state, the extractor 1-N thread reads the data dictionaries in tables 1-N respectively on the basis of the value of the primary key a extracted last time, and puts the data into the queue, and the worker1-N thread reads and processes the data from the queue in batches each time and stores the data in the cache after the processing is finished.
3) And finally, encrypting and compressing the line data in the cache to store the line data in a local file, and recording the primary key value of the line data, wherein the extracted number and the extracted state are in the local file.
It should be understood that this example is for illustrative purposes only and is not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

Claims (7)

1. A method for extracting full data of a database is characterized by comprising the following steps:
(1-1) full data extraction configuration: configuring data source information, extracting contents, modes and objects;
(1-2) full data extraction: reading a data dictionary to obtain object contents to be extracted, taking a table as a unit, extracting a plurality of threads concurrently, and storing the data contents in a cache in a mode of a producer and a consumer;
(1-3) full data storage and recording: and storing the data in the cache into a local file, recording the number of the currently extracted tables, the field value of the main key and whether the state is finished, and storing the data locally for a breakpoint resume function.
2. The method for extracting the full data of the database according to claim 1, wherein the data source information comprises ip addresses, ports, instances, user names and passwords; the extraction content is data or data and a structure, and the extraction mode is a user level or a table level; the extraction object is a table under the extracted user or an excluded table.
3. The method for extracting the full data of the database according to claim 1, wherein the concurrent extraction of the multiple threads is to set the number of concurrent threads to be configurable, and multiple tables are extracted simultaneously and dynamically modified according to the performance of the server;
the generator and consumer mode is characterized in that an extrater extraction thread and a worker thread which take a table as a unit are started, the extrater thread reads a data dictionary and stores row data content acquired from the data dictionary into a queue, and the worker thread reads and processes row data in the queue in batch.
4. The method for extracting the full data of the database according to claim 1, wherein the data is stored in a way that the data is compressed and encrypted and then is stored locally; the breakpoint resume function is that after the program is interrupted and restarted, the full data extraction is continued on the basis of the last extraction, and the extraction does not need to be started again.
5. The method for extracting the full data of the database as claimed in claim 1, wherein the value of the main key field is recorded for breakpoint resume, and the extractor extraction thread reads the data dictionary based on the value of the main key field after the program is restarted.
6. The method for extracting the full data of the database as claimed in claim 1, wherein the record is recorded whether the status is completed or not for breakpoint transmission, and when the status is not started or completed, the extraction is continued.
7. The method for extracting the full-scale data of the database according to claim 1, 2, 3, 4, 5 or 6, wherein the number of the currently extracted tables is used for comparing the number of the rows of the tables in the database with the total number of the rows of the tables when the extraction is finished, judging whether the data is lost or not, and ensuring the accuracy of the data.
CN201911033825.2A 2019-10-28 2019-10-28 Full-amount data extraction method for database Pending CN111241171A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911033825.2A CN111241171A (en) 2019-10-28 2019-10-28 Full-amount data extraction method for database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911033825.2A CN111241171A (en) 2019-10-28 2019-10-28 Full-amount data extraction method for database

Publications (1)

Publication Number Publication Date
CN111241171A true CN111241171A (en) 2020-06-05

Family

ID=70871298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911033825.2A Pending CN111241171A (en) 2019-10-28 2019-10-28 Full-amount data extraction method for database

Country Status (1)

Country Link
CN (1) CN111241171A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113672624A (en) * 2021-08-25 2021-11-19 中汽创智科技有限公司 Data processing method, system and storage medium
CN113760862A (en) * 2021-01-20 2021-12-07 北京沃东天骏信息技术有限公司 Incremental data breakpoint continuous transmission method, device, equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101697126A (en) * 2009-10-28 2010-04-21 山东中创软件商用中间件股份有限公司 ETL realization method for incremental data of Excel file
CN102375891A (en) * 2011-11-15 2012-03-14 山东浪潮金融信息系统有限公司 Implementation tool for unloading and loading incremental data
CN104182502A (en) * 2014-08-18 2014-12-03 浪潮(北京)电子信息产业有限公司 Data extraction method and device
US20150032758A1 (en) * 2013-07-29 2015-01-29 Sybase, Inc. High Performance Index Creation
CN104778182A (en) * 2014-01-14 2015-07-15 博雅网络游戏开发(深圳)有限公司 Data import method and system based on HBase (Hadoop Database)
CN106294886A (en) * 2016-10-17 2017-01-04 北京集奥聚合科技有限公司 A kind of method and system of full dose extracted data from HBase
CN109271435A (en) * 2018-09-14 2019-01-25 南威软件股份有限公司 A kind of data pick-up method and system for supporting breakpoint transmission

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101697126A (en) * 2009-10-28 2010-04-21 山东中创软件商用中间件股份有限公司 ETL realization method for incremental data of Excel file
CN102375891A (en) * 2011-11-15 2012-03-14 山东浪潮金融信息系统有限公司 Implementation tool for unloading and loading incremental data
US20150032758A1 (en) * 2013-07-29 2015-01-29 Sybase, Inc. High Performance Index Creation
CN104778182A (en) * 2014-01-14 2015-07-15 博雅网络游戏开发(深圳)有限公司 Data import method and system based on HBase (Hadoop Database)
CN104182502A (en) * 2014-08-18 2014-12-03 浪潮(北京)电子信息产业有限公司 Data extraction method and device
CN106294886A (en) * 2016-10-17 2017-01-04 北京集奥聚合科技有限公司 A kind of method and system of full dose extracted data from HBase
CN109271435A (en) * 2018-09-14 2019-01-25 南威软件股份有限公司 A kind of data pick-up method and system for supporting breakpoint transmission

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
段成;王增平;吴克河;: "一种轻量级电网实时数据ETL系统的设计与实现" *
陈鸿雁;: "保险数据仓库数据抽取的设计与实现" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113760862A (en) * 2021-01-20 2021-12-07 北京沃东天骏信息技术有限公司 Incremental data breakpoint continuous transmission method, device, equipment and storage medium
CN113672624A (en) * 2021-08-25 2021-11-19 中汽创智科技有限公司 Data processing method, system and storage medium

Similar Documents

Publication Publication Date Title
US11455217B2 (en) Transaction consistency query support for replicated data from recovery log to external data stores
CN110569214B (en) Index construction method and device for log file and electronic equipment
CN108563535B (en) Recovery method for MySQL database whole database
US9256665B2 (en) Creation of inverted index system, and data processing method and apparatus
US20080091698A1 (en) Optimal data storage and access for clustered data in a relational database
CN109213432B (en) Storage device for writing data using log structured merge tree and method thereof
CN111241171A (en) Full-amount data extraction method for database
US11526465B2 (en) Generating hash trees for database schemas
CN108628885B (en) Data synchronization method and device and storage equipment
CN102591864B (en) Data updating method and device in comparison system
US6708185B2 (en) SQL execution analysis
CN113297182B (en) Data migration method, device, storage medium and program product
CN108470071A (en) A kind of data processing method and device
US11615012B2 (en) Preprocessing in database system workload capture and replay
CN110765325A (en) Operation and maintenance analysis method and system of CEPH distributed storage system
CN111177119A (en) Database-based full-data comparison method, device, equipment and storage medium
CN113377719B (en) System abnormal shutdown time acquisition method and system
CN109857792A (en) A kind of method and system of asynchronous big data cleaning conversion
US9405786B2 (en) System and method for database flow management
CN112860376B (en) Snapshot chain manufacturing method and device, electronic equipment and storage medium
CN108052524B (en) Method and system for acquiring file metadata of electron microscope data storage system
CN115774745A (en) Extraction method and system for high-capacity Excel file data
CN113672556A (en) Batch file migration method and device
CN112612812A (en) Method and device for processing multi-system data
CN112685431A (en) Asynchronous caching method, device, system, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200605