CN104376082B

CN104376082B - A method of the data in data source file are imported into database

Info

Publication number: CN104376082B
Application number: CN201410658208.2A
Authority: CN
Inventors: 王懋成; 刘迪; 吴文青
Original assignee: China Construction Bank Corp
Current assignee: China Construction Bank Corp
Priority date: 2014-11-18
Filing date: 2014-11-18
Publication date: 2019-06-18
Anticipated expiration: 2034-11-18
Also published as: CN104376082A

Abstract

The invention discloses a kind of data in data source file to imported into the method in database, comprising the following steps: A. reads the data in the data source file；B. the data are stored in line by line into initial data queue；C. multiple data processing inbound threads are set；D. the data in the initial data queue are pre-processed by the multiple data processing inbound thread, and the pretreated data parallel is written in the database by the multiple data processing inbound thread.The present invention realizes that data parallel imports by multithreading, takes full advantage of hardware resource, so that storage efficiency is improved, so that the speed that data import greatly improves.

Description

A method of the data in data source file are imported into database

Technical field

The present invention relates to field of data storage, in particular it relates to which a kind of data in data source file import Method into database.

Background technique

With the continuous growth of application system quantity, a large amount of data import demand and give birth to therewith, and existing database (example Such as, INFORMIX, MYSQL etc.) simple regular derivative tool in order is only provided, not to the utmost such as people in efficiency and scalability Meaning.

Database used in current each bank is mostly INFORMIX database, and the frequently-used data that official provides is led Entering method has load and two kinds of dbload, but all there is significant limitations in terms of data processing for both methods:

Firstly, the boot speed of both methods is slower.By taking the data file that the every row 27 of 1,000,000 rows arranges as an example: passing through When load mode imported into the data file in database, reporting an error in any importing process can all be caused because data volume is excessive Lead to Database lock resource exhaustion and rollback.When the data file being imported into database by dbload mode, need It is 184440 milliseconds time-consuming.

It is imported secondly, both methods can only implement linear read to text file.Load and dbload is due to being official The tool of offer, therefore not expansible, can only by every row order linear carry out data importing.

The wrong output format of third, both methods not can customize.The error message of load method is output at screen It on curtain, and once malfunctions, data are accordingly to be regarded as malfunctioning by the gross, and program does not continue to processing data and imports.Although dbload Can with customized error log name, but format be it is fixed, cannot achieve customized.

Based on the above reasons, it is badly in need of a kind of with higher storage efficiency, and drawbacks described above can be overcome, data are led Enter method, to meet the needs of big data quantity importing.

Summary of the invention

With the enterprise application system of big data era, Data Migration or processing become more frequently, traditional data Lead-in mode is no longer satisfied existing demand.The importing of millions data volume needs to take several hours easily, therefore number According to importing efficiency it is in urgent need to be improved.In consideration of it, the present invention provides a kind of data in data source file to imported into database In method.

The invention discloses a kind of data in data source file to imported into the method in database, including following step Rapid: A. reads the data in the data source file；B. the data are stored in line by line into initial data queue；C. it sets more A data processing inbound thread；D. by the multiple data processing inbound thread to the data in the initial data queue into Row pretreatment, and the pretreated data parallel is written to by the multiple data processing inbound thread by the number According in library.

Particularly, the step D is specifically included: each data processing inbound thread of a. is respectively from the initial data The data of predetermined quantity are read in queue, and are pre-processed one by one；B. each data processing inbound thread is respectively institute Pretreated data are stated to be stored in into respective pretreatment queue；C. each data processing inbound thread is the pre- place It is written in the database with managing the data parallel in queue；D. each data processing inbound thread empties the pre- place Manage the data in queue；E. each data processing inbound thread confirms whether there are also not in the initial data queue respectively Processed data: if there is untreated data, a is thened follow the steps.

Particularly, specifically include in the step c: I. successively extracts the data from the pretreatment queue；II. really The data being extracted are recognized with the presence or absence of mistake；When there is an error, implementation steps:

Vi. position of the wrong data in the pretreatment queue is determined；

Vii. the data before the position are written in the database；

Viii. the object information of the wrong data on the position is written in error queue；

Ix. the data on the position with before are deleted in the pretreatment queue；

X. step I is executed；

When there is no mistake: the data of extraction are written in the database.

Particularly, the object information includes: location information, content, error reason.

Particularly, it further comprises the steps of: through error handling processing thread and handles the error queue.

Particularly, the information in the error queue is exported according to user-defined format.

Particularly, the depth of the initial data queue and the depth of the pretreatment queue are configurableization.

Particularly, in step a, the predetermined quantity is determined based on the depth of the pretreatment queue.

Particularly, the depth of the pretreatment queue is 50.

Particularly, the database is the database for meeting JDBC specification.

Particularly, the database is INFORMIX database or MYSQL database.

In conclusion disclosed method has the advantages that according to the present invention

1) present invention realizes that data parallel imports by multithreading, takes full advantage of hardware resource, to improve storage effect Rate, so that the speed that data import greatly improves.By taking the data file that the every row 27 of 1,000,000 rows arranges as an example, institute is public through the invention The method opened, which completes data importing, only needs 59473 milliseconds；

2) all databases for meeting JDBC specification be present invention can be suitably applied to, to realize that the unified data of integration across database are led Enter processing；

3) present invention personalized on demand can be handled every data line, get rid of the reading by column that database carries tool Take non-expandable drawback；And the present invention can customize output as a result, for example direct output screen, preservation file, preservation number According to library etc..

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, of the invention other Feature, objects and advantages will become more apparent:

Fig. 1 shows the method stream data in data source file being imported into database disclosed according to the present invention Cheng Tu；

Fig. 2 shows the processes that the data in data source file are imported into database disclosed according to the present invention to show It is intended to；

Fig. 3 shows data processing threads disclosed according to the present invention and pretreated data is written in database Schematic diagram；And

Fig. 4 shows the method flow diagram that processing data disclosed according to the present invention import error.

Specific embodiment

In the following detailed description of the preferred embodiment, reference is constituted to the appended attached drawing of present invention a part.Institute Attached attached drawing, which has been illustrated by way of example, can be realized specific embodiment.Exemplary embodiment is not intended to Exhaustive all embodiments according to the present invention.It is appreciated that without departing from the scope of the present invention, can use other Embodiment can also carry out the modification of structure or logic.Therefore, it is below specific descriptions and it is unrestricted, and this The range of invention is defined by the claims appended hereto.

Below with reference to attached drawing 1-4, illustrate that a kind of data in data source file imported into the method in database.It needs It is noted that although the step of describing method in attached drawing with particular order, this does not require that or implies must be by These operations are executed according to the particular order, or have to carry out shown in whole operation be just able to achieve it is desired as a result, on the contrary, The step of describing in flow chart, which can change, executes sequence.Additionally or alternatively, it is convenient to omit certain steps, by multiple steps It merges into a step to execute, and/or a step is decomposed into execution of multiple steps.

As shown in Figure 1, 2, in step 110, file in file read module reads thread 210 and passes through reads line by line Mode reads data from data source file 200.Wherein, since file reading speed in this step is far longer than subsequent The speed of data processing and storage in step, therefore be configured to only a file in this programme and read thread to read data source Data in file 200.

In the step 120, file reads thread 210 and the data read from data source file 200 is written to simultaneously In the initial data queue 220 of starting.The depth of the queue 220 can be customized by the user.In a preferred embodiment, The depth of the initial data queue 220, which can be according to the hardware configuration of data source file and gatherer, to be determined.Work as number According to the comparison that when source file is larger and/or the memory of gatherer is larger, the depth of the initial data queue 220 can be set It is deep.In a specific embodiment, the depth of the initial data queue 220 is 5000, it may be assumed that the initial data queue can be protected Deposit 5000 row data.

It, should after the data in initial data queue 220 are read by subsequent thread (data processing inbound thread 230) Data will be deleted, and file reading thread 210 will obtain new data from data source file 200 and be put into initial data In queue 220, which will be implemented repeatedly until the data of data source file 200 are finished by all readings.

In step 130, entering to set multiple data processing inbound threads 230 in library module in batches, at multiple data It is all identical for managing the function of inbound thread 230.The data processing inbound thread 230 from initial data queue 220 for reading Access evidence, and data are written to after the pre-treatment in database 240, which will be implemented repeatedly until initial data queue Data are all imported into and finish in 220.In the present invention, which is the database for meeting JDBC specification, such as: INFORMIX database or MYSQL database.

Referring to Fig. 3, a pretreatment queue 330 is respectively equipped in each data processing inbound thread 230.At data The processing speed for managing inbound thread 230 is slower, therefore the depth for pre-processing queue 330 is much smaller than the depth of initial data queue 220 Degree.

In a preferred embodiment, the depth of queue 330 and/or the number of data processing inbound thread 230 are pre-processed Amount can be configured according to system performance.For example, the depth and/or data processing inbound thread 230 of pretreatment queue 330 Quantity can be set according to the data processing speed of data processing inbound thread 230.When the number of data processing inbound thread 230 When very fast according to processing speed, the depth for pre-processing queue 330 is deeper, the negligible amounts of data processing inbound thread 230；Work as data When the data processing speed of processing inbound thread 230 is slower, the depth for pre-processing queue 330 is shallower, data processing inbound thread 230 quantity is more.

It is to be noted that there is no specific sequencing between step 110 and step 130, the two can be real simultaneously It applies, can also successively implement.

In step 140, each data processing inbound thread 230 reads predetermined number from initial data queue 220 respectively The data of amount, and pre-processed one by one.Wherein, the predetermined quantity can be determined according to the depth of pretreatment queue 330, The predetermined quantity is equal to the depth of pretreatment queue 330 in the present embodiment.The pretreatment includes according to pretreatment interface The data read from initial data queue 220 are converted to required format.In the present embodiment, the pretreatment can be with Data are assembled into JDBC parameter list required for PreparedStatement.Pretreated data will be more readily It is directed into JDBC database.

In step 150, each data processing inbound thread 230 is respectively stored in pretreated data to respective pre- It handles in queue 330.And after the pretreatment queue 330 has been expired, in a step 160,230 meeting of data processing inbound thread Data in the pretreatment queue are written in the database 240.Wherein, multiple data processing inbound threads 230 it Between be that concurrently pretreated data are written in database 240, in such a way that this is written in parallel to, data processing and The time of write-in is compressed, to improve the efficiency of data importing.

In step 170, after the data in respective pretreatment queue 330 are imported into database 240, data processing Inbound thread 230 empties the data in the pretreatment queue 330 respectively.And in step 180, data processing inbound thread Whether there are also untreated data in 230 confirmation initial data queues 220: if there is untreated data, then repeating Step 140 is executed to 180, until the data in initial data queue 220 are all directed in database 240.

In a step 160, when each data processing inbound thread 230 writes data into database 240, sometimes There is a situation where write-ins to malfunction.For the situation, Fig. 4 shows the side that processing data disclosed according to the present invention import error Method flow chart, the step 160 specifically includes the following steps:

In the step 161, each data processing inbound thread 230 can be extracted from respective pretreatment queue 330 It pre-processes and needs to be written into the data in database 240.

Confirm the data with the presence or absence of mistake in step 162.

It is implemented the steps of when finding that the data being extracted have mistake: in step 163, confirming the mistake pre- Handle the position in queue 330；In step 164, the data not malfunctioned before the position are written to database 240 In；In step 165, the object information of the error data is written in error queue 340, wherein the object information includes: Location information (such as: the data in data source file 200 line number or the wrong data pretreatment queue 330 in Location number), content, any information relevant to the wrong data of error reason etc.；In step 166, recorded mistake Data dump, it may be assumed that the data in pretreatment queue 330 on deletion error position with before.It finally returns into step 161, To continue to extract remaining data.

In a specific embodiment, which is handled by subsequent error handling processing thread, the processing Mode includes: that the object information is written in specified file, database or JMS, or according to user-defined format Export the information in the error queue.

When finding implementation steps 167: the data of extraction there is no when mistake, are then written to data by the data being extracted In library 240.

Finally, the result of the data write-in and the result of error message can be exported by customized mode, such as Direct output screen, preservation file, preservation database etc..

Illustrate processing of the data processing inbound thread to wrong data below by way of a specific example: for example, one In the pretreatment queue that a depth is 50, there are mistakes for the data on the 20th position in the queue, then data processing is put in storage Thread can successively extract data from pretreatment queue, and when extracting the 20th data, the discovery of data processing inbound thread should There are mistakes for data, then the thread can stop continuing to extract data, but the data on the position 1-19 before are written to In database, and the object information of the data on the 20th position is written to error queue, then on the 1-20 position Data delete.Data processing inbound thread continues to extract data since on the 21st position later, until all data mention It takes and finishes and be written in database.

It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter How from the point of view of, the present embodiments are to be considered as illustrative and not restrictive.In addition, it will be evident that one word of " comprising " not Exclude other elements and steps, and wording "one" be not excluded for plural number.The multiple element stated in device claim can also To be implemented by one element.The first, the second equal words are used to indicate names, and are not indicated any particular order.

Claims

1. a kind of data in data source file imported into the method in database, comprising the following steps:

A. thread is read via a file, reads the data in the data source file；

B. the data are stored in line by line into initial data queue；

C. the identical data processing inbound thread of multiple functions is set；

D. each data processing inbound thread respectively from the initial data queue read predetermined quantity data, and by Item is pre-processed, and then the pretreated data is stored in into respective pretreatment queue, then respectively described pre- It is written in the database with handling the data parallel in queue, wherein the depth of the pretreatment queue is much smaller than the original The depth of beginning data queue, and the pretreatment team is set according to the data processing speed of the data processing inbound thread The depth of column；And

It is read, deletes described original by the multiple data processing inbound thread in response to the data in the initial data queue The data in data queue；Thread, which is read, via file reads the new data in the data source file；

Wherein the data parallel in the pretreatment queue is written in the database described in the step D and is specifically wrapped It includes:

I. data successively are extracted from the pretreatment queue, wherein the data being extracted are by the data processing inbound thread Deposit is into the pretreatment queue；

II. the data being extracted described in confirmation are with the presence or absence of mistake；

When there is an error, implementation steps:

I. position of the wrong data in the pretreatment queue is determined；

Ii. the data before the position are written in the database；

Iii. the object information of the wrong data on the position is written in error queue；

Iv. the data on the position with before are deleted in the pretreatment queue；

V. step I is executed；

When there is no mistake: the data of extraction are written in the database.

2. according to the method described in claim 1, wherein, the step D is also specifically included:

Each data processing inbound thread empties the data in the pretreatment queue；And each data processing enters Library thread confirms whether there are also untreated data in the initial data queue respectively: if there is untreated number According to, then each data processing inbound thread respectively from the initial data queue read predetermined quantity data, and by Item is pre-processed.

3. according to the method described in claim 1, wherein, the object information includes: location information, content, error reason.

4. according to the method described in claim 1, wherein, further comprising the steps of: through error handling processing thread and handling the error Queue.

5. according to the method described in claim 1, wherein, exporting the letter in the error queue according to user-defined format Breath.

6. according to the method described in claim 1, wherein, the depth of the depth of the initial data queue and the pretreatment queue Degree is configurableization.

7. according to the method described in claim 6, wherein reading predetermined number from the initial data queue in the step D The data of amount, and carry out pretreatment one by one and include:

The predetermined quantity is determined based on the depth of the pretreatment queue.

8. according to the method described in claim 6, wherein, the depth of the pretreatment queue is 50.

9. the method according to claim 1, wherein the database is the database for meeting JDBC specification.

10. according to the method described in claim 9, it is characterized in that, the database is INFORMIX database or MYSQL Database.