CN104376082B - A method of the data in data source file are imported into database - Google Patents
A method of the data in data source file are imported into database Download PDFInfo
- Publication number
- CN104376082B CN104376082B CN201410658208.2A CN201410658208A CN104376082B CN 104376082 B CN104376082 B CN 104376082B CN 201410658208 A CN201410658208 A CN 201410658208A CN 104376082 B CN104376082 B CN 104376082B
- Authority
- CN
- China
- Prior art keywords
- data
- queue
- database
- pretreatment
- thread
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of data in data source file to imported into the method in database, comprising the following steps: A. reads the data in the data source file;B. the data are stored in line by line into initial data queue;C. multiple data processing inbound threads are set;D. the data in the initial data queue are pre-processed by the multiple data processing inbound thread, and the pretreated data parallel is written in the database by the multiple data processing inbound thread.The present invention realizes that data parallel imports by multithreading, takes full advantage of hardware resource, so that storage efficiency is improved, so that the speed that data import greatly improves.
Description
Technical field
The present invention relates to field of data storage, in particular it relates to which a kind of data in data source file import
Method into database.
Background technique
With the continuous growth of application system quantity, a large amount of data import demand and give birth to therewith, and existing database (example
Such as, INFORMIX, MYSQL etc.) simple regular derivative tool in order is only provided, not to the utmost such as people in efficiency and scalability
Meaning.
Database used in current each bank is mostly INFORMIX database, and the frequently-used data that official provides is led
Entering method has load and two kinds of dbload, but all there is significant limitations in terms of data processing for both methods:
Firstly, the boot speed of both methods is slower.By taking the data file that the every row 27 of 1,000,000 rows arranges as an example: passing through
When load mode imported into the data file in database, reporting an error in any importing process can all be caused because data volume is excessive
Lead to Database lock resource exhaustion and rollback.When the data file being imported into database by dbload mode, need
It is 184440 milliseconds time-consuming.
It is imported secondly, both methods can only implement linear read to text file.Load and dbload is due to being official
The tool of offer, therefore not expansible, can only by every row order linear carry out data importing.
The wrong output format of third, both methods not can customize.The error message of load method is output at screen
It on curtain, and once malfunctions, data are accordingly to be regarded as malfunctioning by the gross, and program does not continue to processing data and imports.Although dbload
Can with customized error log name, but format be it is fixed, cannot achieve customized.
Based on the above reasons, it is badly in need of a kind of with higher storage efficiency, and drawbacks described above can be overcome, data are led
Enter method, to meet the needs of big data quantity importing.
Summary of the invention
With the enterprise application system of big data era, Data Migration or processing become more frequently, traditional data
Lead-in mode is no longer satisfied existing demand.The importing of millions data volume needs to take several hours easily, therefore number
According to importing efficiency it is in urgent need to be improved.In consideration of it, the present invention provides a kind of data in data source file to imported into database
In method.
The invention discloses a kind of data in data source file to imported into the method in database, including following step
Rapid: A. reads the data in the data source file;B. the data are stored in line by line into initial data queue;C. it sets more
A data processing inbound thread;D. by the multiple data processing inbound thread to the data in the initial data queue into
Row pretreatment, and the pretreated data parallel is written to by the multiple data processing inbound thread by the number
According in library.
Particularly, the step D is specifically included: each data processing inbound thread of a. is respectively from the initial data
The data of predetermined quantity are read in queue, and are pre-processed one by one;B. each data processing inbound thread is respectively institute
Pretreated data are stated to be stored in into respective pretreatment queue;C. each data processing inbound thread is the pre- place
It is written in the database with managing the data parallel in queue;D. each data processing inbound thread empties the pre- place
Manage the data in queue;E. each data processing inbound thread confirms whether there are also not in the initial data queue respectively
Processed data: if there is untreated data, a is thened follow the steps.
Particularly, specifically include in the step c: I. successively extracts the data from the pretreatment queue;II. really
The data being extracted are recognized with the presence or absence of mistake;When there is an error, implementation steps:
Vi. position of the wrong data in the pretreatment queue is determined;
Vii. the data before the position are written in the database;
Viii. the object information of the wrong data on the position is written in error queue;
Ix. the data on the position with before are deleted in the pretreatment queue;
X. step I is executed;
When there is no mistake: the data of extraction are written in the database.
Particularly, the object information includes: location information, content, error reason.
Particularly, it further comprises the steps of: through error handling processing thread and handles the error queue.
Particularly, the information in the error queue is exported according to user-defined format.
Particularly, the depth of the initial data queue and the depth of the pretreatment queue are configurableization.
Particularly, in step a, the predetermined quantity is determined based on the depth of the pretreatment queue.
Particularly, the depth of the pretreatment queue is 50.
Particularly, the database is the database for meeting JDBC specification.
Particularly, the database is INFORMIX database or MYSQL database.
In conclusion disclosed method has the advantages that according to the present invention
1) present invention realizes that data parallel imports by multithreading, takes full advantage of hardware resource, to improve storage effect
Rate, so that the speed that data import greatly improves.By taking the data file that the every row 27 of 1,000,000 rows arranges as an example, institute is public through the invention
The method opened, which completes data importing, only needs 59473 milliseconds;
2) all databases for meeting JDBC specification be present invention can be suitably applied to, to realize that the unified data of integration across database are led
Enter processing;
3) present invention personalized on demand can be handled every data line, get rid of the reading by column that database carries tool
Take non-expandable drawback;And the present invention can customize output as a result, for example direct output screen, preservation file, preservation number
According to library etc..
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, of the invention other
Feature, objects and advantages will become more apparent:
Fig. 1 shows the method stream data in data source file being imported into database disclosed according to the present invention
Cheng Tu;
Fig. 2 shows the processes that the data in data source file are imported into database disclosed according to the present invention to show
It is intended to;
Fig. 3 shows data processing threads disclosed according to the present invention and pretreated data is written in database
Schematic diagram;And
Fig. 4 shows the method flow diagram that processing data disclosed according to the present invention import error.
Specific embodiment
In the following detailed description of the preferred embodiment, reference is constituted to the appended attached drawing of present invention a part.Institute
Attached attached drawing, which has been illustrated by way of example, can be realized specific embodiment.Exemplary embodiment is not intended to
Exhaustive all embodiments according to the present invention.It is appreciated that without departing from the scope of the present invention, can use other
Embodiment can also carry out the modification of structure or logic.Therefore, it is below specific descriptions and it is unrestricted, and this
The range of invention is defined by the claims appended hereto.
Below with reference to attached drawing 1-4, illustrate that a kind of data in data source file imported into the method in database.It needs
It is noted that although the step of describing method in attached drawing with particular order, this does not require that or implies must be by
These operations are executed according to the particular order, or have to carry out shown in whole operation be just able to achieve it is desired as a result, on the contrary,
The step of describing in flow chart, which can change, executes sequence.Additionally or alternatively, it is convenient to omit certain steps, by multiple steps
It merges into a step to execute, and/or a step is decomposed into execution of multiple steps.
As shown in Figure 1, 2, in step 110, file in file read module reads thread 210 and passes through reads line by line
Mode reads data from data source file 200.Wherein, since file reading speed in this step is far longer than subsequent
The speed of data processing and storage in step, therefore be configured to only a file in this programme and read thread to read data source
Data in file 200.
In the step 120, file reads thread 210 and the data read from data source file 200 is written to simultaneously
In the initial data queue 220 of starting.The depth of the queue 220 can be customized by the user.In a preferred embodiment,
The depth of the initial data queue 220, which can be according to the hardware configuration of data source file and gatherer, to be determined.Work as number
According to the comparison that when source file is larger and/or the memory of gatherer is larger, the depth of the initial data queue 220 can be set
It is deep.In a specific embodiment, the depth of the initial data queue 220 is 5000, it may be assumed that the initial data queue can be protected
Deposit 5000 row data.
It, should after the data in initial data queue 220 are read by subsequent thread (data processing inbound thread 230)
Data will be deleted, and file reading thread 210 will obtain new data from data source file 200 and be put into initial data
In queue 220, which will be implemented repeatedly until the data of data source file 200 are finished by all readings.
In step 130, entering to set multiple data processing inbound threads 230 in library module in batches, at multiple data
It is all identical for managing the function of inbound thread 230.The data processing inbound thread 230 from initial data queue 220 for reading
Access evidence, and data are written to after the pre-treatment in database 240, which will be implemented repeatedly until initial data queue
Data are all imported into and finish in 220.In the present invention, which is the database for meeting JDBC specification, such as:
INFORMIX database or MYSQL database.
Referring to Fig. 3, a pretreatment queue 330 is respectively equipped in each data processing inbound thread 230.At data
The processing speed for managing inbound thread 230 is slower, therefore the depth for pre-processing queue 330 is much smaller than the depth of initial data queue 220
Degree.
In a preferred embodiment, the depth of queue 330 and/or the number of data processing inbound thread 230 are pre-processed
Amount can be configured according to system performance.For example, the depth and/or data processing inbound thread 230 of pretreatment queue 330
Quantity can be set according to the data processing speed of data processing inbound thread 230.When the number of data processing inbound thread 230
When very fast according to processing speed, the depth for pre-processing queue 330 is deeper, the negligible amounts of data processing inbound thread 230;Work as data
When the data processing speed of processing inbound thread 230 is slower, the depth for pre-processing queue 330 is shallower, data processing inbound thread
230 quantity is more.
It is to be noted that there is no specific sequencing between step 110 and step 130, the two can be real simultaneously
It applies, can also successively implement.
In step 140, each data processing inbound thread 230 reads predetermined number from initial data queue 220 respectively
The data of amount, and pre-processed one by one.Wherein, the predetermined quantity can be determined according to the depth of pretreatment queue 330,
The predetermined quantity is equal to the depth of pretreatment queue 330 in the present embodiment.The pretreatment includes according to pretreatment interface
The data read from initial data queue 220 are converted to required format.In the present embodiment, the pretreatment can be with
Data are assembled into JDBC parameter list required for PreparedStatement.Pretreated data will be more readily
It is directed into JDBC database.
In step 150, each data processing inbound thread 230 is respectively stored in pretreated data to respective pre-
It handles in queue 330.And after the pretreatment queue 330 has been expired, in a step 160,230 meeting of data processing inbound thread
Data in the pretreatment queue are written in the database 240.Wherein, multiple data processing inbound threads 230 it
Between be that concurrently pretreated data are written in database 240, in such a way that this is written in parallel to, data processing and
The time of write-in is compressed, to improve the efficiency of data importing.
In step 170, after the data in respective pretreatment queue 330 are imported into database 240, data processing
Inbound thread 230 empties the data in the pretreatment queue 330 respectively.And in step 180, data processing inbound thread
Whether there are also untreated data in 230 confirmation initial data queues 220: if there is untreated data, then repeating
Step 140 is executed to 180, until the data in initial data queue 220 are all directed in database 240.
In a step 160, when each data processing inbound thread 230 writes data into database 240, sometimes
There is a situation where write-ins to malfunction.For the situation, Fig. 4 shows the side that processing data disclosed according to the present invention import error
Method flow chart, the step 160 specifically includes the following steps:
In the step 161, each data processing inbound thread 230 can be extracted from respective pretreatment queue 330
It pre-processes and needs to be written into the data in database 240.
Confirm the data with the presence or absence of mistake in step 162.
It is implemented the steps of when finding that the data being extracted have mistake: in step 163, confirming the mistake pre-
Handle the position in queue 330;In step 164, the data not malfunctioned before the position are written to database 240
In;In step 165, the object information of the error data is written in error queue 340, wherein the object information includes:
Location information (such as: the data in data source file 200 line number or the wrong data pretreatment queue 330 in
Location number), content, any information relevant to the wrong data of error reason etc.;In step 166, recorded mistake
Data dump, it may be assumed that the data in pretreatment queue 330 on deletion error position with before.It finally returns into step 161,
To continue to extract remaining data.
In a specific embodiment, which is handled by subsequent error handling processing thread, the processing
Mode includes: that the object information is written in specified file, database or JMS, or according to user-defined format
Export the information in the error queue.
When finding implementation steps 167: the data of extraction there is no when mistake, are then written to data by the data being extracted
In library 240.
Finally, the result of the data write-in and the result of error message can be exported by customized mode, such as
Direct output screen, preservation file, preservation database etc..
Illustrate processing of the data processing inbound thread to wrong data below by way of a specific example: for example, one
In the pretreatment queue that a depth is 50, there are mistakes for the data on the 20th position in the queue, then data processing is put in storage
Thread can successively extract data from pretreatment queue, and when extracting the 20th data, the discovery of data processing inbound thread should
There are mistakes for data, then the thread can stop continuing to extract data, but the data on the position 1-19 before are written to
In database, and the object information of the data on the 20th position is written to error queue, then on the 1-20 position
Data delete.Data processing inbound thread continues to extract data since on the 21st position later, until all data mention
It takes and finishes and be written in database.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie
In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter
How from the point of view of, the present embodiments are to be considered as illustrative and not restrictive.In addition, it will be evident that one word of " comprising " not
Exclude other elements and steps, and wording "one" be not excluded for plural number.The multiple element stated in device claim can also
To be implemented by one element.The first, the second equal words are used to indicate names, and are not indicated any particular order.
Claims (10)
1. a kind of data in data source file imported into the method in database, comprising the following steps:
A. thread is read via a file, reads the data in the data source file;
B. the data are stored in line by line into initial data queue;
C. the identical data processing inbound thread of multiple functions is set;
D. each data processing inbound thread respectively from the initial data queue read predetermined quantity data, and by
Item is pre-processed, and then the pretreated data is stored in into respective pretreatment queue, then respectively described pre-
It is written in the database with handling the data parallel in queue, wherein the depth of the pretreatment queue is much smaller than the original
The depth of beginning data queue, and the pretreatment team is set according to the data processing speed of the data processing inbound thread
The depth of column;And
It is read, deletes described original by the multiple data processing inbound thread in response to the data in the initial data queue
The data in data queue;Thread, which is read, via file reads the new data in the data source file;
Wherein the data parallel in the pretreatment queue is written in the database described in the step D and is specifically wrapped
It includes:
I. data successively are extracted from the pretreatment queue, wherein the data being extracted are by the data processing inbound thread
Deposit is into the pretreatment queue;
II. the data being extracted described in confirmation are with the presence or absence of mistake;
When there is an error, implementation steps:
I. position of the wrong data in the pretreatment queue is determined;
Ii. the data before the position are written in the database;
Iii. the object information of the wrong data on the position is written in error queue;
Iv. the data on the position with before are deleted in the pretreatment queue;
V. step I is executed;
When there is no mistake: the data of extraction are written in the database.
2. according to the method described in claim 1, wherein, the step D is also specifically included:
Each data processing inbound thread empties the data in the pretreatment queue;And each data processing enters
Library thread confirms whether there are also untreated data in the initial data queue respectively: if there is untreated number
According to, then each data processing inbound thread respectively from the initial data queue read predetermined quantity data, and by
Item is pre-processed.
3. according to the method described in claim 1, wherein, the object information includes: location information, content, error reason.
4. according to the method described in claim 1, wherein, further comprising the steps of: through error handling processing thread and handling the error
Queue.
5. according to the method described in claim 1, wherein, exporting the letter in the error queue according to user-defined format
Breath.
6. according to the method described in claim 1, wherein, the depth of the depth of the initial data queue and the pretreatment queue
Degree is configurableization.
7. according to the method described in claim 6, wherein reading predetermined number from the initial data queue in the step D
The data of amount, and carry out pretreatment one by one and include:
The predetermined quantity is determined based on the depth of the pretreatment queue.
8. according to the method described in claim 6, wherein, the depth of the pretreatment queue is 50.
9. the method according to claim 1, wherein the database is the database for meeting JDBC specification.
10. according to the method described in claim 9, it is characterized in that, the database is INFORMIX database or MYSQL
Database.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410658208.2A CN104376082B (en) | 2014-11-18 | 2014-11-18 | A method of the data in data source file are imported into database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410658208.2A CN104376082B (en) | 2014-11-18 | 2014-11-18 | A method of the data in data source file are imported into database |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104376082A CN104376082A (en) | 2015-02-25 |
CN104376082B true CN104376082B (en) | 2019-06-18 |
Family
ID=52554989
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410658208.2A Active CN104376082B (en) | 2014-11-18 | 2014-11-18 | A method of the data in data source file are imported into database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104376082B (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106469152A (en) * | 2015-08-14 | 2017-03-01 | 阿里巴巴集团控股有限公司 | A kind of document handling method based on ETL and system |
CN105045917B (en) | 2015-08-20 | 2019-06-18 | 北京百度网讯科技有限公司 | A kind of the distributed data restoration methods and device of Case-based Reasoning |
CN105279261B (en) * | 2015-10-23 | 2019-07-02 | 北京京东尚科信息技术有限公司 | Dynamic scalable database filing method and system |
CN105468705A (en) * | 2015-11-18 | 2016-04-06 | 广东南方通信建设有限公司 | Mobile communication background data file importing method |
CN106909554B (en) * | 2015-12-22 | 2020-08-04 | 亿阳信通股份有限公司 | Method and device for loading database text table data |
CN107665195A (en) * | 2016-07-27 | 2018-02-06 | 北京京东尚科信息技术有限公司 | Import the method, apparatus and system of mass data |
CN106354788A (en) * | 2016-08-23 | 2017-01-25 | 无锡天脉聚源传媒科技有限公司 | File reading method and device |
CN106599222B (en) * | 2016-12-19 | 2020-09-04 | 广州四三九九信息科技有限公司 | Method and equipment for processing logs in streaming parallel |
CN108959292B (en) * | 2017-05-19 | 2021-03-30 | 北京京东尚科信息技术有限公司 | Data uploading method, system and computer readable storage medium |
CN107357868A (en) * | 2017-07-03 | 2017-11-17 | 华通信安(北京)科技发展有限公司 | A kind of fast conversion method and device of matlab data formats |
CN107862095B (en) * | 2017-12-07 | 2021-06-01 | 中国银行股份有限公司 | Data processing method and device |
CN108228730A (en) * | 2017-12-11 | 2018-06-29 | 深圳市买买提信息科技有限公司 | Data lead-in method, device, computer equipment and readable storage medium storing program for executing |
CN109033184B (en) * | 2018-06-27 | 2021-08-17 | 中国建设银行股份有限公司 | Data processing method and device |
CN109857832A (en) * | 2019-01-03 | 2019-06-07 | 中国银行股份有限公司 | A kind of preprocess method and device of payment data |
CN110362617B (en) * | 2019-06-24 | 2023-06-13 | 北京人大金仓信息技术股份有限公司 | Method and system for rapidly exporting batch data from database based on multiple concurrency technologies |
CN112328542A (en) * | 2020-11-25 | 2021-02-05 | 天津凯发电气股份有限公司 | Method for importing data in heterogeneous data file into database |
CN115510020A (en) * | 2021-06-23 | 2022-12-23 | 比亚迪股份有限公司 | Data archiving method, electronic equipment and readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6477535B1 (en) * | 1998-11-25 | 2002-11-05 | Computer Associates Think Inc. | Method and apparatus for concurrent DBMS table operations |
CN101515291A (en) * | 2009-03-26 | 2009-08-26 | 北京泰合佳通信息技术有限公司 | Method for leading data into database in a batch way and system thereof |
CN103049533A (en) * | 2012-12-23 | 2013-04-17 | 北京人大金仓信息技术股份有限公司 | Method for quickly loading data into database |
CN103092840A (en) * | 2011-10-28 | 2013-05-08 | 上海邮电设计咨询研究院有限公司 | Method for acquiring self-increment mass data files from multiple sources |
-
2014
- 2014-11-18 CN CN201410658208.2A patent/CN104376082B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6477535B1 (en) * | 1998-11-25 | 2002-11-05 | Computer Associates Think Inc. | Method and apparatus for concurrent DBMS table operations |
CN101515291A (en) * | 2009-03-26 | 2009-08-26 | 北京泰合佳通信息技术有限公司 | Method for leading data into database in a batch way and system thereof |
CN103092840A (en) * | 2011-10-28 | 2013-05-08 | 上海邮电设计咨询研究院有限公司 | Method for acquiring self-increment mass data files from multiple sources |
CN103049533A (en) * | 2012-12-23 | 2013-04-17 | 北京人大金仓信息技术股份有限公司 | Method for quickly loading data into database |
Non-Patent Citations (1)
Title |
---|
"Oracle中如何定位报错的行";lianjie1229;《Linux公社》;20120805;第1-2页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104376082A (en) | 2015-02-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104376082B (en) | A method of the data in data source file are imported into database | |
US11328003B2 (en) | Data relationships storage platform | |
Zhao et al. | Schema conversion model of SQL database to NoSQL | |
CN106844682B (en) | Method for interchanging data, apparatus and system | |
CN103886011B (en) | Social-relation network creation and retrieval system and method based on index files | |
CN111712809A (en) | Learning ETL rules by example | |
US9037525B2 (en) | Correlating data from multiple business processes to a business process scenario | |
EP3182304A1 (en) | Computer-implemented method for storing unlimited amount of data as a mind map in relational database systems | |
CN106164865A (en) | Affairs batch processing for the dependency perception that data replicate | |
CN105260464B (en) | The conversion method and device of data store organisation | |
CN106547918B (en) | Statistical data integration method and system | |
CN106354817B (en) | Log processing method and device | |
CN103778239B (en) | Multi-database data management method and system | |
CN109635024A (en) | A kind of data migration method and system | |
CN107784026A (en) | A kind of ETL data processing methods and device | |
CN103092997B (en) | For interlock inquiry system and the interlock querying method of statement analysis | |
CN103440265A (en) | MapReduce-based CDC (Change Data Capture) method of MYSQL database | |
CN104391891B (en) | A kind of database isomery clone method | |
US20140280218A1 (en) | Techniques for data integration | |
JP2016530646A (en) | Metadata automation system | |
CN108073705B (en) | Distributed mass data aggregation acquisition method | |
US20150178367A1 (en) | System and method for implementing online analytical processing (olap) solution using mapreduce | |
CN109376154B (en) | Data reading and writing method and data reading and writing system | |
US20180196858A1 (en) | Api driven etl for complex data lakes | |
US10289711B2 (en) | Integrated data analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |