CN103049533A - Method for quickly loading data into database - Google Patents
Method for quickly loading data into database Download PDFInfo
- Publication number
- CN103049533A CN103049533A CN2012105660757A CN201210566075A CN103049533A CN 103049533 A CN103049533 A CN 103049533A CN 2012105660757 A CN2012105660757 A CN 2012105660757A CN 201210566075 A CN201210566075 A CN 201210566075A CN 103049533 A CN103049533 A CN 103049533A
- Authority
- CN
- China
- Prior art keywords
- data
- database
- loading
- file
- fast
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Abstract
The invention discloses a method for quickly loading data into a database. The method includes loading data files in a parallel mode when the data files are written into the database; and directly writing the data files in a loading procedure, generating tuples and then directly writing the tuples into the data files in the database. The method has the advantages that the CPU (central processing unit) utilization rate is increased owing to a parallel thread mode, inspection for various affairs when the data files are written into the database are omitted owing to the fact that the configured data are directly written into the database, and accordingly the writing efficiency of the data files can be effectively improved.
Description
Technical field
The present invention relates to a kind of in the database method of rapid loading data file, belong to database technical field.
Background technology
Along with extensively popularizing of internet, applications, the access of mass data and storage become the bottleneck problem of design of database system.The data of traditional database are write incoming interface and are mostly adopted single-threaded working method, and efficient is lower when writing mass data.And existing database server generally uses multi-core CPU, and single-threaded data writing mode can cause huge cpu resource waste.In addition, externally data communication device is crossed data and is write in the process of incoming interface write into Databasce, and Database Systems can be carried out multinomial affairs inspection usually.These affairs inspections also can reduce the write efficiency of data file.
Be in the Chinese patent application of 200910080927.X at application number, disclose a kind of method and system that batch data imported database.In this technical scheme, analyze the process of data in the data file and will analyze the afterwards concurrent process of data write into Databasce; Data deposit buffer memory by analysis afterwards in, until analyze complete; When the data in the buffer memory reach the preset data amount, with this data one-time write database, and these data are deleted from buffer memory; After analysis is complete, with all the data one-time write databases in the buffer memory.Adopt this technical scheme, data analysis and the speed that writes are fast, are particularly useful for mass data is imported in the database.
In addition, the people such as Ma Li point out in paper " a kind of mass data method for quickly reading based on multi-core environment " (being published in " the 16th national information storage technology conference (I S T2010) collection of thesis in 2010 "): along with the development of multi-core computer, the multinuclear PC can have been finished many large-scale calculations tasks, yet the processing in the face of mass data, data in storer and the supplementary storage read and tend to become the bottleneck that improves the application program travelling speed, thus the superior hardware performance of can not fine utilization multiple nucleus system bringing.This paper has proposed a kind of mass data rapid extracting method based on multi-core environment, take the Memory Mapping File and its method as the basis also uses based on the dividing mode of View Mapping granularity and the load balancing of the dynamic and stalic state combination, realized that the high-speed parallel for mass data extracts under multi-core platform.
Summary of the invention
Technical matters to be solved by this invention is to provide a kind of method that loads fast data to database.The method can significantly improve the loading efficiency of data file by parallel thread and the data technological means such as write direct.
For realizing above-mentioned goal of the invention, the present invention adopts following technical scheme:
A kind of method that loads fast data to database in the process of data file write into Databasce, adopts parallel mode to load data file; In loading procedure, adopt the mode of writing direct, after generating tuple directly in the data file with the tuple write into Databasce.
Wherein more preferably, before loading described data file, at first make configuration file, the quantity of parallel thread is set according to the hardware condition of database server in described configuration file.
Wherein more preferably, in the process of data file write into Databasce, after loading and finish, the index of real-time servicing database table or pending data regenerate the index of database table.
Wherein more preferably, after data base management system (DBMS) is resolved described configuration file and is created loading environment, at first resolve described data file, type according to described data file reads data that meet database table by reading thread, in this data data writing groove, and this slot data is transferred to the parsing thread; Described parsing thread is the fundamental type of database identification with Data Analysis and puts into slot data that synthetic thread is reading out data and synthetic tuple from described slot data; After synthetic tuple, according to the writing mode in the described configuration file and index process mode tuple is carried out write operation.
Wherein more preferably, in described data file loading procedure if there is abnormal conditions, judge whether to ignore that this is unusual according to described configuration file, unusually then read thread and skip this data and continue to load next bar data if ignore this, if do not ignore this unusually then withdraw from the data loading procedure.
Wherein more preferably, after withdrawing from the data loading procedure, if do not keep the data that loaded, then database table is operated, make the data that loaded invisible.
Database data rapid loading method provided by the present invention improves the utilization factor of CPU by the mode of parallel thread, and by the configuration data mode of writing direct, remove data file from and write fashionable various affairs inspections, write efficiency that can the Effective Raise data file.
Description of drawings
Fig. 1 is the overall architecture synoptic diagram of database data rapid loading method provided by the present invention;
Fig. 2 is the resolving synoptic diagram of configuration file;
Fig. 3 is the ablation process synoptic diagram of data file.
Embodiment
The technology used in the present invention thinking is to improve the loading efficiency of data file by parallel thread and the data technological means such as write direct.In specific implementation process, specify on the one hand the mode of loaded in parallel data by configuration file, and specify the quantity of parallel thread; On the other hand, specify the mode of write direct (Direct), remove the layer by layer inspection in the data file ablation process from.By above-mentioned technological means, the loading velocity of Effective Raise data file.Launch detailed explanation below in conjunction with accompanying drawing.
As shown in Figure 1, after database starts, if need to the data file no write de-lay in database, at first make the configuration file of data file ablation process according to data file s own situation, database table and hardware condition.By this configuration file can dispose degree of parallelism, writing mode, write target, the address of data file, index support, daily record support and the mode of unusually processing to occurring in the loading procedure.
Wherein, the size of degree of parallelism (single-threaded or multithreading) need to be moved according to database the hardware condition of the database server at place, is configured such as check figure of CPU etc.By the size of configuration degree of parallelism, can improve the service efficiency of CPU.In the present invention, the preferential mode data writing of selecting by multi-threaded parallel, the quantity of this parallel thread determines by the configuration parameter in the configuration file.
The configuration of writing mode then needs to select according to the situation of write efficiency and data file self, for example can select buffering to write (Buffer) mode or write direct (Direct) mode.When using the buffering writing mode, need again tuple to be written in the data file through after the operations such as affairs inspection.And when mode is write direct in use, then do not need through operations such as affairs inspections, can be directly with in the tuple data writing file.If the data file that need to write has met the various affairs inspection requirements that write target (database table), and need higher data loading efficiency, the preferred mode of writing direct that adopts is carried out, after generating tuple directly in the data file with the tuple write into Databasce.
In the present invention, data file need to meet fixing standard criterion, can be three kinds of forms of CSV, TEXT and Binary.In configuration file, by specifying in advance a certain file layout, can determine the concrete mode of Data Analysis in the subsequent operation.
The index support provides at database table and has existed in the situation of index, to the index support of database table data writing.In the present invention, regenerate the index dual mode after can selecting to adopt in the process that data write real-time servicing index or pending data to load to finish.
After the information such as storage address of specifying the database table that writes, writing mode, data writing by above-mentioned configuration file, by the Database Systems function with the Information Conduction of configuration file to data base management system (DBMS) (DBMS).With reference to figure 1, data base management system (DBMS) is responsible for utilizing the resolver resolves data according to the information in the configuration file, and resolved data can be any one in CSV, TEXT and three kinds of file types of Binary here.Subsequently, by write device generated data (being tuple), data writing.Wherein, in the process of resolution data and generated data, the mode that preferably adopts multi-threaded parallel to carry out improves the formation efficiency of data, thereby improves the loading velocity of data file.
As shown in Figure 2, as access point, then be responsible for resolving configuration file by data base management system (DBMS) with the Database Systems interface for configuration file, obtains configuration information and create loading environment according to deploy content, create thread information and enable parallel thread, create the slot data information that loads.Above-mentioned parallel thread has respectively different subtasks, comprises reading thread, parsing thread, synthetic thread and writing thread etc.
As shown in Figure 3, after data base management system (DBMS) is resolved configuration file and is created loading environment, resolution data file at first, the type of data-driven file reads data that meet database table by reading thread, in this data data writing groove, and this slot data is transferred to the parsing thread.Resolve thread and be responsible for Data Analysis is become the fundamental type that database can be identified, and put into slot data, synthetic thread is reading out data and be merged into the form of tuple from slot data then.After synthetic tuple, according to writing mode and the index process mode of configuration file tuple is carried out write operation.
In above-mentioned data file loading procedure if there is abnormal conditions, need then whether to ignore according to the configuration determination in the configuration file that this is unusual, unusually then read thread and skip this data and continue to load next bar data if ignore this, if do not ignore this unusually then withdraw from the data loading procedure.After withdrawing from the data loading procedure, need to whether keep the data that loaded according to the configuration determination in the configuration file.If do not keep these data, then database table is operated, make the data that loaded also invisible.In addition, if in the data file loading procedure, exceed the number of times upper limit of unusually skipping, illustrate that then there are a lot of mistakes in the data file that has loaded.At this moment, can the data file that has loaded be handled as follows by the mode of configuration file appointment: keep the data file that has loaded; Perhaps, remove the data file that has loaded.
More than database data rapid loading method proposed by the invention is had been described in detail.To those skilled in the art, any apparent change of under the prerequisite that does not deviate from connotation of the present invention it being done all will consist of infringement of patent right of the present invention, will bear corresponding legal liabilities.
Claims (9)
1. method that loads fast data to database is characterized in that:
In the process of data file write into Databasce, adopt parallel mode to load data file; In loading procedure, adopt the mode of writing direct, after generating tuple directly in the data file with the tuple write into Databasce.
2. as claimed in claim 1 fast to the method for database loading data, it is characterized in that:
Before loading described data file, at first make configuration file, the quantity of parallel thread is set according to the hardware condition of database server in described configuration file.
3. as claimed in claim 1 fast to the method for database loading data, it is characterized in that:
When mode is write direct in employing, do not carry out the affairs inspection.
4. as claimed in claim 1 fast to the method for database loading data, it is characterized in that:
In the process of data file write into Databasce, the index of real-time servicing database table.
5. as claimed in claim 1 fast to the method for database loading data, it is characterized in that:
In the process of data file write into Databasce, after loading and finish, pending data regenerates the index of database table.
6. such as claim 2 or the 4 or 5 described methods that load fast data to database, it is characterized in that:
The processing mode of described index arranges in described configuration file.
7. such as the described method that loads fast data to database of any one in the claim 2~5, it is characterized in that:
After data base management system (DBMS) is resolved described configuration file and is created loading environment, at first resolve described data file, type according to described data file reads data that meet database table by reading thread, in this data data writing groove, and this slot data is transferred to the parsing thread; Described parsing thread is the fundamental type of database identification with Data Analysis and puts into slot data that synthetic thread is reading out data and synthetic tuple from described slot data; After synthetic tuple, according to the writing mode in the described configuration file and index process mode tuple is carried out write operation.
8. as claimed in claim 1 or 2 fast to the method for database loading data, it is characterized in that:
In described data file loading procedure if there is abnormal conditions, judge whether to ignore that this is unusual according to described configuration file, unusually then read thread and skip this data and continue to load next bar data if ignore this, if do not ignore this unusually then withdraw from the data loading procedure.
9. as claimed in claim 8 fast to the method for database loading data, it is characterized in that:
After withdrawing from the data loading procedure, if do not keep the data that loaded, then database table is operated, make the data that loaded invisible.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012105660757A CN103049533A (en) | 2012-12-23 | 2012-12-23 | Method for quickly loading data into database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012105660757A CN103049533A (en) | 2012-12-23 | 2012-12-23 | Method for quickly loading data into database |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103049533A true CN103049533A (en) | 2013-04-17 |
Family
ID=48062174
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2012105660757A Pending CN103049533A (en) | 2012-12-23 | 2012-12-23 | Method for quickly loading data into database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103049533A (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103246537A (en) * | 2013-05-08 | 2013-08-14 | 汉柏科技有限公司 | Method and device for quickly loading configuration files by multiple threads |
CN103593440A (en) * | 2013-11-15 | 2014-02-19 | 北京国双科技有限公司 | Method and device for reading and writing log file |
CN104376082A (en) * | 2014-11-18 | 2015-02-25 | 中国建设银行股份有限公司 | Method for importing data in data source file to database |
CN104484456A (en) * | 2014-12-29 | 2015-04-01 | 哈尔滨工业大学 | Multi-threading parallel-based rapid loading method for SQLite database |
CN104700217A (en) * | 2015-03-17 | 2015-06-10 | 歌尔声学股份有限公司 | Method and system for collecting test data |
CN105354320A (en) * | 2015-11-16 | 2016-02-24 | 天津南大通用数据技术股份有限公司 | Method and device for rapidly loading multiple data files |
CN105468705A (en) * | 2015-11-18 | 2016-04-06 | 广东南方通信建设有限公司 | Mobile communication background data file importing method |
CN103841196B (en) * | 2014-03-07 | 2017-05-17 | 长沙裕邦软件开发有限公司 | File uploading system and method based on multithreading |
CN106909554A (en) * | 2015-12-22 | 2017-06-30 | 亿阳信通股份有限公司 | A kind of loading method and device of database text table data |
CN106934037A (en) * | 2017-03-15 | 2017-07-07 | 郑州云海信息技术有限公司 | A kind of high concurrent realizes the method that database quickly loads data |
CN107885460A (en) * | 2017-10-12 | 2018-04-06 | 北京人大金仓信息技术股份有限公司 | A kind of data access method of cluster |
CN108279943A (en) * | 2017-01-05 | 2018-07-13 | 腾讯科技(深圳)有限公司 | Index loading method and device |
CN110347440A (en) * | 2019-06-24 | 2019-10-18 | 北京人大金仓信息技术股份有限公司 | Data method and system are quickly loaded to database based on multi-course concurrency and plug-in unit |
CN110362617A (en) * | 2019-06-24 | 2019-10-22 | 北京人大金仓信息技术股份有限公司 | Batch data method and system is quickly exported from database based on more concurrent technologies |
CN111104373A (en) * | 2019-12-24 | 2020-05-05 | 天地伟业技术有限公司 | Database performance optimization method |
CN116257493A (en) * | 2022-12-29 | 2023-06-13 | 北京京桥热电有限责任公司 | OPC (optical clear control) network gate penetrating interface based on caching mechanism |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070214167A1 (en) * | 2006-02-16 | 2007-09-13 | Sushil Nair | Method for fast bulk loading data into a database while bypassing exit routines |
CN101515291A (en) * | 2009-03-26 | 2009-08-26 | 北京泰合佳通信息技术有限公司 | Method for leading data into database in a batch way and system thereof |
CN101719143A (en) * | 2009-12-01 | 2010-06-02 | 北京中科创元科技有限公司 | Method for parallel processing compare increment data extraction |
CN101996244A (en) * | 2010-11-09 | 2011-03-30 | 中兴通讯股份有限公司 | Device, system and method for inputting batch data into database |
CN102004743A (en) * | 2009-09-02 | 2011-04-06 | 中国银联股份有限公司 | System and method for copying data among heterogeneous databases |
-
2012
- 2012-12-23 CN CN2012105660757A patent/CN103049533A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070214167A1 (en) * | 2006-02-16 | 2007-09-13 | Sushil Nair | Method for fast bulk loading data into a database while bypassing exit routines |
CN101515291A (en) * | 2009-03-26 | 2009-08-26 | 北京泰合佳通信息技术有限公司 | Method for leading data into database in a batch way and system thereof |
CN102004743A (en) * | 2009-09-02 | 2011-04-06 | 中国银联股份有限公司 | System and method for copying data among heterogeneous databases |
CN101719143A (en) * | 2009-12-01 | 2010-06-02 | 北京中科创元科技有限公司 | Method for parallel processing compare increment data extraction |
CN101996244A (en) * | 2010-11-09 | 2011-03-30 | 中兴通讯股份有限公司 | Device, system and method for inputting batch data into database |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103246537A (en) * | 2013-05-08 | 2013-08-14 | 汉柏科技有限公司 | Method and device for quickly loading configuration files by multiple threads |
CN103246537B (en) * | 2013-05-08 | 2017-06-06 | 汉柏科技有限公司 | The method and device of the quick loading configuration file of multithreading |
CN103593440A (en) * | 2013-11-15 | 2014-02-19 | 北京国双科技有限公司 | Method and device for reading and writing log file |
CN103841196B (en) * | 2014-03-07 | 2017-05-17 | 长沙裕邦软件开发有限公司 | File uploading system and method based on multithreading |
CN104376082A (en) * | 2014-11-18 | 2015-02-25 | 中国建设银行股份有限公司 | Method for importing data in data source file to database |
CN104376082B (en) * | 2014-11-18 | 2019-06-18 | 中国建设银行股份有限公司 | A method of the data in data source file are imported into database |
CN104484456A (en) * | 2014-12-29 | 2015-04-01 | 哈尔滨工业大学 | Multi-threading parallel-based rapid loading method for SQLite database |
CN104700217A (en) * | 2015-03-17 | 2015-06-10 | 歌尔声学股份有限公司 | Method and system for collecting test data |
CN105354320A (en) * | 2015-11-16 | 2016-02-24 | 天津南大通用数据技术股份有限公司 | Method and device for rapidly loading multiple data files |
CN105468705A (en) * | 2015-11-18 | 2016-04-06 | 广东南方通信建设有限公司 | Mobile communication background data file importing method |
CN106909554A (en) * | 2015-12-22 | 2017-06-30 | 亿阳信通股份有限公司 | A kind of loading method and device of database text table data |
CN106909554B (en) * | 2015-12-22 | 2020-08-04 | 亿阳信通股份有限公司 | Method and device for loading database text table data |
CN108279943B (en) * | 2017-01-05 | 2020-09-11 | 腾讯科技(深圳)有限公司 | Index loading method and device |
CN108279943A (en) * | 2017-01-05 | 2018-07-13 | 腾讯科技(深圳)有限公司 | Index loading method and device |
CN106934037A (en) * | 2017-03-15 | 2017-07-07 | 郑州云海信息技术有限公司 | A kind of high concurrent realizes the method that database quickly loads data |
CN107885460A (en) * | 2017-10-12 | 2018-04-06 | 北京人大金仓信息技术股份有限公司 | A kind of data access method of cluster |
CN110362617A (en) * | 2019-06-24 | 2019-10-22 | 北京人大金仓信息技术股份有限公司 | Batch data method and system is quickly exported from database based on more concurrent technologies |
CN110347440A (en) * | 2019-06-24 | 2019-10-18 | 北京人大金仓信息技术股份有限公司 | Data method and system are quickly loaded to database based on multi-course concurrency and plug-in unit |
CN110347440B (en) * | 2019-06-24 | 2022-06-03 | 北京人大金仓信息技术股份有限公司 | Method and system for rapidly loading data to database based on multi-process concurrence and plug-in |
CN111104373A (en) * | 2019-12-24 | 2020-05-05 | 天地伟业技术有限公司 | Database performance optimization method |
CN111104373B (en) * | 2019-12-24 | 2023-09-19 | 天地伟业技术有限公司 | Database performance optimization method |
CN116257493A (en) * | 2022-12-29 | 2023-06-13 | 北京京桥热电有限责任公司 | OPC (optical clear control) network gate penetrating interface based on caching mechanism |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103049533A (en) | Method for quickly loading data into database | |
Bende et al. | Dealing with small files problem in hadoop distributed file system | |
Mackey et al. | Improving metadata management for small files in HDFS | |
US20170206212A1 (en) | Partial snapshot creation | |
CN106126601A (en) | A kind of social security distributed preprocess method of big data and system | |
TW201530328A (en) | Method and device for constructing NoSQL database index for semi-structured data | |
CN103177059A (en) | Split processing paths for database calculation engine | |
CN105701190A (en) | Data synchronizing method and device | |
JP5478526B2 (en) | Data analysis and machine learning processing apparatus, method and program | |
CN104731896A (en) | Data processing method and system | |
CN106909554B (en) | Method and device for loading database text table data | |
CN103699656A (en) | GPU-based mass-multimedia-data-oriented MapReduce platform | |
CN104750720A (en) | Method for achieving high-performance data processing under multithread concurrent access environment | |
Pal et al. | A performance analysis of mapreduce task with large number of files dataset in big data using hadoop | |
US10558391B2 (en) | Data processing system and data processing method | |
CN104571946A (en) | Memory device supporting quick query of logical circuit and access method of memory device | |
CN101783814A (en) | Metadata storing method for mass storage system | |
CN105426119A (en) | Storage apparatus and data processing method | |
Wang et al. | Federated MapReduce to transparently run applications on multicluster environment | |
Masouleh et al. | Optimization of ETL process in data warehouse through a combination of parallelization and shared cache memory | |
CN105243099A (en) | Large data real-time storage method based on translation document | |
Kulkarni | Hadoop mapreduce over lustre | |
Zhao et al. | Metadata-Aware small files storage architecture on hadoop | |
CN108132970A (en) | Big data distributed approach and system based on cloud computing | |
CN104283909A (en) | Cloud computing method and device compatible with desktop applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20130417 |
|
RJ01 | Rejection of invention patent application after publication |