CN103049533A - Method for quickly loading data into database - Google Patents

Method for quickly loading data into database Download PDF

Info

Publication number
CN103049533A
CN103049533A CN2012105660757A CN201210566075A CN103049533A CN 103049533 A CN103049533 A CN 103049533A CN 2012105660757 A CN2012105660757 A CN 2012105660757A CN 201210566075 A CN201210566075 A CN 201210566075A CN 103049533 A CN103049533 A CN 103049533A
Authority
CN
China
Prior art keywords
data
database
loading
file
fast
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012105660757A
Other languages
Chinese (zh)
Inventor
张树杰
王颖泽
冯玉
李祥凯
任永杰
王珊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingbase Information Technologies Co Ltd
Original Assignee
Beijing Kingbase Information Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingbase Information Technologies Co Ltd filed Critical Beijing Kingbase Information Technologies Co Ltd
Priority to CN2012105660757A priority Critical patent/CN103049533A/en
Publication of CN103049533A publication Critical patent/CN103049533A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a method for quickly loading data into a database. The method includes loading data files in a parallel mode when the data files are written into the database; and directly writing the data files in a loading procedure, generating tuples and then directly writing the tuples into the data files in the database. The method has the advantages that the CPU (central processing unit) utilization rate is increased owing to a parallel thread mode, inspection for various affairs when the data files are written into the database are omitted owing to the fact that the configured data are directly written into the database, and accordingly the writing efficiency of the data files can be effectively improved.

Description

A kind of method that loads fast data to database
Technical field
The present invention relates to a kind of in the database method of rapid loading data file, belong to database technical field.
Background technology
Along with extensively popularizing of internet, applications, the access of mass data and storage become the bottleneck problem of design of database system.The data of traditional database are write incoming interface and are mostly adopted single-threaded working method, and efficient is lower when writing mass data.And existing database server generally uses multi-core CPU, and single-threaded data writing mode can cause huge cpu resource waste.In addition, externally data communication device is crossed data and is write in the process of incoming interface write into Databasce, and Database Systems can be carried out multinomial affairs inspection usually.These affairs inspections also can reduce the write efficiency of data file.
Be in the Chinese patent application of 200910080927.X at application number, disclose a kind of method and system that batch data imported database.In this technical scheme, analyze the process of data in the data file and will analyze the afterwards concurrent process of data write into Databasce; Data deposit buffer memory by analysis afterwards in, until analyze complete; When the data in the buffer memory reach the preset data amount, with this data one-time write database, and these data are deleted from buffer memory; After analysis is complete, with all the data one-time write databases in the buffer memory.Adopt this technical scheme, data analysis and the speed that writes are fast, are particularly useful for mass data is imported in the database.
In addition, the people such as Ma Li point out in paper " a kind of mass data method for quickly reading based on multi-core environment " (being published in " the 16th national information storage technology conference (I S T2010) collection of thesis in 2010 "): along with the development of multi-core computer, the multinuclear PC can have been finished many large-scale calculations tasks, yet the processing in the face of mass data, data in storer and the supplementary storage read and tend to become the bottleneck that improves the application program travelling speed, thus the superior hardware performance of can not fine utilization multiple nucleus system bringing.This paper has proposed a kind of mass data rapid extracting method based on multi-core environment, take the Memory Mapping File and its method as the basis also uses based on the dividing mode of View Mapping granularity and the load balancing of the dynamic and stalic state combination, realized that the high-speed parallel for mass data extracts under multi-core platform.
Summary of the invention
Technical matters to be solved by this invention is to provide a kind of method that loads fast data to database.The method can significantly improve the loading efficiency of data file by parallel thread and the data technological means such as write direct.
For realizing above-mentioned goal of the invention, the present invention adopts following technical scheme:
A kind of method that loads fast data to database in the process of data file write into Databasce, adopts parallel mode to load data file; In loading procedure, adopt the mode of writing direct, after generating tuple directly in the data file with the tuple write into Databasce.
Wherein more preferably, before loading described data file, at first make configuration file, the quantity of parallel thread is set according to the hardware condition of database server in described configuration file.
Wherein more preferably, in the process of data file write into Databasce, after loading and finish, the index of real-time servicing database table or pending data regenerate the index of database table.
Wherein more preferably, after data base management system (DBMS) is resolved described configuration file and is created loading environment, at first resolve described data file, type according to described data file reads data that meet database table by reading thread, in this data data writing groove, and this slot data is transferred to the parsing thread; Described parsing thread is the fundamental type of database identification with Data Analysis and puts into slot data that synthetic thread is reading out data and synthetic tuple from described slot data; After synthetic tuple, according to the writing mode in the described configuration file and index process mode tuple is carried out write operation.
Wherein more preferably, in described data file loading procedure if there is abnormal conditions, judge whether to ignore that this is unusual according to described configuration file, unusually then read thread and skip this data and continue to load next bar data if ignore this, if do not ignore this unusually then withdraw from the data loading procedure.
Wherein more preferably, after withdrawing from the data loading procedure, if do not keep the data that loaded, then database table is operated, make the data that loaded invisible.
Database data rapid loading method provided by the present invention improves the utilization factor of CPU by the mode of parallel thread, and by the configuration data mode of writing direct, remove data file from and write fashionable various affairs inspections, write efficiency that can the Effective Raise data file.
Description of drawings
Fig. 1 is the overall architecture synoptic diagram of database data rapid loading method provided by the present invention;
Fig. 2 is the resolving synoptic diagram of configuration file;
Fig. 3 is the ablation process synoptic diagram of data file.
Embodiment
The technology used in the present invention thinking is to improve the loading efficiency of data file by parallel thread and the data technological means such as write direct.In specific implementation process, specify on the one hand the mode of loaded in parallel data by configuration file, and specify the quantity of parallel thread; On the other hand, specify the mode of write direct (Direct), remove the layer by layer inspection in the data file ablation process from.By above-mentioned technological means, the loading velocity of Effective Raise data file.Launch detailed explanation below in conjunction with accompanying drawing.
As shown in Figure 1, after database starts, if need to the data file no write de-lay in database, at first make the configuration file of data file ablation process according to data file s own situation, database table and hardware condition.By this configuration file can dispose degree of parallelism, writing mode, write target, the address of data file, index support, daily record support and the mode of unusually processing to occurring in the loading procedure.
Wherein, the size of degree of parallelism (single-threaded or multithreading) need to be moved according to database the hardware condition of the database server at place, is configured such as check figure of CPU etc.By the size of configuration degree of parallelism, can improve the service efficiency of CPU.In the present invention, the preferential mode data writing of selecting by multi-threaded parallel, the quantity of this parallel thread determines by the configuration parameter in the configuration file.
The configuration of writing mode then needs to select according to the situation of write efficiency and data file self, for example can select buffering to write (Buffer) mode or write direct (Direct) mode.When using the buffering writing mode, need again tuple to be written in the data file through after the operations such as affairs inspection.And when mode is write direct in use, then do not need through operations such as affairs inspections, can be directly with in the tuple data writing file.If the data file that need to write has met the various affairs inspection requirements that write target (database table), and need higher data loading efficiency, the preferred mode of writing direct that adopts is carried out, after generating tuple directly in the data file with the tuple write into Databasce.
In the present invention, data file need to meet fixing standard criterion, can be three kinds of forms of CSV, TEXT and Binary.In configuration file, by specifying in advance a certain file layout, can determine the concrete mode of Data Analysis in the subsequent operation.
The index support provides at database table and has existed in the situation of index, to the index support of database table data writing.In the present invention, regenerate the index dual mode after can selecting to adopt in the process that data write real-time servicing index or pending data to load to finish.
After the information such as storage address of specifying the database table that writes, writing mode, data writing by above-mentioned configuration file, by the Database Systems function with the Information Conduction of configuration file to data base management system (DBMS) (DBMS).With reference to figure 1, data base management system (DBMS) is responsible for utilizing the resolver resolves data according to the information in the configuration file, and resolved data can be any one in CSV, TEXT and three kinds of file types of Binary here.Subsequently, by write device generated data (being tuple), data writing.Wherein, in the process of resolution data and generated data, the mode that preferably adopts multi-threaded parallel to carry out improves the formation efficiency of data, thereby improves the loading velocity of data file.
As shown in Figure 2, as access point, then be responsible for resolving configuration file by data base management system (DBMS) with the Database Systems interface for configuration file, obtains configuration information and create loading environment according to deploy content, create thread information and enable parallel thread, create the slot data information that loads.Above-mentioned parallel thread has respectively different subtasks, comprises reading thread, parsing thread, synthetic thread and writing thread etc.
As shown in Figure 3, after data base management system (DBMS) is resolved configuration file and is created loading environment, resolution data file at first, the type of data-driven file reads data that meet database table by reading thread, in this data data writing groove, and this slot data is transferred to the parsing thread.Resolve thread and be responsible for Data Analysis is become the fundamental type that database can be identified, and put into slot data, synthetic thread is reading out data and be merged into the form of tuple from slot data then.After synthetic tuple, according to writing mode and the index process mode of configuration file tuple is carried out write operation.
In above-mentioned data file loading procedure if there is abnormal conditions, need then whether to ignore according to the configuration determination in the configuration file that this is unusual, unusually then read thread and skip this data and continue to load next bar data if ignore this, if do not ignore this unusually then withdraw from the data loading procedure.After withdrawing from the data loading procedure, need to whether keep the data that loaded according to the configuration determination in the configuration file.If do not keep these data, then database table is operated, make the data that loaded also invisible.In addition, if in the data file loading procedure, exceed the number of times upper limit of unusually skipping, illustrate that then there are a lot of mistakes in the data file that has loaded.At this moment, can the data file that has loaded be handled as follows by the mode of configuration file appointment: keep the data file that has loaded; Perhaps, remove the data file that has loaded.
More than database data rapid loading method proposed by the invention is had been described in detail.To those skilled in the art, any apparent change of under the prerequisite that does not deviate from connotation of the present invention it being done all will consist of infringement of patent right of the present invention, will bear corresponding legal liabilities.

Claims (9)

1. method that loads fast data to database is characterized in that:
In the process of data file write into Databasce, adopt parallel mode to load data file; In loading procedure, adopt the mode of writing direct, after generating tuple directly in the data file with the tuple write into Databasce.
2. as claimed in claim 1 fast to the method for database loading data, it is characterized in that:
Before loading described data file, at first make configuration file, the quantity of parallel thread is set according to the hardware condition of database server in described configuration file.
3. as claimed in claim 1 fast to the method for database loading data, it is characterized in that:
When mode is write direct in employing, do not carry out the affairs inspection.
4. as claimed in claim 1 fast to the method for database loading data, it is characterized in that:
In the process of data file write into Databasce, the index of real-time servicing database table.
5. as claimed in claim 1 fast to the method for database loading data, it is characterized in that:
In the process of data file write into Databasce, after loading and finish, pending data regenerates the index of database table.
6. such as claim 2 or the 4 or 5 described methods that load fast data to database, it is characterized in that:
The processing mode of described index arranges in described configuration file.
7. such as the described method that loads fast data to database of any one in the claim 2~5, it is characterized in that:
After data base management system (DBMS) is resolved described configuration file and is created loading environment, at first resolve described data file, type according to described data file reads data that meet database table by reading thread, in this data data writing groove, and this slot data is transferred to the parsing thread; Described parsing thread is the fundamental type of database identification with Data Analysis and puts into slot data that synthetic thread is reading out data and synthetic tuple from described slot data; After synthetic tuple, according to the writing mode in the described configuration file and index process mode tuple is carried out write operation.
8. as claimed in claim 1 or 2 fast to the method for database loading data, it is characterized in that:
In described data file loading procedure if there is abnormal conditions, judge whether to ignore that this is unusual according to described configuration file, unusually then read thread and skip this data and continue to load next bar data if ignore this, if do not ignore this unusually then withdraw from the data loading procedure.
9. as claimed in claim 8 fast to the method for database loading data, it is characterized in that:
After withdrawing from the data loading procedure, if do not keep the data that loaded, then database table is operated, make the data that loaded invisible.
CN2012105660757A 2012-12-23 2012-12-23 Method for quickly loading data into database Pending CN103049533A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012105660757A CN103049533A (en) 2012-12-23 2012-12-23 Method for quickly loading data into database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012105660757A CN103049533A (en) 2012-12-23 2012-12-23 Method for quickly loading data into database

Publications (1)

Publication Number Publication Date
CN103049533A true CN103049533A (en) 2013-04-17

Family

ID=48062174

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012105660757A Pending CN103049533A (en) 2012-12-23 2012-12-23 Method for quickly loading data into database

Country Status (1)

Country Link
CN (1) CN103049533A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246537A (en) * 2013-05-08 2013-08-14 汉柏科技有限公司 Method and device for quickly loading configuration files by multiple threads
CN103593440A (en) * 2013-11-15 2014-02-19 北京国双科技有限公司 Method and device for reading and writing log file
CN104376082A (en) * 2014-11-18 2015-02-25 中国建设银行股份有限公司 Method for importing data in data source file to database
CN104484456A (en) * 2014-12-29 2015-04-01 哈尔滨工业大学 Multi-threading parallel-based rapid loading method for SQLite database
CN104700217A (en) * 2015-03-17 2015-06-10 歌尔声学股份有限公司 Method and system for collecting test data
CN105354320A (en) * 2015-11-16 2016-02-24 天津南大通用数据技术股份有限公司 Method and device for rapidly loading multiple data files
CN105468705A (en) * 2015-11-18 2016-04-06 广东南方通信建设有限公司 Mobile communication background data file importing method
CN103841196B (en) * 2014-03-07 2017-05-17 长沙裕邦软件开发有限公司 File uploading system and method based on multithreading
CN106909554A (en) * 2015-12-22 2017-06-30 亿阳信通股份有限公司 A kind of loading method and device of database text table data
CN106934037A (en) * 2017-03-15 2017-07-07 郑州云海信息技术有限公司 A kind of high concurrent realizes the method that database quickly loads data
CN107885460A (en) * 2017-10-12 2018-04-06 北京人大金仓信息技术股份有限公司 A kind of data access method of cluster
CN108279943A (en) * 2017-01-05 2018-07-13 腾讯科技(深圳)有限公司 Index loading method and device
CN110347440A (en) * 2019-06-24 2019-10-18 北京人大金仓信息技术股份有限公司 Data method and system are quickly loaded to database based on multi-course concurrency and plug-in unit
CN110362617A (en) * 2019-06-24 2019-10-22 北京人大金仓信息技术股份有限公司 Batch data method and system is quickly exported from database based on more concurrent technologies
CN111104373A (en) * 2019-12-24 2020-05-05 天地伟业技术有限公司 Database performance optimization method
CN116257493A (en) * 2022-12-29 2023-06-13 北京京桥热电有限责任公司 OPC (optical clear control) network gate penetrating interface based on caching mechanism

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070214167A1 (en) * 2006-02-16 2007-09-13 Sushil Nair Method for fast bulk loading data into a database while bypassing exit routines
CN101515291A (en) * 2009-03-26 2009-08-26 北京泰合佳通信息技术有限公司 Method for leading data into database in a batch way and system thereof
CN101719143A (en) * 2009-12-01 2010-06-02 北京中科创元科技有限公司 Method for parallel processing compare increment data extraction
CN101996244A (en) * 2010-11-09 2011-03-30 中兴通讯股份有限公司 Device, system and method for inputting batch data into database
CN102004743A (en) * 2009-09-02 2011-04-06 中国银联股份有限公司 System and method for copying data among heterogeneous databases

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070214167A1 (en) * 2006-02-16 2007-09-13 Sushil Nair Method for fast bulk loading data into a database while bypassing exit routines
CN101515291A (en) * 2009-03-26 2009-08-26 北京泰合佳通信息技术有限公司 Method for leading data into database in a batch way and system thereof
CN102004743A (en) * 2009-09-02 2011-04-06 中国银联股份有限公司 System and method for copying data among heterogeneous databases
CN101719143A (en) * 2009-12-01 2010-06-02 北京中科创元科技有限公司 Method for parallel processing compare increment data extraction
CN101996244A (en) * 2010-11-09 2011-03-30 中兴通讯股份有限公司 Device, system and method for inputting batch data into database

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246537A (en) * 2013-05-08 2013-08-14 汉柏科技有限公司 Method and device for quickly loading configuration files by multiple threads
CN103246537B (en) * 2013-05-08 2017-06-06 汉柏科技有限公司 The method and device of the quick loading configuration file of multithreading
CN103593440A (en) * 2013-11-15 2014-02-19 北京国双科技有限公司 Method and device for reading and writing log file
CN103841196B (en) * 2014-03-07 2017-05-17 长沙裕邦软件开发有限公司 File uploading system and method based on multithreading
CN104376082A (en) * 2014-11-18 2015-02-25 中国建设银行股份有限公司 Method for importing data in data source file to database
CN104376082B (en) * 2014-11-18 2019-06-18 中国建设银行股份有限公司 A method of the data in data source file are imported into database
CN104484456A (en) * 2014-12-29 2015-04-01 哈尔滨工业大学 Multi-threading parallel-based rapid loading method for SQLite database
CN104700217A (en) * 2015-03-17 2015-06-10 歌尔声学股份有限公司 Method and system for collecting test data
CN105354320A (en) * 2015-11-16 2016-02-24 天津南大通用数据技术股份有限公司 Method and device for rapidly loading multiple data files
CN105468705A (en) * 2015-11-18 2016-04-06 广东南方通信建设有限公司 Mobile communication background data file importing method
CN106909554A (en) * 2015-12-22 2017-06-30 亿阳信通股份有限公司 A kind of loading method and device of database text table data
CN106909554B (en) * 2015-12-22 2020-08-04 亿阳信通股份有限公司 Method and device for loading database text table data
CN108279943B (en) * 2017-01-05 2020-09-11 腾讯科技(深圳)有限公司 Index loading method and device
CN108279943A (en) * 2017-01-05 2018-07-13 腾讯科技(深圳)有限公司 Index loading method and device
CN106934037A (en) * 2017-03-15 2017-07-07 郑州云海信息技术有限公司 A kind of high concurrent realizes the method that database quickly loads data
CN107885460A (en) * 2017-10-12 2018-04-06 北京人大金仓信息技术股份有限公司 A kind of data access method of cluster
CN110362617A (en) * 2019-06-24 2019-10-22 北京人大金仓信息技术股份有限公司 Batch data method and system is quickly exported from database based on more concurrent technologies
CN110347440A (en) * 2019-06-24 2019-10-18 北京人大金仓信息技术股份有限公司 Data method and system are quickly loaded to database based on multi-course concurrency and plug-in unit
CN110347440B (en) * 2019-06-24 2022-06-03 北京人大金仓信息技术股份有限公司 Method and system for rapidly loading data to database based on multi-process concurrence and plug-in
CN111104373A (en) * 2019-12-24 2020-05-05 天地伟业技术有限公司 Database performance optimization method
CN111104373B (en) * 2019-12-24 2023-09-19 天地伟业技术有限公司 Database performance optimization method
CN116257493A (en) * 2022-12-29 2023-06-13 北京京桥热电有限责任公司 OPC (optical clear control) network gate penetrating interface based on caching mechanism

Similar Documents

Publication Publication Date Title
CN103049533A (en) Method for quickly loading data into database
Bende et al. Dealing with small files problem in hadoop distributed file system
Mackey et al. Improving metadata management for small files in HDFS
US20170206212A1 (en) Partial snapshot creation
CN106126601A (en) A kind of social security distributed preprocess method of big data and system
TW201530328A (en) Method and device for constructing NoSQL database index for semi-structured data
CN103177059A (en) Split processing paths for database calculation engine
CN105701190A (en) Data synchronizing method and device
JP5478526B2 (en) Data analysis and machine learning processing apparatus, method and program
CN104731896A (en) Data processing method and system
CN106909554B (en) Method and device for loading database text table data
CN103699656A (en) GPU-based mass-multimedia-data-oriented MapReduce platform
CN104750720A (en) Method for achieving high-performance data processing under multithread concurrent access environment
Pal et al. A performance analysis of mapreduce task with large number of files dataset in big data using hadoop
US10558391B2 (en) Data processing system and data processing method
CN104571946A (en) Memory device supporting quick query of logical circuit and access method of memory device
CN101783814A (en) Metadata storing method for mass storage system
CN105426119A (en) Storage apparatus and data processing method
Wang et al. Federated MapReduce to transparently run applications on multicluster environment
Masouleh et al. Optimization of ETL process in data warehouse through a combination of parallelization and shared cache memory
CN105243099A (en) Large data real-time storage method based on translation document
Kulkarni Hadoop mapreduce over lustre
Zhao et al. Metadata-Aware small files storage architecture on hadoop
CN108132970A (en) Big data distributed approach and system based on cloud computing
CN104283909A (en) Cloud computing method and device compatible with desktop applications

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20130417

RJ01 Rejection of invention patent application after publication