CN106790489A - Parallel data loading method and system - Google Patents

Parallel data loading method and system Download PDF

Info

Publication number
CN106790489A
CN106790489A CN201611150991.7A CN201611150991A CN106790489A CN 106790489 A CN106790489 A CN 106790489A CN 201611150991 A CN201611150991 A CN 201611150991A CN 106790489 A CN106790489 A CN 106790489A
Authority
CN
China
Prior art keywords
back end
host node
loading
information
loaded
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611150991.7A
Other languages
Chinese (zh)
Other versions
CN106790489B (en
Inventor
杨卓慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Huawei Technology Co Ltd
Original Assignee
Chengdu Huawei Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Huawei Technology Co Ltd filed Critical Chengdu Huawei Technology Co Ltd
Priority to CN201611150991.7A priority Critical patent/CN106790489B/en
Publication of CN106790489A publication Critical patent/CN106790489A/en
Application granted granted Critical
Publication of CN106790489B publication Critical patent/CN106790489B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Abstract

The embodiment of the present invention provides a kind of parallel data loading method and system, by by data storage to be loaded in ftp server, back end downloads the corresponding data block of load document information from ftp server, downloaded in the form of ftp server file, the efficiency for downloading file is improve, the efficiency of parallel data loading is improve.Loading configured information is sent to multiple back end by host node, so that the data stored in multiple back end loaded in parallel ftp servers, actively distributed by way of task to server request back end, the strong back end of disposal ability can be made to load more data blocks, the distribution according to need of loading tasks is realized, the efficiency of parallel data loading is further improved.

Description

Parallel data loading method and system
Technical field
The present embodiments relate to computer technology, more particularly to a kind of parallel data loading method and system.
Background technology
With developing rapidly for computer technology, also more and more extensively, the loading efficiency of data is direct for the application of database Influence the overall performance of database.
In the prior art, when data are loaded, (Java DataBase are connected using by Jave databases Connectivity, referred to as:JDBC) or open database inter connection (Open DataBase Connectivity, letter Claim:ODBC) drive and database connection, carry out data loading using the SQL statement of standard, for example:SQL SERVER, ORACLE Data are loaded using this kind of mode with the scene such as PostgerSQL.
However, the data loading efficiency of prior art is not high.
The content of the invention
The embodiment of the present invention provides a kind of data load method and system, to improve the efficiency of parallel data loading.
The embodiment of the present invention provides a kind of parallel data loading method on one side, and the method is applied to parallel data loading System, the system includes:M host node, N number of back end and R FTP ftp server, wherein, M be more than Integer equal to 1, N is the integer more than or equal to 2, and R is the integer more than or equal to 1, M host node and N number of back end and R Individual ftp server communication connection, N number of back end and R ftp server communication connection.
The method includes:Host node sends loading configured information at least two back end, and loading configured information is used for At least two back end are indicated to load the data stored in ftp server;The task that each back end sends to host node Assignment request information, host node sends the corresponding load document information of back end to each back end, back end according to Load document information is downloaded the corresponding data block of load document information and is loaded from ftp server, by host node to multiple Back end sends loading configured information, so that the data stored in multiple back end loaded in parallel ftp servers, by number The mode of task is actively distributed to server request according to node, the strong back end of disposal ability can be made to load more data Block, realizes the distribution according to need of loading tasks, further improves the efficiency of parallel data loading.
Alternatively, before host node sends the corresponding load document information of back end to each back end, also include:
Host node sends the frequency of task assignment request information according at least two back end, is defined as each data section The size of the load document of point distribution.Realization is allocated according to the actual treatment ability of back end, further raising treatment The utilization rate of resource, improves the efficiency of loaded in parallel
Alternatively, also include:
If host node determines file to be loaded whole loadeds, send loading at least two back end and complete Configured information.So that at least two back end stop sending task assignment request information to host node.
Alternatively, before host node sends loading configured information at least two back end, also include:
Host node receives the loading configured information that client sends, the letter comprising file to be loaded in loading configured information Breath.
Alternatively, also include:
If host node determines file to be loaded whole loadeds, send loading to client and complete configured information. So that at least two back end stop sending task assignment request information to host node.
Alternatively, host node sends the frequency of task assignment request information according at least two back end, is defined as every Before the size of the load document of individual data node distribution, also include:
File division to be loaded is multiple data blocks, each data block one load document information of correspondence by host node.
On the other hand the embodiment of the present invention provides a kind of parallel data loading system, includes:
M host node, N number of back end and R FTP ftp server, wherein, M is whole more than or equal to 1 Number, N is the integer more than or equal to 2, and R is the integer more than or equal to 1, M host node and N number of back end and R FTP service Device is communicated to connect, and N number of back end and R ftp server are communicated to connect.
Wherein, ftp server, for storing file to be loaded;
Host node, for sending loading configured information at least two back end, loading configured information is used to indicate extremely The data stored in few two back end loading ftp server;
Host node, is additionally operable to receive the task assignment request information that at least two back end send, task distribution request Information is used to ask host node to distribute load document information for back end;
Host node, is additionally operable to send the corresponding load document information of back end to each back end;
Back end, for downloading the corresponding data block of load document information from ftp server according to load document information Loaded.
Alternatively, host node is additionally operable to be sent according at least two back end the frequency of task assignment request information, really It is set to the size of the load document of each back end distribution.
Alternatively, host node is additionally operable to determine file to be loaded whole loadeds, at least two back end Send loading and complete configured information.
Alternatively, host node is additionally operable to receive the loading configured information that client sends, and is included in loading configured information and treated The information of load document.
Alternatively, host node is additionally operable to determine file to be loaded whole loadeds, is sent to client and loaded Into configured information.
Alternatively, it is multiple data blocks that host node is additionally operable to file division to be loaded, and each data block correspondence one adds Carry fileinfo.
Brief description of the drawings
Fig. 1 is the parallel data loading system Organization Chart of the embodiment of the present invention;
Fig. 2 is the schematic flow sheet of parallel data loading method embodiment of the present invention;
Fig. 3 is parallel data of the present invention loading interaction schematic diagram.
Specific embodiment
Term " first ", " second ", " the 3rd ", " in description and claims of this specification and above-mentioned accompanying drawing Four " etc. (if present) is for distinguishing similar object, without for describing specific order or precedence.Should manage Solution so data for using can be exchanged in the appropriate case, so that embodiments of the invention described herein for example can be removing Order beyond those for illustrating herein or describing is implemented.Additionally, term " comprising " and " having " and theirs is any Deformation, it is intended that covering is non-exclusive to be included, for example, containing process, method, system, the product of series of steps or unit Product or equipment are not necessarily limited to those steps clearly listed or unit, but may include not list clearly or for this A little processes, method, product or other intrinsic steps of equipment or unit.
Fig. 1 is the parallel data loading system Organization Chart of the embodiment of the present invention, as shown in figure 1, the embodiment of the present invention is System includes:M host node, N number of back end and external data source, external data source include R (File Transfer Protocol, referred to as:FTP) server, wherein, M is the integer more than or equal to 1, and N is the integer more than or equal to 2, R be more than etc. In 1 integer, in Fig. 1 with M=2, N=3 and R=2 to exemplify.Client can also be included in Fig. 1, client can be with As a part for system, or independently of the part of system, in this regard, the embodiment of the present invention is not restricted.
Wherein, client is communicated to connect with host node.Host node is communicated to connect with ftp server and back end.Number Communicated to connect according to node and ftp server.Ftp server stores file to be loaded.Back end is deposited for loading ftp server The file to be loaded of storage.Host node is used for the file to be loaded of control data node loading ftp server storage.
Technical scheme is described in detail with specifically embodiment below.These specific implementations below Example can be combined with each other, and may be repeated no more in some embodiments for same or analogous concept or process.
Fig. 2 is the schematic flow sheet of parallel data loading method embodiment of the present invention, and the method for the present embodiment is applied to Fig. 1 Shown parallel data loading system, the method for the present embodiment is as follows:
S200:Host node creates appearance.
Host node can create appearance by SQL statement.
SQL statement is for example:
“create FOREIGN table foreign_table(index int,description varchar (100)
)SERVER file_server OPTIONS(format'text',ftpserver‘ftp:// 10.123.1.100/data’,delimiter'|',null”)”
S201:Client sends loading configured information to host node.
Wherein, the information comprising file to be loaded in loading configured information.File storage to be loaded is in ftp server. Can be stored in a ftp server, it is also possible to be stored in multiple ftp servers, to this embodiment of the present invention not It is restricted.
Alternatively, client can send loading configured information by SQL statement to host node.
SQL statement is for example:
“Insert into table2select*from foreign_table”
Host node is received after the loading configured information of client transmission, performs S202.
S202:Host node sends loading configured information at least two back end.
Loading configured information is used for the data for indicating to be stored at least two back end loading ftp server.
I.e. host node loads configured information at least two back end synchronized transmissions, so that at least two back end are simultaneously The data stored in row loading ftp server.
Back end parsing loading configured information, is the discovery that the parallel data loading configured information of ftp server, then perform S203。
S203:Back end sends task assignment request information to host node.
Wherein, task assignment request information is used to ask host node to distribute load document information for back end.
S204:Back end receives the corresponding load document information of back end that host node sends.
S205:Back end downloads the corresponding data block of load document information according to load document information from ftp server Loaded.
Back end performs S203-S205, when the corresponding number of load document information after loading configured information is received After block loaded, return and perform S203-S205, until the loading for receiving host node transmission completes configured information, stop Only task assignment request information is sent to host node.Actively distributed by way of task to server request back end, can So that the strong back end of disposal ability loads more data blocks, the distribution according to need of loading tasks is realized, improve parallel data The efficiency of loading.Fig. 3 is parallel data of the present invention loading interaction schematic diagram, as shown in Figure 3.Wherein, comprising two ftp servers Respectively ftp server 1 and ftp server 2, three back end are respectively back end 1, back end 2 and back end 3, the grid in ftp server represents data block.Left figure is for before not loading, right figure is in loading procedure.
Wherein, before host node sends corresponding load document information to back end, host node recurrence enumerates FTP service File on device, in advance by file division to be loaded into multiple data blocks.Each data block will ensure row integrality.Multiple data Block can be identical, or of different sizes size, in this regard, the embodiment of the present invention is not restricted.
Data block is corresponding with load document information, and load document information can be by the title of load document and offset-lists Show.
When host node is that back end distributes load document information, it is alternatively possible to sent according at least two nodes The frequency of task assignment request information, determines the size of the load document of each back end distribution.If certain back end is sent out Send the frequency of task assignment request information higher, illustrate that the disposal ability of the back end is stronger, then can be the data Node distribution long data block, to make full use of the process resource of the back end.If certain back end sends task, and distribution please Ask the frequency of information than relatively low, illustrate that the disposal ability of the back end is weaker, then can distribute decimal for the back end According to block.So as to, realization is allocated according to the actual treatment ability of back end, further improves the utilization rate of process resource, Improve the efficiency of loaded in parallel.
When host node is different back end distribution load document information, can distribute on different ftp servers The corresponding load document information of data block, to make full use of the network bandwidth of each ftp server, further improves parallel adding The efficiency of load.
S206:If host node determines file to be loaded whole loadeds, sent at least two back end and loaded Complete configured information.
Host node completes configured information by sending loading at least two back end, so that at least two back end Stop sending task assignment request information to host node.
S207:If host node determines file to be loaded whole loadeds, sending loading to client completes to indicate letter Breath.
The backward host node of back end loaded returns to loading result, and host node returns to loading result to client, So that user understands loading result by client.
The present embodiment, by by data storage to be loaded in ftp server, under back end is from ftp server The corresponding data block of load document information is carried, is downloaded in the form of ftp server file, improve the efficiency for downloading file, carried The efficiency of parallel data loading high.Loading configured information is sent to multiple back end by host node, so that multiple data The data stored in nodal parallel loading ftp server, are actively distributed by way of task back end to server request, The strong back end of disposal ability can be made to load more data blocks, realize the distribution according to need of loading tasks, further improved The efficiency of parallel data loading.The frequency of the task assignment request information sent according at least two nodes by host node, really The size of the load document that fixed each back end is matched somebody with somebody, realization is allocated according to the actual treatment ability of back end, enters One step improves the utilization rate of process resource, improves the efficiency of loaded in parallel.It is that different back end are distributed not by host node Data on same ftp server, make full use of the network bandwidth of each ftp server, further improve the effect of loaded in parallel Rate.When the bottleneck of data parallel loading is in network I/O, network I/O bottleneck can also be eliminated by increasing ftp server, entered Improve the efficiency of loaded in parallel in one step ground.
The present invention also provide parallel data loading system embodiment, the system as shown in figure 1, including:It is M host node, N number of Back end and R FTP ftp server, wherein, M is the integer more than or equal to 1, and N is whole more than or equal to 2 Number, R is the integer more than or equal to 1, M host node and N number of back end and R ftp server communication connection, N number of data Node and R ftp server communication connection.
Wherein, ftp server, for storing file to be loaded.
Host node, for sending loading configured information at least two back end, loading configured information is used to indicate extremely The data stored in few two back end loading ftp server.
Host node, is additionally operable to receive the task assignment request information that at least two back end send, task distribution request Information is used to ask host node to distribute load document information for back end.
Host node, is additionally operable to send the corresponding load document information of back end to each back end.
Back end, for downloading the corresponding data block of load document information from ftp server according to load document information Loaded.
Alternatively, host node is additionally operable to be sent according at least two back end the frequency of task assignment request information, really It is set to the size of the load document of each back end distribution.
Alternatively, host node is additionally operable to determine file to be loaded whole loadeds, at least two back end Send loading and complete configured information.
Alternatively, host node is additionally operable to receive the loading configured information that client sends, and is included in loading configured information and treated The information of load document.
Alternatively, host node is additionally operable to determine file to be loaded whole loadeds, is sent to client and loaded Into configured information.
Alternatively, it is multiple data blocks that host node is additionally operable to file division to be loaded, and each data block correspondence one adds Carry fileinfo.
Said system embodiment, the corresponding technical scheme that can be used to perform embodiment of the method shown in Fig. 2, its realization principle Similar with technique effect, here is omitted.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above-mentioned each method embodiment can lead to The related hardware of programmed instruction is crossed to complete.Foregoing program can be stored in a computer read/write memory medium.The journey Sequence upon execution, performs the step of including above-mentioned each method embodiment;And foregoing storage medium includes:ROM, RAM, magnetic disc or Person's CD etc. is various can be with the medium of store program codes.

Claims (11)

1. a kind of parallel data loading method, it is characterised in that methods described is applied to parallel data loading system, the system Including:M host node, N number of back end and R FTP ftp server, wherein, the M is more than or equal to 1 Integer, the N is the integer more than or equal to 2, and the R is the integer more than or equal to 1, the M host node and N number of data Node and the R ftp server are communicated to connect, and N number of back end and the R ftp server are communicated to connect;
Methods described includes:
Host node sends loading configured information at least two back end, the loading configured information be used to indicating it is described at least Two back end load the data stored in the ftp server;
The host node receives the task assignment request information that at least two back end sends, the task distribution request Information is used to ask the host node to distribute load document information for the back end;
The host node sends the corresponding load document information of the back end to each back end;
It is corresponding that the back end downloads the load document information according to the load document information from the ftp server Data block is loaded.
2. method according to claim 1, it is characterised in that the host node sends the data to each back end Before the corresponding load document information of node, also include:
The host node sends the frequency of task assignment request information according at least two back end, is defined as every number According to the size of the load document of node distribution.
3. method according to claim 2, it is characterised in that also include:
If the host node determines file to be loaded whole loadeds, sent at least two back end and loaded Complete configured information.
4. method according to claim 3, it is characterised in that also include:
If the host node determines file to be loaded whole loadeds, sending loading to the client completes to indicate letter Breath.
5. the method according to claim any one of 2-4, it is characterised in that the host node is according to described at least two numbers The frequency of task assignment request information is sent according to node, before being defined as the size of the load document of each back end distribution, Also include:
File division to be loaded is multiple data blocks, the load document letter of each data block correspondence one by the host node Breath.
6. a kind of parallel data loading system, it is characterised in that it includes:
M host node, N number of back end and R FTP ftp server, wherein, the M is whole more than or equal to 1 Number, the N is the integer more than or equal to 2, and the R is the integer more than or equal to 1, the M host node and N number of data section Point and R ftp server communication connection, N number of back end and the R ftp server are communicated to connect;
Wherein,
The ftp server, for storing file to be loaded;
The host node, for sending loading configured information at least two back end, the loading configured information is used to refer to Show that at least two back end loads the data stored in the ftp server;
The host node, is additionally operable to receive the task assignment request information that at least two back end sends, the task Assignment request information is used to ask the host node to distribute load document information for the back end;
The host node, is additionally operable to send the corresponding load document information of the back end to each back end;
The back end, for downloading the load document information from the ftp server according to the load document information Corresponding data block is loaded.
7. system according to claim 6, it is characterised in that
The host node is additionally operable to be sent according at least two back end frequency of task assignment request information, is defined as The size of the load document of each back end distribution.
8. system according to claim 7, it is characterised in that
The host node is additionally operable to determine file to be loaded whole loadeds, is sent at least two back end Loading completes configured information.
9. system according to claim 8, it is characterised in that
The host node is additionally operable to receive the loading configured information that client sends, and is treated comprising described in the loading configured information The information of load document.
10. system according to claim 9, it is characterised in that
The host node is additionally operable to determine file to be loaded whole loadeds that sending loading completion to the client refers to Show information.
11. system according to claim any one of 7-10, it is characterised in that
It is multiple data blocks, the loading text of each data block correspondence one that the host node is additionally operable to file division to be loaded Part information.
CN201611150991.7A 2016-12-14 2016-12-14 Parallel data loading method and system Active CN106790489B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611150991.7A CN106790489B (en) 2016-12-14 2016-12-14 Parallel data loading method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611150991.7A CN106790489B (en) 2016-12-14 2016-12-14 Parallel data loading method and system

Publications (2)

Publication Number Publication Date
CN106790489A true CN106790489A (en) 2017-05-31
CN106790489B CN106790489B (en) 2020-12-22

Family

ID=58887772

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611150991.7A Active CN106790489B (en) 2016-12-14 2016-12-14 Parallel data loading method and system

Country Status (1)

Country Link
CN (1) CN106790489B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107885460A (en) * 2017-10-12 2018-04-06 北京人大金仓信息技术股份有限公司 A kind of data access method of cluster
CN109547253A (en) * 2018-11-28 2019-03-29 广东海格怡创科技有限公司 Document down loading method, device, computer equipment and storage medium
CN109815295A (en) * 2019-02-18 2019-05-28 国家计算机网络与信息安全管理中心 Distributed type assemblies data lead-in method and device
CN109902065A (en) * 2019-02-18 2019-06-18 国家计算机网络与信息安全管理中心 Access distributed type assemblies external data method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1939036A (en) * 2004-06-08 2007-03-28 国际商业机器公司 Optimized concurrent data download within a grid computing environment
CN103544285A (en) * 2013-10-28 2014-01-29 华为技术有限公司 Data loading method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1939036A (en) * 2004-06-08 2007-03-28 国际商业机器公司 Optimized concurrent data download within a grid computing environment
CN103544285A (en) * 2013-10-28 2014-01-29 华为技术有限公司 Data loading method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107885460A (en) * 2017-10-12 2018-04-06 北京人大金仓信息技术股份有限公司 A kind of data access method of cluster
CN109547253A (en) * 2018-11-28 2019-03-29 广东海格怡创科技有限公司 Document down loading method, device, computer equipment and storage medium
CN109815295A (en) * 2019-02-18 2019-05-28 国家计算机网络与信息安全管理中心 Distributed type assemblies data lead-in method and device
CN109902065A (en) * 2019-02-18 2019-06-18 国家计算机网络与信息安全管理中心 Access distributed type assemblies external data method and device

Also Published As

Publication number Publication date
CN106790489B (en) 2020-12-22

Similar Documents

Publication Publication Date Title
US8214356B1 (en) Apparatus for elastic database processing with heterogeneous data
CN107391629B (en) Method, system, server and computer storage medium for data migration between clusters
AU2014212780B2 (en) Data stream splitting for low-latency data access
CN106790489A (en) Parallel data loading method and system
US9323580B2 (en) Optimized resource management for map/reduce computing
EP1032175A2 (en) System and method for transferring partitioned data sets over multiple threads
CN104881466B (en) The processing of data fragmentation and the delet method of garbage files and device
CN104718548B (en) Connection in heterogeneous database system comprising extensive low-power cluster it is effective under push away
CN107807815B (en) Method and device for processing tasks in distributed mode
CN105009508A (en) Integrity checking and selective deduplication based on network parameters
CN105516284B (en) A kind of method and apparatus of Cluster Database distributed storage
CN107333248B (en) A kind of real-time sending method of short message and system
US9355106B2 (en) Sensor data locating
CN105373420A (en) Data transmission method and apparatus
CN105320577B (en) A kind of data backup and resume method, system and device
CN106412030B (en) A kind of selection storage resource method, apparatus and system
CN104239508A (en) Data query method and data query device
CN103414762A (en) Cloud backup method and cloud backup device
CN113051102B (en) File backup method, device, system, storage medium and computer equipment
CN107229635B (en) Data processing method, storage node and coordination node
CN111026397B (en) Rpm packet distributed compiling method and device
CN111951112A (en) Intelligent contract execution method based on block chain, terminal equipment and storage medium
CN108897497B (en) Centerless data management method and device
CN105978744A (en) Resource allocation method, device and system
CN103098025A (en) Software-loading processing method, apparatus and network system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant