CN106407306A

CN106407306A - Data persistence distribution method and device

Info

Publication number: CN106407306A
Application number: CN201610777564.5A
Authority: CN
Inventors: 崔维力; 武新; 刘威; 郑黎辉
Original assignee: TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd
Current assignee: TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd
Priority date: 2016-08-31
Filing date: 2016-08-31
Publication date: 2017-02-15

Abstract

The invention provides a data persistence distribution method and device. The method comprises the following steps: acquiring a data storage requirement for a target side; processing data to be distributed according to the storage requirement; and sending the data to be distributed after processing to the target side. The data processing capacity of the target side is reduced, the data distribution progress is quickened, and the data distribution time is reduced.

Description

The method and device of data persistence distribution

Technical field

The invention belongs to distributed data base field, especially relate to a kind of method and device of data persistence distribution.

Background technology

Distributed data base refers to connect physically scattered multiple data storage cells using information autobahn Get up one data base unified in logic of composition.The basic thought of distributed data base is by original centralized data base Data dispersion storage to multiple by the data memory node of network connection, to obtain bigger memory capacity and Geng Gao simultaneously Send out visit capacity.In recent years, with the rapid growth of data volume, distributed data base technique has also obtained quick development, tradition Relevant database start from centralized model to distributed structure/architecture develop, the distributed data base based on relationship type retain Under the data model and basic feature of traditional database, move towards distributed storage from centralised storage, calculate from centralized To Distributed Calculation.

In distributed data base, each node is typically independent data base, and they completely can be independent as one DBMS is operating.When database node number changes, available data will be redistributed, from all or part of node Middle extracted data, and dumped on other nodes, the transmission of usual data is carried out using the network equipment.If additionally, number According to when certain partial data needs to dump to other tables of data in storehouse, also similar operation to be carried out, especially with new data Data also to be reorganized during distribution rule.

Traditional implementation is based primarily upon following steps

1. in source extracted data

2. destination node is sent data to by network

3. destination node carries out data compilation as required

4. write data into the storage device of persistence

Implementation above can face following Railway Project

1. the data extracting is generally unprocessed, takies the substantial amounts of network bandwidth

2., in the case that destination node number is less than source node number, the calculating pressure of data compilation is concentrated on relatively fewer Destination node on, cause whole process execution slow.

Content of the invention

Embodiments provide a kind of method and device of data persistence distribution, to solve data distribution operand The technical problem concentrated.

On the one hand, embodiments provide a kind of method of data persistence distribution, including：

Obtain destination end call data storage；

Treat distributed data according to described memory requirement to be processed；

Treat that distributed data sends to destination end.

Further, the described memory requirement after processing is included：

The data attribute of table in destination end.

Further, described distributed data treated according to described memory requirement processed, including：

Distributed data is treated according to described memory requirement and carries out type conversion, so that the data after conversion is close to destination end Storage format.

Further, methods described also includes：

Data after conversion is compressed.

Further, methods described also includes：

Generate the metadata treating distributed data.

On the other hand, embodiments provide a kind of device of data persistence distribution, including：

Acquiring unit, for obtaining destination end call data storage；

Processing unit, is processed for treating distributed data according to described memory requirement；

Transmitting element, for will process after treat that distributed data sends to destination end.

Further, described memory requirement includes：

The data attribute of table in destination end.

Further, described processing unit is additionally operable to：

Distributed data is treated according to described memory requirement and carries out type conversion, so that the data fit destination end after conversion Storage format.

Further, described data persistence distribution apparatus also includes：

Compression unit, for being compressed to the data after conversion.

Further, described compression unit also includes：

Metadata signal generating unit, for generating the metadata treating distributed data.

The method and device of the data persistence distribution that the present invention provides, by treating distributed data according to mesh in source node The memory requirement of mark end node carries out pretreatment.And by the data is activation of pretreatment to destination end.Reduce at the data of destination end Reason amount, accelerates data distribution progress, reduces data distributable period.

Brief description

In order to be illustrated more clearly that the technical scheme of the embodiment of the present invention, below will be in embodiment or description of the prior art The accompanying drawing of required use be briefly described it should be apparent that, drawings in the following description be only the present invention some are real Apply example, for those of ordinary skill in the art, without having to pay creative labor, can also be attached according to these Figure obtains other accompanying drawings.

Fig. 1 is the schematic flow sheet of the method for data persistence distribution that the embodiment of the present invention one provides；

Fig. 2 is the schematic flow sheet of the method for data persistence distribution that the embodiment of the present invention two provides；

Fig. 3 is the schematic flow sheet of the method for data persistence distribution that the embodiment of the present invention three provides；

Fig. 4 is the structural representation of the data persistence distribution apparatus that the embodiment of the present invention four provides.

Specific embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation description is it is clear that described embodiment a part of embodiment that is the present invention, rather than whole embodiments.Based on this Embodiment in bright, the every other enforcement that those of ordinary skill in the art are obtained under the premise of not making creative work Example, broadly falls into the scope of protection of the invention.

Embodiment one

The flow chart of the method for the data persistence distribution that Fig. 1 provides for the embodiment of the present invention one, the present embodiment is applicable In the situation of the distribution of data persistence in the cluster, the method can be executed by data persistence distribution apparatus, and this device can Realized by software/hardware mode, and can be integrated in the source node of distributed data base system.

The method being distributed referring to Fig. 1, described data persistence, including：

S110, obtains destination end call data storage.

If certain partial data needs to dump to other tables of data in data base, need to take out from all or part of node Fetch data, and dumped on other nodes.Before unloading, source node obtains destination end call data storage, obtains mark End data memory requirement can be carried out with extracting itself data syn-chronization, to realize reducing the purpose of distribution time.Exemplary, source Node can interacting by real-time performance and destination node, to obtain the memory requirement of object table in destination node, exemplary , described memory requirement includes the data attribute of table in destination end.

S120, treats distributed data according to described memory requirement and is processed.

Source node being locally done directly to the process needing distributed data, be allowed to completely or nearly destination node to storage The setting of data form.Exemplary, described process can include type conversion, can be in source node by number by type conversion According to being processed as the data that object table in destination node is directly added into.

S130, will process after treat that distributed data sends to destination end.

Data after processing directly is deposited in the position specified when receiving data by destination end, due to the number receiving According to being processed in source node, can be directly attached to after available data, decrease the operand of destination end.

The present embodiment provide data persistence distribution method and device, by source node treat distributed data according to The memory requirement of target end node carries out pretreatment.And by the data is activation of pretreatment to destination end.Reduce the data of destination end Treating capacity, accelerates data distribution progress, reduces data distributable period.

In a preferred implementation of the present embodiment, described source node after receiving distribution task, according to distribution Task determines the execution side of evaluation work, such as when source node quantity is much larger than destination node quantity, can be according to being described above Process, evaluation work is placed in source node one end, reduces the calculating pressure to data processing for the destination node.When destination node number When measuring more, such as Data Migration between cluster, then calculating task is given destination node.By this configurable mode, fit Answer different demand scenes, more flexibly.

Embodiment two

Fig. 2 is the schematic flow sheet of the method for data persistence distribution that the embodiment of the present invention two provides, and the present invention is implemented Based on above-described embodiment, further, methods described increases following steps to example：Data after conversion is compressed.

Referring to Fig. 2, the expansion method of described distributed data base, including：

S210, obtains destination end call data storage.

S220, treats distributed data according to described memory requirement and carries out type conversion, so that the data after conversion is close Destination end storage format.

S230, is compressed to the data after conversion.

Data compression refers on the premise of not losing useful information, and reduction data volume, to reduce memory space, improves it Transmission, storage and treatment effeciency, or according to certain algorithm, data is reorganized, reduce redundancy and the storage of data A kind of technical method in space.By being compressed to translated data, improve what source node to target node network transmitted Efficiency.Decrease the occupancy of the network bandwidth.

S240, will process after treat that distributed data sends to destination end.

The present embodiment passes through to increase following steps：Data after conversion is compressed.By carrying out to translated data Compression, improves the efficiency that source node transmits to target node network.Decrease the occupancy of the network bandwidth.

Embodiment three

Fig. 3 is the schematic flow sheet of the method for data persistence distribution that the embodiment of the present invention three provides, and the present invention is implemented Based on above-described embodiment, further, methods described also comprises the steps example：Methods described also includes：Generate and treat point The metadata of cloth data.

Referring to Fig. 3, the expansion method of described distributed data base, including：

S310, obtains destination end call data storage.

S320, treats distributed data according to described memory requirement and is processed.

S330, will process after treat that distributed data sends to destination end.

S340, generates the metadata treating distributed data.

Metadata is defined as：The data of description data, the descriptive information to data and information resources.Due to metadata It is also data, therefore can be stored in data base with the method for class likelihood data and obtain.Because target end data occurs Change, needs to generate corresponding metadata, to provide position and the description of data storage, after generating metadata, destination node Corresponding service can be provided.

The present embodiment passes through to increase following steps, generates the metadata treating distributed data.The position of storage data can be provided And description, after generating metadata, destination node can provide corresponding service.

Example IV

Fig. 4 is the structural representation of the data persistence distribution apparatus that the embodiment of the present invention four provides, as shown in figure 4, institute State device to include：

Acquiring unit 410, for obtaining destination end call data storage；

Processing unit 420, is processed for treating distributed data according to described memory requirement；

Transmitting element 430, for will process after treat that distributed data sends to destination end.

Further, described memory requirement includes：

The data attribute of table in destination end.

Further, described processing unit is additionally operable to：

Further, described data persistence distribution apparatus also includes：

Compression unit, for being compressed to the data after conversion.

Further, described compression unit also includes：

One of ordinary skill in the art will appreciate that：The all or part of step realizing above-mentioned each method embodiment can be led to Cross the related hardware of programmed instruction to complete.Aforesaid program can be stored in a computer read/write memory medium.This journey Sequence upon execution, executes the step including above-mentioned each method embodiment；And aforesaid storage medium includes：ROM, RAM, magnetic disc or Person's CD etc. is various can be with the medium of store program codes.

Finally it should be noted that：Various embodiments above only in order to technical scheme to be described, is not intended to limit；To the greatest extent Pipe has been described in detail to the present invention with reference to foregoing embodiments, it will be understood by those within the art that：Its according to So the technical scheme described in foregoing embodiments can be modified, or wherein some or all of technical characteristic is entered Row equivalent；And these modifications or replacement, do not make the essence of appropriate technical solution depart from various embodiments of the present invention technology The scope of scheme.

Claims

1. a kind of method of data persistence distribution is it is characterised in that include：

Obtain destination end call data storage；

Will process after treat that distributed data sends to destination end.

2. method according to claim 1 is it is characterised in that described memory requirement includes：

The data attribute of table in destination end.

3. method according to claim 1 is it is characterised in that described treat distributed data according to described memory requirement and carry out Process, including：

Distributed data is treated according to described memory requirement and carries out type conversion, so that the data fit destination end storage after conversion Form.

4. method according to claim 3 is it is characterised in that methods described also includes：

Data after conversion is compressed.

5. method according to claim 4 is it is characterised in that methods described also includes：

Generate the metadata treating distributed data.

6. a kind of device of data persistence distribution is it is characterised in that include：

Acquiring unit, for obtaining destination end call data storage；

7. device according to claim 6 is it is characterised in that described memory requirement includes：

The data attribute of table in destination end.

8. device according to claim 6 is it is characterised in that described processing unit is used for：

9. device according to claim 8 is it is characterised in that described device also includes：

Compression unit, for being compressed to the data after conversion.

10. device according to claim 9 is it is characterised in that described device also includes：