CN104572862A

CN104572862A - Mass data storage access method and system

Info

Publication number: CN104572862A
Application number: CN201410796023.8A
Authority: CN
Inventors: 阳珍秀
Original assignee: Individual
Current assignee: Individual
Priority date: 2014-12-19
Filing date: 2014-12-19
Publication date: 2015-04-29

Abstract

The invention discloses a mass data storage access method and system, wherein the method comprises the following steps that source data from a data source file is obtained; the obtained source data is divided into a plurality of independent data blocks; the divided independent data blocks are stored in a distributive file system of a cloud platform; according to different data access requests, the parallel access to the data blocks of the distributive file system is realized. Through the method provided by the invention, mass data can be divided into a plurality of data blocks to be stored into the distributive file system, during the access, the parallel access to the divided data blocks can be realized, the storage and access efficiency of the mass data is improved, and meanwhile, the influence caused by too great data volume on the performance of the storage equipment is also avoided.

Description

A kind of mass data storage access method and system

Technical field

The present invention relates to data memory access technical field, be specifically related to a kind of mass data storage access method and system.

Background technology

The data of existing Large Scale Application Software System to the management object collection that it is applied store and query analysis, and usually, the data volume storing and inquire about efficiency that is all very huge so that inquiry declines.At present, for this problem, can only shorten, simplify SQL query statement, thus improve the business logic processing complexity of data.Profit in this way, process mass data time, the problem that the processing time is long, treatment effeciency is lower, processing speed is slower.

Summary of the invention

Technical matters to be solved by this invention is to provide a kind of mass data storage access method and system, can improve the efficiency of mass data storage access.

The technical scheme that the present invention solves the problems of the technologies described above is as follows:

According to one aspect of the present invention, provide a kind of mass data storage access method, comprising:

Obtain the source data from data source file;

The source data of acquisition is divided into several independently data blocks;

By divide several independently data block be stored in the distributed file system of cloud platform;

For different data access request, the data block in concurrent access distributed file system.

According to another aspect of the present invention, provide a kind of mass data storage access system, comprising:

Source data acquisition module, for obtaining the source data from data source file;

First divides module, for the source data of acquisition is divided into several independently data blocks;

Data memory module, for by divide several independently data block be stored in the distributed file system of cloud platform;

Data access module, for for different data access request, the data block in concurrent access distributed file system.

A kind of mass data storage access method provided by the invention and system, source data is divided into several independently databases, and the dispersion of independently data block is stored in the distributed file system of cloud platform, when needs data access, can data block in concurrent access distributed file system, improve the efficiency of mass data storage access, it also avoid the much performances to memory device of factor data amount and impact.

Accompanying drawing explanation

Fig. 1 is a kind of mass data storage access method process flow diagram of the embodiment of the present invention one;

Fig. 2 is a kind of mass data storage access system schematic diagram of the embodiment of the present invention two.

Embodiment

Be described principle of the present invention and feature below in conjunction with accompanying drawing, example, only for explaining the present invention, is not intended to limit scope of the present invention.

Embodiment one, a kind of mass data storage access method.The method provided below in conjunction with Fig. 1 double embodiment is described in detail.

In Fig. 1, S101, obtain source data from data source file.

Concrete; in operation flow; usually a large amount of business datums (producing with the form of data source file) can be produced; even can be described as the business datum of magnanimity; if the business datum of these magnanimity is stored in some memory devices; certainly will will be very high to the storage space of this memory device requirement, simultaneously also very high to the performance requirement of this memory device.Therefore the present embodiment provides a kind of method of distributed storage mass data, before storing, first needs to obtain business datum, namely obtains the source data in data source file.Because the Data Data amount in data source file is very large, therefore the parallel method obtaining data can be adopted, adopt many to gather link and the data in data source file are gathered, improve the efficiency of data acquisition.

S102, the source data of acquisition is divided into several independently data blocks.

Concrete, the parallel mode obtaining data of step S101 employing gets the data in data source file, and this step divides the data obtained, and is several independently data blocks by Data Placement.Wherein, the data that the data obtained from data source file may produce for multiple operation flow, therefore the whole data obtained can be divided according to business-subject, a corresponding business-subject of data block, namely there is relation one to one in data block and business-subject, such as, data block corresponding for telecom charging business is divided into an independently data block.These independently data block be logically connected, be all independently physically.

S103, by divide several independently data block be stored in the distributed file system of cloud platform.

Concrete, after the source data of acquisition is divided into multiple independently data block according to different business-subjects, the independently data block dispersion these divided is stored in the distributed file system of cloud platform.Wherein, there are several back end in distributed file system, the independently data block divided is stored on these back end of distributed file system, a data block is stored on a back end, i.e. data block and back end one_to_one corresponding, because data block and business-subject exist corresponding relation, therefore all there is corresponding relation between back end and data block and business-subject.

In addition, after the independently data block divided is stored in the back end of the distributed file system of cloud platform, the time (this time can obtain when obtaining source data from data source file) that the data block being stored in back end produces according to the data in data block being divided again, data block being divided into more tiny sub-block.During concrete enforcement, first data block can be divided per year, and then the sub-block divided per year is monthly divided, be divided into more careful sub-block.If the data volume of the sub-block monthly divided is also very large, then the sub-block monthly divided daily can be divided again, form the sub-block with level, i.e. the sub-block of tree structure.

After data block being divided into the sub-block of tree structure, be stored in respectively by tree-like sub-block in the tables of data in corresponding back end, wherein, corresponding with the sub-block of tree structure, the structure of tables of data is also tree structure.

S104, for different data access request, the data block in concurrent access distributed file system.

Concrete, when client needs visit data, send data access request to cloud platform, in data access request, carry the business-subject of required visit data and the generation time of required visit data.

When cloud platform receives the data access request of client transmission, first according to the business-subject of visit data required for carrying in data access request, the back end (when data store, back end and business-subject one_to_one corresponding) corresponding with this business-subject is searched in the distributed file system of cloud platform.

After finding the back end mated with data access request, in back end, corresponding tables of data is searched again according to the generation time of visit data required in data access request, after finding tables of data, in concrete tables of data, search the data of mating with data access request.Structure due to tables of data is tree structure, has hierarchical relationship, therefore in the process of searching, can searching according to time one-level one-level, and until search the data of mating with data access request.Searching like this according to hierarchical relationship one-level one-level, compares and search concrete data from the data of magnanimity, more regularly follows, and has more purpose, improves the efficiency of data search.

Embodiment two, a kind of mass data storage access system.Below in conjunction with Fig. 2, the system that the present embodiment provides is described in detail.

In Fig. 2, the system that the present embodiment provides comprises source data acquisition module 201, first and divides module 202, data memory module 203 and Data access module 204.Wherein, data memory module 203 comprises the second division module 2031.

Source data acquisition module 201 is mainly used in obtaining the source data from data source file.

Concrete, because the Data Data amount in data source file is very large, therefore source data acquisition module 201 can adopt the parallel method obtaining data, adopts many to gather link and gathers the data in data source file, improve the efficiency of data acquisition.

First divides module 202 is mainly used in the source data of acquisition to be divided into several independently data blocks.

Concrete, the parallel mode obtaining data of source data acquisition module 201 employing gets the data in data source file, and first divides module 202 divides the data obtained, and is several independently data blocks by Data Placement.Wherein, the data that the data obtained from data source file may produce for multiple operation flow, therefore the whole data obtained can divide according to business-subject by the first division module 202, a corresponding business-subject of data block, namely there is relation one to one in data block and business-subject, such as, data block corresponding for telecom charging business is divided into an independently data block.These independently data block be logically connected, be all independently physically.

Data memory module 203 be mainly used in by divide several independently data block be stored in the distributed file system of cloud platform.

Concrete, after the source data of acquisition is divided into multiple independently data block according to different business-subjects by the first division module 202, the independently data block dispersion that these divide by data memory module 203 is stored in the distributed file system of cloud platform.Wherein, there are several back end in distributed file system, the independently data block divided is stored on these back end of distributed file system by data memory module 203, a data block is stored on a back end, i.e. data block and back end one_to_one corresponding, because data block and business-subject exist corresponding relation, therefore all there is corresponding relation between back end and data block and business-subject.

Data memory module 2031 also comprises the second division module 2013, divides, form several sub-blocks specifically for the time produced according to data to the data block in each back end; Sub-block after division is stored in the tables of data of corresponding data node by data memory module 203.

Concrete, second divides module 2031 divided again by the time (this time can obtain when obtaining source data from data source file) that the data block being stored in back end produces according to the data in data block, data block was divided into more tiny sub-block.During concrete enforcement, second divides module 2031 can first divide data block per year, and then is monthly divided by the sub-block divided per year, is divided into more careful sub-block.If the data volume of the sub-block monthly divided is also very large, then the sub-block monthly divided daily can be divided again, form the sub-block with level, i.e. the sub-block of tree structure.

After data block is divided into the sub-block of tree structure by the second division module 2031, tree-like sub-block is stored in respectively in the tables of data in corresponding back end, wherein, corresponding with the sub-block of tree structure, the structure of tables of data is also tree structure.

Data access module 204 is mainly used in for different data access request, the data block in concurrent access distributed file system.

When cloud platform receives the data access request of client transmission, Data access module 204 in cloud platform is according to the business-subject of visit data required for carrying in request of access, the back end (when data store, back end and business-subject one_to_one corresponding) corresponding with this business-subject is searched in the distributed file system of cloud platform.

After Data access module 204 finds the back end mated with data access request, in back end, corresponding tables of data is searched again according to the generation time of visit data required in data access request, after finding tables of data, in concrete tables of data, search the data of mating with data access request.Structure due to tables of data is tree structure, has hierarchical relationship, therefore in the process of searching, can searching according to time one-level one-level, and until search the data of mating with data access request.

A kind of mass data storage access method provided by the invention and system, the mass data of acquisition is divided into some independently data blocks according to business-subject, independently data block dispersion after dividing is stored in the back end of the distributed file system of cloud platform, for the data access request that client is different, can data in concurrent access distributed file system, improve the memory access efficiency of the large data of magnanimity.In addition, the data block being stored in Distributed File System Data node is further subdivided into several sub-blocks according to the time that data produce, and the sub-block of division is stored in the tables of data of corresponding data node, form the data store organisation of tree structure, when visit data, the back end of distributed file system is first navigated to according to the business-subject in data access request, the generation time of the data then will accessed according to data access request, navigate to tables of data concrete in back end, the data of mating with data access request are searched from concrete tables of data, in the process of data search, searching of one-level one-level, until search the data of mating with data access request.Searching like this according to hierarchical relationship one-level one-level, compares and search concrete data from the data of magnanimity, more regularly follows, and has more purpose, improves the efficiency of data search.

The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. a mass data storage access method, is characterized in that, comprising:

Step S101, obtain source data from data source file;

Step S102, the source data of acquisition is divided into several independently data blocks;

Step S103, by divide several independently data block be stored in the distributed file system of cloud platform;

Step S104, for different data access request, the data block in concurrent access distributed file system.

2. a kind of mass data storage access method as claimed in claim 1, it is characterized in that, described step S102 comprises:

The source data of acquisition is divided into several independently data blocks by the business-subject corresponding according to source data, described data block and business-subject one_to_one corresponding;

Described step S103 comprises:

By divide several independently data block be stored in the distributed file system of cloud platform according to the business-subject that data block is corresponding.

3. a kind of mass data storage access method as claimed in claim 2, it is characterized in that, described step S103 also comprises:

By described independently data block corresponding stored on the back end of distributed file system, there is one-to-one relationship in described data block and back end.

4. a kind of mass data storage access method as claimed in claim 3, it is characterized in that, described step S103 also comprises:

According to the time, the data block in each back end is divided, form several sub-blocks, and the sub-block after dividing is stored in the tables of data of corresponding data node.

5. a kind of mass data storage access method as claimed in claim 4, it is characterized in that, described step S104 comprises:

For different data access request, the business-subject corresponding according to data access request, searches the back end corresponding with data access request in distributed file system;

The time corresponding according to data access request, in corresponding back end, search the tables of data corresponding with data access request, in tables of data, search the data of mating with data access request.

6. a mass data storage access system, is characterized in that, comprising:

7. a kind of mass data storage access system as claimed in claim 6, is characterized in that, described first divide module, for the source data of acquisition is divided into several independently data block specifically comprise:

Described data memory module be used for by divide several independently data block be stored in the distributed file system of cloud platform and specifically comprise:

8. a kind of mass data storage access system as claimed in claim 7, is characterized in that, described data memory module also for:

9. a kind of mass data storage access system as claimed in claim 8, it is characterized in that, described data memory module also comprises:

Second divides module, for dividing the data block in each back end according to the time, forms several sub-blocks;

Sub-block after division is stored in the tables of data of corresponding data node by data memory module.

10. a kind of mass data storage access system as claimed in claim 9, is characterized in that, described Data access module, and for for different data access request, the data block in concurrent access distributed file system specifically comprises: