CN104572862A - Mass data storage access method and system - Google Patents

Mass data storage access method and system Download PDF

Info

Publication number
CN104572862A
CN104572862A CN201410796023.8A CN201410796023A CN104572862A CN 104572862 A CN104572862 A CN 104572862A CN 201410796023 A CN201410796023 A CN 201410796023A CN 104572862 A CN104572862 A CN 104572862A
Authority
CN
China
Prior art keywords
data
block
access request
file system
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410796023.8A
Other languages
Chinese (zh)
Inventor
阳珍秀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201410796023.8A priority Critical patent/CN104572862A/en
Publication of CN104572862A publication Critical patent/CN104572862A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a mass data storage access method and system, wherein the method comprises the following steps that source data from a data source file is obtained; the obtained source data is divided into a plurality of independent data blocks; the divided independent data blocks are stored in a distributive file system of a cloud platform; according to different data access requests, the parallel access to the data blocks of the distributive file system is realized. Through the method provided by the invention, mass data can be divided into a plurality of data blocks to be stored into the distributive file system, during the access, the parallel access to the divided data blocks can be realized, the storage and access efficiency of the mass data is improved, and meanwhile, the influence caused by too great data volume on the performance of the storage equipment is also avoided.

Description

A kind of mass data storage access method and system
Technical field
The present invention relates to data memory access technical field, be specifically related to a kind of mass data storage access method and system.
Background technology
The data of existing Large Scale Application Software System to the management object collection that it is applied store and query analysis, and usually, the data volume storing and inquire about efficiency that is all very huge so that inquiry declines.At present, for this problem, can only shorten, simplify SQL query statement, thus improve the business logic processing complexity of data.Profit in this way, process mass data time, the problem that the processing time is long, treatment effeciency is lower, processing speed is slower.
Summary of the invention
Technical matters to be solved by this invention is to provide a kind of mass data storage access method and system, can improve the efficiency of mass data storage access.
The technical scheme that the present invention solves the problems of the technologies described above is as follows:
According to one aspect of the present invention, provide a kind of mass data storage access method, comprising:
Obtain the source data from data source file;
The source data of acquisition is divided into several independently data blocks;
By divide several independently data block be stored in the distributed file system of cloud platform;
For different data access request, the data block in concurrent access distributed file system.
According to another aspect of the present invention, provide a kind of mass data storage access system, comprising:
Source data acquisition module, for obtaining the source data from data source file;
First divides module, for the source data of acquisition is divided into several independently data blocks;
Data memory module, for by divide several independently data block be stored in the distributed file system of cloud platform;
Data access module, for for different data access request, the data block in concurrent access distributed file system.
A kind of mass data storage access method provided by the invention and system, source data is divided into several independently databases, and the dispersion of independently data block is stored in the distributed file system of cloud platform, when needs data access, can data block in concurrent access distributed file system, improve the efficiency of mass data storage access, it also avoid the much performances to memory device of factor data amount and impact.
Accompanying drawing explanation
Fig. 1 is a kind of mass data storage access method process flow diagram of the embodiment of the present invention one;
Fig. 2 is a kind of mass data storage access system schematic diagram of the embodiment of the present invention two.
Embodiment
Be described principle of the present invention and feature below in conjunction with accompanying drawing, example, only for explaining the present invention, is not intended to limit scope of the present invention.
Embodiment one, a kind of mass data storage access method.The method provided below in conjunction with Fig. 1 double embodiment is described in detail.
In Fig. 1, S101, obtain source data from data source file.
Concrete; in operation flow; usually a large amount of business datums (producing with the form of data source file) can be produced; even can be described as the business datum of magnanimity; if the business datum of these magnanimity is stored in some memory devices; certainly will will be very high to the storage space of this memory device requirement, simultaneously also very high to the performance requirement of this memory device.Therefore the present embodiment provides a kind of method of distributed storage mass data, before storing, first needs to obtain business datum, namely obtains the source data in data source file.Because the Data Data amount in data source file is very large, therefore the parallel method obtaining data can be adopted, adopt many to gather link and the data in data source file are gathered, improve the efficiency of data acquisition.
S102, the source data of acquisition is divided into several independently data blocks.
Concrete, the parallel mode obtaining data of step S101 employing gets the data in data source file, and this step divides the data obtained, and is several independently data blocks by Data Placement.Wherein, the data that the data obtained from data source file may produce for multiple operation flow, therefore the whole data obtained can be divided according to business-subject, a corresponding business-subject of data block, namely there is relation one to one in data block and business-subject, such as, data block corresponding for telecom charging business is divided into an independently data block.These independently data block be logically connected, be all independently physically.
S103, by divide several independently data block be stored in the distributed file system of cloud platform.
Concrete, after the source data of acquisition is divided into multiple independently data block according to different business-subjects, the independently data block dispersion these divided is stored in the distributed file system of cloud platform.Wherein, there are several back end in distributed file system, the independently data block divided is stored on these back end of distributed file system, a data block is stored on a back end, i.e. data block and back end one_to_one corresponding, because data block and business-subject exist corresponding relation, therefore all there is corresponding relation between back end and data block and business-subject.
In addition, after the independently data block divided is stored in the back end of the distributed file system of cloud platform, the time (this time can obtain when obtaining source data from data source file) that the data block being stored in back end produces according to the data in data block being divided again, data block being divided into more tiny sub-block.During concrete enforcement, first data block can be divided per year, and then the sub-block divided per year is monthly divided, be divided into more careful sub-block.If the data volume of the sub-block monthly divided is also very large, then the sub-block monthly divided daily can be divided again, form the sub-block with level, i.e. the sub-block of tree structure.
After data block being divided into the sub-block of tree structure, be stored in respectively by tree-like sub-block in the tables of data in corresponding back end, wherein, corresponding with the sub-block of tree structure, the structure of tables of data is also tree structure.
S104, for different data access request, the data block in concurrent access distributed file system.
Concrete, when client needs visit data, send data access request to cloud platform, in data access request, carry the business-subject of required visit data and the generation time of required visit data.
When cloud platform receives the data access request of client transmission, first according to the business-subject of visit data required for carrying in data access request, the back end (when data store, back end and business-subject one_to_one corresponding) corresponding with this business-subject is searched in the distributed file system of cloud platform.
After finding the back end mated with data access request, in back end, corresponding tables of data is searched again according to the generation time of visit data required in data access request, after finding tables of data, in concrete tables of data, search the data of mating with data access request.Structure due to tables of data is tree structure, has hierarchical relationship, therefore in the process of searching, can searching according to time one-level one-level, and until search the data of mating with data access request.Searching like this according to hierarchical relationship one-level one-level, compares and search concrete data from the data of magnanimity, more regularly follows, and has more purpose, improves the efficiency of data search.
Embodiment two, a kind of mass data storage access system.Below in conjunction with Fig. 2, the system that the present embodiment provides is described in detail.
In Fig. 2, the system that the present embodiment provides comprises source data acquisition module 201, first and divides module 202, data memory module 203 and Data access module 204.Wherein, data memory module 203 comprises the second division module 2031.
Source data acquisition module 201 is mainly used in obtaining the source data from data source file.
Concrete, because the Data Data amount in data source file is very large, therefore source data acquisition module 201 can adopt the parallel method obtaining data, adopts many to gather link and gathers the data in data source file, improve the efficiency of data acquisition.
First divides module 202 is mainly used in the source data of acquisition to be divided into several independently data blocks.
Concrete, the parallel mode obtaining data of source data acquisition module 201 employing gets the data in data source file, and first divides module 202 divides the data obtained, and is several independently data blocks by Data Placement.Wherein, the data that the data obtained from data source file may produce for multiple operation flow, therefore the whole data obtained can divide according to business-subject by the first division module 202, a corresponding business-subject of data block, namely there is relation one to one in data block and business-subject, such as, data block corresponding for telecom charging business is divided into an independently data block.These independently data block be logically connected, be all independently physically.
Data memory module 203 be mainly used in by divide several independently data block be stored in the distributed file system of cloud platform.
Concrete, after the source data of acquisition is divided into multiple independently data block according to different business-subjects by the first division module 202, the independently data block dispersion that these divide by data memory module 203 is stored in the distributed file system of cloud platform.Wherein, there are several back end in distributed file system, the independently data block divided is stored on these back end of distributed file system by data memory module 203, a data block is stored on a back end, i.e. data block and back end one_to_one corresponding, because data block and business-subject exist corresponding relation, therefore all there is corresponding relation between back end and data block and business-subject.
Data memory module 2031 also comprises the second division module 2013, divides, form several sub-blocks specifically for the time produced according to data to the data block in each back end; Sub-block after division is stored in the tables of data of corresponding data node by data memory module 203.
Concrete, second divides module 2031 divided again by the time (this time can obtain when obtaining source data from data source file) that the data block being stored in back end produces according to the data in data block, data block was divided into more tiny sub-block.During concrete enforcement, second divides module 2031 can first divide data block per year, and then is monthly divided by the sub-block divided per year, is divided into more careful sub-block.If the data volume of the sub-block monthly divided is also very large, then the sub-block monthly divided daily can be divided again, form the sub-block with level, i.e. the sub-block of tree structure.
After data block is divided into the sub-block of tree structure by the second division module 2031, tree-like sub-block is stored in respectively in the tables of data in corresponding back end, wherein, corresponding with the sub-block of tree structure, the structure of tables of data is also tree structure.
Data access module 204 is mainly used in for different data access request, the data block in concurrent access distributed file system.
Concrete, when client needs visit data, send data access request to cloud platform, in data access request, carry the business-subject of required visit data and the generation time of required visit data.
When cloud platform receives the data access request of client transmission, Data access module 204 in cloud platform is according to the business-subject of visit data required for carrying in request of access, the back end (when data store, back end and business-subject one_to_one corresponding) corresponding with this business-subject is searched in the distributed file system of cloud platform.
After Data access module 204 finds the back end mated with data access request, in back end, corresponding tables of data is searched again according to the generation time of visit data required in data access request, after finding tables of data, in concrete tables of data, search the data of mating with data access request.Structure due to tables of data is tree structure, has hierarchical relationship, therefore in the process of searching, can searching according to time one-level one-level, and until search the data of mating with data access request.
A kind of mass data storage access method provided by the invention and system, the mass data of acquisition is divided into some independently data blocks according to business-subject, independently data block dispersion after dividing is stored in the back end of the distributed file system of cloud platform, for the data access request that client is different, can data in concurrent access distributed file system, improve the memory access efficiency of the large data of magnanimity.In addition, the data block being stored in Distributed File System Data node is further subdivided into several sub-blocks according to the time that data produce, and the sub-block of division is stored in the tables of data of corresponding data node, form the data store organisation of tree structure, when visit data, the back end of distributed file system is first navigated to according to the business-subject in data access request, the generation time of the data then will accessed according to data access request, navigate to tables of data concrete in back end, the data of mating with data access request are searched from concrete tables of data, in the process of data search, searching of one-level one-level, until search the data of mating with data access request.Searching like this according to hierarchical relationship one-level one-level, compares and search concrete data from the data of magnanimity, more regularly follows, and has more purpose, improves the efficiency of data search.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. a mass data storage access method, is characterized in that, comprising:
Step S101, obtain source data from data source file;
Step S102, the source data of acquisition is divided into several independently data blocks;
Step S103, by divide several independently data block be stored in the distributed file system of cloud platform;
Step S104, for different data access request, the data block in concurrent access distributed file system.
2. a kind of mass data storage access method as claimed in claim 1, it is characterized in that, described step S102 comprises:
The source data of acquisition is divided into several independently data blocks by the business-subject corresponding according to source data, described data block and business-subject one_to_one corresponding;
Described step S103 comprises:
By divide several independently data block be stored in the distributed file system of cloud platform according to the business-subject that data block is corresponding.
3. a kind of mass data storage access method as claimed in claim 2, it is characterized in that, described step S103 also comprises:
By described independently data block corresponding stored on the back end of distributed file system, there is one-to-one relationship in described data block and back end.
4. a kind of mass data storage access method as claimed in claim 3, it is characterized in that, described step S103 also comprises:
According to the time, the data block in each back end is divided, form several sub-blocks, and the sub-block after dividing is stored in the tables of data of corresponding data node.
5. a kind of mass data storage access method as claimed in claim 4, it is characterized in that, described step S104 comprises:
For different data access request, the business-subject corresponding according to data access request, searches the back end corresponding with data access request in distributed file system;
The time corresponding according to data access request, in corresponding back end, search the tables of data corresponding with data access request, in tables of data, search the data of mating with data access request.
6. a mass data storage access system, is characterized in that, comprising:
Source data acquisition module, for obtaining the source data from data source file;
First divides module, for the source data of acquisition is divided into several independently data blocks;
Data memory module, for by divide several independently data block be stored in the distributed file system of cloud platform;
Data access module, for for different data access request, the data block in concurrent access distributed file system.
7. a kind of mass data storage access system as claimed in claim 6, is characterized in that, described first divide module, for the source data of acquisition is divided into several independently data block specifically comprise:
The source data of acquisition is divided into several independently data blocks by the business-subject corresponding according to source data, described data block and business-subject one_to_one corresponding;
Described data memory module be used for by divide several independently data block be stored in the distributed file system of cloud platform and specifically comprise:
By divide several independently data block be stored in the distributed file system of cloud platform according to the business-subject that data block is corresponding.
8. a kind of mass data storage access system as claimed in claim 7, is characterized in that, described data memory module also for:
By described independently data block corresponding stored on the back end of distributed file system, there is one-to-one relationship in described data block and back end.
9. a kind of mass data storage access system as claimed in claim 8, it is characterized in that, described data memory module also comprises:
Second divides module, for dividing the data block in each back end according to the time, forms several sub-blocks;
Sub-block after division is stored in the tables of data of corresponding data node by data memory module.
10. a kind of mass data storage access system as claimed in claim 9, is characterized in that, described Data access module, and for for different data access request, the data block in concurrent access distributed file system specifically comprises:
For different data access request, the business-subject corresponding according to data access request, searches the back end corresponding with data access request in distributed file system;
The time corresponding according to data access request, in corresponding back end, search the tables of data corresponding with data access request, in tables of data, search the data of mating with data access request.
CN201410796023.8A 2014-12-19 2014-12-19 Mass data storage access method and system Pending CN104572862A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410796023.8A CN104572862A (en) 2014-12-19 2014-12-19 Mass data storage access method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410796023.8A CN104572862A (en) 2014-12-19 2014-12-19 Mass data storage access method and system

Publications (1)

Publication Number Publication Date
CN104572862A true CN104572862A (en) 2015-04-29

Family

ID=53088924

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410796023.8A Pending CN104572862A (en) 2014-12-19 2014-12-19 Mass data storage access method and system

Country Status (1)

Country Link
CN (1) CN104572862A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105100806A (en) * 2015-07-23 2015-11-25 柳州龙辉科技有限公司 Compression and issuing method of super-large scale audio and video data
CN105843955A (en) * 2016-04-13 2016-08-10 曙光信息产业(北京)有限公司 Data migration system
CN106202261A (en) * 2016-06-29 2016-12-07 浪潮(北京)电子信息产业有限公司 The distributed approach of a kind of data access request and engine
CN106372256A (en) * 2016-09-30 2017-02-01 浙江大学 Distributed storage method for massive Argo data
WO2017092384A1 (en) * 2015-12-01 2017-06-08 深圳市华讯方舟软件技术有限公司 Clustered database distributed storage method and device
CN112069148A (en) * 2019-06-10 2020-12-11 贵阳海信网络科技有限公司 Scenic spot data access method and device
CN115617763A (en) * 2022-09-23 2023-01-17 中电金信软件有限公司 Data processing method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1858735A (en) * 2005-12-30 2006-11-08 华为技术有限公司 Method for processing mass data
CN101996250A (en) * 2010-11-15 2011-03-30 中国科学院计算技术研究所 Hadoop-based mass stream data storage and query method and system
CN102521383A (en) * 2011-12-22 2012-06-27 南京烽火星空通信发展有限公司 Method for storing and accessing mass files in distributed system
CN102906751A (en) * 2012-07-25 2013-01-30 华为技术有限公司 Method and device for data storage and data query
CN103514205A (en) * 2012-06-27 2014-01-15 中国电信股份有限公司 Mass data processing method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1858735A (en) * 2005-12-30 2006-11-08 华为技术有限公司 Method for processing mass data
CN101996250A (en) * 2010-11-15 2011-03-30 中国科学院计算技术研究所 Hadoop-based mass stream data storage and query method and system
CN102521383A (en) * 2011-12-22 2012-06-27 南京烽火星空通信发展有限公司 Method for storing and accessing mass files in distributed system
CN103514205A (en) * 2012-06-27 2014-01-15 中国电信股份有限公司 Mass data processing method and system
CN102906751A (en) * 2012-07-25 2013-01-30 华为技术有限公司 Method and device for data storage and data query

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105100806A (en) * 2015-07-23 2015-11-25 柳州龙辉科技有限公司 Compression and issuing method of super-large scale audio and video data
WO2017092384A1 (en) * 2015-12-01 2017-06-08 深圳市华讯方舟软件技术有限公司 Clustered database distributed storage method and device
CN105843955A (en) * 2016-04-13 2016-08-10 曙光信息产业(北京)有限公司 Data migration system
CN106202261A (en) * 2016-06-29 2016-12-07 浪潮(北京)电子信息产业有限公司 The distributed approach of a kind of data access request and engine
CN106372256A (en) * 2016-09-30 2017-02-01 浙江大学 Distributed storage method for massive Argo data
CN112069148A (en) * 2019-06-10 2020-12-11 贵阳海信网络科技有限公司 Scenic spot data access method and device
CN115617763A (en) * 2022-09-23 2023-01-17 中电金信软件有限公司 Data processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN104572862A (en) Mass data storage access method and system
CN105786808B (en) A kind of method and apparatus for distributed execution relationship type computations
CN105354151B (en) Cache management method and equipment
CN105224546B (en) Data storage and query method and equipment
CN103218404B (en) A kind of multi-dimensional metadata management method based on associate feature and system
CN102307206B (en) Caching system and caching method for rapidly accessing virtual machine images based on cloud storage
CN111258978A (en) Data storage method
CN103544261B (en) A kind of magnanimity structuring daily record data global index's management method and device
CN107704202B (en) Method and device for quickly reading and writing data
CN103914483B (en) File memory method, device and file reading, device
CN102402602A (en) B+ tree indexing method and device of real-time database
CN104657387B (en) A kind of data query method and device
CN105117171A (en) Energy SCADA massive data distributed processing system and method thereof
CN105072160A (en) Serial number generating method and device, and a server
CN103823846A (en) Method for storing and querying big data on basis of graph theories
CN103927331A (en) Data querying method, data querying device and data querying system
CN104408159A (en) Data correlating, loading and querying method and device
CN104539750A (en) IP locating method and device
CN104714974A (en) Method and device for parsing and reprocessing query statement
CN104268298A (en) Method for creating database index and inquiring data
CN103092886B (en) A kind of implementation method of data query operation, Apparatus and system
CN104484392A (en) Method and device for generating database query statement
KR101955376B1 (en) Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method
CN104699815A (en) Data processing method and system
CN103559272A (en) Method and device for importing data into dimension table

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150429