CN102801784A - Distributed type data storing method and equipment - Google Patents

Distributed type data storing method and equipment Download PDF

Info

Publication number
CN102801784A
CN102801784A CN2012102274257A CN201210227425A CN102801784A CN 102801784 A CN102801784 A CN 102801784A CN 2012102274257 A CN2012102274257 A CN 2012102274257A CN 201210227425 A CN201210227425 A CN 201210227425A CN 102801784 A CN102801784 A CN 102801784A
Authority
CN
China
Prior art keywords
data
burst
storage
information
size
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012102274257A
Other languages
Chinese (zh)
Other versions
CN102801784B (en
Inventor
王兴勇
韩霆军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201210227425.7A priority Critical patent/CN102801784B/en
Publication of CN102801784A publication Critical patent/CN102801784A/en
Application granted granted Critical
Publication of CN102801784B publication Critical patent/CN102801784B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a distributed type data storing method and equipment. The method comprises the following steps of: calling a data writing interface provided by a client and acquiring a data accessing handle through the data writing interface; acquiring size information of data needing to be stored; acquiring the size information of the stored data; generating fragmentation information according to a first pre-set algorithm according to the size information of the data needing to be stored and the size information of the stored data; generating an only fragmentation mark for each fragmentation according to the generated fragmentation information and a fragmentation mark generating rule; and selecting a corresponding data storage node for each fragmentation according to the fragmentation mark and according to a second pre-set algorithm to carry out data storage. With the adoption of the method or the equipment disclosed by the invention, the storage efficiency and the accessing efficiency of distributed type data can be improved, and the fragmentation information does not need to be stored, so that the load of a server is reduced and the hardware cost is reduced.

Description

A kind of Distributed Storage method and apparatus
Technical field
The present invention relates to field of data storage, particularly relate to a kind of Distributed Storage method and apparatus.
Background technology
Traditional network store system adopts the storage server of concentrating to deposit all data, and storage server becomes the bottleneck of systematic function, can not satisfy the needs that extensive storage is used.Therefore, available technology adopting Distributed Storage method storage data.
Distributed Storage disperses data to be stored on the equipment of many platform independent exactly.The distributed network storage system adopts extendible system configuration, utilizes many storage servers to share the storage load, utilizes location server location stored information.It has not only improved reliability, availability and the access efficiency of system, also is easy to expansion.
Distributed Storage method of the prior art, its principle mainly is: for some files, this document is divided into a plurality of bursts, with the burst information of this document with the stored in form of metadata in server.Burst information can comprise the number of burst, title, memory location etc.When reading of data, client need be sent the request of obtaining burst information to the server of storing metadata, according to the response that the server of storing metadata returns, confirms the burst that need read again.According to the positional information of burst, the actual storage node to each burst place reads actual file data at last.
Because a file is divided into a lot of bursts usually, therefore, adopts Distributed Storage method of the prior art; When reading of data; Need repeatedly to send the request of obtaining burst information, when the storage data, also need repeat the preservation process of burst information at meta data server.Because network has time delay, and the server response also need expend the regular hour, causes Distributed Storage method of the prior art when reading of data so repeatedly obtain the process of burst information, inefficiency.In addition, owing to need to preserve burst information, so also increased load of server, hardware cost is higher.
Summary of the invention
The embodiment of the invention provides a kind of Distributed Storage method and apparatus, can improve the storage and the reading efficiency of distributed data, and can not need preserve burst information, reduces load of server, reduces hardware cost.
For realizing above-mentioned purpose, the embodiment of the invention provides following scheme:
On the one hand, the embodiment of the invention provides a kind of Distributed Storage method, comprising: call the write data interface that client provides, obtain the data access handle through the write data interface; Through said data access handle, obtain the size information of the data that need storage; Obtain the size information of the data of having stored, the storage size of the data occupancy that the size information of said data of having stored is represented to have stored in the file that data belonged to of said needs storage; According to the size information of the data of said needs storage and the size information of said data of having stored; Generate burst information according to first preset algorithm, said burst information comprises the storage space information that data that burst number information that the data of said needs storage need take and each burst are stored take at least in burst; According to said burst information, generate unique segmental identification for each burst according to the segmental identification create-rule; According to said segmental identification,, select corresponding data memory node for each burst and carry out storage according to second preset algorithm.
On the other hand, the embodiment of the invention provides another kind of distributed data read method, comprising: call the read data interface that client provides, obtain the data access handle through said read data interface; Through said data access handle, obtain the size information of the data that need read; Obtain the size information of the data that read, the storage size of the data occupancy that the size information of the said data that read is represented to have read in the file that data belonged to that said needs read; The size information of the data that read according to said needs and the size information of the said data that read; Generate burst information according to first preset algorithm, said burst information comprises the storage space information that the burst number information of the data occupancy that said needs read and the data that each burst is stored take at least in burst; According to said burst information, generate unique segmental identification for each burst according to the segmental identification create-rule; According to said segmental identification,, select corresponding data memory node to carry out data for each burst and read according to second preset algorithm.
On the other hand, the embodiment of the invention provides a kind of Distributed Storage equipment, comprising:
Write data interface interchange module is used to call the write data interface that client provides, and obtains the data access handle through the write data interface; The size information acquisition module of data is used for through said data access handle, obtains the size information of the data that need storage; The size information of said data is sent to the burst information generation module; The storage space information acquisition module is used to obtain the size information of the data of having stored, and the size information of said data of having stored is sent to the burst information generation module; The storage size of the data occupancy that the size information of said data of having stored is represented to have stored in the file that data belonged to of said needs storage; The burst information generation module; Be used to receive size information and the size information of the data of having stored that said storage space information acquisition module sends of the data of the needs storage that the size information acquisition module of said data sends; According to the size information of the data of said needs storage and the size information of said data of having stored; Generate burst information according to first preset algorithm; Said burst information is sent to the segmental identification generation module, and said burst information comprises the storage space information that the data of burst number information that the data of said needs storage need take and the storage of each burst take at least in burst; The segmental identification generation module is used to receive the said burst information that said burst information generation module sends, and according to said burst information, for each burst generates unique segmental identification, said segmental identification is sent to data memory module; Data memory module is used to receive the segmental identification that said segmental identification generation module sends, and according to said segmental identification, according to second preset algorithm, selects corresponding data memory node for each burst and carries out storage.
On the other hand, the embodiment of the invention provides another kind of distributed data fetch equipment, comprising:
Read data interface interchange module is used to call the read data interface that client provides, and obtains the data access handle through said read data interface; The size information acquisition module of data is used for obtaining the size information of the data that need read through said data access handle; The size information of said data is sent to the burst information generation module; The storage space information acquisition module is used to obtain the size information of the data that read, and the size information of the said data that read is sent to the burst information generation module; The storage size of the data occupancy that the size information of the said data that read is represented to have read in the file that data belonged to that said needs read; The burst information generation module; Be used to receive the size information of the data that needs that the size information acquisition module of said data sends read and the size information of the data that read that said storage space information acquisition module sends; The size information of the data that read according to said needs and the size information of the said data that read; Generate burst information according to first preset algorithm; Said burst information is sent to the segmental identification generation module, and said burst information comprises the storage space information that the burst number information of the data occupancy that said needs read and the data that each burst is stored take at least in burst; The segmental identification generation module; Be used to receive the said burst information that said burst information generation module sends; According to said burst information, for each burst generates unique segmental identification, said segmental identification is sent to data memory module according to the segmental identification create-rule; Data read module is used to receive the segmental identification that said segmental identification generation module sends, and according to said segmental identification, according to second preset algorithm, selects corresponding data memory node to carry out data for each burst and reads.
The Distributed Storage method and apparatus of the embodiment of the invention through adopting first preset algorithm, calculates the burst information of the data of storage; But burst information is not preserved; But the data of directly according to the burst information that calculates needs being stored are stored to corresponding data memory node according to second preset algorithm, owing to need not preserve burst information, so can reduce load of server; Reduce hardware cost; And,, can adopt identical algorithm computation to obtain burst information carrying out data when reading; And do not need repeatedly to send the request of obtaining burst information to meta data server, can improve the storage and the reading efficiency of distributed system greatly.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art; To do to introduce simply to the accompanying drawing of required use among the embodiment below; Obviously, the accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills; Under the prerequisite of not paying creative work property, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the flow chart of Distributed Storage method embodiment one of the present invention;
Fig. 2 is for generating the flow chart of burst information among the present invention according to first preset algorithm;
Fig. 3 is the flow chart of Distributed Storage method embodiment two of the present invention;
Fig. 4 is the flow chart of Distributed Storage method embodiment three of the present invention;
Fig. 5 is a distributed data read method flow chart of the present invention;
Fig. 6 is the structure chart of Distributed Storage apparatus embodiments one of the present invention;
Fig. 7 is the structure chart of burst information generation module of the present invention;
Fig. 8 is the structure chart of Distributed Storage apparatus embodiments two of the present invention;
Fig. 9 is a distributed data fetch equipment structure chart of the present invention.
Embodiment
To combine the accompanying drawing in the embodiment of the invention below, the technical scheme in the embodiment of the invention is carried out clear, intactly description, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills are not making the every other embodiment that is obtained under the creative work prerequisite, all belong to the scope of the present invention's protection.
For make above-mentioned purpose of the present invention, feature and advantage can be more obviously understandable, below in conjunction with accompanying drawing and embodiment the present invention done further detailed explanation.
Fig. 1 is the flow chart of Distributed Storage method embodiment one of the present invention.As shown in Figure 1, this method can comprise:
Step S101: call the write data interface that client provides, obtain the data access handle through the write data interface;
A handle is meant a unique integer value of use, i.e. the long numerical value of nybble comes the different instance in different objects and the homogeneous object in the signage applications program, such as; A window, button, icon; Scroll bar, output equipment, control or file etc.Application program can be visited the information of corresponding object through handle, but handle is not a pointer, and program can not utilize handle to come the information in the direct reading file.
Step S102:, obtain the size information of the data that need storage through said data access handle; The data that need store belong to the part of the total data that needs files stored;
For a complete file, the data volume of himself is bigger usually, need this document be divided into a plurality of less data blocks and store.For example, for the file that a size is 100M, can store at twice from dividing the data block of two 50M in logic into.In the last example, need the size of data of storage is exactly 50M at every turn, and the total data that needs files stored is exactly 100M.The data block of 50M belongs to the part of the file of 100M.
Step S103: obtain the size information of the data that write, the storage size of the data occupancy that the size information of said data of having stored is represented to have stored in the file that data belonged to of said needs storage;
Because the storing process to some files possibly be divided into the storing process to a plurality of data blocks, so, before some data blocks are stored, the storage space information that the data block that has write before need obtaining takies.From in logic, the current logical storage location that needs data blocks stored need with last one the logical storage location of data blocks stored be associated, method comparatively commonly used is that to make both logical storage location be continuous.
Step S104: according to the size information of the data of said needs storage and the storage space information of the said data occupancy of having stored; Generate burst information according to first preset algorithm, said burst information comprises the storage space information that data that burst number information that the data of said needs storage need take and each burst are stored take at least in burst;
Burst is the storage cell for data block.For a data block, can be divided into a plurality of bursts and store.So, when a data block is carried out actual storage, need know that this data block need be divided into several bursts and store.And, need know the memory location (being the storage space information that the data of each burst storage take) of each burst in burst.For example, the memory space that burst has is 5000 bytes, and current data block is not first data block, and data block has before taken preceding 1000 bytes of this burst, and so, current data block just can take back 4000 bytes of this burst.
Step S105:, generate unique segmental identification for each burst according to the segmental identification create-rule according to said burst information;
The segmental identification create-rule can have multiple.For example, can generate segmental identification with the current mode that is just adding suffix at the filename of files stored.Such as current file A by name, then can be with first burst called after A-1 of this document, with second burst called after A-2 of this document, the rest may be inferred.Has uniqueness as long as guarantee the sign of each burst.
Step S106: according to said segmental identification,, select corresponding data memory node, carry out storage for each burst according to second preset algorithm.
In this step, can adopt method of the prior art, select corresponding data memory node for each burst according to segmental identification.Said data memory node is the server of actual storage data.Said second preset algorithm can be the distributed hash algorithm.
In sum, the Distributed Storage method of present embodiment is through adopting first preset algorithm; The burst information of data to storage is calculated, but burst information is not preserved, but directly according to the burst information that calculates the data of needs storage is stored to corresponding data memory node according to second preset algorithm; Owing to need not preserve burst information; So can reduce load of server, reduce hardware cost, and; Carrying out data when reading; Can adopt identical algorithm computation to obtain burst information, and not need repeatedly to send the request of obtaining burst information, can improve the storage and the reading efficiency of distributed system greatly to meta data server.
Fig. 2 is for generating the flow chart of burst information among the present invention according to first preset algorithm.As shown in Figure 2, this flow process comprises:
Step S201: calculate said burst number according to formula n=n2-n1, wherein n2=RoundDown ((offset+size+k)/k), n1=RoundDown (offset/k);
RoundDown representes to round downwards; The memory space of the data occupancy that offset representes to have stored in the file that data belonged to of needs storages; Size representes the size of data of said needs storage, and k representes the free memory of each burst, and n representes said burst number;
Step S202: calculate the skew of data in burst that first burst is stored according to formula o1=offset-n1*k, o1 representes the skew of data in burst of first burst storage;
The skew of data in burst of other burst storage is zero;
Step S203: the memory space that the data of storing according to first burst of formula s1=k-o1 calculating take in burst, wherein, s1 representes the memory space that the data of first burst storage take in burst;
Step S204: according to formula sn=size+offset-k* (n2-1), calculate the memory space that the data of last burst storage take in burst, wherein, sn representes the memory space that the data of last burst storage take in burst;
The memory space that the data of other burst storage take in burst is k.
Through an object lesson above-mentioned flow process is elaborated below.The memory space of the data occupancy of having stored in the file of supposing to need to store that data belonged to is 5000 (unit is byte in this example), i.e. offset=5000; Needing the size of data of storage is 5000, i.e. size=5000; The free memory of each burst is 4096, i.e. k=4096.Then:
n2=RoundDown((5000+5000+4096)/4096)=Round(3.44)=3
n1=RoundDown(5000/4096)=1
n=n2-n1=2
Promptly can calculate to such an extent that the burst number that needs is 2.
O1=5000-1*4096=4, first burst writes data since the 4th byte.O2=0, second burst writes data since the 0th byte.
S1=4096-4=4092 representes that the memory space that the data of first burst storage take is 4092 in burst.
S2=5000+5000-4096* (3-1)=1808 representes that the memory space that the data of second burst storage take is 1808 in burst.
Fig. 3 is the flow chart of Distributed Storage method embodiment two of the present invention.As shown in Figure 3, this method comprises:
Step S301: call the write data interface that client provides, obtain the data access handle through the write data interface;
Step S302:, obtain the size information of the data that need storage through said data access handle; The data that need store belong to the part of the total data that needs files stored;
Step S303: obtain the size information of the data of having stored, the storage size of the data occupancy that the size information of said data of having stored is represented to have stored in the said file that needs data information stored to belong to;
Step S304: according to the size information of the data of said needs storage and the size information of said data of having stored; Generate burst information according to first preset algorithm, said burst information comprises the storage space information that data that burst number information that the data of said needs storage need take and each burst are stored take at least in burst;
Step S305:, generate unique segmental identification for each burst according to the segmental identification create-rule according to said burst information;
Step S306: according to said segmental identification, according to second preset algorithm, for each burst is selected corresponding data memory node;
Step S307: when the burst number that the data of storing when said needs need take is a plurality of, select a plurality of data memory node servers, send the request that writes simultaneously for a plurality of bursts.
In the present embodiment, can realize the concurrent storage of data through step S307.
Further, in order to realize the load balancing of server, step S307 can also comprise:
Obtain the load information of each data memory node server;
According to said load information; Select a plurality of data memory node servers of load less than pre-set threshold; Wherein, this threshold value can rule of thumb be worth the perhaps performance requirement flexible configuration of distributed system by those skilled in the art, does not do special qualification here.
In addition, when said method is applied to have the system of a plurality of data memory node servers, can adopt DHT or P2P mode to carry out the distributed type assemblies management to said a plurality of data memory node servers.
Fig. 4 is the flow chart of Distributed Storage method embodiment three of the present invention.As shown in Figure 4, this method comprises:
Step S401: call the write data interface that client provides, obtain the data access handle through the write data interface;
Step S402:, obtain the size information of the data that need storage through said data access handle; The data that need store belong to the part of the total data that needs files stored;
Step S403: obtain the size information of the data of having stored, the storage size of the data occupancy that the size information of said data of having stored is represented to have stored in the said file that needs data information stored to belong to;
Step S404: according to the size information of the data of said needs storage and the size information of said data of having stored; Generate burst information according to first preset algorithm, said burst information comprises the storage space information that data that burst number information that the data of said needs storage need take and each burst are stored take at least in burst;
Step S405:, generate unique segmental identification for each burst according to the segmental identification create-rule according to said burst information;
Step S406: according to said segmental identification, according to second preset algorithm, for each burst is selected corresponding data memory node;
Step S407: when the burst number that the data of storing when said needs need take is a plurality of, select a plurality of data memory node servers, send the request that writes simultaneously for a plurality of bursts.
Step S408: judge whether the burst that the said data that need store need take all writes success; If, execution in step S409; Otherwise, execution in step S410.
S409: return the final byte number that writes to application layer;
S410: return the byte number that begins successfully to write continuously from the memory space of the data occupancy that write to application layer.
In the present embodiment, even certain burst storage failure in the data storage procedure still can be carried out follow-up storing process.
The invention also discloses a kind of distributed data read method.Fig. 5 is a distributed data read method flow chart of the present invention.As shown in Figure 5, this method comprises:
Step S501: call the read data interface that client provides, obtain the data access handle through the write data interface;
Step S502:, obtain the size information of the data that need read through said data access handle; The data that need read belong to the part of the total data of the file that need read;
Step S503: obtain the size information of the data that read, the storage size of the data occupancy that the size information of the said data that read is represented to have read in the file that data belonged to that said needs read;
Step S504: the size information of the data that read according to said needs and the size information of the said data that read; Generate burst information according to first preset algorithm, said burst information comprises the storage space information that the data of burst number information that file data that said needs read takies and the storage of each burst take at least in burst;
Step S505:, generate unique segmental identification for each burst according to the segmental identification create-rule according to said burst information;
Step S506: according to said segmental identification,, select corresponding data memory node, carry out data and read for each burst according to second preset algorithm.
The distributed data read method of present embodiment, through adopt with previous embodiment in the first identical preset algorithm of Distributed Storage method, the file that can read needs; Calculate burst information voluntarily, and the burst information of said burst information during with Distributed Storage is identical; And then obtain identical segmental identification; Adopt the second identical preset algorithm according to this segmental identification again, just can calculate the data actual storage data memory node, and then reading of data.Because the distributed data read method of present embodiment needn't send the request of obtaining burst information to meta data server, and then can avoid server to respond the time-delay that is caused, can improve the file reading efficiency of distributed system greatly.
The distributed data read method of present embodiment, the generative process of burst information is similar with Fig. 2, repeats no more at this.
The invention also discloses a kind of Distributed Storage equipment.Fig. 6 is the structure chart of Distributed Storage apparatus embodiments one of the present invention.As shown in Figure 6, this equipment comprises:
Write data interface interchange module 601 is used to call the write data interface that client provides, and the data access handle is provided;
The size information acquisition module 602 of file data is used for through said data access handle, obtains the size information of the data that need storage; The data that need store belong to the part of the total data that needs files stored;
Storage space information acquisition module 603 is used to obtain the size information of the data of having stored, the storage size of the data occupancy that the size information of said data of having stored is represented to have stored in the said file that needs data information stored to belong to;
Burst information generation module 604; Be used for according to the size information of the data of said needs storage and the size information of said data of having stored; Generate burst information according to first preset algorithm, said burst information comprises the storage space information that data that burst number information that the data of said needs storage need take and each burst are stored take at least in burst;
Segmental identification generation module 605 is used for according to said burst information, for each burst generates unique segmental identification;
Data memory module 606 is used for according to said segmental identification, according to second preset algorithm, selects corresponding data memory node for each burst, carries out storage.
Fig. 7 is the structure chart of burst information generation module of the present invention.As shown in Figure 7, said burst information generation module can comprise:
Burst number computing unit 701 is used for calculating said burst number according to formula n=n2-n1, wherein n2=RoundDown ((offset+size+k)/k), n1=RoundDown (offset/k);
RoundDown representes to round downwards, and the memory space of the data occupancy that offset representes to have write, size are represented the size of data of needs storage, and k representes the free memory of each burst, and n representes said burst number;
The first burst calculations of offset unit 702 is used for calculating the skew of the data of first burst storage at burst according to formula o1=offset-n1*k, and o1 representes the skew of data in burst of first burst storage;
The skew of data in burst of other burst storage is zero;
The first burst memory space computing unit 703 is used for calculating the memory space that the data of first burst storage take at burst according to formula s1=k-o1, and wherein, s1 representes the memory space that the data of first burst storage take in burst;
The second burst memory space computing unit 704; Be used for according to formula sn=size+offset-k* (n2-1); Calculate the memory space that the data of last burst storage take in burst, wherein, sn representes the memory space that the data of last burst storage take in burst;
The memory space that the data of other burst storage take in burst is k.
In addition, the data memory module in the foregoing description 606 can comprise:
Concurrent writing unit when to be used for burst number that the data when the storage of said needs need take be a plurality of, is selected a plurality of data memory node servers, sends the request that writes for a plurality of bursts simultaneously.
Said concurrent writing unit comprises:
Load information obtains subelement, is used to obtain the load information of each data memory node server;
Data memory node server chooser unit is used for according to said load information, selects the less a plurality of data memory node servers of load.
Fig. 8 is the structure chart of Distributed Storage apparatus embodiments two of the present invention.As shown in Figure 8, this equipment comprises:
Write data interface interchange module 601 is used to call the write data interface that client provides, and the data access handle is provided;
The size information acquisition module 602 of file data is used for through said data access handle, obtains the size information of the data that need storage; The data that need store belong to the part of the total data that needs files stored;
Storage space information acquisition module 603 is used to obtain the size information of the data of having stored, the storage size of the data occupancy that the size information of said data of having stored is represented to have stored in the said file that needs data information stored to belong to;
Burst information generation module 604; Be used for according to the size information of the data of said needs storage and the size information of said data of having stored; Generate burst information according to first preset algorithm, said burst information comprises the storage space information that data that burst number information that the data of said needs storage need take and each burst are stored take at least in burst;
Segmental identification generation module 605 is used for according to said burst information, for each burst generates unique segmental identification;
Data memory module 606 is used for according to said segmental identification, according to second preset algorithm, selects corresponding data memory node for each burst, carries out storage.
Write judge module 801 as a result, be used to judge whether the burst that the data of said needs storage need take all writes success;
First writes byte feedback module 802, be used for when said write as a result the judged result of judge module return the final byte number that writes to application layer when being;
Second writes byte feedback module 803, be used for when said write as a result the judged result of judge module return the byte number that begins successfully to write continuously from the memory space of the data occupancy that write to application layer for not the time.
The invention also discloses a kind of distributed data fetch equipment.Fig. 9 is a distributed data fetch equipment structure chart of the present invention.As shown in Figure 9, this equipment comprises:
Read data interface interchange module 901 is used to call the read data interface that client provides, and the data access handle is provided;
The size information acquisition module 902 of file data is used for obtaining the size information of the data that need read through said data access handle; The file data that need read belongs to the part of the total data of the file that need read;
Storage space information acquisition module 903 is used to obtain the size information of the data that read, the storage size of the data occupancy that the size information of the said data that read is represented to have read in the file that file data information that said needs read belonged to;
Burst information generation module 904; The size information of the data that are used for reading and the size information of the said data that read according to said needs; Generate burst information according to first preset algorithm, said burst information comprises the storage space information that the data of burst number information that file data that said needs read takies and the storage of each burst take at least in burst;
Segmental identification generation module 905 is used for according to said burst information, generates unique segmental identification according to the segmental identification create-rule for each burst;
Data read module 906 is used for according to said segmental identification, according to second preset algorithm, selects corresponding data memory node for each burst, carries out data and reads.
The distributed data fetch equipment of present embodiment, the structure of burst information generation module is similar with Fig. 7, repeats no more at this.
Each embodiment adopts the mode of going forward one by one to describe in this specification, and what each embodiment stressed all is and the difference of other embodiment that identical similar part is mutually referring to getting final product between each embodiment.For the disclosed equipment of embodiment, because it is corresponding with the embodiment disclosed method, so description is fairly simple, relevant part is partly explained referring to method and is got final product.
Used concrete example among this paper principle of the present invention and execution mode are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, part all can change on embodiment and range of application.In sum, this description should not be construed as limitation of the present invention.

Claims (13)

1. a Distributed Storage method is characterized in that, comprising:
Call the write data interface that client provides, obtain the data access handle through the write data interface;
Through said data access handle, obtain the size information of the data that need storage;
Obtain the size information of the data of having stored, the storage size of the data occupancy that the size information of said data of having stored is represented to have stored in the file that data belonged to of said needs storage;
According to the size information of the data of said needs storage and the size information of said data of having stored; Generate burst information according to first preset algorithm, said burst information comprises the storage space information that data that burst number information that the data of said needs storage need take and each burst are stored take at least in burst;
According to said burst information, generate unique segmental identification for each burst according to the segmental identification create-rule;
According to said segmental identification,, select corresponding data memory node for each burst and carry out storage according to second preset algorithm.
2. method according to claim 1 is characterized in that, and is said according to first preset algorithm generation burst information, comprising:
Calculate said burst number according to formula n=n2-n1, wherein n2=RoundDown ((offset+size+k)/k), n1=RoundDown (offset/k);
RoundDown representes to round downwards, and the memory space of the data occupancy that offset representes to have stored, size are represented the size of data of needs storage, and k representes the free memory of each burst, and n representes said burst number;
Calculate the skew of data in burst that first burst is stored according to formula o1=offset-n1*k, o1 representes the skew of data in burst of first burst storage;
The skew of data in burst of other burst storage is zero;
The memory space that the data of storing according to first burst of formula s1=k-o1 calculating take in burst, wherein, s1 representes the memory space that the data of first burst storage take in burst;
According to formula sn=size+offset-k* (n2-1), calculate the memory space that the data of last burst storage take in burst, wherein, sn representes the memory space that the data of last burst storage take in burst;
The memory space that the data of other burst storage take in burst is k.
3. according to claim 1 or 2 each described methods, it is characterized in that said data memory node for each burst selection correspondence carries out storage and comprises:
When the burst number that the data of storing when said needs need take is a plurality of, obtain the load information of each data memory node server; According to said load information, select a plurality of data memory node servers of load, to carry out the concurrent storage of data less than pre-set threshold.
4. according to claim 1 or 2 each described methods, it is characterized in that said second preset algorithm comprises: the distributed hash algorithm.
5. according to claim 1 or 2 each described methods, it is characterized in that said method also comprises:
Judge whether the burst that the said data that need store need take all writes success;
If then return the final byte number that writes to application layer;
Otherwise, return the byte number that begins successfully to write continuously from the memory space of the data occupancy that write to application layer.
6. a distributed data read method is characterized in that, comprising:
Call the read data interface that client provides, obtain the data access handle through said read data interface;
Through said data access handle, obtain the size information of the data that need read;
Obtain the size information of the data that read, the storage size of the data occupancy that the size information of the said data that read is represented to have read in the file that data belonged to that said needs read;
The size information of the data that read according to said needs and the size information of the said data that read; Generate burst information according to first preset algorithm, said burst information comprises the storage space information that the burst number information of the data occupancy that said needs read and the data that each burst is stored take at least in burst;
According to said burst information, generate unique segmental identification for each burst according to the segmental identification create-rule;
According to said segmental identification,, select corresponding data memory node to carry out data for each burst and read according to second preset algorithm.
7. method according to claim 6 is characterized in that, and is said according to first preset algorithm generation burst information, comprising:
Calculate said burst number according to formula n=n2-n1, wherein n2=RoundDown ((offset+size+k)/k), n1=RoundDown (offset/k);
RoundDown representes to round downwards, and the memory space of the data occupancy that offset representes to have read, size are represented the size of data that need read, and k representes the free memory of each burst, and n representes said burst number;
Calculate the skew of data in burst that first burst is stored according to formula o1=offset-n1*k, o1 representes the skew of data in burst of first burst storage;
The skew of data in burst of other burst storage is zero;
The memory space that the data of storing according to first burst of formula s1=k-o1 calculating take in burst, wherein, s1 representes the memory space that the data of first burst storage take in burst;
According to formula sn=size+offset-k* (n2-1), calculate the memory space that the data of last burst storage take in burst, wherein, sn representes the memory space that the data of last burst storage take in burst;
The memory space that the data of other burst storage take in burst is k.
8. a Distributed Storage equipment is characterized in that, comprising:
Write data interface interchange module is used to call the write data interface that client provides, and obtains the data access handle through the write data interface;
The size information acquisition module of data is used for through said data access handle, obtains the size information of the data that need storage; The size information of said data is sent to the burst information generation module;
The storage space information acquisition module is used to obtain the size information of the data of having stored, and the size information of said data of having stored is sent to the burst information generation module; The storage size of the data occupancy that the size information of said data of having stored is represented to have stored in the file that data belonged to of said needs storage;
The burst information generation module; Be used to receive size information and the size information of the data of having stored that said storage space information acquisition module sends of the data of the needs storage that the size information acquisition module of said data sends; According to the size information of the data of said needs storage and the size information of said data of having stored; Generate burst information according to first preset algorithm; Said burst information is sent to the segmental identification generation module, and said burst information comprises the storage space information that the data of burst number information that the data of said needs storage need take and the storage of each burst take at least in burst;
The segmental identification generation module is used to receive the said burst information that said burst information generation module sends, and according to said burst information, for each burst generates unique segmental identification, said segmental identification is sent to data memory module;
Data memory module is used to receive the segmental identification that said segmental identification generation module sends, and according to said segmental identification, according to second preset algorithm, selects corresponding data memory node for each burst and carries out storage.
9. equipment according to claim 8 is characterized in that, said burst information generation module comprises:
Burst number computing unit is used for calculating said burst number according to formula n=n2-n1, wherein n2=RoundDown ((offset+size+k)/k), n 1=RoundDown (offset/k);
RoundDown representes to round downwards, and the memory space of the data occupancy that offset representes to have write, size are represented the size of data that need write, and k representes the free memory of each burst, and n representes said burst number;
The first burst calculations of offset unit is used for calculating the skew of the data of first burst storage at burst according to formula o1=offset-n1*k, and o1 representes the skew of data in burst of first burst storage; And the skew of data in burst of other burst storage is set to zero;
The first burst memory space computing unit is used for calculating the memory space that the data of first burst storage take at burst according to formula s1=k-o1, and wherein, s1 representes the memory space that the data of first burst storage take in burst;
The second burst memory space computing unit; Be used for according to formula sn=size+offset-k* (n2-1); Calculate the memory space that the data of last burst storage take in burst, wherein, sn representes the memory space that the data of last burst storage take in burst; Wherein, the memory space that in burst, takies of the data of other burst storage is k.
10. according to Claim 8 or 9 described equipment, it is characterized in that said data memory module comprises:
Concurrent writing unit when to be used for burst number that the data when the storage of said needs need take be a plurality of, obtains the load information of each data memory node server; According to said load information, select a plurality of data memory node servers of load, to carry out the concurrent storage of data less than pre-set threshold.
11. according to Claim 8 or 9 described equipment, it is characterized in that said equipment also comprises:
Write judge module as a result, be used to judge whether the burst that the data of said needs storage need take all writes success;
First writes the byte feedback module, be used for when said write as a result the judged result of judge module return the final byte number that writes to application layer when being;
Second writes the byte feedback module, be used for when said write as a result the judged result of judge module return the byte number that begins successfully to write continuously from the memory space of the data occupancy that write to application layer for not the time.
12. a distributed data fetch equipment is characterized in that, comprising:
Read data interface interchange module is used to call the read data interface that client provides, and obtains the data access handle through said read data interface;
The size information acquisition module of data is used for obtaining the size information of the data that need read through said data access handle; The size information of said data is sent to the burst information generation module;
The storage space information acquisition module is used to obtain the size information of the data that read, and the size information of the said data that read is sent to the burst information generation module; The storage size of the data occupancy that the size information of the said data that read is represented to have read in the file that data belonged to that said needs read;
The burst information generation module; Be used to receive the size information of the data that needs that the size information acquisition module of said data sends read and the size information of the data that read that said storage space information acquisition module sends; The size information of the data that read according to said needs and the size information of the said data that read; Generate burst information according to first preset algorithm; Said burst information is sent to the segmental identification generation module, and said burst information comprises the storage space information that the burst number information of the data occupancy that said needs read and the data that each burst is stored take at least in burst;
The segmental identification generation module; Be used to receive the said burst information that said burst information generation module sends; According to said burst information, for each burst generates unique segmental identification, said segmental identification is sent to data memory module according to the segmental identification create-rule;
Data read module is used to receive the segmental identification that said segmental identification generation module sends, and according to said segmental identification, according to second preset algorithm, selects corresponding data memory node to carry out data for each burst and reads.
13. equipment according to claim 12 is characterized in that, said burst information generation module comprises:
Burst number computing unit is used for calculating said burst number according to formula n=n2-n1, wherein n2=RoundDown ((offset+size+k)/k), n1=RoundDown (offset/k);
RoundDown representes to round downwards, and the memory space of the data occupancy that offset representes to have read, size are represented the size of data that need read, and k representes the free memory of each burst, and n representes said burst number;
The first burst calculations of offset unit is used for calculating the skew of the data of first burst storage at burst according to formula o1=offset-n1*k, and o1 representes the skew of data in burst of first burst storage; And the skew of data in burst of other burst storage is set to zero;
The first burst memory space computing unit is used for calculating the memory space that the data of first burst storage take at burst according to formula s1=k-o1, and wherein, s1 representes the memory space that the data of first burst storage take in burst;
The second burst memory space computing unit; Be used for according to formula sn=size+offset-k* (n2-1); Calculate the memory space that the data of last burst storage take in burst, wherein, sn representes the memory space that the data of last burst storage take in burst; Wherein, the memory space that in burst, takies of the data of other burst storage is k.
CN201210227425.7A 2012-07-03 2012-07-03 A kind of distributed data storage method and equipment Active CN102801784B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210227425.7A CN102801784B (en) 2012-07-03 2012-07-03 A kind of distributed data storage method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210227425.7A CN102801784B (en) 2012-07-03 2012-07-03 A kind of distributed data storage method and equipment

Publications (2)

Publication Number Publication Date
CN102801784A true CN102801784A (en) 2012-11-28
CN102801784B CN102801784B (en) 2015-11-25

Family

ID=47200744

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210227425.7A Active CN102801784B (en) 2012-07-03 2012-07-03 A kind of distributed data storage method and equipment

Country Status (1)

Country Link
CN (1) CN102801784B (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103825945A (en) * 2014-02-25 2014-05-28 华为技术有限公司 Fragmentation storage method and user terminal
CN103873504A (en) * 2012-12-12 2014-06-18 鸿富锦精密工业(深圳)有限公司 System enabling data blocks to be stored in distributed server and method thereof
CN103927265A (en) * 2013-01-04 2014-07-16 深圳市龙视传媒有限公司 Content hierarchical storage device, content acquisition method and content acquisition device
WO2015100627A1 (en) * 2013-12-31 2015-07-09 华为技术有限公司 Data processing method and device in distributed file storage system
CN105159836A (en) * 2015-08-03 2015-12-16 联想(北京)有限公司 Information processing method and electronic device
CN105471930A (en) * 2014-07-28 2016-04-06 浙江大华技术股份有限公司 Method, system and equipment for reading distributed data
CN106557706A (en) * 2016-11-30 2017-04-05 广东欧珀移动通信有限公司 Date storage method, device and mobile terminal
CN104298574B (en) * 2014-09-16 2017-07-04 南京斯坦德云科技股份有限公司 A kind of data high-speed storage processing system
CN106980693A (en) * 2017-04-01 2017-07-25 广东浪潮大数据研究有限公司 Method and device that a kind of file is read
CN108845769A (en) * 2018-06-19 2018-11-20 郑州云海信息技术有限公司 A kind of method, apparatus and computer readable storage medium of the positioning of data rule
CN108846022A (en) * 2018-05-24 2018-11-20 沈阳东软医疗系统有限公司 File memory method, document conversion method, device, equipment and storage medium
CN109286477A (en) * 2018-10-30 2019-01-29 佛山市瑞德物联科技有限公司 A kind of data transmission method for supporting out-of-order transfer
WO2019033368A1 (en) * 2017-08-17 2019-02-21 深圳市优品壹电子有限公司 Data storage method and device
CN109412755A (en) * 2018-11-05 2019-03-01 东方网力科技股份有限公司 A kind of multimedia data processing method, device and storage medium
CN109413190A (en) * 2018-11-05 2019-03-01 腾讯科技(深圳)有限公司 File acquisition method, device, electronic equipment and storage medium
CN110399394A (en) * 2018-04-16 2019-11-01 北京京东尚科信息技术有限公司 The data traversal method, apparatus and calculate node of calculate node
CN110430274A (en) * 2019-08-09 2019-11-08 西藏宁算科技集团有限公司 A kind of document down loading method and system based on cloud storage
CN110442558A (en) * 2019-07-30 2019-11-12 深信服科技股份有限公司 Data processing method, sliced service device, storage medium and device
CN111464661A (en) * 2020-06-17 2020-07-28 北京金迅瑞博网络技术有限公司 Load balancing method and device, proxy equipment, cache equipment and service node
CN112524116A (en) * 2020-11-12 2021-03-19 江苏徐工工程机械研究院有限公司 Hydraulic piston rod, surface composite functional coating thereof and preparation method
CN113778948A (en) * 2021-09-01 2021-12-10 中国人民银行清算总中心 Message persistent storage method and device
CN114840608A (en) * 2022-07-04 2022-08-02 深圳市茗格科技有限公司 Distributed data access method, device and storage medium
CN114936010A (en) * 2022-07-20 2022-08-23 阿里巴巴(中国)有限公司 Data processing method, device, equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101673271A (en) * 2008-09-09 2010-03-17 青岛海信传媒网络技术有限公司 Distributed file system and file sharding method thereof
US20110022640A1 (en) * 2009-07-21 2011-01-27 International Business Machines Corporation Web distributed storage system
CN101996217A (en) * 2009-08-24 2011-03-30 华为技术有限公司 Method for storing data and memory device thereof
CN102194014A (en) * 2011-06-28 2011-09-21 无锡永中软件有限公司 Document storage method and document recovery method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101673271A (en) * 2008-09-09 2010-03-17 青岛海信传媒网络技术有限公司 Distributed file system and file sharding method thereof
US20110022640A1 (en) * 2009-07-21 2011-01-27 International Business Machines Corporation Web distributed storage system
CN101996217A (en) * 2009-08-24 2011-03-30 华为技术有限公司 Method for storing data and memory device thereof
CN102194014A (en) * 2011-06-28 2011-09-21 无锡永中软件有限公司 Document storage method and document recovery method

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103873504A (en) * 2012-12-12 2014-06-18 鸿富锦精密工业(深圳)有限公司 System enabling data blocks to be stored in distributed server and method thereof
CN103927265B (en) * 2013-01-04 2017-09-01 深圳市龙视传媒有限公司 A kind of content classification storage device, content acquisition method and content acquisition unit
CN103927265A (en) * 2013-01-04 2014-07-16 深圳市龙视传媒有限公司 Content hierarchical storage device, content acquisition method and content acquisition device
WO2015100627A1 (en) * 2013-12-31 2015-07-09 华为技术有限公司 Data processing method and device in distributed file storage system
US10127233B2 (en) 2013-12-31 2018-11-13 Huawei Technologies Co., Ltd. Data processing method and device in distributed file storage system
CN104272274B (en) * 2013-12-31 2017-06-09 华为技术有限公司 Data processing method and equipment in a kind of distributed file storage system
CN103825945A (en) * 2014-02-25 2014-05-28 华为技术有限公司 Fragmentation storage method and user terminal
CN103825945B (en) * 2014-02-25 2018-05-11 华为技术有限公司 A kind of burst storage method and user terminal
CN105471930A (en) * 2014-07-28 2016-04-06 浙江大华技术股份有限公司 Method, system and equipment for reading distributed data
CN105471930B (en) * 2014-07-28 2019-03-26 浙江大华技术股份有限公司 A kind of method, system and equipment reading distributed data
CN104298574B (en) * 2014-09-16 2017-07-04 南京斯坦德云科技股份有限公司 A kind of data high-speed storage processing system
CN105159836B (en) * 2015-08-03 2019-01-08 北京联想核芯科技有限公司 A kind of information processing method and electronic equipment
CN105159836A (en) * 2015-08-03 2015-12-16 联想(北京)有限公司 Information processing method and electronic device
CN106557706A (en) * 2016-11-30 2017-04-05 广东欧珀移动通信有限公司 Date storage method, device and mobile terminal
CN106557706B (en) * 2016-11-30 2019-09-13 Oppo广东移动通信有限公司 Date storage method, device and mobile terminal
CN106980693A (en) * 2017-04-01 2017-07-25 广东浪潮大数据研究有限公司 Method and device that a kind of file is read
WO2019033368A1 (en) * 2017-08-17 2019-02-21 深圳市优品壹电子有限公司 Data storage method and device
CN110399394A (en) * 2018-04-16 2019-11-01 北京京东尚科信息技术有限公司 The data traversal method, apparatus and calculate node of calculate node
CN108846022A (en) * 2018-05-24 2018-11-20 沈阳东软医疗系统有限公司 File memory method, document conversion method, device, equipment and storage medium
CN108845769A (en) * 2018-06-19 2018-11-20 郑州云海信息技术有限公司 A kind of method, apparatus and computer readable storage medium of the positioning of data rule
CN109286477A (en) * 2018-10-30 2019-01-29 佛山市瑞德物联科技有限公司 A kind of data transmission method for supporting out-of-order transfer
CN109412755B (en) * 2018-11-05 2021-11-23 东方网力科技股份有限公司 Multimedia data processing method, device and storage medium
CN109412755A (en) * 2018-11-05 2019-03-01 东方网力科技股份有限公司 A kind of multimedia data processing method, device and storage medium
CN109413190B (en) * 2018-11-05 2020-09-04 腾讯科技(深圳)有限公司 File acquisition method and device, electronic equipment and storage medium
CN109413190A (en) * 2018-11-05 2019-03-01 腾讯科技(深圳)有限公司 File acquisition method, device, electronic equipment and storage medium
CN110442558A (en) * 2019-07-30 2019-11-12 深信服科技股份有限公司 Data processing method, sliced service device, storage medium and device
CN110442558B (en) * 2019-07-30 2023-12-29 深信服科技股份有限公司 Data processing method, slicing server, storage medium and device
CN110430274A (en) * 2019-08-09 2019-11-08 西藏宁算科技集团有限公司 A kind of document down loading method and system based on cloud storage
WO2021253889A1 (en) * 2020-06-17 2021-12-23 北京金迅瑞博网络技术有限公司 Load balancing method and apparatus, proxy device, cache device and serving node
CN111464661A (en) * 2020-06-17 2020-07-28 北京金迅瑞博网络技术有限公司 Load balancing method and device, proxy equipment, cache equipment and service node
CN112524116A (en) * 2020-11-12 2021-03-19 江苏徐工工程机械研究院有限公司 Hydraulic piston rod, surface composite functional coating thereof and preparation method
CN112524116B (en) * 2020-11-12 2023-05-05 江苏徐工工程机械研究院有限公司 Hydraulic piston rod and surface composite functional coating thereof and preparation method
CN113778948A (en) * 2021-09-01 2021-12-10 中国人民银行清算总中心 Message persistent storage method and device
CN114840608A (en) * 2022-07-04 2022-08-02 深圳市茗格科技有限公司 Distributed data access method, device and storage medium
CN114936010A (en) * 2022-07-20 2022-08-23 阿里巴巴(中国)有限公司 Data processing method, device, equipment and medium

Also Published As

Publication number Publication date
CN102801784B (en) 2015-11-25

Similar Documents

Publication Publication Date Title
CN102801784A (en) Distributed type data storing method and equipment
KR101677946B1 (en) Partition expansion method and device
WO2019179539A3 (en) Shared blockchain data storage
CN109697133A (en) ID generation method, apparatus and system
CN102968498A (en) Method and device for processing data
CN102938784A (en) Method and system used for data storage and used in distributed storage system
CN102413156B (en) Distributed data storage system and method
CN102142032B (en) Method and system for reading and writing data of distributed file system
CN105138281B (en) A kind of sharing method and device of physical disk
CN105208078A (en) File storage system and method
CN106911743A (en) Small documents write polymerization, read polymerization and system and client
CN106126374A (en) Method for writing data, method for reading data and device
CN109145053B (en) Data processing method and device, client and server
CN108268216A (en) Data processing method, device and server
CN103778120A (en) Global file identification generation method, generation device and corresponding distributed file system
CN104348793A (en) Storage server system and storage method for data information
CN104144202A (en) Hadoop distributed file system access method, system and device
CN103326925A (en) Message push method and device
CN102970380B (en) Obtain method and the cloud storage server of the media data of cloud storage file
CN109858285B (en) Block chain data processing method, device, equipment and medium
CN103310008A (en) Cloud control server and file index method
JP2017059058A (en) Parallel information processing device, communication procedure determination method, and communication procedure determination program
CN110413588A (en) Distributed objects storage method, device, computer equipment and storage medium
CN109271438A (en) A kind of data bank access method and its system
CN105205062A (en) Data storage method and data reading method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant