CN1971562A - Distributing method of object faced to object storage system - Google Patents
Distributing method of object faced to object storage system Download PDFInfo
- Publication number
- CN1971562A CN1971562A CN 200610125189 CN200610125189A CN1971562A CN 1971562 A CN1971562 A CN 1971562A CN 200610125189 CN200610125189 CN 200610125189 CN 200610125189 A CN200610125189 A CN 200610125189A CN 1971562 A CN1971562 A CN 1971562A
- Authority
- CN
- China
- Prior art keywords
- file
- burst
- equipment
- value
- storage system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The objects distribution method facing the objects storage system is disclosed that belongs to the computer storage system facing objects with the characteristic of distributed storage system, the aim is combined with the advantages of Hasi algorism and the cutted piece algorism to improve the load balance of equipment, the concurrency and the reliability. The steps are: (1) determining the boundary value L of the file length; (2) obtaining the total number N of equipments when the original data sever is initialized; (3) obtaining the file length from clients, and the file length is compared with the defined boundary value L of the file length to judge the file is big or small; (4) if the result is small file, the object is mapped to the device by Hasi directly; (5)if the result is big file, the file will be split into several objects and located in the different devices. The invention reduces the overhead cost when the system with a mount of devices distributes the objects, and improves the concurrency and the reliability of devices. The invention can be used to distribute the object on one or more storage devices.
Description
Technical field
The invention belongs to object oriented calculation machine storage system, be specifically related to the object distribution method in a kind of object-oriented storage system, be used for object distribution to one or more object storage equipment with distributed storage architecture characteristics.
Background technology
OO storage system is considered as file the set of object.These object distribution are in object storage equipment OSD (Object-Based Storage Device).Object is placed into and can makes system have higher capacity, handling capacity, reliability and extensibility in the different equipment.For how improving in the object-oriented storage system object to place Study on Efficiency less relatively.The object distribution method begins just to have determined the performance of system at the communication process of total system, also affects the load balancing of equipment room and the parallel efficiency of equipment room simultaneously.
The object-oriented storage system is made up of client, meta data server and memory device three parts.This tripartite communication process can be described as: client sends the reading and writing of files request to meta data server, the object information of meta data server backspace file correspondence, and client is visited corresponding apparatus according to object information.In some distributed file systems, data are stored on the smart machine that can directly visit by network, and the metadata of data of description information is by special meta data server management.The position of meta data server in the object-oriented storage system is extremely important, and one of its task becomes one or more objects with File mapping exactly and is stored on the equipment, so whether the method that distributes efficiently directly influences system performance.
The algorithm that existing object distribution method is used mainly contains two kinds.First kind is called hash algorithm, and the use hash function is mapped as an object with a file integral body and is placed in the equipment.Second kind is called the burst mapping algorithm, with the file mean allocation in each equipment.Hash algorithm and burst mapping algorithm all have relative merits separately.Hash algorithm use hash function with the file data approximate random be distributed on the different equipment, can realize the efficient of load balancing and higher distributed data.But this man-to-man mapping model can not make full use of the concurrency of equipment for big file; And hash algorithm on can expanding also existing problems because the result of hash function will be subjected to the influence of equipment sum, so hash algorithm supposes that usually the memory device sum is constant.The burst mapping algorithm has utilized the concurrency of each equipment.It has two advantages: the first, simplified the operation of client, and client does not need concrete distribution method; The second, change equipment sum does not influence existing map information on the meta data server, is with good expansibility.But when file hour, use concurrency some waste that seems, because the expense that connects to each equipment under the small documents situation has surpassed the parallel benefit of bringing of equipment, particularly when the equipment sum is bigger, file is divided into a lot of small objects, is to create object or connect all very consuming timely, and reliability also can reduce, because if an equipment goes wrong, total system will turn round.
Summary of the invention
The present invention proposes the object distribution method in a kind of object-oriented storage system, purpose is the advantage in conjunction with hash algorithm and burst mapping algorithm, avoids their shortcoming simultaneously as far as possible, improves the load balancing of equipment, the concurrency of equipment room and equipment dependability, thus system performance improved.
Object distribution method in a kind of object-oriented storage system of the present invention, order comprises the steps:
(1) determine file size boundary value L, L is 256KB, 512KB or 1MB;
(2) obtain equipment sum N when the meta data server initialization, N is a positive integer;
(3) obtain file size from client, the file size boundary value L that defines in file size and the algorithm is compared, judgement is big file or small documents;
(4), directly be hashed into an object and be mapped in the equipment if small documents;
(5) if big file becomes several objects with its burst, be placed on respectively in the different equipment.
Object distribution method in the described object-oriented storage system is characterized in that: described big file fragmentation step is comprised following process:
(1) determines the burst reference value n of all lengths file
Reference
(2) according to equipment sum N and burst reference value n
ReferenceJudge the file fragmentation number n
p
Burst reference value n
ReferenceGreater than equipment sum N, then burst is counted n
The resultBe equipment sum N; Burst reference value n
ReferenceLess than equipment sum N, then burst is counted n
pFor burst is counted reference value n
Reference
(3) with equipment by the load parameter value of being obtained from the memory device end, sort from low to high, choose the low preceding n of load parameter value
pIndividual equipment;
(4) the even burst of file is mapped to the n that chooses
pOn the individual equipment.
Object distribution method in the described object-oriented storage system is characterized in that: when (1) determines file size boundary value L, and should be according to the file proportion of all lengths in the concrete environment; (2) determine burst reference value n under the various file sizes
ReferenceThe time, according to the situation of change that the cost performance of all lengths file fragmentation parallel transmission is counted n with burst, choose and make cost performance reach optimum burst number as burst reference value n
Reference
Object distribution method in the described object-oriented storage system is characterized in that: determine that described cost performance reaches optimum burst reference value n
ReferenceThe time, find first n value that satisfies with lower inequality to be n
Reference:
In the formula, a is the transmit leg expense, and b is the transmission time, and ε is the expectation value that increases the performance gain amplitude along with the burst number, and the ε reference value is 0.7 * 10
-3,
If in the interval [1, n
Theoretical] between do not satisfy the n of inequality, n then
Reference=n
Theoretical
Among the present invention, the file size boundary value determines according to the file proportion of all lengths in the concrete applied environment, and the file size of proportion maximum is decided to be the file size boundary value, and fundamental purpose is to bring into play the advantage of equipment end file system.
Network environment is meant the transmit leg expense and the network bandwidth; The theoretical best burst number of the file of all lengths
Can obtain by following analysis:
The number of object storage equipment (OSD) and the relation between the degree of parallelism are quite complicated.Equipment transmission channel more at most is many more, helps parallel transmission.But there is certain influence the institute's time spent that connects to the entire system performance, and especially when linking number is a lot, this influence will make performance reduce greatly.Data sync also is a time-consuming job.If a lot of the corresponding same big files of object are assembled into file again with these file fragments and also will spend many CPU time.Therefore, when same file corresponding equipment number increased, the transmission degree of parallelism increased, and has also consumed extra system resource simultaneously.This relation can be described with following formula:
Tp: the overall delay of parallel transmission file.
T: the overall delay of serial transmission file.
N: the corresponding number of devices that distributes of file.
A: transmit leg expense.
B: transmission time.
δ (t): the coordinated time between object, comprise synchronously, verification etc., be the function of n
The molecule of formula (1) comprises three part: n * a, b/n and δ (t).The transmit leg expense that n * a represents to connect n equipment be connect individual equipment n doubly.And b/n shows that the bandwidth of n equipment parallel transmission is n a times of individual equipment.Though δ (t) is an independent variable is the function of n,, can think approx that δ (t) is a constant because its variation with n is very little.So substitute δ (t) with constant c, formula (1) abbreviation also further is deformed into:
T/Tp is the speed-up ratio of parallel transmission with respect to serial transmission, the speed-up ratio maximum just is equivalent to makes Tp/T minimum in the formula (2).And the method that the present invention proposes is mainly paid close attention to is to be a plurality of objects with a File mapping how, and promptly n is to the influence of Tp/T, so as long as find best n just can obtain maximum speed-up ratio.Order
To its differentiate
We can obtain
Because
F (n) exists
There is minimum value at the place.Promptly work as
Time transmission speed-up ratio maximum.If transmission time b is very big or send expense a when very little, it is very big that n will become.This also just shows if the transmission time is very little, then there is no need the parallel transmission object.
Wherein transmit leg expense a is determined by network condition, and transmission time b is by file size and network bandwidth decision, b=file size/network bandwidth.So under the condition of determining network environment, can basis
The best burst number of pairing theory when obtaining the file optimal transmission performance of all lengths.In actual applications, make systemic price ratio reach the highest, also should consider increases the cost expense that number of devices is brought, and then makes cost performance reach optimum burst reference value n reference and is determined by following analysis:
Order
Its differentiate is got
ε is the expectation value that increases the performance gain amplitude along with the burst number, and reference value is 0.7 * 10
-3Find first n value that satisfies inequality (3) to be n
ReferenceIf, in the interval [1, n
Theoretical] between do not satisfy the n of inequality (3), n then
Reference=n
Theoreticaln
ReferenceThe physical meaning of substitution (3) formula represents that working as the burst number is n
ReferenceThe time, Tp/T changes not too obvious with the burst number, can think that this moment, systemic price ratio reached the highest.
The present invention has reduced the expense of distributing object in the system that has large number quipments, has improved the concurrency and the reliability of equipment.
Description of drawings
Fig. 1 is a FB(flow block) of the present invention;
Fig. 2 is the FB(flow block) of the big file fragmentation of the present invention;
Fig. 3 is the graph of a relation of time ratio Tp/T and object number n;
Fig. 4 is transmission overall delay (a) 16-OSDs of algorithm of the present invention in each virtual system, (b) 32-OSDs, (c) 64-OSDs;
Fig. 5 is comparison (a) 16-OSDs of transmission overall delay in each virtual system of algorithm of the present invention and hash algorithm and burst mapping algorithm, (b) 32-OSDs, (c) 64-OSDs.
Embodiment
The present invention is described in further detail below in conjunction with embodiment.
The boundary value L of file size is decided to be 512KB: by statistical study, in the load of LLNL, it is 512KB that 85% object length is arranged approximately, 15% object length is less than 512KB, in design, the boundary value of file size is defined as 512KB thus, more helps bringing into play the advantage of equipment end file system.
In big file, chosen the situation of the most normal appearance: 1MB, 2MB, 4MB, 8MB, 16MB, 32MB, 64MB, 128MB, 256MB, 512MB and 1GB.Choose a kind of typical case on network: bandwidth is 1000Mbits/ second, and the expense factor is 80 microseconds.For the file of 1MB, Message size=8Mbits, Bandwidth=1000Mbits, b=1*8/1000=0.008s=8000 μ s, a=80 μ s,
As shown in Figure 3, for the file of 1MB and 2MB, the change of Tp/T is not obvious when n>=10.For 4MB, the file of 8MB and 16MB, the variation of Tp/T is not obvious when n>=20.Equally, to the file greater than 32MB, n>=40 an o'clock Tp/T has changed with n hardly.Can obtain as drawing a conclusion: n
Reference=10 (file size is 1MB-2MB), n
Reference=20 (file size is 4MB-16MB), n
Reference=40 (file size is 32MB-1GB).
Experimental situation is the PC of a 2.4GHz Intel Celeron CPU 512MB RAM, and operating system is Red Hat Linux 9.0, kernel version 2 .4.20.Use Matlab as simulation software.At first generate the array at random of an expression file size according to exponential distribution, object allocation algorithm of the present invention then is applied to calculate corresponding propagation delay time in these files.Algorithm is realized with the form of the M file of Matlab.Suppose that the Overhead expense factor is 80 microseconds, the network bandwidth is 1000Mb/s.And should be able to be faster than the equipment of selecting at random through selecting equipment after the ordering.According to the difference of OSD number, the situation of emulation is divided into following three classes: the system of 16 OSD, the system of the system of 32 OSD and 64 OSD.
Every statistics of table 1. object allocation algorithm
Number of devices | Average (μ s) | Standard deviation (μ s) | Maximal value (μ s) | Minimum value (μ s) |
16 | 2.229474e+06 | 3.303443e+04 | 2.309076e+06 | 2.132241e+06 |
32 | 2.230677e+06 | 3.514477e+04 | 2.315964e+06 | 2.152755e+06 |
64 | 2.235837e+06 | 3.870051e+04 | 2.311283e+06 | 2.160869e+06 |
Every statistics of table 2. hash algorithm
Number of devices | Average (μ s) | Standard deviation (μ s) | Maximal value (μ s) | Minimum value (μ s) |
16 | 4.274640e+06 | 1.382437e+05 | 4.644530e+06 | 3.883163e+06 |
32 | 4.255894e+06 | 1.180060e+05 | 4.511304e+06 | 4.010331e+06 |
64 | 4.187441e+06 | 1.448126e+05 | 4.512065e+06 | 3.857762e+06 |
Every statistics of table 3. slicing algorithm
Number of devices | Average (μ s) | Standard deviation (μ s) | Maximal value (μ s) | Minimum value (μ s) |
16 | 1.572765e+06 | 8.640234e+03 | 1.595883e+06 | 1.548298e+06 |
32 | 2.751877e+06 | 3.687687e+03 | 2.759858e+06 | 2.744203e+06 |
64 | 5.308309e+06 | 2.262698e+03 | 5.313381e+06 | 5.303158e+06 |
From table numerical value and Fig. 5 as can be seen, the standard deviation of the object allocation algorithm overall delay among the present invention is all littler than hash algorithm and burst mapping algorithm.See that on the whole object allocation algorithm of the present invention is more stable than other two kinds of algorithms, and behaves oneself best when the object-oriented storage system has a large amount of OSD.
Claims (4)
1. the object distribution method in the object-oriented storage system, order comprises the steps:
(1) determine file size boundary value L, L is 256KB, 512KB or 1MB;
(2) obtain equipment sum N when the meta data server initialization, N is a positive integer;
(3) obtain file size from client, the file size boundary value L that defines in file size and the algorithm is compared, judgement is big file or small documents;
(4), directly be hashed into an object and be mapped in the equipment if small documents;
(5) if big file becomes several objects with its burst, be placed on respectively in the different equipment.
2. the object distribution method in the object-oriented storage system as claimed in claim 1 is characterized in that: described big file fragmentation step is comprised following process:
(1) determines the burst reference value n of all lengths file
Reference
(2) according to equipment sum N and burst reference value n
ReferenceJudge the file fragmentation number n
p
Burst reference value n
ReferenceGreater than equipment sum N, then burst is counted n
The resultBe equipment sum N; Burst reference value n
ReferenceLess than equipment sum N, then burst is counted n
pFor burst is counted reference value n
Reference
(3) with equipment by the load parameter value of being obtained from the memory device end, sort from low to high, choose the low preceding n of load parameter value
pIndividual equipment;
(4) the even burst of file is mapped to the n that chooses
pOn the individual equipment.
3. the object distribution method in the object-oriented storage system as claimed in claim 1 or 2 is characterized in that: when (1) determines file size boundary value L, and should be according to the file proportion of all lengths in the concrete environment; (2) determine burst reference value n under the various file sizes
ReferenceThe time, according to the situation of change that the cost performance of all lengths file fragmentation parallel transmission is counted n with burst, choose and make cost performance reach optimum burst number as burst reference value n
Reference
4. the object distribution method in the object-oriented storage system as claimed in claim 3 is characterized in that: determine that described cost performance reaches optimum burst reference value n
ReferenceThe time, find first n value that satisfies with lower inequality to be n
Reference:
In the formula, a is the transmit leg expense, and b is the transmission time, and ε is the expectation value that increases the performance gain amplitude along with the burst number, and the ε reference value is 0.7 * 10
-3,
If in the interval [1, n
Theoretical] between do not satisfy the n of inequality, n then
Reference=n
Theoretical
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 200610125189 CN1971562A (en) | 2006-11-29 | 2006-11-29 | Distributing method of object faced to object storage system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 200610125189 CN1971562A (en) | 2006-11-29 | 2006-11-29 | Distributing method of object faced to object storage system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN1971562A true CN1971562A (en) | 2007-05-30 |
Family
ID=38112385
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 200610125189 Pending CN1971562A (en) | 2006-11-29 | 2006-11-29 | Distributing method of object faced to object storage system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1971562A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101494658A (en) * | 2008-01-24 | 2009-07-29 | 华为技术有限公司 | Method, apparatus and system for implementing fingerprint technology |
CN101763437A (en) * | 2010-02-10 | 2010-06-30 | 成都市华为赛门铁克科技有限公司 | Method and device for realizing high-speed buffer storage |
CN101820445A (en) * | 2010-03-25 | 2010-09-01 | 南昌航空大学 | Distribution method for two-dimensional tiles in object-based storage system |
CN101866359A (en) * | 2010-06-24 | 2010-10-20 | 北京航空航天大学 | Small file storage and visit method in avicade file system |
CN102004761A (en) * | 2010-11-01 | 2011-04-06 | 青岛普加智能信息有限公司 | Data storage method and system |
CN102377972A (en) * | 2010-08-23 | 2012-03-14 | 联想(北京)有限公司 | Image processing equipment and method |
CN104102646A (en) * | 2013-04-07 | 2014-10-15 | 腾讯科技(深圳)有限公司 | Method, device and system for processing data |
CN104933067A (en) * | 2014-03-19 | 2015-09-23 | 中国移动通信集团公司 | Method and apparatus for operating file system and object storage system |
CN105608193A (en) * | 2015-12-23 | 2016-05-25 | 深圳市深信服电子科技有限公司 | Data management method and apparatus for distributed file system |
WO2016188063A1 (en) * | 2015-05-25 | 2016-12-01 | 深圳市中兴微电子技术有限公司 | Method and device for improving ram access efficiency, and computer storage medium |
CN106250436A (en) * | 2016-07-26 | 2016-12-21 | 东软集团股份有限公司 | The method and device of management form data |
CN107135264A (en) * | 2017-05-12 | 2017-09-05 | 成都优孚达信息技术有限公司 | Data-encoding scheme for embedded device |
CN107851112A (en) * | 2015-07-08 | 2018-03-27 | 云聚公司 | For the system and method from camera secure transmission signal |
-
2006
- 2006-11-29 CN CN 200610125189 patent/CN1971562A/en active Pending
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101494658A (en) * | 2008-01-24 | 2009-07-29 | 华为技术有限公司 | Method, apparatus and system for implementing fingerprint technology |
US8706746B2 (en) | 2008-01-24 | 2014-04-22 | Huawei Technologies Co., Ltd. | Method, device, and system for realizing fingerprint technology |
WO2011098017A1 (en) * | 2010-02-10 | 2011-08-18 | 成都市华为赛门铁克科技有限公司 | Method and device for realizing high-speed buffer storage |
CN101763437B (en) * | 2010-02-10 | 2013-03-27 | 华为数字技术(成都)有限公司 | Method and device for realizing high-speed buffer storage |
CN101763437A (en) * | 2010-02-10 | 2010-06-30 | 成都市华为赛门铁克科技有限公司 | Method and device for realizing high-speed buffer storage |
CN101820445A (en) * | 2010-03-25 | 2010-09-01 | 南昌航空大学 | Distribution method for two-dimensional tiles in object-based storage system |
CN101820445B (en) * | 2010-03-25 | 2012-09-05 | 南昌航空大学 | Distribution method for two-dimensional tiles in object-based storage system |
CN101866359A (en) * | 2010-06-24 | 2010-10-20 | 北京航空航天大学 | Small file storage and visit method in avicade file system |
CN102377972A (en) * | 2010-08-23 | 2012-03-14 | 联想(北京)有限公司 | Image processing equipment and method |
CN102377972B (en) * | 2010-08-23 | 2015-01-28 | 联想(北京)有限公司 | Image processing equipment and method |
CN102004761A (en) * | 2010-11-01 | 2011-04-06 | 青岛普加智能信息有限公司 | Data storage method and system |
CN102004761B (en) * | 2010-11-01 | 2012-12-26 | 青岛普加智能信息有限公司 | Data storage method and system |
CN104102646B (en) * | 2013-04-07 | 2019-01-15 | 腾讯科技(深圳)有限公司 | The method, apparatus and system of data processing |
CN104102646A (en) * | 2013-04-07 | 2014-10-15 | 腾讯科技(深圳)有限公司 | Method, device and system for processing data |
CN104933067A (en) * | 2014-03-19 | 2015-09-23 | 中国移动通信集团公司 | Method and apparatus for operating file system and object storage system |
WO2016188063A1 (en) * | 2015-05-25 | 2016-12-01 | 深圳市中兴微电子技术有限公司 | Method and device for improving ram access efficiency, and computer storage medium |
CN106293496A (en) * | 2015-05-25 | 2017-01-04 | 深圳市中兴微电子技术有限公司 | A kind of method and device improving RAM access efficiency |
CN106293496B (en) * | 2015-05-25 | 2019-05-31 | 深圳市中兴微电子技术有限公司 | A kind of method and device improving RAM access efficiency |
CN107851112A (en) * | 2015-07-08 | 2018-03-27 | 云聚公司 | For the system and method from camera secure transmission signal |
CN105608193A (en) * | 2015-12-23 | 2016-05-25 | 深圳市深信服电子科技有限公司 | Data management method and apparatus for distributed file system |
CN105608193B (en) * | 2015-12-23 | 2019-03-26 | 深信服科技股份有限公司 | The data managing method and device of distributed file system |
CN106250436A (en) * | 2016-07-26 | 2016-12-21 | 东软集团股份有限公司 | The method and device of management form data |
CN107135264A (en) * | 2017-05-12 | 2017-09-05 | 成都优孚达信息技术有限公司 | Data-encoding scheme for embedded device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1971562A (en) | Distributing method of object faced to object storage system | |
CN100462979C (en) | Distributed indesx file searching method, searching system and searching server | |
CN103067297B (en) | A kind of dynamic load balancing method based on resource consumption prediction and device | |
CN102236581A (en) | Mapping reduction method and system thereof for data center | |
CN102014042A (en) | Web load balancing method, grid server and system | |
CN103810244A (en) | Distributed data storage system expansion method based on data distribution | |
Yao et al. | An energy-efficient and access latency optimized indexing scheme for wireless data broadcast | |
CN202033748U (en) | Search engine performance test system | |
CN113900810A (en) | Distributed graph processing method, system and storage medium | |
CN106709048A (en) | High-performance computing-oriented distributed data organization method | |
WO2004063928A1 (en) | Database load reducing system and load reducing program | |
CN103207920A (en) | Parallel metadata acquisition system | |
CN109753244A (en) | A kind of application method of Redis cluster | |
CN110245129A (en) | Distributed global data deduplication method and device | |
CN107276914B (en) | Self-service resource allocation scheduling method based on CMDB | |
CN102882960A (en) | Method and device for transmitting resource files | |
US10375164B1 (en) | Parallel storage system with burst buffer appliance for storage of partitioned key-value store across a plurality of storage tiers | |
CN102402544A (en) | Attachment sharing method and device as well as business system | |
CN105138536A (en) | Mobile social network data fragmentation method based on directed hypergraph | |
CN114090530A (en) | Log summarizing and inquiring method and device under distributed architecture | |
CN109586970A (en) | Resource allocation methods, apparatus and system | |
Liu et al. | Sequential seeding for spreading in complex networks: Influence of the network topology | |
CN116405179A (en) | Building networking data management method based on block chain slicing and DAG | |
Al-Yatama et al. | Memory allocation algorithm for cloud services | |
Li et al. | Blockchain transaction sharding algorithm based on account-weighted graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |