CN103246700A - Mass small file low latency storage method based on HBase - Google Patents

Mass small file low latency storage method based on HBase Download PDF

Info

Publication number
CN103246700A
CN103246700A CN201310112130XA CN201310112130A CN103246700A CN 103246700 A CN103246700 A CN 103246700A CN 201310112130X A CN201310112130X A CN 201310112130XA CN 201310112130 A CN201310112130 A CN 201310112130A CN 103246700 A CN103246700 A CN 103246700A
Authority
CN
China
Prior art keywords
small documents
hbase
row
section
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310112130XA
Other languages
Chinese (zh)
Other versions
CN103246700B (en
Inventor
魏超
鄢小征
栾江霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Meiya Pico Information Co Ltd
Original Assignee
Xiamen Meiya Pico Information Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Meiya Pico Information Co Ltd filed Critical Xiamen Meiya Pico Information Co Ltd
Priority to CN201310112130.XA priority Critical patent/CN103246700B/en
Publication of CN103246700A publication Critical patent/CN103246700A/en
Application granted granted Critical
Publication of CN103246700B publication Critical patent/CN103246700B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a mass small file low latency storage method based on HBase. A small file list comprising a row primary key and two column families is established on the condition of Hadoop and HBase; storage environment suitable for small files is established, an application process including small file writing, small file inserting and small file reading is provided; and further reasonable storage and low latency reading and writing of the mass small files are realized, and practical requirements are met.

Description

Mass small documents low delay storage means based on HBase
Technical field
The present invention relates to a kind ofly, refer in particular to a kind of mass small documents low delay storage means based on HBase.
Background technology
The distributed file system that Hadoop provides (HDFS-Hadoop Distributed File System) as a kind of solution of distributed storage cheaply, is widely used in recent years.
The design object of HDFS mainly is at big file, big file such as hundreds of MB, last GB, be not suitable for storing small documents, reason is the NameNode node as the service of Hadoop master data, need preserve the key message of All Files in internal memory, it is estimated, 10,000,000 files need take the internal memory of 2~3GB, well imagine that the memory problem that more file brings will bring very big pressure to the server hardware of present main flow, even can not increase to join by hardware and deal with problems.
In addition, HDFS is partial to the read-write of high-throughput and does optimization, and the negative effect that brings is that certain time delay is arranged, and after writing such as file, can not be reacted on other clients that reading this document at once, wait HDFS synchronously afterwards.So HDFS itself is not suitable for storing the data of low delay requirement.
At present existing certain methods is used for solving the problem that HDFS is not suitable for storing small documents, wherein the solution of Hadoop official comprises HAR(Hadoop Archives), namely a collection of small documents is filed in the big file and stored, but there are a lot of problems, the one, the URI(unified resource identifier of small documents) become, unfriendly to the application system that is implemented on the HDFS, application system need remodify to adapt to this variation; The 2nd, compression is not supported in this filing; The 3rd, the content in the archive file can't be revised, and such as increasing newly in archive file or deleting a small documents, need file again.
The other solution is SequenceFile, the MapFile that provides by Hadoop, small documents is combined into a big file earlier to be stored, but also there are some problems, such as listing the small documents catalogue simply fast, and this solution is bad to the support of other language mainly towards Java language.
In the existing patent, " the non-independent small documents association store method of a kind of magnanimity based on Hadoop " (number of patent application: 201110312671.8), " a kind of magnanimity based on Hadoop can be sorted out small documents association store method " (number of patent application: 201110312694.9), also be to adopt the method that is combined into big file earlier, exploitation to the upper layer application system has requirement, in other words can't be transparent fully to upper layer application.The characteristic that does not also possess low delay in addition.
Summary of the invention
The objective of the invention is to overcome above-mentioned defective, a kind of reasonable storage and low delay of realizing mass small documents is provided, and to the complete transparent mass small documents low delay storage means based on HBase of upper layer application system.
The object of the present invention is achieved like this: a kind of mass small documents low delay storage means based on HBase is characterized in that: provide to comprise that small documents writes, small documents is continued and small documents reads the step of application after setting up the small documents storage environment under Hadoop, the HBase environment; Wherein, set up the small documents storage environment, comprise,
Set up HBase small documents table, in the HBase database, create the step of a small documents table; Described small documents table is as follows,
Figure BDA00002996884900021
The value of described capable major key is a unique character string that generates at random;
The described row i of family is used for the corresponding stored small documents and is divided into the length information of N each section after the section and the byte stream of file content according to predefined rule;
The described row m of family for the lock information of minute book line item, is used for whether writing fashionable decision line by other thread or process locking concurrent;
The small documents table is carried out the step of pre-subregion (character string that partitioning strategies adopts 16 systems to represent); And
The small documents table is opened the step of the Bloom filter of row.
In the said method, the value of described capable major key is done MD5 digest to this value and is calculated by getting the UUID value, gets the character string of the hexadecimal representation formation of digest value again; The key name of the described row i of family is made up of section sequence number and current slice length, stores the byte arrays of the section content of current slice sequence number correspondence in the key assignments;
In the said method, described small documents writes to use and comprises step,
A1), initialization small documents output stream, core buffer is opened up in small documents output stream inside in the back of quoting that obtains the target small documents, transmits the URI of target small documents to output stream; Comprised the target line major key that writes the small documents table among the described URI; Be not less than the small documents slice size of definition between described buffer empty;
A2), check the key assignments of the key name lock of the row m of family in the small documents table, if not, then row can be operated, newly-increased lock;
A3), from source file, read byte stream, be input in the buffer zone, after a buffer zone is fully written, content in this moment buffer zone is formed new section, and acquisition section sequence number, and obtain the true byte length of this section, be assembled into key name under the HBase small documents table i row family jointly in advance with the section sequence number, and the content of section stores in the corresponding key assignments into;
A4), connect HBase, new whole slices is outputed in the small documents table, the row that the row i of family dynamic expansion makes new advances in the small documents table are used for storage should new section;
A5), the replacement buffer zone, get back to steps A 2;
A6), after source file all the elements read and finish, before small documents output stream is closed, the remaining content that does not reach section definition size threshold value in the buffer zone is assembled into last section that this time writes, output to HBase small documents epiphase with row in;
A7), discharge lock;
Best, also can comprise step after the described steps A 7,
A8), by the capable major key of small documents storage, cooperate the KeyOnlyFilter of HBase to grasp all row names under the corresponding row i row family, and then from the row name, parse existing all sequence numbers of cutting into slices under the current state;
A9), judge that the key assignments of locking under the m row family of the capable major key correspondence of small documents for being/denying, then continues step if not;
A10), ordering obtains the section sequence number of current maximum;
A11), increase progressively maximum sequence number and begin the section write, it is identical that follow-up and complete small documents writes flow process;
In the said method, described small documents reads to use and comprises step,
B1), initialization small documents inlet flow, buffer zone is opened up in small documents inlet flow inside, obtains the access limit of HBase small documents table; Be not less than the maximum slice size of small documents of definition between described buffer empty;
B2), according to the capable major key of the small documents of desiring to read in the small documents table, the inquiry corresponding row the i of row family under all column informations; Utilize the KeyOnlyFilter of HBase, only read the key name of the row i of family from the small documents table;
B3), when application layer reads data in to the small documents inlet flow, whether the small documents inlet flow checks available data in its buffer zone, if there is not data available, brings out the small documents inlet flow and flow to the slice of data that HBase please look for novelty;
B4), the application layer data of having consumed current buffer zone, return step B2;
B5), when small documents inlet flow location and after handling last section of small documents correspondence, return end signal and finish reading of whole small documents;
In the said method, described small documents deletion action, the capable major key that relies on the need deletion of HBase small documents is located fast, directly uses the API deletion corresponding row of HBase then;
In the said method, the storage abstraction layers of creating a file, be used for receiving the file size report from application layer, and according to returning to application layer behind the corresponding URI of predefined big or small file differentiation threshold value structure, desiring of receiving then that application layer sends here carries out resolving URI when the file input stream of read/write and URI transmit, and automatically with in the corresponding small documents table that points to HDFS file system or HBase of the read/write operation of file;
In above-mentioned, described threshold size is 10 megabyte.
Beneficial effect of the present invention is by set up a kind of suitable small documents storage environment under Hadoop, HBase environment, thereby supportingly comprise that small documents writes, small documents is continued and small documents reads application flow, and then reasonable storage and the low delay read-write of realization mass small documents, practical requirement.
Description of drawings
Below in conjunction with accompanying drawing in detail concrete structure of the present invention is described in detail
Fig. 1 is the inventive method application size file storage abstract graph at the middle and upper levels;
Fig. 2 writes the small documents schematic flow sheet in the inventive method;
Fig. 3 reads the small documents schematic flow sheet in the inventive method.
Embodiment
By describing technology contents of the present invention, structural attitude in detail, realized purpose and effect, give explanation below in conjunction with embodiment and conjunction with figs. are detailed.
The invention provides a kind of mass small documents low delay storage means based on HBase, it is by supplying to comprise that small documents writes, small documents is continued and small documents reads application after setting up the small documents storage environment under Hadoop, the HBase environment.
Building of small documents storage environment wherein comprises step:
One, prepares Hadoop, HBase environment
This method realizes based on Hadoop, HBase, therefore at first needs to dispose earlier Hadoop, HBase environment.
This method is implemented on the HBase.HBase be implemented on the Hadoop one distributed, towards the database of row, also is the part of Hadoop originally, independently is the top project of an Apache now.HBase is not suitable for directly being used for storage file, and its design object is for structured data, possesses the characteristic of low delay simultaneously.The data of HBase can be kept in a lot of file system, and HDFS is best selection.
Two, set up HBase small documents table
Create a small documents table in the HBase database, the list structure design is as follows:
Figure BDA00002996884900051
Wherein, row major key (key name: value rowkey) is a character string that generate, unique at random: get UUID(Universally Unique Identifier) value, this value is done MD5 digest calculate the character string of getting the hexadecimal representation of digest value afterwards again.The purpose of so handling the row major key is to take full advantage of the existing table subregion of HBase (region splitting) strategy, and row is distributed among the different region, avoids the region hot issue.
This table has and only has 2 row families (column family):
The row i of family: be used for section byte stream (file content) and length information after the storage small documents is cut into slices.
The row m of family: be used for the lock information of minute book line item, be used for whether writing fashionable decision line by other thread or process locking concurrent.
Complete small documents in this table by a capable complete representation.When writing this table, file can be N section according to predefined regular cutting, and each section is with the section sequence number that obtains to increase progressively.A row complete representation during section sequence number, slice size, the complete byte arrays of section will be shown by this under i row family.
Section sequence number, slice size are stored as the row name.Section content (byte arrays) is stored as train value.
Figure BDA00002996884900061
Preceding two bytes of row name are the section sequence number.The section sequence number is short (short type), short on the occasion of the part maximum can provide 32767 the section sequence numbers, for small documents this section upper limit enough.The section sequence number that increases progressively be used for to be distinguished different sections on the one hand, and section reconfigures to complete file assembling sequence is provided when being used to small documents to read on the other hand.
What the remaining byte of row name was partly stored is the physical length of current slice.This slice size of storage in train value when the whole small documents of calculating is big or small, can only reads the row name from this table, and need not read real section train value, avoids loading the memory cost of whole section complete bytes array.
Train value storage section content (byte arrays), each small documents is cut into the impartial section of a plurality of sizes, preserves wherein.Slice size can customize as required, such as 4KB, 128KB, 512KB, 1MB.
The row of a lock by name are only arranged under the row m of family, be used for providing capable lock information.When small documents is being write these row of fashionable existence, when writing end, locks by file to discharge, and these row are deleted.Use the self-defined row storage line lock information that is listed as under the family, the capable lock mechanism primary than HBase has more dirigibility.
Three, to the pre-subregion of small documents table
This method need be to the pre-subregion strategy of the character string that the small documents table adopts 16 systems to represent (RegionSplitter.HexStringSplit).Capable major key generating mode based on small documents table HEX (MD5 (UUID)), small documents stores into will effectively be evenly distributed to upward processing of different region among the HBase, can prevent the problem that single region is overheated behind the pre-subregion, by reasonably pre-subregion, can effectively be distributed to read-write requests on the different subregions, promote the concurrent reading and writing ability, thereby promote small documents concurrent reading and writing performance.
Along with the growth of storage space, can do further manual subregion at existing subregion as required and handle.
Four, the small documents table is opened Bloom filter (Bloom Filter)
Bloom filter is a kind of fast searching method based on Hash, and it can do definite eliminating and possible supposition.HBase supports his-and-hers watches to open Bloom filter.This method is opened the Bloom filter of going to the small documents table, guarantees the performance that reads at random.
Through after the building of above-mentioned small documents storage environment, can carry out the reading and writing application operating based on the mass small documents of HBase.Realize that for better the read-write operation of mass small documents is transparent fully to the upper layer application system, on above-mentioned environmental structure basis, also need the processing of level of abstraction is carried out in big small documents storage, concrete:
Referring to Fig. 1, for the upper layer application system provides one by the file storage abstraction layers, file finally is stored in and still is stored in the HDFS file system in the HBase small documents table, is transparent for upper layer application.Upper layer application only needs the nondistinctive general-purpose interface of paying close attention to the file storage.Step is as follows:
1, application layer is informed storage abstraction layers with file size.The predefined big or small file of level of abstraction internal condition is distinguished threshold value, makes up corresponding URI, returns to application layer.
2, application layer need not to pay close attention to the particular content of URI, and file input stream and the URI that desires to store passed to abstract storage means.The inner URI that can resolve of level of abstraction stores file in HDFS file system or the HBase small documents table automatically.
For example, the URI that is stored in HDFS is with hdfs: // beginning, the URI that is stored in HBase small documents table is with hbase: // beginning.
Above-mentioned putting sent out, by a file storage abstraction layers is provided for the upper layer application system, judge the memory location according to file size automatically, greater than setting threshold values (for example 10MB), stores HDFS into, and URI is with hdfs: // sign; For a short time equal to set threshold values, store HBase into, URI is with hbase: // sign.Below explanation is stored the disposal route among the HBase into less than setting threshold values (being small documents).Small documents is that orientation stores among the HBase, and HBase itself possesses the characteristic of low delay, so this method has also realized the low delay that small documents stores.
The present invention also provides standard set small documents I/O interface method, comprises that can small documents create, read, continue, delete, obtain file size, write, read, whether exist.Inherit the read write attribute of HBase low delay, provide the I/O stream of low memory cost to write incoming interface (primary HBase API does not support that stream writes).Concrete:
Small documents writes application
Write HBase small documents table by a self-defining small documents output stream, finish the section of small documents byte stream and the process of storing to HBase.Ablation process can use normal document flow form, and memory cost equals slice size, is equivalent to the buffer size of standard I/O.
The logic flow that writes referring to Fig. 2 small documents is as follows:
1, obtains quoting of target small documents, prepare to write.
2, initialization small documents output stream.Core buffer is opened up in small documents output stream inside, and the small documents slice size with definition between buffer empty is identical.The URI(that transmits the target small documents to small documents output stream produces by " storage of upper layer application file is abstract ").Comprised the target line major key (rowkey) that writes the small documents table among the URI.
3, check the row lock, as writing new line increment lock, i.e. the lock row of the row m of family in the small documents table.
4, from source file, read byte stream, be input in the buffer zone.(namely formed a new whole slices) after a buffer zone is fully written, its content forms new section, obtains the section sequence number.From then on its true byte length is obtained in section, is assembled into row name (key name) under the HBase small documents table i row family jointly in advance with the section sequence number, and the content of section stores in the corresponding train value (key assignments).
5, connect HBase, new whole slices is outputed in the small documents table.The row that small documents table corresponding row makes new advances dynamic expansion (belonging to the row i of family) are used for the new section of storage.
6, the replacement buffer zone is got back to step 3.
7, after source file all the elements read and finish, before inlet flow is closed, the remaining content that does not reach section definition size threshold value in the buffer zone is assembled into last section that this time writes, outputs in the HBase small documents epiphase row together.
8, discharge the row lock.
Small documents is continued application
In addition, this method is also supported existing small documents appended and is write, and has walked around the restriction that the common row of HBase can only whole stole only take.
The logic flow of continuing small documents continues, and is as follows:
9, by the capable major key of small documents storage, the KeyOnlyFilter of cooperation HBase grasps all the row names under the corresponding row i row family.And then from row names, parse existing all section sequence numbers under the current state.
10, judge that can small documents write.According to the design of small documents list structure, will record the row lock under the m row family of HBase small documents table.Whether exist by the descending lock of m row family of judging the capable major key correspondence of small documents, can judge directly whether small documents can write;
11, ordering obtains the section sequence number of current maximum.
12, increase progressively maximum sequence number and begin the section write, it is identical that follow-up and complete small documents writes flow process.
Small documents reads application
Reading of small documents namely read from HBase small documents table and read all row (section) in the corresponding row, and will cut into slices and be assembled into the process of complete small documents again in order.Readout can use normal document flow form, and memory cost equals slice size, is equivalent to the buffer size of standard I/O.
Referring to Fig. 3, the logic flow that small documents reads is as follows:
1, initialization small documents inlet flow.Buffer zone is opened up in inlet flow inside, and is identical with the maximum slice size of the small documents of definition between buffer empty.Obtain the access limit of HBase small documents table.
2, according to the capable major key of the small documents of desiring to read, all column informations under the inquiry corresponding row i row family.
The KeyOnlyFilter that utilizes HBase to provide only reads row name (key of row) from the small documents table, reduces data traffic and avoids loading the expense of whole slices content in the internal memory.Go out the section sequence of small documents according to the data configuration that obtains, safeguard the mapping of section sequence number and row name, be used for follow-up branch section and read small documents.
3, when upper layer application attempted reading data from the small documents inlet flow, the small documents inlet flow checked in the buffer zone whether available data are arranged, if there is not data available, will bring out the slice of data that inlet flow please be looked for novelty to HBase.Can shine upon directly to locate according to the section sequence number that obtains before and row name at this moment and read corresponding section content.The section of newly obtaining will be written into buffer zone, supply with upper layer application consumption.
4, consume the data of current buffer zone when upper layer application, will trigger step 2 again.
5, when the inlet flow location and after handling last section of small documents correspondence, finish reading of whole small documents with returning end signal.
Before reading, upper layer application can obtain file size, and this programme need not to read the complete file content can obtain file size.According to the list structure design of small documents storage, during the small documents storage, after being cut into slices, the corresponding length of each section will be recorded in the row name of each row under the i row family.
By the KeyOnlyFilter of HBase, only grasp the row name of all row under the corresponding row i row family, namely obtain the information of all sections.After will being listed as name preceding two bytes abandoning, can obtain the slice size of correspondence after the remaining byte conversion.All slice size that adds up are the complete and accurate size of corresponding small documents.
Before above-mentioned steps 1, upper layer application need judge that can small documents read.HBase supports concurrent reading, so the problem of small documents storage table outside not having that HBase itself is concurrent and reading.What unique needs were judged is whether small documents exists.Can judge directly whether small documents corresponding row major key exists in HBase small documents table by HBase API, exist i.e. expression to read.
A complete small documents is to be stored in the HBase small documents table with the form of going.A row namely represents a complete small documents, therefore can judge directly whether small documents corresponding row major key exists in the small documents table by HBase API, can judge whether small documents exists.
The small documents deletion is used
Small documents is that the form of cutting into slices is stored in the HBase small documents table, and all sections of monofile all are stored under the same row.And then the quick location of the rowkey of dependence HBase, directly use HBase API deletion corresponding row, mean that also corresponding small documents is deleted.
As fully visible, the inventive method has solved the two large problems in the background technology, realizes reasonable storage and the low delay of mass small documents, and transparent fully to the upper layer application system.
The actual effect that reaches: at first, set up the small documents storage environment under Hadoop, HBase environment, then supporting providing comprises that small documents writes, small documents is continued and small documents reads application flow.And then reasonable storage and the low delay of realization mass small documents.
And then for the upper layer application system provides a file storage abstraction layers, upper layer application is carried out writing, reading of file by this level of abstraction, no matter be big file or small documents.Possess the characteristic of storage mass small documents, solved HDFS and be not suitable for storing the small documents problem, i.e. the growth of quantity of documents does not need to rely on increase NameNode internal memory to support.Write, read the operation of file by the I/O stream of standard, upper layer application, that is to say the retrofit work amount of upper layer application is reduced to minimum less than the different disposal of bottom to big small documents by this level of abstraction perception.Simultaneously, owing to be that I/O by standard flows to the operation of style of writing part, rather than pass through byte arrays, so the requirement to server memory is very low, such as the file that writes or read a 10MB size, do not need the internal memory of primary distribution 10MB size to come the cache file content, file content all is to operate by the mode of I/O stream.In addition, the storage of small documents possesses the characteristic of low delay.
The above only is embodiments of the invention; be not so limit claim of the present invention; every equivalent structure or equivalent flow process conversion that utilizes instructions of the present invention and accompanying drawing content to do; or directly or indirectly be used in other relevant technical fields, all in like manner be included in the scope of patent protection of the present invention.

Claims (8)

1. the mass small documents low delay storage means based on HBase is characterized in that: supply to comprise that small documents writes, small documents is continued and small documents reads the step of application after setting up the small documents storage environment under Hadoop, the HBase environment; Wherein, set up the small documents storage environment, comprise,
Set up HBase small documents table, in the HBase database, create the step of a small documents table; Described small documents table is as follows,
Figure FDA00002996884800011
The value of described capable major key is a unique character string that generates at random;
The described row i of family is used for the corresponding stored small documents and is divided into the length information of N each section after the section and the byte stream of file content according to predefined rule;
The described row m of family for the lock information of minute book line item, is used for whether writing fashionable decision line by other thread or process locking concurrent;
The small documents table is carried out the step of pre-subregion (character string that partitioning strategies adopts 16 systems to represent); And
The small documents table is opened the step of the Bloom filter of row.
2. the mass small documents low delay storage means based on HBase as claimed in claim 1 is characterized in that: the value of described capable major key is done MD5 digest to this value and is calculated by getting the UUID value, gets the character string that the hexadecimal representation of digest value forms again; The key name of the described row i of family is made up of section sequence number and current slice length, stores the byte arrays of the section content of current slice sequence number correspondence in the key assignments.
3. the mass small documents low delay storage means based on HBase as claimed in claim 1 is characterized in that: described small documents writes to use and comprises step,
A1), initialization small documents output stream, core buffer is opened up in small documents output stream inside in the back of quoting that obtains the target small documents, transmits the URI of target small documents to output stream; Comprised the target line major key that writes the small documents table among the described URI; Be not less than the small documents slice size of definition between described buffer empty;
A2), check the key assignments of the key name lock of the row m of family in the small documents table, if not, then row can be operated, newly-increased lock;
A3), from source file, read byte stream, be input in the buffer zone, after a buffer zone is fully written, content in this moment buffer zone is formed new section, and acquisition section sequence number, and obtain the true byte length of this section, be assembled into key name under the HBase small documents table i row family jointly in advance with the section sequence number, and the content of section stores in the corresponding key assignments into;
A4), connect HBase, new whole slices is outputed in the small documents table, the row that the row i of family dynamic expansion makes new advances in the small documents table are used for the new section of storage;
A5), the replacement buffer zone, get back to steps A 2;
A6), after source file all the elements read and finish, before small documents output stream is closed, the remaining content that does not reach section definition size threshold value in the buffer zone is assembled into last section that this time writes, output to HBase small documents epiphase with row in;
A7), discharge lock.
4. the mass small documents low delay storage means based on HBase as claimed in claim 3 is characterized in that: also can comprise step after the described steps A 7,
A8), by the capable major key of small documents storage, cooperate the KeyOnlyFilter of HBase to grasp all row names under the corresponding row i row family, and then from the row name, parse existing all sequence numbers of cutting into slices under the current state;
A9), judge that the key assignments of locking under the m row family of the capable major key correspondence of small documents for being/denying, then continues step if not;
A10), ordering obtains the section sequence number of current maximum;
A11), increase progressively maximum sequence number and begin the section write, it is identical that follow-up and complete small documents writes flow process.
5. the mass small documents low delay storage means based on HBase as claimed in claim 1 is characterized in that: described small documents reads to use and comprises step,
B1), initialization small documents inlet flow, buffer zone is opened up in small documents inlet flow inside, obtains the access limit of HBase small documents table; Be not less than the maximum slice size of small documents of definition between described buffer empty;
B2), according to the capable major key of the small documents of desiring to read in the small documents table, the inquiry corresponding row the i of row family under all column informations; Utilize the KeyOnlyFilter of HBase, only read the key name of the row i of family from the small documents table;
B3), when application layer reads data in to the small documents inlet flow, whether the small documents inlet flow checks available data in its buffer zone, if there is not data available, brings out the small documents inlet flow and flow to the slice of data that HBase please look for novelty;
B4), the application layer data of having consumed current buffer zone, return step B2;
B5), when small documents inlet flow location and after handling last section of small documents correspondence, return end signal and finish reading of whole small documents.
6. the mass small documents low delay storage means based on HBase as claimed in claim 1, it is characterized in that: described small documents deletion action, the capable major key that relies on the need deletion of HBase small documents is located fast, directly uses the API deletion corresponding row of HBase then.
7. as any described mass small documents low delay storage means based on HBase of claim 1-6, it is characterized in that: the storage abstraction layers of creating a file, be used for receiving the file size report from application layer, and return to application layer behind the URI according to predefined big or small file differentiation threshold value member correspondence, desiring of receiving then that application layer sends here carries out resolving URI when the file input stream of read/write and URI transmit, and automatically with in the corresponding small documents table that points to HDFS file system or HBase of the read/write operation of file.
8. the mass small documents low delay storage means based on HBase as claimed in claim 7, it is characterized in that: described threshold size is 10 megabyte.
CN201310112130.XA 2013-04-01 2013-04-01 Mass small documents low delay based on HBase storage method Active CN103246700B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310112130.XA CN103246700B (en) 2013-04-01 2013-04-01 Mass small documents low delay based on HBase storage method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310112130.XA CN103246700B (en) 2013-04-01 2013-04-01 Mass small documents low delay based on HBase storage method

Publications (2)

Publication Number Publication Date
CN103246700A true CN103246700A (en) 2013-08-14
CN103246700B CN103246700B (en) 2016-08-10

Family

ID=48926220

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310112130.XA Active CN103246700B (en) 2013-04-01 2013-04-01 Mass small documents low delay based on HBase storage method

Country Status (1)

Country Link
CN (1) CN103246700B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473365A (en) * 2013-09-25 2013-12-25 北京奇虎科技有限公司 File storage method and device based on HDFS (Hadoop Distributed File System) and distributed file system
CN103646073A (en) * 2013-12-11 2014-03-19 浪潮电子信息产业股份有限公司 Condition query optimizing method based on HBase table
CN104391910A (en) * 2014-11-17 2015-03-04 西安交通大学 HBase-based tax statistic report storage and calculation method
CN105094695A (en) * 2015-06-29 2015-11-25 浪潮(北京)电子信息产业有限公司 Storing method and system
CN105354323A (en) * 2015-11-16 2016-02-24 天津南大通用数据技术股份有限公司 Method and device for increasing precise inquiry speed of columnar storage database by using two-stage filtration
CN105426466A (en) * 2015-11-16 2016-03-23 天津南大通用数据技术股份有限公司 Method and apparatus for increasing accurate query speed of packing database
CN105677826A (en) * 2016-01-04 2016-06-15 博康智能网络科技股份有限公司 Resource management method for massive unstructured data
CN105956106A (en) * 2016-05-04 2016-09-21 北京思特奇信息技术股份有限公司 Method and system for accessing big data based on memory database and Hbase
CN105988995A (en) * 2015-01-27 2016-10-05 杭州海康威视数字技术股份有限公司 HFile based data batch loading method
CN106021491A (en) * 2016-05-20 2016-10-12 天津海量信息技术股份有限公司 Quasi real-time data storage method based on hdfs (Hadoop Distributed File System)
CN106469225A (en) * 2016-09-28 2017-03-01 厦门嵘拓物联科技有限公司 A kind of method that in intelligent workshop management, magnanimity manufaturing data accesses
CN106528819A (en) * 2016-11-16 2017-03-22 北京集奥聚合科技有限公司 Method and system for reading and writing time series data by HBase
CN108932287A (en) * 2018-05-22 2018-12-04 广东技术师范学院 A kind of mass small documents wiring method based on Hadoop
US10210173B2 (en) 2014-06-27 2019-02-19 International Business Machines Corporation File storage processing in HDFS
CN110532425A (en) * 2019-08-19 2019-12-03 深圳市网心科技有限公司 Video data placement formula storage method, device, computer equipment and storage medium
CN112256634A (en) * 2020-10-14 2021-01-22 杭州当虹科技股份有限公司 Low-memory large file analysis method based on http
CN115658626A (en) * 2022-12-26 2023-01-31 成都数默科技有限公司 Distributed network small file storage management method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866359A (en) * 2010-06-24 2010-10-20 北京航空航天大学 Small file storage and visit method in avicade file system
US20110106806A1 (en) * 2009-11-02 2011-05-05 Stg Interactive Process for optimizing file storage systems
CN102662992A (en) * 2012-03-14 2012-09-12 北京搜狐新媒体信息技术有限公司 Method and device for storing and accessing massive small files
CN102902716A (en) * 2012-08-27 2013-01-30 苏州两江科技有限公司 Storage system based on Hadoop distributed computing platform

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110106806A1 (en) * 2009-11-02 2011-05-05 Stg Interactive Process for optimizing file storage systems
CN101866359A (en) * 2010-06-24 2010-10-20 北京航空航天大学 Small file storage and visit method in avicade file system
CN102662992A (en) * 2012-03-14 2012-09-12 北京搜狐新媒体信息技术有限公司 Method and device for storing and accessing massive small files
CN102902716A (en) * 2012-08-27 2013-01-30 苏州两江科技有限公司 Storage system based on Hadoop distributed computing platform

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XUHUI LIU 等: "Implementing WebGIS on Hadoop:A Case Study of Improving Small File I/O Performance on HDFS", 《CLUSTER COMPUTER AND WORKSHOPS,2009》, 4 September 2009 (2009-09-04), pages 1 - 8 *
赵跃龙 等: "一种性能优化的小文件存储访问策略的研究", 《计算机研究与发展》, vol. 49, no. 7, 31 December 2012 (2012-12-31), pages 1579 - 1585 *

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473365A (en) * 2013-09-25 2013-12-25 北京奇虎科技有限公司 File storage method and device based on HDFS (Hadoop Distributed File System) and distributed file system
CN103646073A (en) * 2013-12-11 2014-03-19 浪潮电子信息产业股份有限公司 Condition query optimizing method based on HBase table
US10210173B2 (en) 2014-06-27 2019-02-19 International Business Machines Corporation File storage processing in HDFS
CN104391910A (en) * 2014-11-17 2015-03-04 西安交通大学 HBase-based tax statistic report storage and calculation method
CN105988995B (en) * 2015-01-27 2019-05-24 杭州海康威视数字技术股份有限公司 A method of based on HFile batch load data
CN105988995A (en) * 2015-01-27 2016-10-05 杭州海康威视数字技术股份有限公司 HFile based data batch loading method
CN105094695B (en) * 2015-06-29 2018-09-04 浪潮(北京)电子信息产业有限公司 A kind of storage method and system
CN105094695A (en) * 2015-06-29 2015-11-25 浪潮(北京)电子信息产业有限公司 Storing method and system
CN105354323A (en) * 2015-11-16 2016-02-24 天津南大通用数据技术股份有限公司 Method and device for increasing precise inquiry speed of columnar storage database by using two-stage filtration
CN105426466A (en) * 2015-11-16 2016-03-23 天津南大通用数据技术股份有限公司 Method and apparatus for increasing accurate query speed of packing database
CN105677826A (en) * 2016-01-04 2016-06-15 博康智能网络科技股份有限公司 Resource management method for massive unstructured data
CN105956106B (en) * 2016-05-04 2019-12-13 北京思特奇信息技术股份有限公司 method and system for accessing big data based on memory database and Hbase
CN105956106A (en) * 2016-05-04 2016-09-21 北京思特奇信息技术股份有限公司 Method and system for accessing big data based on memory database and Hbase
CN106021491A (en) * 2016-05-20 2016-10-12 天津海量信息技术股份有限公司 Quasi real-time data storage method based on hdfs (Hadoop Distributed File System)
CN106469225A (en) * 2016-09-28 2017-03-01 厦门嵘拓物联科技有限公司 A kind of method that in intelligent workshop management, magnanimity manufaturing data accesses
CN106469225B (en) * 2016-09-28 2019-04-16 厦门嵘拓物联科技有限公司 It is a kind of intelligence workshop management in magnanimity manufaturing data access method
CN106528819A (en) * 2016-11-16 2017-03-22 北京集奥聚合科技有限公司 Method and system for reading and writing time series data by HBase
CN108932287A (en) * 2018-05-22 2018-12-04 广东技术师范学院 A kind of mass small documents wiring method based on Hadoop
CN110532425A (en) * 2019-08-19 2019-12-03 深圳市网心科技有限公司 Video data placement formula storage method, device, computer equipment and storage medium
CN110532425B (en) * 2019-08-19 2022-04-01 深圳市网心科技有限公司 Video data distributed storage method and device, computer equipment and storage medium
CN112256634A (en) * 2020-10-14 2021-01-22 杭州当虹科技股份有限公司 Low-memory large file analysis method based on http
CN112256634B (en) * 2020-10-14 2024-03-26 杭州当虹科技股份有限公司 Http-based low-memory large file analysis method
CN115658626A (en) * 2022-12-26 2023-01-31 成都数默科技有限公司 Distributed network small file storage management method
CN115658626B (en) * 2022-12-26 2023-03-07 成都数默科技有限公司 Distributed network small file storage management method

Also Published As

Publication number Publication date
CN103246700B (en) 2016-08-10

Similar Documents

Publication Publication Date Title
CN103246700A (en) Mass small file low latency storage method based on HBase
US10831736B2 (en) Fast multi-tier indexing supporting dynamic update
CN103020315B (en) A kind of mass small documents storage means based on master-salve distributed file system
US8930332B2 (en) Method and system for partitioning search indexes
CN101556557B (en) Object file organization method based on object storage device
CN100468402C (en) Sort data storage and split catalog inquiry method based on catalog tree
CN103812939B (en) Big data storage system
CN110268394A (en) KVS tree
CN110383261A (en) Stream for multithread storage device selects
CN110291518A (en) Merge tree garbage index
JP5589205B2 (en) Computer system and data management method
CN110268399A (en) Merging tree for attended operation is modified
WO2018064962A1 (en) Data storage method, electronic device and computer non-volatile storage medium
CN103282899B (en) The storage method of data, access method and device in file system
CN104657459A (en) Massive data storage method based on file granularity
CN105612491A (en) Management of extent based metadata with dense tree structures within a distributed storage architecture
CN110119425A (en) Solid state drive, distributed data-storage system and the method using key assignments storage
CN105677826A (en) Resource management method for massive unstructured data
CN102169507A (en) Distributed real-time search engine
CN111427847B (en) Indexing and querying method and system for user-defined metadata
CN102541985A (en) Organization method of client directory cache in distributed file system
CN110347852A (en) It is embedded in the file system and file management method of key assignments storage system extending transversely
CN100424699C (en) Attribute extensible object file system
CN106570113B (en) Mass vector slice data cloud storage method and system
CN109542861A (en) File management method, device and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant