CN103246700A

CN103246700A - Mass small file low latency storage method based on HBase

Info

Publication number: CN103246700A
Application number: CN201310112130XA
Authority: CN
Inventors: 魏超; 鄢小征; 栾江霞
Original assignee: Xiamen Meiya Pico Information Co Ltd
Current assignee: Xiamen Meiya Pico Information Co Ltd
Priority date: 2013-04-01
Filing date: 2013-04-01
Publication date: 2013-08-14
Anticipated expiration: 2033-04-01
Also published as: CN103246700B

Abstract

The invention provides a mass small file low latency storage method based on HBase. A small file list comprising a row primary key and two column families is established on the condition of Hadoop and HBase; storage environment suitable for small files is established, an application process including small file writing, small file inserting and small file reading is provided; and further reasonable storage and low latency reading and writing of the mass small files are realized, and practical requirements are met.

Description

Mass small documents low delay storage means based on HBase

Technical field

The present invention relates to a kind ofly, refer in particular to a kind of mass small documents low delay storage means based on HBase.

Background technology

The distributed file system that Hadoop provides (HDFS-Hadoop Distributed File System) as a kind of solution of distributed storage cheaply, is widely used in recent years.

The design object of HDFS mainly is at big file, big file such as hundreds of MB, last GB, be not suitable for storing small documents, reason is the NameNode node as the service of Hadoop master data, need preserve the key message of All Files in internal memory, it is estimated, 10,000,000 files need take the internal memory of 2～3GB, well imagine that the memory problem that more file brings will bring very big pressure to the server hardware of present main flow, even can not increase to join by hardware and deal with problems.

In addition, HDFS is partial to the read-write of high-throughput and does optimization, and the negative effect that brings is that certain time delay is arranged, and after writing such as file, can not be reacted on other clients that reading this document at once, wait HDFS synchronously afterwards.So HDFS itself is not suitable for storing the data of low delay requirement.

At present existing certain methods is used for solving the problem that HDFS is not suitable for storing small documents, wherein the solution of Hadoop official comprises HAR(Hadoop Archives), namely a collection of small documents is filed in the big file and stored, but there are a lot of problems, the one, the URI(unified resource identifier of small documents) become, unfriendly to the application system that is implemented on the HDFS, application system need remodify to adapt to this variation; The 2nd, compression is not supported in this filing; The 3rd, the content in the archive file can't be revised, and such as increasing newly in archive file or deleting a small documents, need file again.

The other solution is SequenceFile, the MapFile that provides by Hadoop, small documents is combined into a big file earlier to be stored, but also there are some problems, such as listing the small documents catalogue simply fast, and this solution is bad to the support of other language mainly towards Java language.

In the existing patent, " the non-independent small documents association store method of a kind of magnanimity based on Hadoop " (number of patent application: 201110312671.8), " a kind of magnanimity based on Hadoop can be sorted out small documents association store method " (number of patent application: 201110312694.9), also be to adopt the method that is combined into big file earlier, exploitation to the upper layer application system has requirement, in other words can't be transparent fully to upper layer application.The characteristic that does not also possess low delay in addition.

Summary of the invention

The objective of the invention is to overcome above-mentioned defective, a kind of reasonable storage and low delay of realizing mass small documents is provided, and to the complete transparent mass small documents low delay storage means based on HBase of upper layer application system.

The object of the present invention is achieved like this: a kind of mass small documents low delay storage means based on HBase is characterized in that: provide to comprise that small documents writes, small documents is continued and small documents reads the step of application after setting up the small documents storage environment under Hadoop, the HBase environment; Wherein, set up the small documents storage environment, comprise,

Set up HBase small documents table, in the HBase database, create the step of a small documents table; Described small documents table is as follows,

The value of described capable major key is a unique character string that generates at random;

The described row i of family is used for the corresponding stored small documents and is divided into the length information of N each section after the section and the byte stream of file content according to predefined rule;

The described row m of family for the lock information of minute book line item, is used for whether writing fashionable decision line by other thread or process locking concurrent;

The small documents table is carried out the step of pre-subregion (character string that partitioning strategies adopts 16 systems to represent); And

The small documents table is opened the step of the Bloom filter of row.

In the said method, the value of described capable major key is done MD5 digest to this value and is calculated by getting the UUID value, gets the character string of the hexadecimal representation formation of digest value again; The key name of the described row i of family is made up of section sequence number and current slice length, stores the byte arrays of the section content of current slice sequence number correspondence in the key assignments;

In the said method, described small documents writes to use and comprises step,

A1), initialization small documents output stream, core buffer is opened up in small documents output stream inside in the back of quoting that obtains the target small documents, transmits the URI of target small documents to output stream; Comprised the target line major key that writes the small documents table among the described URI; Be not less than the small documents slice size of definition between described buffer empty;

A2), check the key assignments of the key name lock of the row m of family in the small documents table, if not, then row can be operated, newly-increased lock;

A3), from source file, read byte stream, be input in the buffer zone, after a buffer zone is fully written, content in this moment buffer zone is formed new section, and acquisition section sequence number, and obtain the true byte length of this section, be assembled into key name under the HBase small documents table i row family jointly in advance with the section sequence number, and the content of section stores in the corresponding key assignments into;

A4), connect HBase, new whole slices is outputed in the small documents table, the row that the row i of family dynamic expansion makes new advances in the small documents table are used for storage should new section;

A5), the replacement buffer zone, get back to steps A 2;

A6), after source file all the elements read and finish, before small documents output stream is closed, the remaining content that does not reach section definition size threshold value in the buffer zone is assembled into last section that this time writes, output to HBase small documents epiphase with row in;

A7), discharge lock;

Best, also can comprise step after the described steps A 7,

A8), by the capable major key of small documents storage, cooperate the KeyOnlyFilter of HBase to grasp all row names under the corresponding row i row family, and then from the row name, parse existing all sequence numbers of cutting into slices under the current state;

A9), judge that the key assignments of locking under the m row family of the capable major key correspondence of small documents for being/denying, then continues step if not;

A10), ordering obtains the section sequence number of current maximum;

A11), increase progressively maximum sequence number and begin the section write, it is identical that follow-up and complete small documents writes flow process;

In the said method, described small documents reads to use and comprises step,

B1), initialization small documents inlet flow, buffer zone is opened up in small documents inlet flow inside, obtains the access limit of HBase small documents table; Be not less than the maximum slice size of small documents of definition between described buffer empty;

B2), according to the capable major key of the small documents of desiring to read in the small documents table, the inquiry corresponding row the i of row family under all column informations; Utilize the KeyOnlyFilter of HBase, only read the key name of the row i of family from the small documents table;

B3), when application layer reads data in to the small documents inlet flow, whether the small documents inlet flow checks available data in its buffer zone, if there is not data available, brings out the small documents inlet flow and flow to the slice of data that HBase please look for novelty;

B4), the application layer data of having consumed current buffer zone, return step B2;

B5), when small documents inlet flow location and after handling last section of small documents correspondence, return end signal and finish reading of whole small documents;

In the said method, described small documents deletion action, the capable major key that relies on the need deletion of HBase small documents is located fast, directly uses the API deletion corresponding row of HBase then;

In the said method, the storage abstraction layers of creating a file, be used for receiving the file size report from application layer, and according to returning to application layer behind the corresponding URI of predefined big or small file differentiation threshold value structure, desiring of receiving then that application layer sends here carries out resolving URI when the file input stream of read/write and URI transmit, and automatically with in the corresponding small documents table that points to HDFS file system or HBase of the read/write operation of file;

In above-mentioned, described threshold size is 10 megabyte.

Beneficial effect of the present invention is by set up a kind of suitable small documents storage environment under Hadoop, HBase environment, thereby supportingly comprise that small documents writes, small documents is continued and small documents reads application flow, and then reasonable storage and the low delay read-write of realization mass small documents, practical requirement.

Description of drawings

Below in conjunction with accompanying drawing in detail concrete structure of the present invention is described in detail

Fig. 1 is the inventive method application size file storage abstract graph at the middle and upper levels;

Fig. 2 writes the small documents schematic flow sheet in the inventive method;

Fig. 3 reads the small documents schematic flow sheet in the inventive method.

Embodiment

By describing technology contents of the present invention, structural attitude in detail, realized purpose and effect, give explanation below in conjunction with embodiment and conjunction with figs. are detailed.

The invention provides a kind of mass small documents low delay storage means based on HBase, it is by supplying to comprise that small documents writes, small documents is continued and small documents reads application after setting up the small documents storage environment under Hadoop, the HBase environment.

Building of small documents storage environment wherein comprises step:

One, prepares Hadoop, HBase environment

This method realizes based on Hadoop, HBase, therefore at first needs to dispose earlier Hadoop, HBase environment.

This method is implemented on the HBase.HBase be implemented on the Hadoop one distributed, towards the database of row, also is the part of Hadoop originally, independently is the top project of an Apache now.HBase is not suitable for directly being used for storage file, and its design object is for structured data, possesses the characteristic of low delay simultaneously.The data of HBase can be kept in a lot of file system, and HDFS is best selection.

Two, set up HBase small documents table

Create a small documents table in the HBase database, the list structure design is as follows:

Wherein, row major key (key name: value rowkey) is a character string that generate, unique at random: get UUID(Universally Unique Identifier) value, this value is done MD5 digest calculate the character string of getting the hexadecimal representation of digest value afterwards again.The purpose of so handling the row major key is to take full advantage of the existing table subregion of HBase (region splitting) strategy, and row is distributed among the different region, avoids the region hot issue.

This table has and only has 2 row families (column family):

The row i of family: be used for section byte stream (file content) and length information after the storage small documents is cut into slices.

The row m of family: be used for the lock information of minute book line item, be used for whether writing fashionable decision line by other thread or process locking concurrent.

Complete small documents in this table by a capable complete representation.When writing this table, file can be N section according to predefined regular cutting, and each section is with the section sequence number that obtains to increase progressively.A row complete representation during section sequence number, slice size, the complete byte arrays of section will be shown by this under i row family.

Section sequence number, slice size are stored as the row name.Section content (byte arrays) is stored as train value.

Preceding two bytes of row name are the section sequence number.The section sequence number is short (short type), short on the occasion of the part maximum can provide 32767 the section sequence numbers, for small documents this section upper limit enough.The section sequence number that increases progressively be used for to be distinguished different sections on the one hand, and section reconfigures to complete file assembling sequence is provided when being used to small documents to read on the other hand.

What the remaining byte of row name was partly stored is the physical length of current slice.This slice size of storage in train value when the whole small documents of calculating is big or small, can only reads the row name from this table, and need not read real section train value, avoids loading the memory cost of whole section complete bytes array.

Train value storage section content (byte arrays), each small documents is cut into the impartial section of a plurality of sizes, preserves wherein.Slice size can customize as required, such as 4KB, 128KB, 512KB, 1MB.

The row of a lock by name are only arranged under the row m of family, be used for providing capable lock information.When small documents is being write these row of fashionable existence, when writing end, locks by file to discharge, and these row are deleted.Use the self-defined row storage line lock information that is listed as under the family, the capable lock mechanism primary than HBase has more dirigibility.

Three, to the pre-subregion of small documents table

This method need be to the pre-subregion strategy of the character string that the small documents table adopts 16 systems to represent (RegionSplitter.HexStringSplit).Capable major key generating mode based on small documents table HEX (MD5 (UUID)), small documents stores into will effectively be evenly distributed to upward processing of different region among the HBase, can prevent the problem that single region is overheated behind the pre-subregion, by reasonably pre-subregion, can effectively be distributed to read-write requests on the different subregions, promote the concurrent reading and writing ability, thereby promote small documents concurrent reading and writing performance.

Along with the growth of storage space, can do further manual subregion at existing subregion as required and handle.

Four, the small documents table is opened Bloom filter (Bloom Filter)

Bloom filter is a kind of fast searching method based on Hash, and it can do definite eliminating and possible supposition.HBase supports his-and-hers watches to open Bloom filter.This method is opened the Bloom filter of going to the small documents table, guarantees the performance that reads at random.

Through after the building of above-mentioned small documents storage environment, can carry out the reading and writing application operating based on the mass small documents of HBase.Realize that for better the read-write operation of mass small documents is transparent fully to the upper layer application system, on above-mentioned environmental structure basis, also need the processing of level of abstraction is carried out in big small documents storage, concrete:

Referring to Fig. 1, for the upper layer application system provides one by the file storage abstraction layers, file finally is stored in and still is stored in the HDFS file system in the HBase small documents table, is transparent for upper layer application.Upper layer application only needs the nondistinctive general-purpose interface of paying close attention to the file storage.Step is as follows:

1, application layer is informed storage abstraction layers with file size.The predefined big or small file of level of abstraction internal condition is distinguished threshold value, makes up corresponding URI, returns to application layer.

2, application layer need not to pay close attention to the particular content of URI, and file input stream and the URI that desires to store passed to abstract storage means.The inner URI that can resolve of level of abstraction stores file in HDFS file system or the HBase small documents table automatically.

For example, the URI that is stored in HDFS is with hdfs: // beginning, the URI that is stored in HBase small documents table is with hbase: // beginning.

Above-mentioned putting sent out, by a file storage abstraction layers is provided for the upper layer application system, judge the memory location according to file size automatically, greater than setting threshold values (for example 10MB), stores HDFS into, and URI is with hdfs: // sign; For a short time equal to set threshold values, store HBase into, URI is with hbase: // sign.Below explanation is stored the disposal route among the HBase into less than setting threshold values (being small documents).Small documents is that orientation stores among the HBase, and HBase itself possesses the characteristic of low delay, so this method has also realized the low delay that small documents stores.

The present invention also provides standard set small documents I/O interface method, comprises that can small documents create, read, continue, delete, obtain file size, write, read, whether exist.Inherit the read write attribute of HBase low delay, provide the I/O stream of low memory cost to write incoming interface (primary HBase API does not support that stream writes).Concrete:

Small documents writes application

Write HBase small documents table by a self-defining small documents output stream, finish the section of small documents byte stream and the process of storing to HBase.Ablation process can use normal document flow form, and memory cost equals slice size, is equivalent to the buffer size of standard I/O.

The logic flow that writes referring to Fig. 2 small documents is as follows:

1, obtains quoting of target small documents, prepare to write.

2, initialization small documents output stream.Core buffer is opened up in small documents output stream inside, and the small documents slice size with definition between buffer empty is identical.The URI(that transmits the target small documents to small documents output stream produces by " storage of upper layer application file is abstract ").Comprised the target line major key (rowkey) that writes the small documents table among the URI.

3, check the row lock, as writing new line increment lock, i.e. the lock row of the row m of family in the small documents table.

4, from source file, read byte stream, be input in the buffer zone.(namely formed a new whole slices) after a buffer zone is fully written, its content forms new section, obtains the section sequence number.From then on its true byte length is obtained in section, is assembled into row name (key name) under the HBase small documents table i row family jointly in advance with the section sequence number, and the content of section stores in the corresponding train value (key assignments).

5, connect HBase, new whole slices is outputed in the small documents table.The row that small documents table corresponding row makes new advances dynamic expansion (belonging to the row i of family) are used for the new section of storage.

6, the replacement buffer zone is got back to step 3.

7, after source file all the elements read and finish, before inlet flow is closed, the remaining content that does not reach section definition size threshold value in the buffer zone is assembled into last section that this time writes, outputs in the HBase small documents epiphase row together.

8, discharge the row lock.

Small documents is continued application

In addition, this method is also supported existing small documents appended and is write, and has walked around the restriction that the common row of HBase can only whole stole only take.

The logic flow of continuing small documents continues, and is as follows:

9, by the capable major key of small documents storage, the KeyOnlyFilter of cooperation HBase grasps all the row names under the corresponding row i row family.And then from row names, parse existing all section sequence numbers under the current state.

10, judge that can small documents write.According to the design of small documents list structure, will record the row lock under the m row family of HBase small documents table.Whether exist by the descending lock of m row family of judging the capable major key correspondence of small documents, can judge directly whether small documents can write;

11, ordering obtains the section sequence number of current maximum.

12, increase progressively maximum sequence number and begin the section write, it is identical that follow-up and complete small documents writes flow process.

Small documents reads application

Reading of small documents namely read from HBase small documents table and read all row (section) in the corresponding row, and will cut into slices and be assembled into the process of complete small documents again in order.Readout can use normal document flow form, and memory cost equals slice size, is equivalent to the buffer size of standard I/O.

Referring to Fig. 3, the logic flow that small documents reads is as follows:

1, initialization small documents inlet flow.Buffer zone is opened up in inlet flow inside, and is identical with the maximum slice size of the small documents of definition between buffer empty.Obtain the access limit of HBase small documents table.

2, according to the capable major key of the small documents of desiring to read, all column informations under the inquiry corresponding row i row family.

The KeyOnlyFilter that utilizes HBase to provide only reads row name (key of row) from the small documents table, reduces data traffic and avoids loading the expense of whole slices content in the internal memory.Go out the section sequence of small documents according to the data configuration that obtains, safeguard the mapping of section sequence number and row name, be used for follow-up branch section and read small documents.

3, when upper layer application attempted reading data from the small documents inlet flow, the small documents inlet flow checked in the buffer zone whether available data are arranged, if there is not data available, will bring out the slice of data that inlet flow please be looked for novelty to HBase.Can shine upon directly to locate according to the section sequence number that obtains before and row name at this moment and read corresponding section content.The section of newly obtaining will be written into buffer zone, supply with upper layer application consumption.

4, consume the data of current buffer zone when upper layer application, will trigger step 2 again.

5, when the inlet flow location and after handling last section of small documents correspondence, finish reading of whole small documents with returning end signal.

Before reading, upper layer application can obtain file size, and this programme need not to read the complete file content can obtain file size.According to the list structure design of small documents storage, during the small documents storage, after being cut into slices, the corresponding length of each section will be recorded in the row name of each row under the i row family.

By the KeyOnlyFilter of HBase, only grasp the row name of all row under the corresponding row i row family, namely obtain the information of all sections.After will being listed as name preceding two bytes abandoning, can obtain the slice size of correspondence after the remaining byte conversion.All slice size that adds up are the complete and accurate size of corresponding small documents.

Before above-mentioned steps 1, upper layer application need judge that can small documents read.HBase supports concurrent reading, so the problem of small documents storage table outside not having that HBase itself is concurrent and reading.What unique needs were judged is whether small documents exists.Can judge directly whether small documents corresponding row major key exists in HBase small documents table by HBase API, exist i.e. expression to read.

A complete small documents is to be stored in the HBase small documents table with the form of going.A row namely represents a complete small documents, therefore can judge directly whether small documents corresponding row major key exists in the small documents table by HBase API, can judge whether small documents exists.

The small documents deletion is used

Small documents is that the form of cutting into slices is stored in the HBase small documents table, and all sections of monofile all are stored under the same row.And then the quick location of the rowkey of dependence HBase, directly use HBase API deletion corresponding row, mean that also corresponding small documents is deleted.

As fully visible, the inventive method has solved the two large problems in the background technology, realizes reasonable storage and the low delay of mass small documents, and transparent fully to the upper layer application system.

The actual effect that reaches: at first, set up the small documents storage environment under Hadoop, HBase environment, then supporting providing comprises that small documents writes, small documents is continued and small documents reads application flow.And then reasonable storage and the low delay of realization mass small documents.

And then for the upper layer application system provides a file storage abstraction layers, upper layer application is carried out writing, reading of file by this level of abstraction, no matter be big file or small documents.Possess the characteristic of storage mass small documents, solved HDFS and be not suitable for storing the small documents problem, i.e. the growth of quantity of documents does not need to rely on increase NameNode internal memory to support.Write, read the operation of file by the I/O stream of standard, upper layer application, that is to say the retrofit work amount of upper layer application is reduced to minimum less than the different disposal of bottom to big small documents by this level of abstraction perception.Simultaneously, owing to be that I/O by standard flows to the operation of style of writing part, rather than pass through byte arrays, so the requirement to server memory is very low, such as the file that writes or read a 10MB size, do not need the internal memory of primary distribution 10MB size to come the cache file content, file content all is to operate by the mode of I/O stream.In addition, the storage of small documents possesses the characteristic of low delay.

The above only is embodiments of the invention; be not so limit claim of the present invention; every equivalent structure or equivalent flow process conversion that utilizes instructions of the present invention and accompanying drawing content to do; or directly or indirectly be used in other relevant technical fields, all in like manner be included in the scope of patent protection of the present invention.

Claims

1. the mass small documents low delay storage means based on HBase is characterized in that: supply to comprise that small documents writes, small documents is continued and small documents reads the step of application after setting up the small documents storage environment under Hadoop, the HBase environment; Wherein, set up the small documents storage environment, comprise,

The small documents table is opened the step of the Bloom filter of row.

2. the mass small documents low delay storage means based on HBase as claimed in claim 1 is characterized in that: the value of described capable major key is done MD5 digest to this value and is calculated by getting the UUID value, gets the character string that the hexadecimal representation of digest value forms again; The key name of the described row i of family is made up of section sequence number and current slice length, stores the byte arrays of the section content of current slice sequence number correspondence in the key assignments.

3. the mass small documents low delay storage means based on HBase as claimed in claim 1 is characterized in that: described small documents writes to use and comprises step,

A4), connect HBase, new whole slices is outputed in the small documents table, the row that the row i of family dynamic expansion makes new advances in the small documents table are used for the new section of storage;

A5), the replacement buffer zone, get back to steps A 2;

A7), discharge lock.

4. the mass small documents low delay storage means based on HBase as claimed in claim 3 is characterized in that: also can comprise step after the described steps A 7,

A10), ordering obtains the section sequence number of current maximum;

A11), increase progressively maximum sequence number and begin the section write, it is identical that follow-up and complete small documents writes flow process.

5. the mass small documents low delay storage means based on HBase as claimed in claim 1 is characterized in that: described small documents reads to use and comprises step,

B5), when small documents inlet flow location and after handling last section of small documents correspondence, return end signal and finish reading of whole small documents.

6. the mass small documents low delay storage means based on HBase as claimed in claim 1, it is characterized in that: described small documents deletion action, the capable major key that relies on the need deletion of HBase small documents is located fast, directly uses the API deletion corresponding row of HBase then.

7. as any described mass small documents low delay storage means based on HBase of claim 1-6, it is characterized in that: the storage abstraction layers of creating a file, be used for receiving the file size report from application layer, and return to application layer behind the URI according to predefined big or small file differentiation threshold value member correspondence, desiring of receiving then that application layer sends here carries out resolving URI when the file input stream of read/write and URI transmit, and automatically with in the corresponding small documents table that points to HDFS file system or HBase of the read/write operation of file.

8. the mass small documents low delay storage means based on HBase as claimed in claim 7, it is characterized in that: described threshold size is 10 megabyte.