CN103246700B - Mass small documents low delay based on HBase storage method - Google Patents

Mass small documents low delay based on HBase storage method Download PDF

Info

Publication number
CN103246700B
CN103246700B CN201310112130.XA CN201310112130A CN103246700B CN 103246700 B CN103246700 B CN 103246700B CN 201310112130 A CN201310112130 A CN 201310112130A CN 103246700 B CN103246700 B CN 103246700B
Authority
CN
China
Prior art keywords
small documents
row
hbase
file
write
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310112130.XA
Other languages
Chinese (zh)
Other versions
CN103246700A (en
Inventor
魏超
鄢小征
栾江霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Meiya Pico Information Co Ltd
Original Assignee
Xiamen Meiya Pico Information Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Meiya Pico Information Co Ltd filed Critical Xiamen Meiya Pico Information Co Ltd
Priority to CN201310112130.XA priority Critical patent/CN103246700B/en
Publication of CN103246700A publication Critical patent/CN103246700A/en
Application granted granted Critical
Publication of CN103246700B publication Critical patent/CN103246700B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides a kind of mass small documents low delay based on HBase storage method, it is by setting up a kind of small documents table including a row major key and Liang Gelie race under Hadoop, HBase environment, thus set up and be suitable for small documents storage environment, and supporting include small documents write, small documents continue and small documents read application flow, and then realize rationally storing and low delay read-write of mass small documents, meet actual demand.

Description

Mass small documents low delay based on HBase storage method
Technical field
The present invention relates to one, refer in particular to a kind of mass small documents low delay based on HBase storage method.
Background technology
The distributed file system (HDFS-Hadoop Distributed File System) that Hadoop provides is made For the distributed storage solution of a kind of low cost, it is widely used in recent years.
The design object of HDFS is primarily directed to big file, the most hundreds of MB, the big file of upper GB, no Being suitable for storage small documents, reason is the NameNode node as the service of Hadoop master data, including needs Deposit the key message of middle preservation All Files, it is estimated that 10,000,000 files need to take the internal memory of 2~3GB, Well imagining, the server hardware to current main flow is brought the biggest pressure by the memory problem that more file brings Power, even can not be increased by hardware and join solution problem.
Doing optimize it addition, HDFS is partial to the read-write of high-throughput, the negative effect brought is to have certain prolonging After Shi Xing, such as file write, it is impossible to be reacted in other clients reading this document at once, After synchronizing Deng HDFS.So HDFS itself is not suitable for storing the data of low delay requirement.
Have certain methods at present and be not suitable for storing the problem of small documents, wherein Hadoop for solving HDFS The solution of official includes HAR(Hadoop Archives), i.e. a collection of small documents is filed a big literary composition Storing in part, but there is a lot of problem, one is the URI(Uniform Resource Identifier of small documents) become, Unfriendly to the application system being implemented on HDFS, application system needs to remodify to adapt to this and becomes Change;Two is that compression is not supported in this filing;Three is that the content in archive file cannot be revised, such as toward filing In file, newly-increased or one small documents of deletion, needs to re-start filing.
Other solution is SequenceFile, the MapFile provided by Hadoop, small documents First it is combined into a big file to store, but there is also some problems, row that such as cannot be simple and quick Go out small documents catalogue, and this solution is mainly towards Java language, the support to other language Bad.
In existing patent, " a kind of magnanimity dependent small documents based on Hadoop association storage method " is (special Profit application number: 201110312671.8), " a kind of magnanimity based on Hadoop can sort out small documents association storage Method " (number of patent application: 201110312694.9), also it is to use the method being first combined into big file, to upper The exploitation of layer application system requires, in other words cannot be fully transparent to upper layer application.The most do not possess low The characteristic of time delay.
Summary of the invention
It is an object of the invention to overcome drawbacks described above, it is provided that a kind of realize rationally depositing of mass small documents Storage and low delay, and prolong the mass small documents based on HBase that upper layer application system is fully transparent is low Time storage method.
The object of the present invention is achieved like this: a kind of mass small documents low delay storage side based on HBase Method, it is characterised in that: after setting up small documents storage environment under Hadoop, HBase environment, offer includes little File write, small documents are continued and small documents reads the step applied;Wherein, set up small documents storage environment, Including,
Set up HBase small documents table, HBase data base creates the step of a small documents table;Described Small documents table is as follows,
The value of described row major key is unique character string of a stochastic generation;
Described row race i, for correspondence storage small documents according to predefined rule be divided into after N number of section each The length information of section and the byte stream of file content;
Described row race m, for recording the lock information of one's own profession record, for whether judging row when concurrently writing Locked by other thread or process;
Small documents table is carried out the step of pre-subregion (partitioning strategies uses the character string that 16 systems represent);And
Small documents table is opened the step of the Bloom filter of row.
In said method, this value, by taking UUID value, is done MD5 digest and is calculated by the value of described row major key, Take the character string that the hexadecimal representation of digest value is formed again;The key name of described row race i is by Slice Sequence number and works as Front slice length forms, and key assignments internal memory contains the byte arrays of section content corresponding to current slice serial number;
In said method, the write application of described small documents includes step,
A1), small documents output stream is initialized, it is thus achieved that after the quoting of target small documents inside small documents output stream Open up core buffer, to the URI of output stream transmission target small documents;Described URI contains write little The target line major key of file table;Not less than the small documents slice size of definition between described buffer empty;
A2), the key assignments that the key name of small documents Biao Zhonglie race m is locked is checked, if it has not, then go operable, newly Increase lock;
A3), from source file, read byte stream, be input in relief area, after a relief area is fully written, Content in now relief area is formed new section, and obtains Slice Sequence number, and obtain the true of this section Real byte length, is pre-assembled as the key name under HBase small documents table i row race jointly with Slice Sequence number, and The content of section stores in the key assignments of correspondence;
A4), connect HBase, new whole slices is exported in small documents table, small documents Biao Zhonglie race i The row that dynamic expansion makes new advances are for storing this new section;
A5), reset relief area, return to step A2;
A6), source file all the elements read after, before small documents output stream is closed, by relief area The content of remaining not up to section definition size threshold value is assembled into last section of this time write, output In the row that HBase small documents table is identical;
A7), release lock;
Optimal, may also include step after described step A7,
A8) the row major key, by small documents stored, coordinates the KeyOnlyFilter of HBase to capture corresponding row All row names under i row race, and then from row name, parse existing all Slice Sequences number under current state;
A9), judge under the m row race that small documents row major key is corresponding the key assignments of lock as Yes/No, if otherwise continuing step Suddenly;
A10), sequence obtains current maximum Slice Sequence number;
A11), being incremented by maximum sequence number and start section write, it is identical that follow-up and complete small documents writes flow process;
In said method, described small documents reads application and includes step,
B1), initialize small documents inlet flow, open up relief area inside small documents inlet flow, it is thus achieved that HBase The access limit of small documents table;Not less than the small documents maximum slice size of definition between described buffer empty;
B2), according to the small documents to be read row major key in small documents table, under the row race i of inquiry corresponding row All column informations;Utilize the KeyOnlyFilter of HBase, only read the key name of row race i from small documents table;
B3), when reading data during application layer is to small documents inlet flow, small documents inlet flow checks its relief area The most whether having available data, without data available, induction small documents inlet flow flows to HBase request New slice of data;
B4), application layer consumed the data of current buffer, return step B2;
B5), after small documents inlet flow positions and process last section that complete primary school file is corresponding, knot is returned Bundle signal completes the reading of whole small documents;
In said method, described small documents deletion action, rely on the HBase row major key to small documents need to be deleted Quickly position, the most directly use the API of HBase to delete corresponding row;
In said method, set up a file storage abstraction layers, for receiving the file size from application layer Application layer is returned to after report, and the URI according to predefined size file differentiation threshold value structure correspondence, and Parsing URI when the file input stream of the read/write to be carried out that rear reception application layer is sent here and URI transmission come, and from In the dynamic small documents table that the read/write operation correspondence of file is pointed to HDFS file system or HBase;
In above-mentioned, described threshold size is 10 Mbytes.
The beneficial effects of the present invention is by setting up the applicable small documents of one under Hadoop, HBase environment Storage environment, thus supporting include small documents write, small documents continue and small documents read application flow, enter And realize rationally storage and the low delay read-write of mass small documents, meet actual demand.
Accompanying drawing explanation
The concrete structure of the present invention is described in detail in detail below in conjunction with the accompanying drawings
Fig. 1 is that the inventive method application size file at the middle and upper levels stores abstract graph;
Fig. 2 is write small documents schematic flow sheet in the inventive method;
Fig. 3 is reading small documents schematic flow sheet in the inventive method.
Detailed description of the invention
By describing the technology contents of the present invention, structural feature in detail, being realized purpose and effect, below in conjunction with Embodiment also coordinates accompanying drawing to be explained in detail.
The invention provides a kind of mass small documents low delay based on HBase storage method, it passes through For including that small documents write, small documents are continued after setting up small documents storage environment under Hadoop, HBase environment And small documents reads application.
Small documents therein building of environment of storage includes step:
One, Hadoop, HBase environment is prepared
This method realizes based on Hadoop, HBase, therefore firstly the need of first disposing Hadoop, HBase ring Border.
This method is implemented on HBase.HBase is to be implemented in distributed a, face on Hadoop The data base of nematic, is the most also a part of Hadoop, stands alone as now the top item of an Apache Mesh.HBase is not suitable for being used directly to store file, and its design object is for structured data, with Time possess the characteristic of low delay.The data of HBase can be saved in a lot of file system, and HDFS is optimal Selection.
Two, HBase small documents table is set up
Creating a small documents table in HBase data base, table structure is as follows:
Wherein, the value of row major key (key name: rowkey) is a character string stochastic generation, unique: Take UUID(Universally Unique Identifier) value, this value is done MD5 digest and calculates, the most again Take the character string of the hexadecimal representation of digest value.The purpose being processed as row major key is to make full use of HBase existing table subregion (region splitting) strategy, makes row be distributed in different region, keeps away Exempt from region hot issue.
This table has and only 2 Ge Lie races (column family):
Row race i: the section byte stream (file content) after storing small documents section and length information.
Row race m: for recording the lock information of one's own profession record, for judging row the most when concurrently writing Locked by other thread or process.
One complete small documents is in the table by a row complete representation.When writing this table, file can root Being N number of section according to the cutting of predefined rule, each section will obtain the Slice Sequence number being incremented by.Section sequence The complete byte arrays of row number, slice size, section is by by a row complete representation under i row race in this table.
Slice Sequence number, slice size are stored as row name.Section content (byte arrays) is stored as row Value.
The first two byte of row name is Slice Sequence number.Slice Sequence number is short (short type), short Positive portions maximum 32767 Slice Sequences number can be provided, for small documents this section upper limit Through enough.The Slice Sequence one side being incremented by, for distinguishing different sections, is on the other hand used for as little literary composition When part reads, section reconfigures provides assembling sequence into complete file.
What the remaining byte sections of row name stored is the physical length of current slice.This section is stored in train value Size, when calculating whole small documents size, only can read row name from this table, and need not read real Section train value, it is to avoid load the memory cost of whole section complete bytes array.
Train value storage section content (byte arrays), each small documents is cut into the section that multiple size is impartial, Preserve wherein.Slice size can customize as required, such as 4KB, 128KB, 512KB, 1MB.
Only have the row of an entitled lock under row race m, be used for providing capable lock information.When small documents is currently written into Time exist this row, file write at the end of lock release, this row be deleted.The row under self-defined row race are used to deposit Storage row lock information, the row lock mechanism more motility primary compared with HBase.
Three, subregion pre-to small documents table
This method needs the pre-partitioning strategies of character string using 16 systems to represent in small documents table (RegionSplitter.HexStringSplit).Row major key based on small documents table HEX (MD5 (UUID)) generates Mode, small documents stores in HBase and processes being effectively evenly distributed on different region, divides in advance It is possible to prevent single problem overheated for region behind district, by rational pre-subregion, can effectively read-write be asked Ask and be distributed on different subregions, promote concurrent reading and writing ability, thus promote small documents concurrent reading and writing performance.
Along with the growth of memory space, can do on existing subregion as required at the most manual subregion Reason.
Four, small documents table is opened Bloom filter (Bloom Filter)
Bloom filter is a kind of fast searching method based on Hash, and it can do the eliminating determined and possibility Supposition.HBase supports table is opened Bloom filter.Small documents table is opened the grand filtration of cloth of row by this method Device, it is ensured that the random performance read.
After above-mentioned small documents storage the building of environment, mass small documents based on HBase can be carried out Reading and writing application operating.Complete to upper layer application system in order to preferably realize the read-write operation of mass small documents Transparent, on the basis of above-mentioned environmental structure, also need to big small documents is stored in the process of row level of abstraction, specifically :
Seeing Fig. 1, provide one to pass through file storage abstraction layers for upper layer application system, file is finally stored HDFS file system is also stored in HBase small documents table, is transparent for upper layer application. Upper layer application has only to the nondistinctive general-purpose interface paying close attention to file storage.Step is as follows:
1, file size is informed storage abstraction layers by application layer.According to predefined big small documents inside level of abstraction Distinguish threshold value, build corresponding URI, return to application layer.
2, application layer is without paying close attention to the particular content of URI, is intended to file input stream and the URI carrying out storing Pass to abstract storage method.URI can be resolved inside level of abstraction automatically file is stored HDFS file system In system or HBase small documents table.
Such as, it is stored in the URI of HDFS with hdfs: // beginning, is stored in the URI of HBase small documents table With hbase: // beginning.
Above-mentioned put send out in, by providing a file storage abstraction layers for upper layer application system, automatically according to File size judges storage position, more than setting threshold values (such as 10MB), and storage to HDFS, URI With hdfs: // mark;Little equal to setting threshold values, storage to HBase, URI is with hbase: // mark.Below Illustrate less than the processing method in setting threshold values (i.e. small documents) storage to HBase.Small documents is that orientation is deposited Storing up in HBase, HBase itself possesses the characteristic of low delay, so this method also achieves small documents The low delay stored.
Present invention also offers standard set small documents I/O interface method, including small documents create, read, Continue, delete, obtain file size, can write, can read, whether exist.Inherit HBase The read write attribute of low delay, it is provided that (primary HBase API does not props up the I/O stream write interface of low memory cost Hold stream write).Concrete:
Small documents write application
Write HBase small documents table, by a self-defining small documents output stream, completes small documents byte stream Section and carry out the process stored to HBase.Ablation process can use normal file streamed, interior Deposit expense and be equal to slice size, be equivalent to the buffer size of standard I/O.
The logic flow seeing the write of Fig. 2 small documents is as follows:
1, obtain quoting of target small documents, be ready for write.
2, small documents output stream is initialized.Core buffer is opened up, between buffer empty inside small documents output stream Identical with the small documents slice size of definition.Pass through to the URI(of small documents output stream transmission target small documents " storage of upper layer application file is abstract " produces).URI contains the target line major key of write small documents table (rowkey).
3, checking row lock, as writeable, new line increment is locked, i.e. the lock row of small documents Biao Zhonglie race m.
4, from source file, read byte stream, be input in relief area.(i.e. shape after a relief area is fully written Become a new whole slices), its content forms new section, it is thus achieved that Slice Sequence number.Obtain from this section Take its true byte length, be jointly pre-assembled as the row under HBase small documents table i row race with Slice Sequence number Name (key name), the content of section stores in the train value (key assignments) of correspondence.
5, connect HBase, new whole slices is exported in small documents table.Small documents table corresponding row will be dynamic State extends the row made new advances and (belongs to row race i) to be used for storing new section.
6, reset relief area, return to step 3.
7, source file all the elements read after, before inlet flow is closed, by relief area remaining not The content reaching section definition size threshold value is assembled into last section of this time write, output to HBase In the row that small documents table is identical.
8, release row lock.
Small documents continues application
Additionally, this method is also supported to carry out existing small documents adding write, walk around HBase and only commonly arranged Can the whole restriction taken of whole deposit.
The logic flow continuing small documents continues, as follows:
9, the row major key stored by small documents, coordinates the KeyOnlyFilter of HBase to capture corresponding row i All row names under row race.And then from row name, parse existing all Slice Sequences number under current state.
10, judge that can small documents write.According to small documents table structure, the m of HBase small documents table By record row lock under row race.By judging whether the descending lock of m row race that small documents row major key is corresponding exists, can Directly to judge whether small documents can write;
11, sequence obtains current maximum Slice Sequence number.
12, being incremented by maximum sequence number and start section write, it is identical that follow-up and complete small documents writes flow process.
Small documents reads application
The reading of small documents, i.e. reads in corresponding row from HBase small documents table and reads all of row (section), And section is assembled into the most in order the process of complete small documents.Readout can use normal file Streamed, memory cost is equal to slice size, is equivalent to the buffer size of standard I/O.
Seeing Fig. 3, the logic flow that small documents reads is as follows:
1, small documents inlet flow is initialized.Relief area is opened up inside inlet flow, little with define between buffer empty File maximum slice size is identical.Obtain the access limit of HBase small documents table.
2, all column informations according to the small documents row major key to be read, under inquiry corresponding row i row race.
Utilize the KeyOnlyFilter that HBase provides, only read row name (key of row) from small documents table, subtract Few data traffic and avoiding loads whole slices content to the expense in internal memory.Data configuration according to obtaining goes out The Slice Sequence of small documents, safeguards the mapping of Slice Sequence number and row name, reads little literary composition for follow-up sheet of cutting Part.
3, when upper layer application attempts reading data from small documents inlet flow, small documents inlet flow inspection buffers Whether there are available data in district, without data available, induction input flowed to what HBase please look for novelty Slice of data.Now can map directly location according to the Slice Sequence number obtained before with row name and read corresponding Section content.The new section obtained is loaded onto relief area, supply upper layer application consumption.
4, consume the data of current buffer when upper layer application, will again trigger step 2.
5, after positioning when inlet flow and process last section that complete primary school file is corresponding, end signal will be returned Complete the reading of whole small documents.
Before reading, upper layer application can obtain file size, and this programme is without reading in complete file Appearance can obtain file size.According to the table structure of small documents storage, during small documents storage, cut After sheet, length corresponding to each section by record under i row race in row name of each row.
By the row name of all row under the KeyOnlyFilter of HBase, only crawl corresponding row i row race, i.e. obtain The information of all sections.After row name the first two byte being abandoned, correspondence after remaining byte conversion, can be obtained Slice size.Cumulative all of slice size is the complete and accurate size of corresponding small documents.
Before above-mentioned steps 1, upper layer application needs to judge that can small documents read.HBase supports concurrently to read Taking, therefore there is not the problem outside HBase itself concurrently reads in small documents storage table.Unique needs is sentenced Fixed is whether small documents exists.Can directly judge that small documents corresponding row major key exists by HBase API Whether HBase small documents table exists, exists and i.e. represent and can read.
One complete small documents is to be stored in rows in HBase small documents table.One row i.e. generation By HBase API, one complete small documents of table, therefore can directly judge that small documents corresponding row major key exists Whether small documents table exists, i.e. can determine that whether small documents exists.
Small documents deletes application
Small documents is to be stored in HBase small documents table with the form of section, and all sections of monofile are all deposited Storage is under same row.And then rely on the HBase quick location to rowkey, directly use HBase API Delete corresponding row, also imply that the small documents of correspondence is the most deleted.
As fully visible, the two large problems during the inventive method solves background technology, it is achieved mass small documents Rationally storage and low delay, and fully transparent to upper layer application system.
Actual effect to be reached: first, sets up small documents storage environment under Hadoop, HBase environment, The most supporting providing includes that small documents write, small documents are continued and small documents reads application flow.And then it is real Rationally storage and the low delay of existing mass small documents.
And then providing a file storage abstraction layers for upper layer application system, upper layer application passes through this level of abstraction Carry out the write of file, reading, no matter be big file or small documents.Possesses the spy of storage mass small documents Property, solve HDFS be not suitable for store small documents problem, i.e. quantity of documents growth need not rely on increase NameNode internal memory supports.Write, the operation of reading file are flowed by the I/O of standard, and upper layer application is led to Cross this level of abstraction perception less than the bottom different disposal to big small documents, say, that changing upper layer application Make workload and be reduced to minimum.Simultaneously as be that the I/O stream by standard carries out file operation rather than leads to Cross byte arrays, so the requirement to server memory is the lowest, such as write or read a 10MB size File, the internal memory being not required to primary distribution 10MB size carrys out cache file content, and file content is all Operate by the way of I/O flows.It addition, the storage of small documents possesses the characteristic of low delay.
The foregoing is only embodiments of the invention, not thereby limit the scope of the claims of the present invention, every profit The equivalent structure made by description of the invention and accompanying drawing content or equivalence flow process conversion, or directly or indirectly transport It is used in other relevant technical fields, is the most in like manner included in the scope of patent protection of the present invention.

Claims (6)

1. a mass small documents low delay based on HBase storage method, it is characterised in that: provide after setting up small documents storage environment under Hadoop, HBase environment and big small documents being stored in the process of row level of abstraction and include that application is continued in small documents write application, small documents and small documents reads application;Wherein, set up small documents storage environment, including,
Set up HBase small documents table, HBase data base creates a small documents table;Described small documents table is as follows,
The value of described row major key is a character string;
This value, by taking UUID value, is done MD5 digest and is calculated, then take the character string of the hexadecimal representation of digest value by the value of described row major key;
Described row race i, is divided into length information and the byte stream of file content of each section after N number of section for correspondence storage small documents according to predefined rule;
The key name of described row race i is made up of Slice Sequence number and current slice length, and key assignments internal memory contains the byte arrays of section content corresponding to current slice serial number;
Described row race m, for recording the lock information of one's own profession record, for judging that when concurrently writing row is the most locked by other thread or process;
Big small documents is stored in the process of row level of abstraction, including:
Set up a file storage abstraction layers, report for receiving the file size from application layer, and return to application layer after distinguishing, according to predefined size file, the URI that threshold value structure is corresponding, the file input stream and the URI transmission that then receive the read/write to be carried out that application layer is sent here resolve URI when coming, and automatically the read/write operation correspondence of file are pointed in the small documents table of HDFS file system or HBase;
Small documents table is carried out pre-subregion;And
Small documents table is opened the Bloom filter of row.
2. mass small documents low delay based on HBase storage method as claimed in claim 1, it is characterised in that: the write application of described small documents includes step,
A1), small documents output stream is initialized, it is thus achieved that after the quoting of target small documents, inside small documents output stream, open up core buffer, to the URI of output stream transmission target small documents;Described URI contains the target line major key of write small documents table;The space of described core buffer is not less than the small documents slice size of definition;
A2), the key assignments that the key name of small documents Biao Zhonglie race m is locked is checked, if it has not, then go operable, newly-increased lock;
A3), from source file, byte stream is read, it is input in relief area, after a relief area is fully written, content in now relief area is formed new section, and obtain Slice Sequence number, and obtain the true byte length of this section, jointly it is pre-assembled as the key name under HBase small documents table i row race with Slice Sequence number, and the content cut into slices stores in the key assignments of correspondence;
A4), connecting HBase, new whole slices exported in small documents table, the row that small documents Biao Zhonglie race i dynamic expansion makes new advances are for storing new section;
A5), reset relief area, return to step A2;
A6), after source file all the elements read, before small documents output stream is closed, the content of not up to section definition size threshold value remaining in relief area is assembled into last section of this time write, exports in the row that HBase small documents table is identical;
A7), release lock.
3. mass small documents low delay based on HBase storage method as claimed in claim 2, it is characterised in that: described step A7) after also include step,
A8) the row major key, by small documents stored, coordinates the KeyOnlyFilter of HBase to capture all row names under corresponding row i row race, and then parses existing all Slice Sequences number under current state from row name;
A9), judge under the m row race that small documents row major key is corresponding the key assignments of lock as Yes/No, if otherwise continuing step;
A10), sequence obtains current maximum Slice Sequence number;
A11), being incremented by maximum sequence number and start section write, it is identical that follow-up and complete small documents writes flow process.
4. mass small documents low delay based on HBase storage method as claimed in claim 1, it is characterised in that: described small documents reads application and includes step,
B1), initialize small documents inlet flow, open up relief area inside small documents inlet flow, it is thus achieved that the access limit of HBase small documents table;The space of described relief area is not less than the small documents maximum slice size of definition;
B2), according to the small documents to be read row major key in small documents table, all column informations under the row race i of inquiry corresponding row;Utilize the KeyOnlyFilter of HBase, only read the key name of row race i from small documents table;
B3), when reading data during application layer is to small documents inlet flow, whether small documents inlet flow has available data in checking its relief area, and without data available, induction small documents inlet flow flows to the slice of data that HBase please look for novelty;
B4), application layer consumed the data of current buffer, return step B2;
B5), after small documents inlet flow positions and process last section that complete primary school file is corresponding, return end signal and complete the reading of whole small documents.
5. mass small documents low delay based on HBase storage method as claimed in claim 1, it is characterized in that: also include small documents deletion action, described small documents deletion action, rely on HBase the row major key that need to delete small documents is quickly positioned, the most directly use the API of HBase to delete corresponding row.
6. mass small documents low delay based on HBase storage method as claimed in claim 1, it is characterised in that: described threshold size is 10 Mbytes.
CN201310112130.XA 2013-04-01 2013-04-01 Mass small documents low delay based on HBase storage method Active CN103246700B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310112130.XA CN103246700B (en) 2013-04-01 2013-04-01 Mass small documents low delay based on HBase storage method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310112130.XA CN103246700B (en) 2013-04-01 2013-04-01 Mass small documents low delay based on HBase storage method

Publications (2)

Publication Number Publication Date
CN103246700A CN103246700A (en) 2013-08-14
CN103246700B true CN103246700B (en) 2016-08-10

Family

ID=48926220

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310112130.XA Active CN103246700B (en) 2013-04-01 2013-04-01 Mass small documents low delay based on HBase storage method

Country Status (1)

Country Link
CN (1) CN103246700B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473365B (en) * 2013-09-25 2017-06-06 北京奇虎科技有限公司 A kind of file memory method based on HDFS, device and distributed file system
CN103646073A (en) * 2013-12-11 2014-03-19 浪潮电子信息产业股份有限公司 Condition query optimizing method based on HBase table
CN105205082A (en) 2014-06-27 2015-12-30 国际商业机器公司 Method and system for processing file storage in HDFS
CN104391910B (en) * 2014-11-17 2016-06-08 西安交通大学 A kind of taxation statistics form based on HBase stores and the method calculated
CN105988995B (en) * 2015-01-27 2019-05-24 杭州海康威视数字技术股份有限公司 A method of based on HFile batch load data
CN105094695B (en) * 2015-06-29 2018-09-04 浪潮(北京)电子信息产业有限公司 A kind of storage method and system
CN105426466A (en) * 2015-11-16 2016-03-23 天津南大通用数据技术股份有限公司 Method and apparatus for increasing accurate query speed of packing database
CN105354323A (en) * 2015-11-16 2016-02-24 天津南大通用数据技术股份有限公司 Method and device for increasing precise inquiry speed of columnar storage database by using two-stage filtration
CN105677826A (en) * 2016-01-04 2016-06-15 博康智能网络科技股份有限公司 Resource management method for massive unstructured data
CN105956106B (en) * 2016-05-04 2019-12-13 北京思特奇信息技术股份有限公司 method and system for accessing big data based on memory database and Hbase
CN106021491B (en) * 2016-05-20 2019-10-08 天津海量信息技术股份有限公司 Near-realtime data storage method based on hdfs
CN106469225B (en) * 2016-09-28 2019-04-16 厦门嵘拓物联科技有限公司 It is a kind of intelligence workshop management in magnanimity manufaturing data access method
CN106528819A (en) * 2016-11-16 2017-03-22 北京集奥聚合科技有限公司 Method and system for reading and writing time series data by HBase
CN108932287B (en) * 2018-05-22 2019-11-29 广东技术师范大学 A kind of mass small documents wiring method based on Hadoop
CN110532425B (en) * 2019-08-19 2022-04-01 深圳市网心科技有限公司 Video data distributed storage method and device, computer equipment and storage medium
CN112256634B (en) * 2020-10-14 2024-03-26 杭州当虹科技股份有限公司 Http-based low-memory large file analysis method
CN115658626B (en) * 2022-12-26 2023-03-07 成都数默科技有限公司 Distributed network small file storage management method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866359A (en) * 2010-06-24 2010-10-20 北京航空航天大学 Small file storage and visit method in avicade file system
CN102662992A (en) * 2012-03-14 2012-09-12 北京搜狐新媒体信息技术有限公司 Method and device for storing and accessing massive small files
CN102902716A (en) * 2012-08-27 2013-01-30 苏州两江科技有限公司 Storage system based on Hadoop distributed computing platform

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8768980B2 (en) * 2009-11-02 2014-07-01 Stg Interactive S.A. Process for optimizing file storage systems

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866359A (en) * 2010-06-24 2010-10-20 北京航空航天大学 Small file storage and visit method in avicade file system
CN102662992A (en) * 2012-03-14 2012-09-12 北京搜狐新媒体信息技术有限公司 Method and device for storing and accessing massive small files
CN102902716A (en) * 2012-08-27 2013-01-30 苏州两江科技有限公司 Storage system based on Hadoop distributed computing platform

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Xuhui Liu 等.Implementing WebGIS on Hadoop:A Case Study of Improving Small File I/O Performance on HDFS.《Cluster Computer and Workshops,2009》.2009,第1页-第8页. *
赵跃龙 等.一种性能优化的小文件存储访问策略的研究.《计算机研究与发展》.2012,第49卷(第7期),第1579页-第1585页. *

Also Published As

Publication number Publication date
CN103246700A (en) 2013-08-14

Similar Documents

Publication Publication Date Title
CN103246700B (en) Mass small documents low delay based on HBase storage method
CN106484877B (en) A kind of document retrieval system based on HDFS
CN108053863B (en) Mass medical data storage system and data storage method suitable for large and small files
CN104536959B (en) A kind of optimization method of Hadoop accessing small high-volume files
US10037341B1 (en) Nesting tree quotas within a filesystem
WO2018064962A1 (en) Data storage method, electronic device and computer non-volatile storage medium
CN103327052B (en) Date storage method and system and data access method and system
CN110383261A (en) Stream for multithread storage device selects
CN101674334B (en) Access control method of network storage equipment
CN110291518A (en) Merge tree garbage index
CN110268394A (en) KVS tree
KR100856245B1 (en) File system device and method for saving and seeking file thereof
CN110268399A (en) Merging tree for attended operation is modified
CN103812939A (en) Big data storage system
CN105787093B (en) A kind of construction method of the log file system based on LSM-Tree structure
WO2011053843A3 (en) Fixed content storage within a partitioned content platform using namespaces
US8095678B2 (en) Data processing
CN111427847B (en) Indexing and querying method and system for user-defined metadata
CN102024019B (en) Suffix tree based catalog organizing method in distributed file system
CN101641695A (en) Resource inserts filtering system and for the database structure that uses with resource access filtering system
CN100424699C (en) Attribute extensible object file system
CN106407355A (en) Data storage method and device
CN109542861A (en) File management method, device and system
CN104182487A (en) Unified storage method supporting various storage modes
GB2439577A (en) Storing data in streams of varying size

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant