CN103246700B - Mass small documents low delay based on HBase storage method - Google Patents
Mass small documents low delay based on HBase storage method Download PDFInfo
- Publication number
- CN103246700B CN103246700B CN201310112130.XA CN201310112130A CN103246700B CN 103246700 B CN103246700 B CN 103246700B CN 201310112130 A CN201310112130 A CN 201310112130A CN 103246700 B CN103246700 B CN 103246700B
- Authority
- CN
- China
- Prior art keywords
- small documents
- row
- hbase
- file
- write
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The invention provides a kind of mass small documents low delay based on HBase storage method, it is by setting up a kind of small documents table including a row major key and Liang Gelie race under Hadoop, HBase environment, thus set up and be suitable for small documents storage environment, and supporting include small documents write, small documents continue and small documents read application flow, and then realize rationally storing and low delay read-write of mass small documents, meet actual demand.
Description
Technical field
The present invention relates to one, refer in particular to a kind of mass small documents low delay based on HBase storage method.
Background technology
The distributed file system (HDFS-Hadoop Distributed File System) that Hadoop provides is made
For the distributed storage solution of a kind of low cost, it is widely used in recent years.
The design object of HDFS is primarily directed to big file, the most hundreds of MB, the big file of upper GB, no
Being suitable for storage small documents, reason is the NameNode node as the service of Hadoop master data, including needs
Deposit the key message of middle preservation All Files, it is estimated that 10,000,000 files need to take the internal memory of 2~3GB,
Well imagining, the server hardware to current main flow is brought the biggest pressure by the memory problem that more file brings
Power, even can not be increased by hardware and join solution problem.
Doing optimize it addition, HDFS is partial to the read-write of high-throughput, the negative effect brought is to have certain prolonging
After Shi Xing, such as file write, it is impossible to be reacted in other clients reading this document at once,
After synchronizing Deng HDFS.So HDFS itself is not suitable for storing the data of low delay requirement.
Have certain methods at present and be not suitable for storing the problem of small documents, wherein Hadoop for solving HDFS
The solution of official includes HAR(Hadoop Archives), i.e. a collection of small documents is filed a big literary composition
Storing in part, but there is a lot of problem, one is the URI(Uniform Resource Identifier of small documents) become,
Unfriendly to the application system being implemented on HDFS, application system needs to remodify to adapt to this and becomes
Change;Two is that compression is not supported in this filing;Three is that the content in archive file cannot be revised, such as toward filing
In file, newly-increased or one small documents of deletion, needs to re-start filing.
Other solution is SequenceFile, the MapFile provided by Hadoop, small documents
First it is combined into a big file to store, but there is also some problems, row that such as cannot be simple and quick
Go out small documents catalogue, and this solution is mainly towards Java language, the support to other language
Bad.
In existing patent, " a kind of magnanimity dependent small documents based on Hadoop association storage method " is (special
Profit application number: 201110312671.8), " a kind of magnanimity based on Hadoop can sort out small documents association storage
Method " (number of patent application: 201110312694.9), also it is to use the method being first combined into big file, to upper
The exploitation of layer application system requires, in other words cannot be fully transparent to upper layer application.The most do not possess low
The characteristic of time delay.
Summary of the invention
It is an object of the invention to overcome drawbacks described above, it is provided that a kind of realize rationally depositing of mass small documents
Storage and low delay, and prolong the mass small documents based on HBase that upper layer application system is fully transparent is low
Time storage method.
The object of the present invention is achieved like this: a kind of mass small documents low delay storage side based on HBase
Method, it is characterised in that: after setting up small documents storage environment under Hadoop, HBase environment, offer includes little
File write, small documents are continued and small documents reads the step applied;Wherein, set up small documents storage environment,
Including,
Set up HBase small documents table, HBase data base creates the step of a small documents table;Described
Small documents table is as follows,
The value of described row major key is unique character string of a stochastic generation;
Described row race i, for correspondence storage small documents according to predefined rule be divided into after N number of section each
The length information of section and the byte stream of file content;
Described row race m, for recording the lock information of one's own profession record, for whether judging row when concurrently writing
Locked by other thread or process;
Small documents table is carried out the step of pre-subregion (partitioning strategies uses the character string that 16 systems represent);And
Small documents table is opened the step of the Bloom filter of row.
In said method, this value, by taking UUID value, is done MD5 digest and is calculated by the value of described row major key,
Take the character string that the hexadecimal representation of digest value is formed again;The key name of described row race i is by Slice Sequence number and works as
Front slice length forms, and key assignments internal memory contains the byte arrays of section content corresponding to current slice serial number;
In said method, the write application of described small documents includes step,
A1), small documents output stream is initialized, it is thus achieved that after the quoting of target small documents inside small documents output stream
Open up core buffer, to the URI of output stream transmission target small documents;Described URI contains write little
The target line major key of file table;Not less than the small documents slice size of definition between described buffer empty;
A2), the key assignments that the key name of small documents Biao Zhonglie race m is locked is checked, if it has not, then go operable, newly
Increase lock;
A3), from source file, read byte stream, be input in relief area, after a relief area is fully written,
Content in now relief area is formed new section, and obtains Slice Sequence number, and obtain the true of this section
Real byte length, is pre-assembled as the key name under HBase small documents table i row race jointly with Slice Sequence number, and
The content of section stores in the key assignments of correspondence;
A4), connect HBase, new whole slices is exported in small documents table, small documents Biao Zhonglie race i
The row that dynamic expansion makes new advances are for storing this new section;
A5), reset relief area, return to step A2;
A6), source file all the elements read after, before small documents output stream is closed, by relief area
The content of remaining not up to section definition size threshold value is assembled into last section of this time write, output
In the row that HBase small documents table is identical;
A7), release lock;
Optimal, may also include step after described step A7,
A8) the row major key, by small documents stored, coordinates the KeyOnlyFilter of HBase to capture corresponding row
All row names under i row race, and then from row name, parse existing all Slice Sequences number under current state;
A9), judge under the m row race that small documents row major key is corresponding the key assignments of lock as Yes/No, if otherwise continuing step
Suddenly;
A10), sequence obtains current maximum Slice Sequence number;
A11), being incremented by maximum sequence number and start section write, it is identical that follow-up and complete small documents writes flow process;
In said method, described small documents reads application and includes step,
B1), initialize small documents inlet flow, open up relief area inside small documents inlet flow, it is thus achieved that HBase
The access limit of small documents table;Not less than the small documents maximum slice size of definition between described buffer empty;
B2), according to the small documents to be read row major key in small documents table, under the row race i of inquiry corresponding row
All column informations;Utilize the KeyOnlyFilter of HBase, only read the key name of row race i from small documents table;
B3), when reading data during application layer is to small documents inlet flow, small documents inlet flow checks its relief area
The most whether having available data, without data available, induction small documents inlet flow flows to HBase request
New slice of data;
B4), application layer consumed the data of current buffer, return step B2;
B5), after small documents inlet flow positions and process last section that complete primary school file is corresponding, knot is returned
Bundle signal completes the reading of whole small documents;
In said method, described small documents deletion action, rely on the HBase row major key to small documents need to be deleted
Quickly position, the most directly use the API of HBase to delete corresponding row;
In said method, set up a file storage abstraction layers, for receiving the file size from application layer
Application layer is returned to after report, and the URI according to predefined size file differentiation threshold value structure correspondence, and
Parsing URI when the file input stream of the read/write to be carried out that rear reception application layer is sent here and URI transmission come, and from
In the dynamic small documents table that the read/write operation correspondence of file is pointed to HDFS file system or HBase;
In above-mentioned, described threshold size is 10 Mbytes.
The beneficial effects of the present invention is by setting up the applicable small documents of one under Hadoop, HBase environment
Storage environment, thus supporting include small documents write, small documents continue and small documents read application flow, enter
And realize rationally storage and the low delay read-write of mass small documents, meet actual demand.
Accompanying drawing explanation
The concrete structure of the present invention is described in detail in detail below in conjunction with the accompanying drawings
Fig. 1 is that the inventive method application size file at the middle and upper levels stores abstract graph;
Fig. 2 is write small documents schematic flow sheet in the inventive method;
Fig. 3 is reading small documents schematic flow sheet in the inventive method.
Detailed description of the invention
By describing the technology contents of the present invention, structural feature in detail, being realized purpose and effect, below in conjunction with
Embodiment also coordinates accompanying drawing to be explained in detail.
The invention provides a kind of mass small documents low delay based on HBase storage method, it passes through
For including that small documents write, small documents are continued after setting up small documents storage environment under Hadoop, HBase environment
And small documents reads application.
Small documents therein building of environment of storage includes step:
One, Hadoop, HBase environment is prepared
This method realizes based on Hadoop, HBase, therefore firstly the need of first disposing Hadoop, HBase ring
Border.
This method is implemented on HBase.HBase is to be implemented in distributed a, face on Hadoop
The data base of nematic, is the most also a part of Hadoop, stands alone as now the top item of an Apache
Mesh.HBase is not suitable for being used directly to store file, and its design object is for structured data, with
Time possess the characteristic of low delay.The data of HBase can be saved in a lot of file system, and HDFS is optimal
Selection.
Two, HBase small documents table is set up
Creating a small documents table in HBase data base, table structure is as follows:
Wherein, the value of row major key (key name: rowkey) is a character string stochastic generation, unique:
Take UUID(Universally Unique Identifier) value, this value is done MD5 digest and calculates, the most again
Take the character string of the hexadecimal representation of digest value.The purpose being processed as row major key is to make full use of
HBase existing table subregion (region splitting) strategy, makes row be distributed in different region, keeps away
Exempt from region hot issue.
This table has and only 2 Ge Lie races (column family):
Row race i: the section byte stream (file content) after storing small documents section and length information.
Row race m: for recording the lock information of one's own profession record, for judging row the most when concurrently writing
Locked by other thread or process.
One complete small documents is in the table by a row complete representation.When writing this table, file can root
Being N number of section according to the cutting of predefined rule, each section will obtain the Slice Sequence number being incremented by.Section sequence
The complete byte arrays of row number, slice size, section is by by a row complete representation under i row race in this table.
Slice Sequence number, slice size are stored as row name.Section content (byte arrays) is stored as row
Value.
The first two byte of row name is Slice Sequence number.Slice Sequence number is short (short type), short
Positive portions maximum 32767 Slice Sequences number can be provided, for small documents this section upper limit
Through enough.The Slice Sequence one side being incremented by, for distinguishing different sections, is on the other hand used for as little literary composition
When part reads, section reconfigures provides assembling sequence into complete file.
What the remaining byte sections of row name stored is the physical length of current slice.This section is stored in train value
Size, when calculating whole small documents size, only can read row name from this table, and need not read real
Section train value, it is to avoid load the memory cost of whole section complete bytes array.
Train value storage section content (byte arrays), each small documents is cut into the section that multiple size is impartial,
Preserve wherein.Slice size can customize as required, such as 4KB, 128KB, 512KB, 1MB.
Only have the row of an entitled lock under row race m, be used for providing capable lock information.When small documents is currently written into
Time exist this row, file write at the end of lock release, this row be deleted.The row under self-defined row race are used to deposit
Storage row lock information, the row lock mechanism more motility primary compared with HBase.
Three, subregion pre-to small documents table
This method needs the pre-partitioning strategies of character string using 16 systems to represent in small documents table
(RegionSplitter.HexStringSplit).Row major key based on small documents table HEX (MD5 (UUID)) generates
Mode, small documents stores in HBase and processes being effectively evenly distributed on different region, divides in advance
It is possible to prevent single problem overheated for region behind district, by rational pre-subregion, can effectively read-write be asked
Ask and be distributed on different subregions, promote concurrent reading and writing ability, thus promote small documents concurrent reading and writing performance.
Along with the growth of memory space, can do on existing subregion as required at the most manual subregion
Reason.
Four, small documents table is opened Bloom filter (Bloom Filter)
Bloom filter is a kind of fast searching method based on Hash, and it can do the eliminating determined and possibility
Supposition.HBase supports table is opened Bloom filter.Small documents table is opened the grand filtration of cloth of row by this method
Device, it is ensured that the random performance read.
After above-mentioned small documents storage the building of environment, mass small documents based on HBase can be carried out
Reading and writing application operating.Complete to upper layer application system in order to preferably realize the read-write operation of mass small documents
Transparent, on the basis of above-mentioned environmental structure, also need to big small documents is stored in the process of row level of abstraction, specifically
:
Seeing Fig. 1, provide one to pass through file storage abstraction layers for upper layer application system, file is finally stored
HDFS file system is also stored in HBase small documents table, is transparent for upper layer application.
Upper layer application has only to the nondistinctive general-purpose interface paying close attention to file storage.Step is as follows:
1, file size is informed storage abstraction layers by application layer.According to predefined big small documents inside level of abstraction
Distinguish threshold value, build corresponding URI, return to application layer.
2, application layer is without paying close attention to the particular content of URI, is intended to file input stream and the URI carrying out storing
Pass to abstract storage method.URI can be resolved inside level of abstraction automatically file is stored HDFS file system
In system or HBase small documents table.
Such as, it is stored in the URI of HDFS with hdfs: // beginning, is stored in the URI of HBase small documents table
With hbase: // beginning.
Above-mentioned put send out in, by providing a file storage abstraction layers for upper layer application system, automatically according to
File size judges storage position, more than setting threshold values (such as 10MB), and storage to HDFS, URI
With hdfs: // mark;Little equal to setting threshold values, storage to HBase, URI is with hbase: // mark.Below
Illustrate less than the processing method in setting threshold values (i.e. small documents) storage to HBase.Small documents is that orientation is deposited
Storing up in HBase, HBase itself possesses the characteristic of low delay, so this method also achieves small documents
The low delay stored.
Present invention also offers standard set small documents I/O interface method, including small documents create, read,
Continue, delete, obtain file size, can write, can read, whether exist.Inherit HBase
The read write attribute of low delay, it is provided that (primary HBase API does not props up the I/O stream write interface of low memory cost
Hold stream write).Concrete:
Small documents write application
Write HBase small documents table, by a self-defining small documents output stream, completes small documents byte stream
Section and carry out the process stored to HBase.Ablation process can use normal file streamed, interior
Deposit expense and be equal to slice size, be equivalent to the buffer size of standard I/O.
The logic flow seeing the write of Fig. 2 small documents is as follows:
1, obtain quoting of target small documents, be ready for write.
2, small documents output stream is initialized.Core buffer is opened up, between buffer empty inside small documents output stream
Identical with the small documents slice size of definition.Pass through to the URI(of small documents output stream transmission target small documents
" storage of upper layer application file is abstract " produces).URI contains the target line major key of write small documents table
(rowkey).
3, checking row lock, as writeable, new line increment is locked, i.e. the lock row of small documents Biao Zhonglie race m.
4, from source file, read byte stream, be input in relief area.(i.e. shape after a relief area is fully written
Become a new whole slices), its content forms new section, it is thus achieved that Slice Sequence number.Obtain from this section
Take its true byte length, be jointly pre-assembled as the row under HBase small documents table i row race with Slice Sequence number
Name (key name), the content of section stores in the train value (key assignments) of correspondence.
5, connect HBase, new whole slices is exported in small documents table.Small documents table corresponding row will be dynamic
State extends the row made new advances and (belongs to row race i) to be used for storing new section.
6, reset relief area, return to step 3.
7, source file all the elements read after, before inlet flow is closed, by relief area remaining not
The content reaching section definition size threshold value is assembled into last section of this time write, output to HBase
In the row that small documents table is identical.
8, release row lock.
Small documents continues application
Additionally, this method is also supported to carry out existing small documents adding write, walk around HBase and only commonly arranged
Can the whole restriction taken of whole deposit.
The logic flow continuing small documents continues, as follows:
9, the row major key stored by small documents, coordinates the KeyOnlyFilter of HBase to capture corresponding row i
All row names under row race.And then from row name, parse existing all Slice Sequences number under current state.
10, judge that can small documents write.According to small documents table structure, the m of HBase small documents table
By record row lock under row race.By judging whether the descending lock of m row race that small documents row major key is corresponding exists, can
Directly to judge whether small documents can write;
11, sequence obtains current maximum Slice Sequence number.
12, being incremented by maximum sequence number and start section write, it is identical that follow-up and complete small documents writes flow process.
Small documents reads application
The reading of small documents, i.e. reads in corresponding row from HBase small documents table and reads all of row (section),
And section is assembled into the most in order the process of complete small documents.Readout can use normal file
Streamed, memory cost is equal to slice size, is equivalent to the buffer size of standard I/O.
Seeing Fig. 3, the logic flow that small documents reads is as follows:
1, small documents inlet flow is initialized.Relief area is opened up inside inlet flow, little with define between buffer empty
File maximum slice size is identical.Obtain the access limit of HBase small documents table.
2, all column informations according to the small documents row major key to be read, under inquiry corresponding row i row race.
Utilize the KeyOnlyFilter that HBase provides, only read row name (key of row) from small documents table, subtract
Few data traffic and avoiding loads whole slices content to the expense in internal memory.Data configuration according to obtaining goes out
The Slice Sequence of small documents, safeguards the mapping of Slice Sequence number and row name, reads little literary composition for follow-up sheet of cutting
Part.
3, when upper layer application attempts reading data from small documents inlet flow, small documents inlet flow inspection buffers
Whether there are available data in district, without data available, induction input flowed to what HBase please look for novelty
Slice of data.Now can map directly location according to the Slice Sequence number obtained before with row name and read corresponding
Section content.The new section obtained is loaded onto relief area, supply upper layer application consumption.
4, consume the data of current buffer when upper layer application, will again trigger step 2.
5, after positioning when inlet flow and process last section that complete primary school file is corresponding, end signal will be returned
Complete the reading of whole small documents.
Before reading, upper layer application can obtain file size, and this programme is without reading in complete file
Appearance can obtain file size.According to the table structure of small documents storage, during small documents storage, cut
After sheet, length corresponding to each section by record under i row race in row name of each row.
By the row name of all row under the KeyOnlyFilter of HBase, only crawl corresponding row i row race, i.e. obtain
The information of all sections.After row name the first two byte being abandoned, correspondence after remaining byte conversion, can be obtained
Slice size.Cumulative all of slice size is the complete and accurate size of corresponding small documents.
Before above-mentioned steps 1, upper layer application needs to judge that can small documents read.HBase supports concurrently to read
Taking, therefore there is not the problem outside HBase itself concurrently reads in small documents storage table.Unique needs is sentenced
Fixed is whether small documents exists.Can directly judge that small documents corresponding row major key exists by HBase API
Whether HBase small documents table exists, exists and i.e. represent and can read.
One complete small documents is to be stored in rows in HBase small documents table.One row i.e. generation
By HBase API, one complete small documents of table, therefore can directly judge that small documents corresponding row major key exists
Whether small documents table exists, i.e. can determine that whether small documents exists.
Small documents deletes application
Small documents is to be stored in HBase small documents table with the form of section, and all sections of monofile are all deposited
Storage is under same row.And then rely on the HBase quick location to rowkey, directly use HBase API
Delete corresponding row, also imply that the small documents of correspondence is the most deleted.
As fully visible, the two large problems during the inventive method solves background technology, it is achieved mass small documents
Rationally storage and low delay, and fully transparent to upper layer application system.
Actual effect to be reached: first, sets up small documents storage environment under Hadoop, HBase environment,
The most supporting providing includes that small documents write, small documents are continued and small documents reads application flow.And then it is real
Rationally storage and the low delay of existing mass small documents.
And then providing a file storage abstraction layers for upper layer application system, upper layer application passes through this level of abstraction
Carry out the write of file, reading, no matter be big file or small documents.Possesses the spy of storage mass small documents
Property, solve HDFS be not suitable for store small documents problem, i.e. quantity of documents growth need not rely on increase
NameNode internal memory supports.Write, the operation of reading file are flowed by the I/O of standard, and upper layer application is led to
Cross this level of abstraction perception less than the bottom different disposal to big small documents, say, that changing upper layer application
Make workload and be reduced to minimum.Simultaneously as be that the I/O stream by standard carries out file operation rather than leads to
Cross byte arrays, so the requirement to server memory is the lowest, such as write or read a 10MB size
File, the internal memory being not required to primary distribution 10MB size carrys out cache file content, and file content is all
Operate by the way of I/O flows.It addition, the storage of small documents possesses the characteristic of low delay.
The foregoing is only embodiments of the invention, not thereby limit the scope of the claims of the present invention, every profit
The equivalent structure made by description of the invention and accompanying drawing content or equivalence flow process conversion, or directly or indirectly transport
It is used in other relevant technical fields, is the most in like manner included in the scope of patent protection of the present invention.
Claims (6)
1. a mass small documents low delay based on HBase storage method, it is characterised in that: provide after setting up small documents storage environment under Hadoop, HBase environment and big small documents being stored in the process of row level of abstraction and include that application is continued in small documents write application, small documents and small documents reads application;Wherein, set up small documents storage environment, including,
Set up HBase small documents table, HBase data base creates a small documents table;Described small documents table is as follows,
The value of described row major key is a character string;
This value, by taking UUID value, is done MD5 digest and is calculated, then take the character string of the hexadecimal representation of digest value by the value of described row major key;
Described row race i, is divided into length information and the byte stream of file content of each section after N number of section for correspondence storage small documents according to predefined rule;
The key name of described row race i is made up of Slice Sequence number and current slice length, and key assignments internal memory contains the byte arrays of section content corresponding to current slice serial number;
Described row race m, for recording the lock information of one's own profession record, for judging that when concurrently writing row is the most locked by other thread or process;
Big small documents is stored in the process of row level of abstraction, including:
Set up a file storage abstraction layers, report for receiving the file size from application layer, and return to application layer after distinguishing, according to predefined size file, the URI that threshold value structure is corresponding, the file input stream and the URI transmission that then receive the read/write to be carried out that application layer is sent here resolve URI when coming, and automatically the read/write operation correspondence of file are pointed in the small documents table of HDFS file system or HBase;
Small documents table is carried out pre-subregion;And
Small documents table is opened the Bloom filter of row.
2. mass small documents low delay based on HBase storage method as claimed in claim 1, it is characterised in that: the write application of described small documents includes step,
A1), small documents output stream is initialized, it is thus achieved that after the quoting of target small documents, inside small documents output stream, open up core buffer, to the URI of output stream transmission target small documents;Described URI contains the target line major key of write small documents table;The space of described core buffer is not less than the small documents slice size of definition;
A2), the key assignments that the key name of small documents Biao Zhonglie race m is locked is checked, if it has not, then go operable, newly-increased lock;
A3), from source file, byte stream is read, it is input in relief area, after a relief area is fully written, content in now relief area is formed new section, and obtain Slice Sequence number, and obtain the true byte length of this section, jointly it is pre-assembled as the key name under HBase small documents table i row race with Slice Sequence number, and the content cut into slices stores in the key assignments of correspondence;
A4), connecting HBase, new whole slices exported in small documents table, the row that small documents Biao Zhonglie race i dynamic expansion makes new advances are for storing new section;
A5), reset relief area, return to step A2;
A6), after source file all the elements read, before small documents output stream is closed, the content of not up to section definition size threshold value remaining in relief area is assembled into last section of this time write, exports in the row that HBase small documents table is identical;
A7), release lock.
3. mass small documents low delay based on HBase storage method as claimed in claim 2, it is characterised in that: described step A7) after also include step,
A8) the row major key, by small documents stored, coordinates the KeyOnlyFilter of HBase to capture all row names under corresponding row i row race, and then parses existing all Slice Sequences number under current state from row name;
A9), judge under the m row race that small documents row major key is corresponding the key assignments of lock as Yes/No, if otherwise continuing step;
A10), sequence obtains current maximum Slice Sequence number;
A11), being incremented by maximum sequence number and start section write, it is identical that follow-up and complete small documents writes flow process.
4. mass small documents low delay based on HBase storage method as claimed in claim 1, it is characterised in that: described small documents reads application and includes step,
B1), initialize small documents inlet flow, open up relief area inside small documents inlet flow, it is thus achieved that the access limit of HBase small documents table;The space of described relief area is not less than the small documents maximum slice size of definition;
B2), according to the small documents to be read row major key in small documents table, all column informations under the row race i of inquiry corresponding row;Utilize the KeyOnlyFilter of HBase, only read the key name of row race i from small documents table;
B3), when reading data during application layer is to small documents inlet flow, whether small documents inlet flow has available data in checking its relief area, and without data available, induction small documents inlet flow flows to the slice of data that HBase please look for novelty;
B4), application layer consumed the data of current buffer, return step B2;
B5), after small documents inlet flow positions and process last section that complete primary school file is corresponding, return end signal and complete the reading of whole small documents.
5. mass small documents low delay based on HBase storage method as claimed in claim 1, it is characterized in that: also include small documents deletion action, described small documents deletion action, rely on HBase the row major key that need to delete small documents is quickly positioned, the most directly use the API of HBase to delete corresponding row.
6. mass small documents low delay based on HBase storage method as claimed in claim 1, it is characterised in that: described threshold size is 10 Mbytes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310112130.XA CN103246700B (en) | 2013-04-01 | 2013-04-01 | Mass small documents low delay based on HBase storage method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310112130.XA CN103246700B (en) | 2013-04-01 | 2013-04-01 | Mass small documents low delay based on HBase storage method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103246700A CN103246700A (en) | 2013-08-14 |
CN103246700B true CN103246700B (en) | 2016-08-10 |
Family
ID=48926220
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310112130.XA Active CN103246700B (en) | 2013-04-01 | 2013-04-01 | Mass small documents low delay based on HBase storage method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103246700B (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103473365B (en) * | 2013-09-25 | 2017-06-06 | 北京奇虎科技有限公司 | A kind of file memory method based on HDFS, device and distributed file system |
CN103646073A (en) * | 2013-12-11 | 2014-03-19 | 浪潮电子信息产业股份有限公司 | Condition query optimizing method based on HBase table |
CN105205082A (en) | 2014-06-27 | 2015-12-30 | 国际商业机器公司 | Method and system for processing file storage in HDFS |
CN104391910B (en) * | 2014-11-17 | 2016-06-08 | 西安交通大学 | A kind of taxation statistics form based on HBase stores and the method calculated |
CN105988995B (en) * | 2015-01-27 | 2019-05-24 | 杭州海康威视数字技术股份有限公司 | A method of based on HFile batch load data |
CN105094695B (en) * | 2015-06-29 | 2018-09-04 | 浪潮(北京)电子信息产业有限公司 | A kind of storage method and system |
CN105426466A (en) * | 2015-11-16 | 2016-03-23 | 天津南大通用数据技术股份有限公司 | Method and apparatus for increasing accurate query speed of packing database |
CN105354323A (en) * | 2015-11-16 | 2016-02-24 | 天津南大通用数据技术股份有限公司 | Method and device for increasing precise inquiry speed of columnar storage database by using two-stage filtration |
CN105677826A (en) * | 2016-01-04 | 2016-06-15 | 博康智能网络科技股份有限公司 | Resource management method for massive unstructured data |
CN105956106B (en) * | 2016-05-04 | 2019-12-13 | 北京思特奇信息技术股份有限公司 | method and system for accessing big data based on memory database and Hbase |
CN106021491B (en) * | 2016-05-20 | 2019-10-08 | 天津海量信息技术股份有限公司 | Near-realtime data storage method based on hdfs |
CN106469225B (en) * | 2016-09-28 | 2019-04-16 | 厦门嵘拓物联科技有限公司 | It is a kind of intelligence workshop management in magnanimity manufaturing data access method |
CN106528819A (en) * | 2016-11-16 | 2017-03-22 | 北京集奥聚合科技有限公司 | Method and system for reading and writing time series data by HBase |
CN108932287B (en) * | 2018-05-22 | 2019-11-29 | 广东技术师范大学 | A kind of mass small documents wiring method based on Hadoop |
CN110532425B (en) * | 2019-08-19 | 2022-04-01 | 深圳市网心科技有限公司 | Video data distributed storage method and device, computer equipment and storage medium |
CN112256634B (en) * | 2020-10-14 | 2024-03-26 | 杭州当虹科技股份有限公司 | Http-based low-memory large file analysis method |
CN115658626B (en) * | 2022-12-26 | 2023-03-07 | 成都数默科技有限公司 | Distributed network small file storage management method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101866359A (en) * | 2010-06-24 | 2010-10-20 | 北京航空航天大学 | Small file storage and visit method in avicade file system |
CN102662992A (en) * | 2012-03-14 | 2012-09-12 | 北京搜狐新媒体信息技术有限公司 | Method and device for storing and accessing massive small files |
CN102902716A (en) * | 2012-08-27 | 2013-01-30 | 苏州两江科技有限公司 | Storage system based on Hadoop distributed computing platform |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8768980B2 (en) * | 2009-11-02 | 2014-07-01 | Stg Interactive S.A. | Process for optimizing file storage systems |
-
2013
- 2013-04-01 CN CN201310112130.XA patent/CN103246700B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101866359A (en) * | 2010-06-24 | 2010-10-20 | 北京航空航天大学 | Small file storage and visit method in avicade file system |
CN102662992A (en) * | 2012-03-14 | 2012-09-12 | 北京搜狐新媒体信息技术有限公司 | Method and device for storing and accessing massive small files |
CN102902716A (en) * | 2012-08-27 | 2013-01-30 | 苏州两江科技有限公司 | Storage system based on Hadoop distributed computing platform |
Non-Patent Citations (2)
Title |
---|
Xuhui Liu 等.Implementing WebGIS on Hadoop:A Case Study of Improving Small File I/O Performance on HDFS.《Cluster Computer and Workshops,2009》.2009,第1页-第8页. * |
赵跃龙 等.一种性能优化的小文件存储访问策略的研究.《计算机研究与发展》.2012,第49卷(第7期),第1579页-第1585页. * |
Also Published As
Publication number | Publication date |
---|---|
CN103246700A (en) | 2013-08-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103246700B (en) | Mass small documents low delay based on HBase storage method | |
CN106484877B (en) | A kind of document retrieval system based on HDFS | |
CN108053863B (en) | Mass medical data storage system and data storage method suitable for large and small files | |
CN104536959B (en) | A kind of optimization method of Hadoop accessing small high-volume files | |
US10037341B1 (en) | Nesting tree quotas within a filesystem | |
WO2018064962A1 (en) | Data storage method, electronic device and computer non-volatile storage medium | |
CN103327052B (en) | Date storage method and system and data access method and system | |
CN110383261A (en) | Stream for multithread storage device selects | |
CN101674334B (en) | Access control method of network storage equipment | |
CN110291518A (en) | Merge tree garbage index | |
CN110268394A (en) | KVS tree | |
KR100856245B1 (en) | File system device and method for saving and seeking file thereof | |
CN110268399A (en) | Merging tree for attended operation is modified | |
CN103812939A (en) | Big data storage system | |
CN105787093B (en) | A kind of construction method of the log file system based on LSM-Tree structure | |
WO2011053843A3 (en) | Fixed content storage within a partitioned content platform using namespaces | |
US8095678B2 (en) | Data processing | |
CN111427847B (en) | Indexing and querying method and system for user-defined metadata | |
CN102024019B (en) | Suffix tree based catalog organizing method in distributed file system | |
CN101641695A (en) | Resource inserts filtering system and for the database structure that uses with resource access filtering system | |
CN100424699C (en) | Attribute extensible object file system | |
CN106407355A (en) | Data storage method and device | |
CN109542861A (en) | File management method, device and system | |
CN104182487A (en) | Unified storage method supporting various storage modes | |
GB2439577A (en) | Storing data in streams of varying size |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |