CN109614373A - A kind of storage of small documents storage architecture and read method - Google Patents
A kind of storage of small documents storage architecture and read method Download PDFInfo
- Publication number
- CN109614373A CN109614373A CN201811392837.XA CN201811392837A CN109614373A CN 109614373 A CN109614373 A CN 109614373A CN 201811392837 A CN201811392837 A CN 201811392837A CN 109614373 A CN109614373 A CN 109614373A
- Authority
- CN
- China
- Prior art keywords
- file
- hbase
- storage
- sever
- small documents
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The invention discloses a kind of storage of small documents storage architecture and read methods, are related to big data technical field.Storage architecture of the invention includes client layer, access layer, accumulation layer;Storage method includes: S01: user uploads file to be stored;S02: judging file size, sets threshold values when file size is greater than, goes to step S04;Otherwise step S03 is gone to;S03: further file request is parsed;S04: HDFS file write-in interface is called to be operated;S05: by respective table in the file information write-in Hbase storage;The present invention is by the way that when file is written, small documents are directly stored in Hbase table structure, when reading, using Hbase itself Indexing Mechanism, reading efficiency is high, and for HDFS, the process of the less metadata information management to large amount of small documents, storage and reading efficiency are high.
Description
Technical field
The invention belongs to big data technical fields, storage and reading side more particularly to a kind of small documents storage architecture
Method.
Background technique
With the continuous development of internet, digital information is being in explosive growth, how efficiently to handle and store sea
Measuring data becomes a urgent problem to be solved.Increase income by it,.Easy-to-use feature, Hadoop distributed platform have become pipe
The preferred option of reason and processing mass data.It is basic storage with Hadoop distributed file system, and is counted by MapReduce
It calculates frame and various services is provided, allow user to build Hadoop cluster under cheap hardware environment, set up using Hbase etc.
The processing and analysis task of mass data are completed, so that the processing and application for super large data set provide a kind of point of low cost
Cloth storage solution.
The groundwork object of HDFS is big file, and for storage mass small documents and improper, small documents refer to those
The block size (default 64MByte) of size ratio HDFS much smaller file, there are following tools when storing small documents by HDFS
Body problem: 1, mass small documents certainly will will cause that metadata is huge, expend the memory of meta data server.2, small documents storage and
Reading efficiency is bad, because HDFS original design intention is easy for storage super large file, has biggish data throughout, centainly to prolong
When be cost.However, in practical applications, small documents can be found everywhere, such as daily file and the web application generated in individual application
The small documents etc. of middle generation.When the small documents of magnanimity are stored in HDFS file system, the metadata of these small documents
Information will occupy a large amount of memory headrooms in NameNode node, cause extreme load to NameNode node.Meanwhile every time
Small documents thing is asked in reply, it will be by clapping the information such as NameNode node to acquisite approachs, therefore, the concurrently access gesture of large amount of small documents
NameNode node bottleneck must be caused, can also cause serious influence to the performance of entire HDFS file system.Therefore for
Upper problem provides a kind of storage of small documents storage architecture and read method has great importance to solve problem above.
Summary of the invention
The purpose of the present invention is to provide a kind of storage of small documents storage architecture and read methods, by being written in file
When, small documents are directly stored in Hbase table structure, when reading, utilize Hbase itself Indexing Mechanism, solution
Realization reading efficiency of having determined is low, and for HDFS, solves the metadata information management needed to large amount of small documents
Process, store the problem low with reading efficiency.
In order to solve the above technical problems, the present invention is achieved by the following technical solutions:
A kind of small documents storage architecture of the invention, the storage architecture from top to bottom include client layer, access layer, storage
Layer, the accumulation layer include Hbase storage and HDFS storage;
The client layer is interacted using http protocol or HTTPS agreement with access layer;
The access layer includes UP/Down Sever, two class server of CGI Proxy Sever and Hbase file interface;
The access layer calls externally packaged class VFS by Hbase file interface, and interacts with Hbase storage, described to connect
Enter layer and receive the file operation requests of user for being responsible for, and forward a request to corresponding server, is handed over the accumulation layer
Mutually carry out the read-write of file;
The Hbase storage is for realizing the storage to the file information and file content.
Further, the UP/Down Sever server is mainly responsible for processing user terminal to the first generic operation of file,
Including upload, down operation, and with the Hbase file interface, accumulation layer carry out file interaction, while with the accumulation layer
HDFS storage carries out data reading and writing interaction.
Further, the CGI Proxy Sever is responsible for handling user terminal to the second generic operation of file, including traversal
File directory creaties directory, deltrees, copied files, deleting file, the operation such as Rename file, and with the Hbase text
Part interface interacts, and the Hbase file interface is used to read and write interaction with accumulation layer.
A kind of storage method of small documents storage architecture, includes the following steps:
S01: user uploads file to be stored;
S02: the UP/Down Sever server of the access layer judges file size, when file size is greater than setting valve
Value, goes to step S04;Otherwise step S03 is gone to;
S03: the UP/Down Sever server end further parses file request, obtains and uses from request
Family identifier, file upload the information such as path, remember using user's " identifier _ file " as Row Key in the Hbase of accumulation layer
It is retrieved in record table, if showing that target uploads the file for having uploaded same file name in path with the presence of record, this
When, prompt information is returned to client layer, user terminal is prompted to go up transmitting file;Otherwise, UP/Down Sever server end to
The request of creation corresponding document record and write-in file is initiated in Hbase storage, and goes to step S05;
S04: the UP/Down Sever server end calls directly HDFS file write-in interface and is operated;
S05: by respective table in the file information write-in Hbase storage.
A kind of read method of small documents storage architecture, includes the following steps:
T01: the UP/Down Sever received server-side file read request obtains user " identifier _ file "
Routing information;
T02: the UP/Down Sever server end is retrieved in Hbase tables of data using information obtained,
If navigating to user to request to read the record of file, step T03 is gone to, step T04 is otherwise gone to;
T03: the UP/Down Sever server end reads the column cluster in Hbase file record, is directly returned by access layer
The data of " file content " column are returned to user terminal;
T04: it calls traditional HDFS file to read interface and is operated.
The invention has the following advantages:
The present invention by the way that when file is written, small documents are directly stored in Hbase table structure, reading when
It waits, using Hbase itself Indexing Mechanism, has the advantages that reading efficiency is high, and for HDFS, reduce to a large amount of
The process of the metadata information management of small documents has the advantages that storage and reading efficiency are high.
Certainly, it implements any of the products of the present invention and does not necessarily require achieving all the advantages described above at the same time.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, will be described below to embodiment required
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for ability
For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to these attached drawings other attached
Figure.
Fig. 1 is a kind of structural schematic diagram of small documents storage architecture of the invention;
The step schematic diagram of the working method of CGI Proxy Server server Fig. 2 of the invention;
The step schematic diagram of the working method of Up/Dowen Server server Fig. 3 of the invention;
The method and step schematic diagram that the file of Up/Dowen Server server Fig. 4 of the invention is read;
Fig. 5 is the experimental result column diagram that small documents of the invention are written;
Fig. 6 is the experimental result column diagram that small documents of the invention are read.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts all other
Embodiment shall fall within the protection scope of the present invention.
Refering to Figure 1, a kind of small documents storage architecture of the invention, storage architecture from top to bottom include client layer,
Access layer, accumulation layer, accumulation layer include Hbase storage and HDFS storage;
Client layer is interacted using http protocol or HTTPS agreement with access layer;
Access layer includes UP/Down Sever, two class server of CGI Proxy Sever and Hbase file interface;Access
Layer calls externally packaged class VFS by Hbase file interface, and interacts with Hbase storage, and access layer is for being responsible for
Receive the file operation requests of user, and forward a request to corresponding server, the read-write for carrying out file is interacted with accumulation layer;
Hbase stores for realizing the storage to the file information and file content.
Wherein, UP/Down Sever server is mainly responsible for processing user terminal to the first generic operation of file, including upper
Pass, down operation, and with the Hbase file interface, accumulation layer carry out file interaction, while with the HDFS of accumulation layer store into
Row data reading and writing interaction.
Wherein, CGI Proxy Sever is responsible for handling user terminal to the second generic operation of file, including traversal file mesh
It records, create directory, deltreeing, copied files, deleting file, the operation such as Rename file, and being carried out with Hbase file interface
Interaction, Hbase file interface are used to read and write interaction with accumulation layer.
As shown in Fig. 2, GGI Proxy Server server receives the end Client by Http Https agreement
Associated documents operation requests, then intrinsic call Hbase file interface API realizes these operations;
As shown in figure 3, UP/Down Sever server to the upload file request received, first judges that file size is
It is no to be greater than given threshold, if upper transmitting file is judged as big file, HDFS file write-in API is called to upload files to
HDFS, if it is small documents, then intrinsic call Hbase file interface realizes that file uploads;
Hbase file interface is used to for Hbase being packaged into the mkdir of similar VFS with the excuse of directory operation about file,
Rm interface function;When bottom document operating system changes, the change of upper layer calling interface not will lead to.The interface
From UP/Down Sever server and GGI Proxy Server server file operation requests, necessary pretreatment is carried out,
Including collect the file information, recall corresponding Hbase file read-write API, by the file information and file content write-in Hbase or
It reads file content and returns to top service;
A kind of storage method of small documents storage architecture, includes the following steps:
S01: user uploads file to be stored;
S02: the UP/Down Sever server of access layer judges file size, sets threshold values when file size is greater than, turns
To step S04;Otherwise step S03 is gone to;
S03:UP/Down Sever server end further parses file request, and user's mark is obtained from request
Know symbol, file uploads the information such as path, using user's " identifier _ file " as Row Key accumulation layer Hbase record sheet
In retrieved, if showing to have uploaded the file of same file name in target upload path with the presence of record, at this point, to
Client layer returns to prompt information, and user terminal is prompted to go up transmitting file;Otherwise, UP/Down Sever server end is stored to Hbase
The request of creation corresponding document record and write-in file is initiated, and goes to step S05;
S04:UP/Down Sever server end calls directly HDFS file write-in interface and is operated;
S05: by respective table in the file information write-in Hbase storage.
As shown in figure 4, a kind of read method of small documents storage architecture, includes the following steps:
T01:UP/Down Sever received server-side file read request obtains the path of user " identifier _ file "
Information;
T02:UP/Down Sever server end is retrieved in Hbase tables of data using information obtained, if fixed
Position requests to read the record of file to user, then goes to step T03, otherwise go to step T04;
T03:UP/Down Sever server end reads the column cluster in Hbase file record, is directly returned by access layer
The data of " file content " column are to user terminal;
T04: it calls traditional HDFS file to read interface and is operated.
In order to verify the high efficiency of above-mentioned small documents storage scheme, the present embodiment to traditional HDFS, be based on the small text of Hbase
The comparative test that part high-efficiency storage method is written and read.
The present embodiment selects a large amount of size 1kByte small documents as test data of experiment collection, wherein 1kByte,
2kByte, 4kByte, 8kByte, 16kByte each 5,000,000;
The experimental enviroment of the present embodiment is the Hadoop cluster of 4 nodes, and each node is configured to 16 core Intel Xeon
Cpu 2.4Ghz, 32Gyte inner server, network environment is kilomega optic fiber, wherein a machine is as NameNode, remaining 3
Platform machine is all used as DataNode and RegionSever.
Hbase Master and HDFS namendode are operated on the same node, and zookeeper cluster, which operates in, to be removed
On 3 machines outside NameNod, the operating system of every node installation is Cent OS5.4, and Hadoop version is 2.0, Hbase
The version 1.6.0_35 of version Hbase-0.96, JDK.
Herein to HDFS, the storage method of Hbase has carried out experimental analysis in terms of small documents storage, and analysis result is as schemed
Shown in 5 and Fig. 6;
Can be seen that the writing speed based on HDFS from the experimental result of Fig. 5 is 20-30MByte/s, based on Hbase's
The writing speed of high-efficiency storage method is 60-80MByte/s, this is because being stored when small documents are written based on Hbase
Method be write direct in Hbase tables of data, and also need to manage the information such as metadata and data block based on HDFS, so, this
The method that invention proposes has having a distinct increment on writing speed;
It can be seen that in terms of small documents reading from the experimental result of Fig. 6, be 50- based on HDFS reading speed
60MByte/s, the high-efficiency storage method reading speed based on Hbase are 111-120MByte/s, and HDFS needs first to access first number
According to, then interact and be read out with datanode, and Hbase utilizes its own Indexing Mechanism, can carry out to information in tables of data
Quickly read.The experimental results showed that in terms of small documents read-write, efficient storage between Xiao's grace for the opportunity Hbase that the present invention rejects
The method ratio HDFS reading speed that is averaged is significantly increased, and writing speed is promoted also obvious.
Small documents high-efficiency storage method proposed by the present invention based on Hbase is straight by small documents when file is written
It connects and is stored in Hbase table structure, realized when reading using Hbase itself Indexing Mechanism and efficiently read, and is opposite
For HDFS, the method reduce the processes of the metadata information management of HDFS large amount of small documents, in conclusion this method is used
Be in the read-write of mass small documents it is efficiently feasible, in further work, will continue to study big in mass data.Small documents
Same efficient storage scheme.
In the description of this specification, the description of reference term " one embodiment ", " example ", " specific example " etc. means
Particular features, structures, materials, or characteristics described in conjunction with this embodiment or example are contained at least one implementation of the invention
In example or example.In the present specification, schematic expression of the above terms may not refer to the same embodiment or example.
Moreover, particular features, structures, materials, or characteristics described can be in any one or more of the embodiments or examples to close
Suitable mode combines.
Present invention disclosed above preferred embodiment is only intended to help to illustrate the present invention.There is no detailed for preferred embodiment
All details are described, are not limited the invention to the specific embodiments described.Obviously, according to the content of this specification,
It can make many modifications and variations.These embodiments are chosen and specifically described to this specification, is in order to better explain the present invention
Principle and practical application, so that skilled artisan be enable to better understand and utilize the present invention.The present invention is only
It is limited by claims and its full scope and equivalent.
Claims (5)
1. a kind of small documents storage architecture, which is characterized in that the storage architecture from top to bottom includes client layer, access layer, deposits
Reservoir, the accumulation layer include Hbase storage and HDFS storage;
The client layer is interacted using http protocol or HTTPS agreement with access layer;
The access layer includes UP/Down Sever, two class server of CGI Proxy Sever and Hbase file interface;It is described
Access layer calls externally packaged class VFS by Hbase file interface, and interacts with Hbase storage, the access layer
For being responsible for receiving the file operation requests of user, and forward a request to corresponding server, interacted with the accumulation layer into
The read-write of style of writing part;
The Hbase storage is for realizing the storage to the file information and file content.
2. a kind of small documents storage architecture according to claim 1, which is characterized in that the UP/Down Sever service
Device is mainly responsible for processing user terminal to the first generic operation of file, including uploads, down operation, and connect with the Hbase file
Mouth, accumulation layer carry out file interaction, while storing progress data reading and writing with the HDFS of the accumulation layer and interacting.
3. a kind of small documents storage architecture according to claim 1, which is characterized in that the CGI Proxy Sever is negative
Duty handles user terminal to the second generic operation of file, including traverses file directory, create directory, deltree, copied files, deleting
Except file, Rename file etc. operate, and interacted with the Hbase file interface, the Hbase file interface be used for
Accumulation layer reads and writes interaction.
4. a kind of storage method of small documents storage architecture as described in any one of claims 1-3, which is characterized in that including such as
Lower step:
S01: user uploads file to be stored;
S02: the UP/Down Sever server of the access layer judges file size, sets threshold values when file size is greater than, turns
To step S04;Otherwise step S03 is gone to;
S03: the UP/Down Sever server end further parses file request, and user's mark is obtained from request
Know symbol, file uploads the information such as path, using user's " identifier _ file " as Row Key accumulation layer Hbase record sheet
In retrieved, if showing to have uploaded the file of same file name in target upload path with the presence of record, at this point, to
Client layer returns to prompt information, and user terminal is prompted to go up transmitting file;Otherwise, UP/Down Sever server end is stored to Hbase
The request of creation corresponding document record and write-in file is initiated, and goes to step S05;
S04: the UP/Down Sever server end calls directly HDFS file write-in interface and is operated;
S05: by respective table in the file information write-in Hbase storage.
5. a kind of read method of small documents storage architecture as described in any one of claims 1-3, which is characterized in that including such as
Lower step:
T01: the UP/Down Sever received server-side file read request obtains the path of user " identifier _ file "
Information;
T02: the UP/Down Sever server end is retrieved in Hbase tables of data using information obtained, if fixed
Position requests to read the record of file to user, then goes to step T03, otherwise go to step T04;
T03: the UP/Down Sever server end reads the column cluster in Hbase file record, is directly returned by access layer
The data of " file content " column are to user terminal;
T04: it calls traditional HDFS file to read interface and is operated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811392837.XA CN109614373A (en) | 2018-11-21 | 2018-11-21 | A kind of storage of small documents storage architecture and read method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811392837.XA CN109614373A (en) | 2018-11-21 | 2018-11-21 | A kind of storage of small documents storage architecture and read method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109614373A true CN109614373A (en) | 2019-04-12 |
Family
ID=66004783
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811392837.XA Pending CN109614373A (en) | 2018-11-21 | 2018-11-21 | A kind of storage of small documents storage architecture and read method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109614373A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111367857A (en) * | 2020-03-03 | 2020-07-03 | 中国联合网络通信集团有限公司 | Data storage method and device, FTP server and storage medium |
CN111831208A (en) * | 2019-04-16 | 2020-10-27 | 中移(苏州)软件技术有限公司 | Information processing method and device, terminal equipment and storage medium |
CN112684985A (en) * | 2021-01-04 | 2021-04-20 | 北京金山云网络技术有限公司 | Data writing method and device |
CN112835864A (en) * | 2021-02-03 | 2021-05-25 | 北京联创信安科技股份有限公司 | File storage method, device, equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110276538A1 (en) * | 2010-05-06 | 2011-11-10 | The Go Daddy Group, Inc. | Cloud storage solution for reading and writing files |
CN106484821A (en) * | 2016-09-27 | 2017-03-08 | 浪潮软件集团有限公司 | Hybrid cloud storage method under cloud computing architecture |
-
2018
- 2018-11-21 CN CN201811392837.XA patent/CN109614373A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110276538A1 (en) * | 2010-05-06 | 2011-11-10 | The Go Daddy Group, Inc. | Cloud storage solution for reading and writing files |
CN106484821A (en) * | 2016-09-27 | 2017-03-08 | 浪潮软件集团有限公司 | Hybrid cloud storage method under cloud computing architecture |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111831208A (en) * | 2019-04-16 | 2020-10-27 | 中移(苏州)软件技术有限公司 | Information processing method and device, terminal equipment and storage medium |
CN111367857A (en) * | 2020-03-03 | 2020-07-03 | 中国联合网络通信集团有限公司 | Data storage method and device, FTP server and storage medium |
CN112684985A (en) * | 2021-01-04 | 2021-04-20 | 北京金山云网络技术有限公司 | Data writing method and device |
CN112684985B (en) * | 2021-01-04 | 2024-04-05 | 北京金山云网络技术有限公司 | Data writing method and device |
CN112835864A (en) * | 2021-02-03 | 2021-05-25 | 北京联创信安科技股份有限公司 | File storage method, device, equipment and storage medium |
CN112835864B (en) * | 2021-02-03 | 2024-02-20 | 北京联创信安科技股份有限公司 | File storage method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109614373A (en) | A kind of storage of small documents storage architecture and read method | |
US7849069B2 (en) | Method and system for federated resource discovery service in distributed systems | |
CN105868286B (en) | The parallel method of adding and system merged based on distributed file system small documents | |
CN105933376B (en) | A kind of data manipulation method, server and storage system | |
JP5656563B2 (en) | Document management system, document management system control method, and program | |
CN107045422A (en) | Distributed storage method and equipment | |
US8572161B2 (en) | Simplifying synchronization of copies of same data used by multiple applications | |
CN104484216A (en) | Method and device for generating service interface document and on-line test tool | |
CN111694791B (en) | Data access method and device in distributed basic framework | |
CN109710614A (en) | A kind of method and device of real-time data memory and inquiry | |
CN104834648B (en) | Log inquiring method and device | |
CN112653730A (en) | User mode network file storage method and system | |
US10635650B1 (en) | Auto-partitioning secondary index for database tables | |
CN108776682A (en) | The method and system of random read-write object based on object storage | |
CN107203532A (en) | Construction method, the implementation method of search and the device of directory system | |
CN110515894A (en) | A kind of conversion method of data format, device, equipment and readable storage medium storing program for executing | |
CN109101599B (en) | Incremental index updating method and system | |
CN106201351A (en) | A kind of storage method based on object storage and server | |
CN105978944A (en) | Object storage method and device | |
CN102693318B (en) | Report query method and report query equipment | |
CN106126595A (en) | A kind of document down loading method and device | |
CN106649847A (en) | A large data real-time processing system based on Hadoop | |
JP5367470B2 (en) | Storage server device and computer program | |
CN114466031B (en) | CDN system node configuration method, device, equipment and storage medium | |
CN113742172B (en) | Method, system and related device for collecting server logs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190412 |
|
RJ01 | Rejection of invention patent application after publication |