CN109614373A

CN109614373A - A kind of storage of small documents storage architecture and read method

Info

Publication number: CN109614373A
Application number: CN201811392837.XA
Authority: CN
Inventors: 胡翔
Original assignee: Anhui Cloud Finance Information Technology Co Ltd
Current assignee: Anhui Cloud Finance Information Technology Co Ltd
Priority date: 2018-11-21
Filing date: 2018-11-21
Publication date: 2019-04-12

Abstract

The invention discloses a kind of storage of small documents storage architecture and read methods, are related to big data technical field.Storage architecture of the invention includes client layer, access layer, accumulation layer；Storage method includes: S01: user uploads file to be stored；S02: judging file size, sets threshold values when file size is greater than, goes to step S04；Otherwise step S03 is gone to；S03: further file request is parsed；S04: HDFS file write-in interface is called to be operated；S05: by respective table in the file information write-in Hbase storage；The present invention is by the way that when file is written, small documents are directly stored in Hbase table structure, when reading, using Hbase itself Indexing Mechanism, reading efficiency is high, and for HDFS, the process of the less metadata information management to large amount of small documents, storage and reading efficiency are high.

Description

A kind of storage of small documents storage architecture and read method

Technical field

The invention belongs to big data technical fields, storage and reading side more particularly to a kind of small documents storage architecture Method.

Background technique

With the continuous development of internet, digital information is being in explosive growth, how efficiently to handle and store sea Measuring data becomes a urgent problem to be solved.Increase income by it,.Easy-to-use feature, Hadoop distributed platform have become pipe The preferred option of reason and processing mass data.It is basic storage with Hadoop distributed file system, and is counted by MapReduce It calculates frame and various services is provided, allow user to build Hadoop cluster under cheap hardware environment, set up using Hbase etc. The processing and analysis task of mass data are completed, so that the processing and application for super large data set provide a kind of point of low cost Cloth storage solution.

The groundwork object of HDFS is big file, and for storage mass small documents and improper, small documents refer to those The block size (default 64MByte) of size ratio HDFS much smaller file, there are following tools when storing small documents by HDFS Body problem: 1, mass small documents certainly will will cause that metadata is huge, expend the memory of meta data server.2, small documents storage and Reading efficiency is bad, because HDFS original design intention is easy for storage super large file, has biggish data throughout, centainly to prolong When be cost.However, in practical applications, small documents can be found everywhere, such as daily file and the web application generated in individual application The small documents etc. of middle generation.When the small documents of magnanimity are stored in HDFS file system, the metadata of these small documents Information will occupy a large amount of memory headrooms in NameNode node, cause extreme load to NameNode node.Meanwhile every time Small documents thing is asked in reply, it will be by clapping the information such as NameNode node to acquisite approachs, therefore, the concurrently access gesture of large amount of small documents NameNode node bottleneck must be caused, can also cause serious influence to the performance of entire HDFS file system.Therefore for Upper problem provides a kind of storage of small documents storage architecture and read method has great importance to solve problem above.

Summary of the invention

The purpose of the present invention is to provide a kind of storage of small documents storage architecture and read methods, by being written in file When, small documents are directly stored in Hbase table structure, when reading, utilize Hbase itself Indexing Mechanism, solution Realization reading efficiency of having determined is low, and for HDFS, solves the metadata information management needed to large amount of small documents Process, store the problem low with reading efficiency.

In order to solve the above technical problems, the present invention is achieved by the following technical solutions:

A kind of small documents storage architecture of the invention, the storage architecture from top to bottom include client layer, access layer, storage Layer, the accumulation layer include Hbase storage and HDFS storage；

The client layer is interacted using http protocol or HTTPS agreement with access layer；

The access layer includes UP/Down Sever, two class server of CGI Proxy Sever and Hbase file interface； The access layer calls externally packaged class VFS by Hbase file interface, and interacts with Hbase storage, described to connect Enter layer and receive the file operation requests of user for being responsible for, and forward a request to corresponding server, is handed over the accumulation layer Mutually carry out the read-write of file；

The Hbase storage is for realizing the storage to the file information and file content.

Further, the UP/Down Sever server is mainly responsible for processing user terminal to the first generic operation of file, Including upload, down operation, and with the Hbase file interface, accumulation layer carry out file interaction, while with the accumulation layer HDFS storage carries out data reading and writing interaction.

Further, the CGI Proxy Sever is responsible for handling user terminal to the second generic operation of file, including traversal File directory creaties directory, deltrees, copied files, deleting file, the operation such as Rename file, and with the Hbase text Part interface interacts, and the Hbase file interface is used to read and write interaction with accumulation layer.

A kind of storage method of small documents storage architecture, includes the following steps:

S01: user uploads file to be stored；

S02: the UP/Down Sever server of the access layer judges file size, when file size is greater than setting valve Value, goes to step S04；Otherwise step S03 is gone to；

S03: the UP/Down Sever server end further parses file request, obtains and uses from request Family identifier, file upload the information such as path, remember using user's " identifier _ file " as Row Key in the Hbase of accumulation layer It is retrieved in record table, if showing that target uploads the file for having uploaded same file name in path with the presence of record, this When, prompt information is returned to client layer, user terminal is prompted to go up transmitting file；Otherwise, UP/Down Sever server end to The request of creation corresponding document record and write-in file is initiated in Hbase storage, and goes to step S05；

S04: the UP/Down Sever server end calls directly HDFS file write-in interface and is operated；

S05: by respective table in the file information write-in Hbase storage.

A kind of read method of small documents storage architecture, includes the following steps:

T01: the UP/Down Sever received server-side file read request obtains user " identifier _ file " Routing information；

T02: the UP/Down Sever server end is retrieved in Hbase tables of data using information obtained, If navigating to user to request to read the record of file, step T03 is gone to, step T04 is otherwise gone to；

T03: the UP/Down Sever server end reads the column cluster in Hbase file record, is directly returned by access layer The data of " file content " column are returned to user terminal；

T04: it calls traditional HDFS file to read interface and is operated.

The invention has the following advantages:

The present invention by the way that when file is written, small documents are directly stored in Hbase table structure, reading when It waits, using Hbase itself Indexing Mechanism, has the advantages that reading efficiency is high, and for HDFS, reduce to a large amount of The process of the metadata information management of small documents has the advantages that storage and reading efficiency are high.

Certainly, it implements any of the products of the present invention and does not necessarily require achieving all the advantages described above at the same time.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, will be described below to embodiment required Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for ability For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.

Fig. 1 is a kind of structural schematic diagram of small documents storage architecture of the invention；

The step schematic diagram of the working method of CGI Proxy Server server Fig. 2 of the invention；

The step schematic diagram of the working method of Up/Dowen Server server Fig. 3 of the invention；

The method and step schematic diagram that the file of Up/Dowen Server server Fig. 4 of the invention is read；

Fig. 5 is the experimental result column diagram that small documents of the invention are written；

Fig. 6 is the experimental result column diagram that small documents of the invention are read.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts all other Embodiment shall fall within the protection scope of the present invention.

Refering to Figure 1, a kind of small documents storage architecture of the invention, storage architecture from top to bottom include client layer, Access layer, accumulation layer, accumulation layer include Hbase storage and HDFS storage；

Client layer is interacted using http protocol or HTTPS agreement with access layer；

Access layer includes UP/Down Sever, two class server of CGI Proxy Sever and Hbase file interface；Access Layer calls externally packaged class VFS by Hbase file interface, and interacts with Hbase storage, and access layer is for being responsible for Receive the file operation requests of user, and forward a request to corresponding server, the read-write for carrying out file is interacted with accumulation layer；

Hbase stores for realizing the storage to the file information and file content.

Wherein, UP/Down Sever server is mainly responsible for processing user terminal to the first generic operation of file, including upper Pass, down operation, and with the Hbase file interface, accumulation layer carry out file interaction, while with the HDFS of accumulation layer store into Row data reading and writing interaction.

Wherein, CGI Proxy Sever is responsible for handling user terminal to the second generic operation of file, including traversal file mesh It records, create directory, deltreeing, copied files, deleting file, the operation such as Rename file, and being carried out with Hbase file interface Interaction, Hbase file interface are used to read and write interaction with accumulation layer.

As shown in Fig. 2, GGI Proxy Server server receives the end Client by Http Https agreement Associated documents operation requests, then intrinsic call Hbase file interface API realizes these operations；

As shown in figure 3, UP/Down Sever server to the upload file request received, first judges that file size is It is no to be greater than given threshold, if upper transmitting file is judged as big file, HDFS file write-in API is called to upload files to HDFS, if it is small documents, then intrinsic call Hbase file interface realizes that file uploads；

Hbase file interface is used to for Hbase being packaged into the mkdir of similar VFS with the excuse of directory operation about file, Rm interface function；When bottom document operating system changes, the change of upper layer calling interface not will lead to.The interface From UP/Down Sever server and GGI Proxy Server server file operation requests, necessary pretreatment is carried out, Including collect the file information, recall corresponding Hbase file read-write API, by the file information and file content write-in Hbase or It reads file content and returns to top service；

S01: user uploads file to be stored；

S02: the UP/Down Sever server of access layer judges file size, sets threshold values when file size is greater than, turns To step S04；Otherwise step S03 is gone to；

S03:UP/Down Sever server end further parses file request, and user's mark is obtained from request Know symbol, file uploads the information such as path, using user's " identifier _ file " as Row Key accumulation layer Hbase record sheet In retrieved, if showing to have uploaded the file of same file name in target upload path with the presence of record, at this point, to Client layer returns to prompt information, and user terminal is prompted to go up transmitting file；Otherwise, UP/Down Sever server end is stored to Hbase The request of creation corresponding document record and write-in file is initiated, and goes to step S05；

S04:UP/Down Sever server end calls directly HDFS file write-in interface and is operated；

S05: by respective table in the file information write-in Hbase storage.

As shown in figure 4, a kind of read method of small documents storage architecture, includes the following steps:

T01:UP/Down Sever received server-side file read request obtains the path of user " identifier _ file " Information；

T02:UP/Down Sever server end is retrieved in Hbase tables of data using information obtained, if fixed Position requests to read the record of file to user, then goes to step T03, otherwise go to step T04；

T03:UP/Down Sever server end reads the column cluster in Hbase file record, is directly returned by access layer The data of " file content " column are to user terminal；

T04: it calls traditional HDFS file to read interface and is operated.

In order to verify the high efficiency of above-mentioned small documents storage scheme, the present embodiment to traditional HDFS, be based on the small text of Hbase The comparative test that part high-efficiency storage method is written and read.

The present embodiment selects a large amount of size 1kByte small documents as test data of experiment collection, wherein 1kByte, 2kByte, 4kByte, 8kByte, 16kByte each 5,000,000；

The experimental enviroment of the present embodiment is the Hadoop cluster of 4 nodes, and each node is configured to 16 core Intel Xeon Cpu 2.4Ghz, 32Gyte inner server, network environment is kilomega optic fiber, wherein a machine is as NameNode, remaining 3 Platform machine is all used as DataNode and RegionSever.

Hbase Master and HDFS namendode are operated on the same node, and zookeeper cluster, which operates in, to be removed On 3 machines outside NameNod, the operating system of every node installation is Cent OS5.4, and Hadoop version is 2.0, Hbase The version 1.6.0_35 of version Hbase-0.96, JDK.

Herein to HDFS, the storage method of Hbase has carried out experimental analysis in terms of small documents storage, and analysis result is as schemed Shown in 5 and Fig. 6；

Can be seen that the writing speed based on HDFS from the experimental result of Fig. 5 is 20-30MByte/s, based on Hbase's The writing speed of high-efficiency storage method is 60-80MByte/s, this is because being stored when small documents are written based on Hbase Method be write direct in Hbase tables of data, and also need to manage the information such as metadata and data block based on HDFS, so, this The method that invention proposes has having a distinct increment on writing speed；

It can be seen that in terms of small documents reading from the experimental result of Fig. 6, be 50- based on HDFS reading speed 60MByte/s, the high-efficiency storage method reading speed based on Hbase are 111-120MByte/s, and HDFS needs first to access first number According to, then interact and be read out with datanode, and Hbase utilizes its own Indexing Mechanism, can carry out to information in tables of data Quickly read.The experimental results showed that in terms of small documents read-write, efficient storage between Xiao's grace for the opportunity Hbase that the present invention rejects The method ratio HDFS reading speed that is averaged is significantly increased, and writing speed is promoted also obvious.

Small documents high-efficiency storage method proposed by the present invention based on Hbase is straight by small documents when file is written It connects and is stored in Hbase table structure, realized when reading using Hbase itself Indexing Mechanism and efficiently read, and is opposite For HDFS, the method reduce the processes of the metadata information management of HDFS large amount of small documents, in conclusion this method is used Be in the read-write of mass small documents it is efficiently feasible, in further work, will continue to study big in mass data.Small documents Same efficient storage scheme.

In the description of this specification, the description of reference term " one embodiment ", " example ", " specific example " etc. means Particular features, structures, materials, or characteristics described in conjunction with this embodiment or example are contained at least one implementation of the invention In example or example.In the present specification, schematic expression of the above terms may not refer to the same embodiment or example. Moreover, particular features, structures, materials, or characteristics described can be in any one or more of the embodiments or examples to close Suitable mode combines.

Present invention disclosed above preferred embodiment is only intended to help to illustrate the present invention.There is no detailed for preferred embodiment All details are described, are not limited the invention to the specific embodiments described.Obviously, according to the content of this specification, It can make many modifications and variations.These embodiments are chosen and specifically described to this specification, is in order to better explain the present invention Principle and practical application, so that skilled artisan be enable to better understand and utilize the present invention.The present invention is only It is limited by claims and its full scope and equivalent.

Claims

1. a kind of small documents storage architecture, which is characterized in that the storage architecture from top to bottom includes client layer, access layer, deposits Reservoir, the accumulation layer include Hbase storage and HDFS storage；

The access layer includes UP/Down Sever, two class server of CGI Proxy Sever and Hbase file interface；It is described Access layer calls externally packaged class VFS by Hbase file interface, and interacts with Hbase storage, the access layer For being responsible for receiving the file operation requests of user, and forward a request to corresponding server, interacted with the accumulation layer into The read-write of style of writing part；

2. a kind of small documents storage architecture according to claim 1, which is characterized in that the UP/Down Sever service Device is mainly responsible for processing user terminal to the first generic operation of file, including uploads, down operation, and connect with the Hbase file Mouth, accumulation layer carry out file interaction, while storing progress data reading and writing with the HDFS of the accumulation layer and interacting.

3. a kind of small documents storage architecture according to claim 1, which is characterized in that the CGI Proxy Sever is negative Duty handles user terminal to the second generic operation of file, including traverses file directory, create directory, deltree, copied files, deleting Except file, Rename file etc. operate, and interacted with the Hbase file interface, the Hbase file interface be used for Accumulation layer reads and writes interaction.

4. a kind of storage method of small documents storage architecture as described in any one of claims 1-3, which is characterized in that including such as Lower step:

S01: user uploads file to be stored；

S02: the UP/Down Sever server of the access layer judges file size, sets threshold values when file size is greater than, turns To step S04；Otherwise step S03 is gone to；

S03: the UP/Down Sever server end further parses file request, and user's mark is obtained from request Know symbol, file uploads the information such as path, using user's " identifier _ file " as Row Key accumulation layer Hbase record sheet In retrieved, if showing to have uploaded the file of same file name in target upload path with the presence of record, at this point, to Client layer returns to prompt information, and user terminal is prompted to go up transmitting file；Otherwise, UP/Down Sever server end is stored to Hbase The request of creation corresponding document record and write-in file is initiated, and goes to step S05；

S05: by respective table in the file information write-in Hbase storage.

5. a kind of read method of small documents storage architecture as described in any one of claims 1-3, which is characterized in that including such as Lower step:

T01: the UP/Down Sever received server-side file read request obtains the path of user " identifier _ file " Information；

T02: the UP/Down Sever server end is retrieved in Hbase tables of data using information obtained, if fixed Position requests to read the record of file to user, then goes to step T03, otherwise go to step T04；

T03: the UP/Down Sever server end reads the column cluster in Hbase file record, is directly returned by access layer The data of " file content " column are to user terminal；

T04: it calls traditional HDFS file to read interface and is operated.