CN109614373A - A kind of storage of small documents storage architecture and read method - Google Patents

A kind of storage of small documents storage architecture and read method Download PDF

Info

Publication number
CN109614373A
CN109614373A CN201811392837.XA CN201811392837A CN109614373A CN 109614373 A CN109614373 A CN 109614373A CN 201811392837 A CN201811392837 A CN 201811392837A CN 109614373 A CN109614373 A CN 109614373A
Authority
CN
China
Prior art keywords
file
hbase
storage
sever
small documents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811392837.XA
Other languages
Chinese (zh)
Inventor
胡翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Cloud Finance Information Technology Co Ltd
Original Assignee
Anhui Cloud Finance Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Cloud Finance Information Technology Co Ltd filed Critical Anhui Cloud Finance Information Technology Co Ltd
Priority to CN201811392837.XA priority Critical patent/CN109614373A/en
Publication of CN109614373A publication Critical patent/CN109614373A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a kind of storage of small documents storage architecture and read methods, are related to big data technical field.Storage architecture of the invention includes client layer, access layer, accumulation layer;Storage method includes: S01: user uploads file to be stored;S02: judging file size, sets threshold values when file size is greater than, goes to step S04;Otherwise step S03 is gone to;S03: further file request is parsed;S04: HDFS file write-in interface is called to be operated;S05: by respective table in the file information write-in Hbase storage;The present invention is by the way that when file is written, small documents are directly stored in Hbase table structure, when reading, using Hbase itself Indexing Mechanism, reading efficiency is high, and for HDFS, the process of the less metadata information management to large amount of small documents, storage and reading efficiency are high.

Description

A kind of storage of small documents storage architecture and read method
Technical field
The invention belongs to big data technical fields, storage and reading side more particularly to a kind of small documents storage architecture Method.
Background technique
With the continuous development of internet, digital information is being in explosive growth, how efficiently to handle and store sea Measuring data becomes a urgent problem to be solved.Increase income by it,.Easy-to-use feature, Hadoop distributed platform have become pipe The preferred option of reason and processing mass data.It is basic storage with Hadoop distributed file system, and is counted by MapReduce It calculates frame and various services is provided, allow user to build Hadoop cluster under cheap hardware environment, set up using Hbase etc. The processing and analysis task of mass data are completed, so that the processing and application for super large data set provide a kind of point of low cost Cloth storage solution.
The groundwork object of HDFS is big file, and for storage mass small documents and improper, small documents refer to those The block size (default 64MByte) of size ratio HDFS much smaller file, there are following tools when storing small documents by HDFS Body problem: 1, mass small documents certainly will will cause that metadata is huge, expend the memory of meta data server.2, small documents storage and Reading efficiency is bad, because HDFS original design intention is easy for storage super large file, has biggish data throughout, centainly to prolong When be cost.However, in practical applications, small documents can be found everywhere, such as daily file and the web application generated in individual application The small documents etc. of middle generation.When the small documents of magnanimity are stored in HDFS file system, the metadata of these small documents Information will occupy a large amount of memory headrooms in NameNode node, cause extreme load to NameNode node.Meanwhile every time Small documents thing is asked in reply, it will be by clapping the information such as NameNode node to acquisite approachs, therefore, the concurrently access gesture of large amount of small documents NameNode node bottleneck must be caused, can also cause serious influence to the performance of entire HDFS file system.Therefore for Upper problem provides a kind of storage of small documents storage architecture and read method has great importance to solve problem above.
Summary of the invention
The purpose of the present invention is to provide a kind of storage of small documents storage architecture and read methods, by being written in file When, small documents are directly stored in Hbase table structure, when reading, utilize Hbase itself Indexing Mechanism, solution Realization reading efficiency of having determined is low, and for HDFS, solves the metadata information management needed to large amount of small documents Process, store the problem low with reading efficiency.
In order to solve the above technical problems, the present invention is achieved by the following technical solutions:
A kind of small documents storage architecture of the invention, the storage architecture from top to bottom include client layer, access layer, storage Layer, the accumulation layer include Hbase storage and HDFS storage;
The client layer is interacted using http protocol or HTTPS agreement with access layer;
The access layer includes UP/Down Sever, two class server of CGI Proxy Sever and Hbase file interface; The access layer calls externally packaged class VFS by Hbase file interface, and interacts with Hbase storage, described to connect Enter layer and receive the file operation requests of user for being responsible for, and forward a request to corresponding server, is handed over the accumulation layer Mutually carry out the read-write of file;
The Hbase storage is for realizing the storage to the file information and file content.
Further, the UP/Down Sever server is mainly responsible for processing user terminal to the first generic operation of file, Including upload, down operation, and with the Hbase file interface, accumulation layer carry out file interaction, while with the accumulation layer HDFS storage carries out data reading and writing interaction.
Further, the CGI Proxy Sever is responsible for handling user terminal to the second generic operation of file, including traversal File directory creaties directory, deltrees, copied files, deleting file, the operation such as Rename file, and with the Hbase text Part interface interacts, and the Hbase file interface is used to read and write interaction with accumulation layer.
A kind of storage method of small documents storage architecture, includes the following steps:
S01: user uploads file to be stored;
S02: the UP/Down Sever server of the access layer judges file size, when file size is greater than setting valve Value, goes to step S04;Otherwise step S03 is gone to;
S03: the UP/Down Sever server end further parses file request, obtains and uses from request Family identifier, file upload the information such as path, remember using user's " identifier _ file " as Row Key in the Hbase of accumulation layer It is retrieved in record table, if showing that target uploads the file for having uploaded same file name in path with the presence of record, this When, prompt information is returned to client layer, user terminal is prompted to go up transmitting file;Otherwise, UP/Down Sever server end to The request of creation corresponding document record and write-in file is initiated in Hbase storage, and goes to step S05;
S04: the UP/Down Sever server end calls directly HDFS file write-in interface and is operated;
S05: by respective table in the file information write-in Hbase storage.
A kind of read method of small documents storage architecture, includes the following steps:
T01: the UP/Down Sever received server-side file read request obtains user " identifier _ file " Routing information;
T02: the UP/Down Sever server end is retrieved in Hbase tables of data using information obtained, If navigating to user to request to read the record of file, step T03 is gone to, step T04 is otherwise gone to;
T03: the UP/Down Sever server end reads the column cluster in Hbase file record, is directly returned by access layer The data of " file content " column are returned to user terminal;
T04: it calls traditional HDFS file to read interface and is operated.
The invention has the following advantages:
The present invention by the way that when file is written, small documents are directly stored in Hbase table structure, reading when It waits, using Hbase itself Indexing Mechanism, has the advantages that reading efficiency is high, and for HDFS, reduce to a large amount of The process of the metadata information management of small documents has the advantages that storage and reading efficiency are high.
Certainly, it implements any of the products of the present invention and does not necessarily require achieving all the advantages described above at the same time.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, will be described below to embodiment required Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for ability For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.
Fig. 1 is a kind of structural schematic diagram of small documents storage architecture of the invention;
The step schematic diagram of the working method of CGI Proxy Server server Fig. 2 of the invention;
The step schematic diagram of the working method of Up/Dowen Server server Fig. 3 of the invention;
The method and step schematic diagram that the file of Up/Dowen Server server Fig. 4 of the invention is read;
Fig. 5 is the experimental result column diagram that small documents of the invention are written;
Fig. 6 is the experimental result column diagram that small documents of the invention are read.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts all other Embodiment shall fall within the protection scope of the present invention.
Refering to Figure 1, a kind of small documents storage architecture of the invention, storage architecture from top to bottom include client layer, Access layer, accumulation layer, accumulation layer include Hbase storage and HDFS storage;
Client layer is interacted using http protocol or HTTPS agreement with access layer;
Access layer includes UP/Down Sever, two class server of CGI Proxy Sever and Hbase file interface;Access Layer calls externally packaged class VFS by Hbase file interface, and interacts with Hbase storage, and access layer is for being responsible for Receive the file operation requests of user, and forward a request to corresponding server, the read-write for carrying out file is interacted with accumulation layer;
Hbase stores for realizing the storage to the file information and file content.
Wherein, UP/Down Sever server is mainly responsible for processing user terminal to the first generic operation of file, including upper Pass, down operation, and with the Hbase file interface, accumulation layer carry out file interaction, while with the HDFS of accumulation layer store into Row data reading and writing interaction.
Wherein, CGI Proxy Sever is responsible for handling user terminal to the second generic operation of file, including traversal file mesh It records, create directory, deltreeing, copied files, deleting file, the operation such as Rename file, and being carried out with Hbase file interface Interaction, Hbase file interface are used to read and write interaction with accumulation layer.
As shown in Fig. 2, GGI Proxy Server server receives the end Client by Http Https agreement Associated documents operation requests, then intrinsic call Hbase file interface API realizes these operations;
As shown in figure 3, UP/Down Sever server to the upload file request received, first judges that file size is It is no to be greater than given threshold, if upper transmitting file is judged as big file, HDFS file write-in API is called to upload files to HDFS, if it is small documents, then intrinsic call Hbase file interface realizes that file uploads;
Hbase file interface is used to for Hbase being packaged into the mkdir of similar VFS with the excuse of directory operation about file, Rm interface function;When bottom document operating system changes, the change of upper layer calling interface not will lead to.The interface From UP/Down Sever server and GGI Proxy Server server file operation requests, necessary pretreatment is carried out, Including collect the file information, recall corresponding Hbase file read-write API, by the file information and file content write-in Hbase or It reads file content and returns to top service;
A kind of storage method of small documents storage architecture, includes the following steps:
S01: user uploads file to be stored;
S02: the UP/Down Sever server of access layer judges file size, sets threshold values when file size is greater than, turns To step S04;Otherwise step S03 is gone to;
S03:UP/Down Sever server end further parses file request, and user's mark is obtained from request Know symbol, file uploads the information such as path, using user's " identifier _ file " as Row Key accumulation layer Hbase record sheet In retrieved, if showing to have uploaded the file of same file name in target upload path with the presence of record, at this point, to Client layer returns to prompt information, and user terminal is prompted to go up transmitting file;Otherwise, UP/Down Sever server end is stored to Hbase The request of creation corresponding document record and write-in file is initiated, and goes to step S05;
S04:UP/Down Sever server end calls directly HDFS file write-in interface and is operated;
S05: by respective table in the file information write-in Hbase storage.
As shown in figure 4, a kind of read method of small documents storage architecture, includes the following steps:
T01:UP/Down Sever received server-side file read request obtains the path of user " identifier _ file " Information;
T02:UP/Down Sever server end is retrieved in Hbase tables of data using information obtained, if fixed Position requests to read the record of file to user, then goes to step T03, otherwise go to step T04;
T03:UP/Down Sever server end reads the column cluster in Hbase file record, is directly returned by access layer The data of " file content " column are to user terminal;
T04: it calls traditional HDFS file to read interface and is operated.
In order to verify the high efficiency of above-mentioned small documents storage scheme, the present embodiment to traditional HDFS, be based on the small text of Hbase The comparative test that part high-efficiency storage method is written and read.
The present embodiment selects a large amount of size 1kByte small documents as test data of experiment collection, wherein 1kByte, 2kByte, 4kByte, 8kByte, 16kByte each 5,000,000;
The experimental enviroment of the present embodiment is the Hadoop cluster of 4 nodes, and each node is configured to 16 core Intel Xeon Cpu 2.4Ghz, 32Gyte inner server, network environment is kilomega optic fiber, wherein a machine is as NameNode, remaining 3 Platform machine is all used as DataNode and RegionSever.
Hbase Master and HDFS namendode are operated on the same node, and zookeeper cluster, which operates in, to be removed On 3 machines outside NameNod, the operating system of every node installation is Cent OS5.4, and Hadoop version is 2.0, Hbase The version 1.6.0_35 of version Hbase-0.96, JDK.
Herein to HDFS, the storage method of Hbase has carried out experimental analysis in terms of small documents storage, and analysis result is as schemed Shown in 5 and Fig. 6;
Can be seen that the writing speed based on HDFS from the experimental result of Fig. 5 is 20-30MByte/s, based on Hbase's The writing speed of high-efficiency storage method is 60-80MByte/s, this is because being stored when small documents are written based on Hbase Method be write direct in Hbase tables of data, and also need to manage the information such as metadata and data block based on HDFS, so, this The method that invention proposes has having a distinct increment on writing speed;
It can be seen that in terms of small documents reading from the experimental result of Fig. 6, be 50- based on HDFS reading speed 60MByte/s, the high-efficiency storage method reading speed based on Hbase are 111-120MByte/s, and HDFS needs first to access first number According to, then interact and be read out with datanode, and Hbase utilizes its own Indexing Mechanism, can carry out to information in tables of data Quickly read.The experimental results showed that in terms of small documents read-write, efficient storage between Xiao's grace for the opportunity Hbase that the present invention rejects The method ratio HDFS reading speed that is averaged is significantly increased, and writing speed is promoted also obvious.
Small documents high-efficiency storage method proposed by the present invention based on Hbase is straight by small documents when file is written It connects and is stored in Hbase table structure, realized when reading using Hbase itself Indexing Mechanism and efficiently read, and is opposite For HDFS, the method reduce the processes of the metadata information management of HDFS large amount of small documents, in conclusion this method is used Be in the read-write of mass small documents it is efficiently feasible, in further work, will continue to study big in mass data.Small documents Same efficient storage scheme.
In the description of this specification, the description of reference term " one embodiment ", " example ", " specific example " etc. means Particular features, structures, materials, or characteristics described in conjunction with this embodiment or example are contained at least one implementation of the invention In example or example.In the present specification, schematic expression of the above terms may not refer to the same embodiment or example. Moreover, particular features, structures, materials, or characteristics described can be in any one or more of the embodiments or examples to close Suitable mode combines.
Present invention disclosed above preferred embodiment is only intended to help to illustrate the present invention.There is no detailed for preferred embodiment All details are described, are not limited the invention to the specific embodiments described.Obviously, according to the content of this specification, It can make many modifications and variations.These embodiments are chosen and specifically described to this specification, is in order to better explain the present invention Principle and practical application, so that skilled artisan be enable to better understand and utilize the present invention.The present invention is only It is limited by claims and its full scope and equivalent.

Claims (5)

1. a kind of small documents storage architecture, which is characterized in that the storage architecture from top to bottom includes client layer, access layer, deposits Reservoir, the accumulation layer include Hbase storage and HDFS storage;
The client layer is interacted using http protocol or HTTPS agreement with access layer;
The access layer includes UP/Down Sever, two class server of CGI Proxy Sever and Hbase file interface;It is described Access layer calls externally packaged class VFS by Hbase file interface, and interacts with Hbase storage, the access layer For being responsible for receiving the file operation requests of user, and forward a request to corresponding server, interacted with the accumulation layer into The read-write of style of writing part;
The Hbase storage is for realizing the storage to the file information and file content.
2. a kind of small documents storage architecture according to claim 1, which is characterized in that the UP/Down Sever service Device is mainly responsible for processing user terminal to the first generic operation of file, including uploads, down operation, and connect with the Hbase file Mouth, accumulation layer carry out file interaction, while storing progress data reading and writing with the HDFS of the accumulation layer and interacting.
3. a kind of small documents storage architecture according to claim 1, which is characterized in that the CGI Proxy Sever is negative Duty handles user terminal to the second generic operation of file, including traverses file directory, create directory, deltree, copied files, deleting Except file, Rename file etc. operate, and interacted with the Hbase file interface, the Hbase file interface be used for Accumulation layer reads and writes interaction.
4. a kind of storage method of small documents storage architecture as described in any one of claims 1-3, which is characterized in that including such as Lower step:
S01: user uploads file to be stored;
S02: the UP/Down Sever server of the access layer judges file size, sets threshold values when file size is greater than, turns To step S04;Otherwise step S03 is gone to;
S03: the UP/Down Sever server end further parses file request, and user's mark is obtained from request Know symbol, file uploads the information such as path, using user's " identifier _ file " as Row Key accumulation layer Hbase record sheet In retrieved, if showing to have uploaded the file of same file name in target upload path with the presence of record, at this point, to Client layer returns to prompt information, and user terminal is prompted to go up transmitting file;Otherwise, UP/Down Sever server end is stored to Hbase The request of creation corresponding document record and write-in file is initiated, and goes to step S05;
S04: the UP/Down Sever server end calls directly HDFS file write-in interface and is operated;
S05: by respective table in the file information write-in Hbase storage.
5. a kind of read method of small documents storage architecture as described in any one of claims 1-3, which is characterized in that including such as Lower step:
T01: the UP/Down Sever received server-side file read request obtains the path of user " identifier _ file " Information;
T02: the UP/Down Sever server end is retrieved in Hbase tables of data using information obtained, if fixed Position requests to read the record of file to user, then goes to step T03, otherwise go to step T04;
T03: the UP/Down Sever server end reads the column cluster in Hbase file record, is directly returned by access layer The data of " file content " column are to user terminal;
T04: it calls traditional HDFS file to read interface and is operated.
CN201811392837.XA 2018-11-21 2018-11-21 A kind of storage of small documents storage architecture and read method Pending CN109614373A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811392837.XA CN109614373A (en) 2018-11-21 2018-11-21 A kind of storage of small documents storage architecture and read method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811392837.XA CN109614373A (en) 2018-11-21 2018-11-21 A kind of storage of small documents storage architecture and read method

Publications (1)

Publication Number Publication Date
CN109614373A true CN109614373A (en) 2019-04-12

Family

ID=66004783

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811392837.XA Pending CN109614373A (en) 2018-11-21 2018-11-21 A kind of storage of small documents storage architecture and read method

Country Status (1)

Country Link
CN (1) CN109614373A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111367857A (en) * 2020-03-03 2020-07-03 中国联合网络通信集团有限公司 Data storage method and device, FTP server and storage medium
CN111831208A (en) * 2019-04-16 2020-10-27 中移(苏州)软件技术有限公司 Information processing method and device, terminal equipment and storage medium
CN112684985A (en) * 2021-01-04 2021-04-20 北京金山云网络技术有限公司 Data writing method and device
CN112835864A (en) * 2021-02-03 2021-05-25 北京联创信安科技股份有限公司 File storage method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110276538A1 (en) * 2010-05-06 2011-11-10 The Go Daddy Group, Inc. Cloud storage solution for reading and writing files
CN106484821A (en) * 2016-09-27 2017-03-08 浪潮软件集团有限公司 Hybrid cloud storage method under cloud computing architecture

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110276538A1 (en) * 2010-05-06 2011-11-10 The Go Daddy Group, Inc. Cloud storage solution for reading and writing files
CN106484821A (en) * 2016-09-27 2017-03-08 浪潮软件集团有限公司 Hybrid cloud storage method under cloud computing architecture

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111831208A (en) * 2019-04-16 2020-10-27 中移(苏州)软件技术有限公司 Information processing method and device, terminal equipment and storage medium
CN111367857A (en) * 2020-03-03 2020-07-03 中国联合网络通信集团有限公司 Data storage method and device, FTP server and storage medium
CN112684985A (en) * 2021-01-04 2021-04-20 北京金山云网络技术有限公司 Data writing method and device
CN112684985B (en) * 2021-01-04 2024-04-05 北京金山云网络技术有限公司 Data writing method and device
CN112835864A (en) * 2021-02-03 2021-05-25 北京联创信安科技股份有限公司 File storage method, device, equipment and storage medium
CN112835864B (en) * 2021-02-03 2024-02-20 北京联创信安科技股份有限公司 File storage method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109614373A (en) A kind of storage of small documents storage architecture and read method
US7849069B2 (en) Method and system for federated resource discovery service in distributed systems
CN105868286B (en) The parallel method of adding and system merged based on distributed file system small documents
CN105933376B (en) A kind of data manipulation method, server and storage system
JP5656563B2 (en) Document management system, document management system control method, and program
CN107045422A (en) Distributed storage method and equipment
US8572161B2 (en) Simplifying synchronization of copies of same data used by multiple applications
CN104484216A (en) Method and device for generating service interface document and on-line test tool
CN111694791B (en) Data access method and device in distributed basic framework
CN109710614A (en) A kind of method and device of real-time data memory and inquiry
CN104834648B (en) Log inquiring method and device
CN112653730A (en) User mode network file storage method and system
US10635650B1 (en) Auto-partitioning secondary index for database tables
CN108776682A (en) The method and system of random read-write object based on object storage
CN107203532A (en) Construction method, the implementation method of search and the device of directory system
CN110515894A (en) A kind of conversion method of data format, device, equipment and readable storage medium storing program for executing
CN109101599B (en) Incremental index updating method and system
CN106201351A (en) A kind of storage method based on object storage and server
CN105978944A (en) Object storage method and device
CN102693318B (en) Report query method and report query equipment
CN106126595A (en) A kind of document down loading method and device
CN106649847A (en) A large data real-time processing system based on Hadoop
JP5367470B2 (en) Storage server device and computer program
CN114466031B (en) CDN system node configuration method, device, equipment and storage medium
CN113742172B (en) Method, system and related device for collecting server logs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190412

RJ01 Rejection of invention patent application after publication