CN115544149A - Small file storage method and system based on HBase multi-terminal fusion - Google Patents

Small file storage method and system based on HBase multi-terminal fusion Download PDF

Info

Publication number
CN115544149A
CN115544149A CN202211286417.XA CN202211286417A CN115544149A CN 115544149 A CN115544149 A CN 115544149A CN 202211286417 A CN202211286417 A CN 202211286417A CN 115544149 A CN115544149 A CN 115544149A
Authority
CN
China
Prior art keywords
file
hbase
small
small file
calling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211286417.XA
Other languages
Chinese (zh)
Inventor
佘平
罗琳
李静茹
徐鑫朋
袁铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 32 Research Institute
Original Assignee
CETC 32 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 32 Research Institute filed Critical CETC 32 Research Institute
Priority to CN202211286417.XA priority Critical patent/CN115544149A/en
Publication of CN115544149A publication Critical patent/CN115544149A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/448Execution paradigms, e.g. implementations of programming paradigms
    • G06F9/4488Object-oriented
    • G06F9/449Object-oriented method invocation or resolution

Abstract

The invention provides a small file storage method and a system based on HBase multi-terminal fusion, wherein a table is established in an HBase starting database, file name related information is used as a row key, and the content, the type, the size and the creation time of a small file are stored by using an independent column cluster, and the file operation process comprises the following steps: step 1: inputting a file path and a file name; step 2: forming HBase basic data entry metadata according to the file name; and 3, step 3: and judging the file processing operation type, calling the client to connect with the HBase database, and performing small file query, deletion, downloading and addition operations according to the metadata. The invention provides a C + +/C #/Java data input interface mode, realizes uniform access of multi-source data, provides a plurality of HBase thrift services by combining the characteristics of large quantity of small files, and realizes efficient storage of a large quantity of small files in a flexible load balancing mode.

Description

Small file storage method and system based on HBase multi-terminal fusion
Technical Field
The invention relates to the technical field of data storage and processing, in particular to a small file storage method and system based on HBase multi-terminal fusion.
Background
In the field of mass data storage, data storage is generally performed by a distributed file system, and the distributed file system has a data redundancy mechanism and supports the lateral expansion of a storage system. The distributed file system is generally composed of a plurality of data nodes, metadata service provides file data attribute information, and file access needs to access metadata information of a file first and then actual data information of the file. Meanwhile, the data information takes data blocks as basic storage units, and the size of the data blocks is generally larger than that of a single file system, for example, in a distributed file system HDFS, the size of one data block is 128M.
Patent document CN114595255A (application number: CN 202210238856.7) discloses multi-source heterogeneous data fusion storage, and relates to the technical field of data storage. The multi-source heterogeneous data fusion storage comprises a HaiNaTable database management system and a storage hard disk, wherein the HaiNaTable database management system has the functions of starting fusion, newly adding, modifying, searching a main key and finishing fusion; the HaiNaTable database management system stores the data files by taking Tdb as data and TIndex as index, and stores the data files into the storage hard disk in real time, and the index files stored by the index are files in which the characteristic information of the data files is stored into Int128 through character strings generated by Md 5.
However, in a distributed file system, if there are a large number of small files, the overall performance of data access is not high because access to a large number of small files would incur metadata and data block overhead. Based on various input modes of C + +/C #/Java, file metadata information is stored in a distributed column database system HBase in Rowkey mode, and small file data is stored in a single-column data unit of the HBase, so that the small file data can be rapidly and reliably stored and accessed.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a small file storage method and system based on HBase multi-terminal fusion.
According to the small file storage method based on HBase multi-terminal fusion provided by the invention, a table is established in an HBase starting database, file name related information is used as a row key, and the content, type, size and creation time of a small file are stored by using an independent column cluster, wherein the file operation process comprises the following steps:
step 1: inputting a file path and a file name;
step 2: forming HBase basic data entry metadata according to the file name;
and step 3: and judging the file processing operation type, calling the client to connect with the HBase database, and performing small file query, deletion, downloading and addition operations according to the metadata.
Preferably, the small file query process is as follows: inputting a file name, calling an HBase third C + +/C #/Java interface for query, judging whether the file exists, if so, constructing a packaged small file object and outputting the small file object, and if not, outputting null.
Preferably, the small file deleting process is as follows: inputting a file name, and calling an HBase gradient C + +/C #/Java interface to delete;
the small file adding process comprises the following steps: inputting a small file object, reading the file content, and calling HBase gradient C + +/C #/Java interface addition.
Preferably, the small file downloading process is as follows: inputting a file name and a download address, calling an HBase third C + +/C #/Java query interface, judging whether null is returned, if yes, directly ending the flow, and if not, reading a file content field of the small file to the specified file.
Preferably, when small files are stored, the reverse timestamp, the file path and the file name information are spliced into a row key in the HBase table, and the file size, the file time, the file type and the file content are stored in a column cluster in the HBase table.
According to the small file storage system based on HBase multi-end fusion provided by the invention, a table is established in an HBase starting database, file name related information is used as a row key, and a single column cluster is used for storing the content, type, size and creation time of a small file, wherein the file operation process comprises the following modules:
a module M1: inputting a file path and a file name;
a module M2: forming HBase basic data entry metadata according to the file name;
a module M3: and judging the file processing operation type, calling the client to connect the HBase database, and performing small file query, deletion, downloading and addition operations according to the metadata.
Preferably, the small file query process is as follows: inputting a file name, calling an HBase third C + +/C #/Java interface for query, judging whether the file exists, if so, constructing a packaged small file object and outputting the small file object, and if not, outputting null.
Preferably, the small file deleting process is as follows: inputting a file name, and calling an HBase gradient C + +/C #/Java interface to delete;
the small file adding process comprises the following steps: inputting a small file object, reading the file content, and calling HBase gradient C + +/C #/Java interface addition.
Preferably, the small file downloading process is as follows: inputting a file name and a download address, calling an HBase third C + +/C #/Java query interface, judging whether null is returned, if yes, directly ending the flow, and if not, reading a file content field of the small file to the specified file.
Preferably, when small files are stored, the reverse timestamp, the file path and the file name information are spliced into a row key in the HBase table, and the file size, the file time, the file type and the file content are stored in a column cluster in the HBase table.
Compared with the prior art, the invention has the following beneficial effects:
1) The invention can effectively improve the storage performance of the small files by utilizing the columnar data storage characteristic;
2) The invention provides a C + +/C #/Java data input interface mode, which realizes uniform access of multi-source data;
3) The invention supports file operation, and realizes the functions of small file creation, deletion, reading, writing and the like based on an HBase database interface;
4) The invention provides a plurality of HBase threads services by combining the characteristics of large quantity of small files, and realizes the high-efficiency storage of mass small files in a flexible load balancing mode.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a diagram of a system configuration;
FIG. 2 is a schematic diagram of a multi-terminal fusion;
FIG. 3 is a flow chart of C + + small file query implementation;
FIG. 4 is a flowchart of a C + + small file deletion implementation;
FIG. 5 is a flowchart of a C + + small file download implementation;
FIG. 6 is a flow chart of a C + + small file addition implementation;
FIG. 7 is a flow chart of C # doclet query implementation;
FIG. 8 is a C # doclet deletion implementation flow diagram;
FIG. 9 is a flow chart of a C # small file download implementation;
FIG. 10 is a flow chart of a C # doclet addition implementation;
FIG. 11 is a flow diagram of a JAVA doclet query implementation;
FIG. 12 is a flow diagram of a JAVA doclet deletion implementation;
FIG. 13 is a flow chart of a JAVA doclet download implementation;
FIG. 14 is a flow chart of a JAVA doclet addition implementation;
FIG. 15 is a small file metadata implementation diagram;
FIG. 16 is a flow chart of client side doclet data storage.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the concept of the invention. All falling within the scope of the present invention.
Example 1:
the invention provides a small file storage method based on HBase multi-terminal fusion, which is characterized in that a table is established in an HBase starting database, file name related information is used as a row key, and the content, type, size and creation time of a small file are stored by using an independent column cluster, wherein the file operation process comprises the following steps: step 1: inputting a file path and a file name; step 2: forming HBase basic data entry metadata according to the file name; and step 3: and judging the file processing operation type, calling the client to connect with the HBase database, and performing small file query, deletion, downloading and addition operations according to the metadata.
The small file query process comprises the following steps: inputting a file name, calling an HBase third C + +/C #/Java interface for query, judging whether the file exists, if so, constructing a packaged small file object and outputting the small file object, and if not, outputting null.
The small file deleting process comprises the following steps: inputting the file name, and calling HBase gradient C + +/C #/Java interface to delete.
The small file adding process comprises the following steps: inputting a small file object, reading the file content, and calling HBase gradient C + +/C #/Java interface addition.
The small file downloading process comprises the following steps: inputting a file name and a download address, calling an HBase third C + +/C #/Java query interface, judging whether null is returned, if yes, directly ending the flow, and if not, reading a file content field of the small file to the specified file.
When small files are stored, the reverse time stamps, the file paths and the file name information are spliced into row keys in the HBase table, and the file size, the file time, the file type and the file content are stored in a column cluster in the HBase table.
Example 2:
the invention also provides a small file storage system based on HBase multi-terminal fusion, which can be realized by executing the flow steps of the small file storage method based on HBase multi-terminal fusion, namely, the small file storage method based on HBase multi-terminal fusion can be understood as the preferred implementation mode of the small file storage system based on HBase multi-terminal fusion by a person skilled in the art. The system provides multi-language end data function access on one hand, and has HBase high-performance access capability on the other hand, and can realize rapid unified storage of multi-source small file data. Meanwhile, the system is built on a distributed file system and has high data reliability and dynamic capacity expansion capacity of the data system. The system realizes the uploading and downloading interfaces of the small files, realizes the storage of the metadata and the data of the small files based on Rowkey through the uploading interface, stores the actual data on the HBase system through the data of the file interface, and supports data redundancy. And the query of the small file data based on the metadata information and the data content downloading are realized through a downloading interface. The system composition refers to fig. 1.
The specific method for realizing the HBase based on the HBase multi-terminal fusion small file storage system is to establish a large table, use file name related information as a row key, use a single column cluster to store attribute information such as content, type, size, creation time and the like of a small file, and the general flow of file operation is as follows:
1) Inputting a file path and a file name;
2) Forming HBase basic data entry metadata according to the file name information;
3) If the operation is uploading operation, acquiring file content;
4) And calling the client to connect with the HBase database, and performing uploading, querying, downloading and other operations according to the metadata.
Aiming at the read-write process in the file system, the HBase multi-terminal fusion-based small file storage system provides multiple programming languages and functionally equivalent interface realization for the client, and supports multi-terminal writing and reading of small files. Multi-terminal fusion reference is made to fig. 2.
The system specifically supports the following data inputs:
1. c + + small file end interface
The C + + small file interface realizes the operations of querying, deleting, downloading and adding small files.
And (3) small file query: receiving an input file name, calling an HBase third C + + interface for query, and returning a small file object if the file exists, specifically referring to FIG. 3.
Deleting the small file: receiving the input file name, and calling the HBase gradient C + + interface to delete, which refers to FIG. 4 specifically.
Downloading the small file: receiving the input file name and download path, calling the C + + small file interface for query, and downloading if the file exists, specifically referring to fig. 5.
Adding small files: receiving the constructed small file object information, and calling the HBase third C + + interface for addition, which specifically refers to FIG. 6.
2. C # small file end interface
The C # small file interface realizes the operations of inquiring, deleting, downloading and adding the small files.
And (3) small file query: receiving the input file name, calling HBase third C # interface for query, and returning a small file object if the file exists, specifically referring to FIG. 7.
Deleting the small file: and receiving the input file name, and calling the HBase triple C # interface for deletion, which refers to FIG. 8 specifically.
Downloading the small file: receiving the input file name and download path, calling the C # small file interface for inquiry, and downloading if the file exists, specifically referring to fig. 9.
Adding small files: receiving the constructed small file object information, and calling the HBase third interface for addition, specifically referring to fig. 10.
3. Java small file end interface
The Java small file interface realizes the operations of inquiring, deleting, downloading and adding the small files.
And (3) small file query: receiving the input file name, calling the HBase third Java interface for query, and returning a small file object if the file exists, which specifically refers to FIG. 11.
Deleting the small file: and receiving the input file name, and calling the HBase triple Java interface to delete the file name, which is specifically referred to in FIG. 12.
Downloading the small file: receiving the input file name and download path, calling the Java doclet interface for inquiry, and downloading if the file exists, specifically referring to fig. 13.
Adding small files: receiving the constructed small file object information, and calling the HBase third JAVA interface for addition, specifically referring to fig. 14.
4. Small file metadata storage
When the small file is stored, the reverse timestamp, the file path, and the file name information are spliced into a row key (Rowkey) in the HBase table, and file metadata such as the file size, the file time, the file type, and the like and the file content are stored together in a column cluster in the HBase table, which is specifically referred to fig. 15.
5. Client small file data storage process
And storing the file content in a column cluster determined by the small file Rowkey in the HBase table by the small file data storage. Considering the scenario that the number of small files is large, when the client stores data, a plurality of HBase thrift services can be provided, and the client can select one of the HBase thrift services for storing data through a load balancing configuration strategy. The client data storage flow design implementation refers specifically to fig. 16.
It is known to those skilled in the art that, in addition to implementing the system, apparatus and its various modules provided by the present invention in pure computer readable program code, the system, apparatus and its various modules provided by the present invention can be implemented in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like by completely programming the method steps. Therefore, the system, the apparatus, and the modules thereof provided by the present invention may be considered as a hardware component, and the modules included in the system, the apparatus, and the modules for implementing various programs may also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (10)

1. A small file storage method based on HBase multi-terminal fusion is characterized in that a table is established in an HBase starting database, file name related information is used as a row key, content, type, size and creation time of a small file are stored by using a single column cluster, and a file operation process comprises the following steps:
step 1: inputting a file path and a file name;
and 2, step: forming HBase basic data entry metadata according to the file name;
and step 3: and judging the file processing operation type, calling the client to connect with the HBase database, and performing small file query, deletion, downloading and addition operations according to the metadata.
2. The HBase multi-terminal fusion-based small file storage method according to claim 1, wherein the small file query process is as follows: inputting a file name, calling an HBase third C + +/C #/Java interface for query, judging whether the file exists, if so, constructing a packaged small file object and outputting the small file object, and if not, outputting null.
3. The HBase multi-terminal fusion-based small file storage method according to claim 1, wherein the small file deletion process is as follows: inputting a file name, and calling an HBase gradient C + +/C #/Java interface to delete;
the small file adding process comprises the following steps: inputting a small file object, reading the file content, and calling HBase gradient C + +/C #/Java interface addition.
4. The method for storing the small files based on the HBase multi-terminal fusion according to claim 1, wherein the small file downloading process is as follows: inputting a file name and a download address, calling an HBase third C + +/C #/Java query interface, judging whether null is returned, if yes, directly ending the flow, and if not, reading a file content field of the small file to the specified file.
5. The method for storing the small files based on the HBase multi-terminal fusion as claimed in claim 1, wherein when the small files are stored, the reverse timestamp, the file path and the file name information are spliced into a row key in the HBase table, and the file size, the file time, the file type and the file content are stored in a column cluster in the HBase table.
6. A small file storage system based on HBase multi-end fusion is characterized in that a table is established in an HBase starting database, file name related information is used as a row key, a single column cluster is used for storing the content, type, size and creation time of a small file, and a file operation flow comprises the following modules:
a module M1: inputting a file path and a file name;
a module M2: forming HBase basic data entry metadata according to the file name;
a module M3: and judging the file processing operation type, calling the client to connect the HBase database, and performing small file query, deletion, downloading and addition operations according to the metadata.
7. The HBase multi-terminal fusion-based small file storage system according to claim 6, wherein the small file query process is as follows: inputting a file name, calling an HBase third C + +/C #/Java interface for query, judging whether the file exists, if so, constructing a packaged small file object and outputting the small file object, and if not, outputting null.
8. The HBase multi-terminal fusion-based small file storage system according to claim 6, wherein the small file deletion process is as follows: inputting a file name, and calling an HBase gradient C + +/C #/Java interface to delete;
the small file adding process comprises the following steps: inputting a small file object, reading the file content, and calling HBase third C + +/C #/Java interface addition.
9. The HBase multi-terminal fusion-based small file storage system according to claim 6, wherein the small file downloading process is as follows: inputting a file name and a download address, calling an HBase third C + +/C #/Java query interface, judging whether null is returned, if yes, directly ending the flow, and if not, reading a file content field of the small file to the specified file.
10. The HBase multi-terminal fusion-based small file storage system according to claim 6, wherein when small files are stored, the reverse timestamp, the file path and the file name information are spliced into a row key in the HBase table, and the file size, the file time, the file type and the file content are stored in a column cluster in the HBase table.
CN202211286417.XA 2022-10-20 2022-10-20 Small file storage method and system based on HBase multi-terminal fusion Pending CN115544149A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211286417.XA CN115544149A (en) 2022-10-20 2022-10-20 Small file storage method and system based on HBase multi-terminal fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211286417.XA CN115544149A (en) 2022-10-20 2022-10-20 Small file storage method and system based on HBase multi-terminal fusion

Publications (1)

Publication Number Publication Date
CN115544149A true CN115544149A (en) 2022-12-30

Family

ID=84735076

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211286417.XA Pending CN115544149A (en) 2022-10-20 2022-10-20 Small file storage method and system based on HBase multi-terminal fusion

Country Status (1)

Country Link
CN (1) CN115544149A (en)

Similar Documents

Publication Publication Date Title
CN109933570B (en) Metadata management method, system and medium
US9672235B2 (en) Method and system for dynamically partitioning very large database indices on write-once tables
US10248676B2 (en) Efficient B-Tree data serialization
Vora Hadoop-HBase for large-scale data
US20140136510A1 (en) Hybrid table implementation by using buffer pool as permanent in-memory storage for memory-resident data
CN106570113B (en) Mass vector slice data cloud storage method and system
KR20200122994A (en) Key Value Append
CN109213432B (en) Storage device for writing data using log structured merge tree and method thereof
CN104881466A (en) Method and device for processing data fragments and deleting garbage files
JPH1131096A (en) Data storage/retrieval system
CN109460406B (en) Data processing method and device
KR20200056357A (en) Technique for implementing change data capture in database management system
CN113377868A (en) Offline storage system based on distributed KV database
CN113821171A (en) Key value storage method based on hash table and LSM tree
JP2015528957A (en) Distributed file system, file access method, and client device
KR20200056526A (en) Technique for implementing change data capture in database management system
US20180011897A1 (en) Data processing method having structure of cache index specified to transaction in mobile environment dbms
CN111930684A (en) Small file processing method, device and equipment based on HDFS (Hadoop distributed File System) and storage medium
JP6006740B2 (en) Index management device
CN112000666B (en) Database management system of facing array
CN115544149A (en) Small file storage method and system based on HBase multi-terminal fusion
WO2022121274A1 (en) Metadata management method and apparatus in storage system, and storage system
CN113204520A (en) Remote sensing data rapid concurrent read-write method based on distributed file system
CN112084141A (en) Full-text retrieval system capacity expansion method, device, equipment and medium
CN113515518A (en) Data storage method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination