CN111176574B - Small file storage method, device, equipment and medium - Google Patents

Small file storage method, device, equipment and medium Download PDF

Info

Publication number
CN111176574B
CN111176574B CN201911382298.6A CN201911382298A CN111176574B CN 111176574 B CN111176574 B CN 111176574B CN 201911382298 A CN201911382298 A CN 201911382298A CN 111176574 B CN111176574 B CN 111176574B
Authority
CN
China
Prior art keywords
file
small
name
aggregation information
metadata table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911382298.6A
Other languages
Chinese (zh)
Other versions
CN111176574A (en
Inventor
梁珂铭
胡永刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201911382298.6A priority Critical patent/CN111176574B/en
Publication of CN111176574A publication Critical patent/CN111176574A/en
Application granted granted Critical
Publication of CN111176574B publication Critical patent/CN111176574B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application discloses a method, a device, equipment and a medium for storing small files, which comprise the following steps: generating a corresponding file identifier by using the bucket name and the file name corresponding to the small file; adding the aggregation information corresponding to different small files to different metadata tables by using the file identification; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, and the large file is the aggregation file corresponding to the small file; when a read request for the small file sent by a user terminal is acquired, searching the corresponding metadata table, querying the corresponding aggregation information from the searched metadata table, and then reading the data of the small file in the corresponding large file by using the aggregation information. Therefore, the rate of inquiring the corresponding aggregation information from the metadata table can be increased, and the reading rate of the small file is increased.

Description

Small file storage method, device, equipment and medium
Technical Field
The present application relates to the field of storage technologies, and in particular, to a method, an apparatus, a device, and a medium for storing a small file.
Background
Mechanical hard disks are widely used in a storage system, and seek time exists in the process of reading and writing, so that for some user scenes with more small file storage requirements, the reading and writing efficiency of small files is very low, the speed is only a few MB/S, and the bottleneck of the small file storage efficiency is formed. Therefore, aiming at the application scenario of the small file, a small file aggregation scheme is proposed in the industry, the small file is firstly put into a high-speed medium such as a memory or a Solid State Disk (SSD) hard Disk, and then the small file is aggregated into a large file, and then the large file is brushed down to a mechanical hard Disk, so that the IO bottleneck of the mechanical hard Disk to the small file is reduced, and the processing speed of the system to the small file is greatly increased.
Currently, in distributed object storage, a common file reading process is to combine a bucket name and a file name in a read request into an object name, and send the object name to a bottom-layer rados storage cluster to acquire data. Therefore, in a scenario of aggregating small files into a large file, a search process of adding a large file in the middle is usually required, the aggregated large file is searched through a bucket name, a file name and a metadata directory, and then small file data is read from the searched large file and returned. Thus, there is a problem that for distributed mass storage, tens of millions of objects, even hundreds of millions of objects, may exist in a bucket, which may cause the search process of a large file to be very long, and become a bottleneck of the reading performance of a small file.
Disclosure of Invention
In view of the above, an object of the present application is to provide a method, an apparatus, a device and a medium for storing a small file, which can improve the reading rate of the small file. The specific scheme is as follows:
in a first aspect, the present application discloses a method for storing a small file, comprising:
generating a corresponding file identifier by using the bucket name and the file name corresponding to the small file;
adding the aggregation information corresponding to different small files to different metadata tables by using the file identification; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, and the large file is the aggregation file corresponding to the small file;
when a read request for the small file sent by a user terminal is acquired, searching the corresponding metadata table, querying the corresponding aggregation information from the searched metadata table, and then reading the data of the small file in the corresponding large file by using the aggregation information.
Optionally, adding, by using the file identifier, aggregation information corresponding to different small files to different metadata tables includes:
searching whether the metadata table with the data table name corresponding to the file identifier exists or not aiming at the aggregation information corresponding to any small file;
if the small files exist, adding the aggregation information corresponding to the current small files to the corresponding metadata table;
and if the small file does not exist, adding the aggregation information corresponding to the current small file to the empty metadata table, and generating the data table name of the metadata table by using the file identifier.
Optionally, the adding the aggregation information corresponding to the current doclet to the empty metadata table includes:
adding the aggregation information corresponding to the current small file to the pre-created empty metadata table;
or, creating an empty metadata table in real time, and then adding the aggregation information corresponding to the current small file to the metadata table.
Optionally, the generating a corresponding file identifier by using the bucket name and the file name corresponding to the small file includes:
generating a corresponding character string by using the bucket name and the file name corresponding to the small file;
and carrying out Hash operation on the character strings to obtain the corresponding file identification.
Optionally, the generating a data table name of the metadata table by using the file identifier includes:
reading the first character of the file identifier;
and generating the data table name by using the first character and the corresponding barrel name.
Optionally, when the read request for the small file sent by the user terminal is obtained, searching the corresponding metadata table, and querying the corresponding aggregation information from the searched metadata table, the method includes:
generating a corresponding target character string by using the bucket name and the file name in the reading request;
performing hash operation on the target character string to obtain a corresponding target file identifier;
extracting a target first character of the target file identifier;
searching the corresponding metadata table by using the target first character and the barrel name in the reading request;
and querying the corresponding aggregation information from the searched metadata table.
Optionally, before generating the corresponding file identifier by using the bucket name and the file name corresponding to the small file, the method further includes:
and acquiring a file aggregation instruction to trigger the step of generating the corresponding file identifier by using the bucket name and the file name corresponding to the small file.
In a second aspect, the present application discloses a small file storage device, comprising:
the file identifier generating module is used for generating a corresponding file identifier by using the bucket name and the file name corresponding to the small file;
the aggregation information adding module is used for adding the aggregation information corresponding to different small files to different metadata tables by using the file identifiers; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, and the large file is the aggregation file corresponding to the small file;
and the file data reading module is used for searching the corresponding metadata table when a reading request aiming at the small file sent by a user terminal is obtained, inquiring the corresponding aggregation information from the searched metadata table, and then reading the data of the small file in the corresponding large file by using the aggregation information.
In a third aspect, the present application discloses a doclet storage device comprising a processor and a memory; wherein the content of the first and second substances,
the memory is used for storing a computer program;
the processor is used for executing the computer program to realize the small file storage method.
In a fourth aspect, the present application discloses a computer readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the aforementioned doclet storage method.
Therefore, according to the method and the device, the corresponding file identifier is generated by using the bucket name and the file name corresponding to the small file, and then the aggregate information corresponding to different small files is added to different metadata tables by using the file identifier; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, the large file is the aggregation file corresponding to the small file, when a read request for the small file sent by a user terminal is obtained, the corresponding metadata table is searched, the corresponding aggregation information is inquired from the searched metadata table, and then the data of the small file in the corresponding large file is read by using the aggregation information. That is, according to the method and the device for processing the small files, aggregation information corresponding to different small files is added to different metadata tables by using file identifiers, so that the rate of querying the corresponding aggregation information from the metadata tables can be increased, and the reading rate of the small files is increased.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a method for storing a small file disclosed in the present application;
FIG. 2 is a flow chart of a specific method for storing a small file disclosed in the present application;
FIG. 3 is a schematic structural diagram of a small file storage device disclosed in the present application;
fig. 4 is a structural diagram of a small file storage device disclosed in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Currently, in distributed object storage, a common file reading process is to combine a bucket name and a file name in a read request into an object name, and send the object name to a bottom-layer rados storage cluster to acquire data. Therefore, in a scenario of aggregating small files into a large file, a search process of adding a large file in the middle is usually required, the aggregated large file is searched through a bucket name, a file name and a metadata directory, and then small file data is read from the searched large file and returned. Thus, there is a problem that for distributed mass storage, tens of millions of objects, even hundreds of millions of objects, may exist in a bucket, which may cause the search process of a large file to be very long, and become a bottleneck of the reading performance of a small file. Therefore, the small file storage scheme is provided, and the reading speed of the small file can be improved.
Referring to fig. 1, an embodiment of the present application discloses a method for storing a small file, including:
step S11: and generating a corresponding file identifier by using the bucket name and the file name corresponding to the small file.
In a specific implementation manner, this embodiment may obtain a file aggregation instruction to trigger the step of generating the corresponding file identifier by using the bucket name and the file name corresponding to the small file. That is, when the file aggregation instruction is obtained, the corresponding file identifier is generated by using the bucket name and the file name corresponding to the small file.
Of course, in some embodiments, the corresponding file identifier may also be generated in advance by using the bucket name and the file name corresponding to the small file, so that when the file aggregation instruction is obtained, the file identifier generated in advance may be read, and corresponding subsequent steps may be performed.
The small files are files with file sizes smaller than a first preset threshold value.
Step S12: adding the aggregation information corresponding to different small files to different metadata tables by using the file identification; the aggregation information includes the file identifier, a name of a corresponding large file, and an offset position of the small file in the corresponding large file, and the large file is an aggregation file corresponding to the small file.
In a specific implementation manner, in this embodiment, it may be found whether the metadata table having a data table name corresponding to the file identifier exists for aggregation information corresponding to any one of the small files; if the small files exist, adding the aggregation information corresponding to the current small files to the corresponding metadata table; and if the small file does not exist, adding the aggregation information corresponding to the current small file to the empty metadata table, and generating the data table name of the metadata table by using the file identifier.
And, in some embodiments, aggregation information corresponding to the current doclet may be added to the metadata table created in advance and empty. In other embodiments, the empty metadata table may be created in real time, and then the aggregation information corresponding to the current small file may be added to the metadata table.
That is, in this embodiment, data fragmentation storage may be performed on aggregation information corresponding to all the small files, and the file identifiers of the small files are used to store aggregation information corresponding to different small files in different metadata tables, so that the length of each metadata table is relatively short, so as to facilitate quick search of the aggregation information. The offset position of the small file in the aggregation information in the corresponding large file may include a start position and an end position of the small file in the corresponding file, and the end position is determined according to the length of the start position and the small file.
The large file is a file with the file size larger than a second preset threshold value.
Step S13: when a read request for the small file sent by a user terminal is acquired, searching the corresponding metadata table, querying the corresponding aggregation information from the searched metadata table, and then reading the data of the small file in the corresponding large file by using the aggregation information.
In a specific implementation process, when a read request for the small file sent by a user terminal is acquired, the corresponding target file identification is generated using the bucket name and the file name in the read request, and since the data table name of the metadata is created from the file identification, therefore, the corresponding metadata table can be searched according to the target file identifier, and then the corresponding aggregation information can be queried from the searched metadata table according to the target file identifier, then the name of the corresponding large file and the offset position of the small file in the corresponding large file are found by utilizing the aggregation information, and then searching a corresponding large file from the bottom layer of the storage system by using the name of the searched large file, reading the data of the small file in the searched large file according to the initial position and the end position included in the offset position, and returning the read data to the corresponding user terminal. It can be understood that the aggregate information includes the file identifier of the small file, so that the corresponding aggregate information can be queried according to the file identifier. In addition, the embodiment can use the file identifier as an index value of the aggregation information to improve the query speed.
In some embodiments, the aggregation information stored in the metadata table may include the file identifier and a name of the corresponding large file, a start position of the small file in the corresponding large file, and a length of the small file, and then, when the aggregation information is used to read the data of the small file in the corresponding large file, the data of the small file in the large file may be read by using the searched start position and length of the small file.
As can be seen, in the embodiment of the present application, a corresponding file identifier is generated by using a bucket name and a file name corresponding to a small file, and then aggregation information corresponding to different small files is added to different metadata tables by using the file identifier; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, the large file is the aggregation file corresponding to the small file, when a read request for the small file sent by a user terminal is obtained, the corresponding metadata table is searched, the corresponding aggregation information is inquired from the searched metadata table, and then the data of the small file in the corresponding large file is read by using the aggregation information. That is, according to the method and the device for processing the small files, aggregation information corresponding to different small files is added to different metadata tables by using file identifiers, so that the rate of querying the corresponding aggregation information from the metadata tables can be increased, and the reading rate of the small files is increased.
Referring to fig. 2, an embodiment of the present application discloses a specific small file storage method, including:
step S21: and generating a corresponding character string by using the bucket name and the file name corresponding to the small file.
Step S22: and carrying out Hash operation on the character strings to obtain the corresponding file identification.
That is, the implementation may generate a hash corresponding to the bucket name and the file name corresponding to the small file, and use the hash as the corresponding file identifier, so as to obtain the file identifier with a fixed length, so as to perform information search by using the file identifier.
Step S23: adding the aggregation information corresponding to different small files to different metadata tables by using the file identification; the aggregation information includes the file identifier, a name of a corresponding large file, and an offset position of the small file in the corresponding large file, and the large file is an aggregation file corresponding to the small file.
In a specific implementation manner, in this embodiment, it may be found whether the metadata table having a data table name corresponding to the file identifier exists for aggregation information corresponding to any one of the small files; if the small files exist, adding the aggregation information corresponding to the current small files to the corresponding metadata table; if the aggregation information does not exist, the aggregation information corresponding to the current small file is added to the empty metadata table, and the file identifier is used to generate the data table name of the metadata table, in this embodiment, the first character of the file identifier and the bucket name of the file identifier may generate a corresponding data table name, for example, the bucket name + the first character, so that, for the aggregation information corresponding to any small file, the corresponding metadata table may be searched by using the first character of the file identifier and the bucket name of the file identifier, and then the aggregation information is added to the corresponding metadata table, that is, in this embodiment, the aggregation information of the small file with the same file identifier first character and bucket name is added to the metadata table whose corresponding data table name is the corresponding bucket name and first character.
Step S24: when a read request for the small file sent by a user terminal is acquired, searching the corresponding metadata table, querying the corresponding aggregation information from the searched metadata table, and then reading the data of the small file in the corresponding large file by using the aggregation information.
In a specific implementation process, when a read request for the small file sent by a user terminal is acquired, a corresponding target character string is generated from a barrel name and a file name in the read request, hash operation is performed on the target character string to obtain a corresponding target file identifier, a target first character of the target file identifier is extracted, the corresponding metadata table is searched by using the target first character and the barrel name in the read request, the corresponding aggregation information is inquired from the searched metadata table, and then the data of the small file in the corresponding large file is read by using the aggregation information.
Referring to fig. 3, the present application discloses a small file storage device, including:
and the file identifier generating module 11 is configured to generate a corresponding file identifier by using the bucket name and the file name corresponding to the small file.
The aggregate information adding module 12 is configured to add aggregate information corresponding to different small files to different metadata tables by using the file identifier; the aggregation information includes the file identifier, a name of a corresponding large file, and an offset position of the small file in the corresponding large file, and the large file is an aggregation file corresponding to the small file.
The file data reading module 13 is configured to, when a read request for the small file sent by the user terminal is obtained, search the corresponding metadata table, query the corresponding aggregation information from the searched metadata table, and then read the data of the small file in the corresponding large file by using the aggregation information.
As can be seen, in the embodiment of the present application, a corresponding file identifier is generated by using a bucket name and a file name corresponding to a small file, and then aggregation information corresponding to different small files is added to different metadata tables by using the file identifier; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, the large file is the aggregation file corresponding to the small file, when a read request for the small file sent by a user terminal is obtained, the corresponding metadata table is searched, the corresponding aggregation information is inquired from the searched metadata table, and then the data of the small file in the corresponding large file is read by using the aggregation information. That is, according to the method and the device for processing the small files, aggregation information corresponding to different small files is added to different metadata tables by using file identifiers, so that the rate of querying the corresponding aggregation information from the metadata tables can be increased, and the reading rate of the small files is increased.
The aggregate information adding module 12 is specifically configured to search, for the aggregate information corresponding to any one of the small files, whether the metadata table having the data table name corresponding to the file identifier exists; if the small files exist, adding the aggregation information corresponding to the current small files to the corresponding metadata table; and if the small file does not exist, adding the aggregation information corresponding to the current small file to the empty metadata table, and generating the data table name of the metadata table by using the file identifier.
In a specific embodiment, the aggregation information adding module 12 is configured to add aggregation information corresponding to the current doclet to the empty metadata table created in advance.
In a specific embodiment, the aggregation information adding module 12 is configured to create an empty metadata table in real time, and then add aggregation information corresponding to the current small file to the metadata table.
The file identifier generating module 11 is specifically configured to generate a corresponding character string from a bucket name and a file name corresponding to a small file; and carrying out Hash operation on the character strings to obtain the corresponding file identification. Correspondingly, the aggregate information adding module 12 is configured to read a first character of the file identifier; and generating the data table name by using the first character and the corresponding barrel name.
Correspondingly, the file data reading module 13 is specifically configured to generate a corresponding target character string from the bucket name and the file name in the read request; performing hash operation on the target character string to obtain a corresponding target file identifier; extracting a target first character of the target file identifier; searching the corresponding metadata table by using the target first character and the barrel name in the reading request; and querying the corresponding aggregation information from the searched metadata table.
Further, the small file storage device further comprises an aggregation instruction acquisition module, configured to acquire a file aggregation instruction, so as to trigger the step of generating the corresponding file identifier by using the bucket name and the file name corresponding to the small file.
Referring to fig. 4, an embodiment of the present application discloses a small file storage device, which includes a processor 21 and a memory 22; wherein, the memory 22 is used for saving computer programs; the processor 21 is configured to execute the computer program to implement the following steps:
generating a corresponding file identifier by using the bucket name and the file name corresponding to the small file; adding the aggregation information corresponding to different small files to different metadata tables by using the file identification; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, and the large file is the aggregation file corresponding to the small file; when a read request for the small file sent by a user terminal is acquired, searching the corresponding metadata table, querying the corresponding aggregation information from the searched metadata table, and then reading the data of the small file in the corresponding large file by using the aggregation information.
As can be seen, in the embodiment of the present application, a corresponding file identifier is generated by using a bucket name and a file name corresponding to a small file, and then aggregation information corresponding to different small files is added to different metadata tables by using the file identifier; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, the large file is the aggregation file corresponding to the small file, when a read request for the small file sent by a user terminal is obtained, the corresponding metadata table is searched, the corresponding aggregation information is inquired from the searched metadata table, and then the data of the small file in the corresponding large file is read by using the aggregation information. That is, according to the method and the device for processing the small files, aggregation information corresponding to different small files is added to different metadata tables by using file identifiers, so that the rate of querying the corresponding aggregation information from the metadata tables can be increased, and the reading rate of the small files is increased.
In this embodiment, when the processor 21 executes the computer subprogram stored in the memory 22, the following steps may be specifically implemented: searching whether the metadata table with the data table name corresponding to the file identifier exists or not aiming at the aggregation information corresponding to any small file; if the small files exist, adding the aggregation information corresponding to the current small files to the corresponding metadata table; and if the small file does not exist, adding the aggregation information corresponding to the current small file to the empty metadata table, and generating the data table name of the metadata table by using the file identifier.
In this embodiment, when the processor 21 executes the computer subprogram stored in the memory 22, the following steps may be specifically implemented: adding the aggregation information corresponding to the current small file to the pre-created empty metadata table; or, creating an empty metadata table in real time, and then adding the aggregation information corresponding to the current small file to the metadata table.
In this embodiment, when the processor 21 executes the computer subprogram stored in the memory 22, the following steps may be specifically implemented: generating a corresponding character string by using the bucket name and the file name corresponding to the small file; and carrying out Hash operation on the character strings to obtain the corresponding file identification.
In this embodiment, when the processor 21 executes the computer subprogram stored in the memory 22, the following steps may be specifically implemented: reading the first character of the file identifier; and generating the data table name by using the first character and the corresponding barrel name.
In this embodiment, when the processor 21 executes the computer subprogram stored in the memory 22, the following steps may be specifically implemented: generating a corresponding target character string by using the bucket name and the file name in the reading request; performing hash operation on the target character string to obtain a corresponding target file identifier; extracting a target first character of the target file identifier; searching the corresponding metadata table by using the target first character and the barrel name in the reading request; and querying the corresponding aggregation information from the searched metadata table.
In this embodiment, when the processor 21 executes the computer subprogram stored in the memory 22, the following steps may be specifically implemented: and acquiring a file aggregation instruction to trigger the step of generating the corresponding file identifier by using the bucket name and the file name corresponding to the small file.
Further, an embodiment of the present application also discloses a computer readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the following steps:
generating a corresponding file identifier by using the bucket name and the file name corresponding to the small file; adding the aggregation information corresponding to different small files to different metadata tables by using the file identification; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, and the large file is the aggregation file corresponding to the small file; when a read request for the small file sent by a user terminal is acquired, searching the corresponding metadata table, querying the corresponding aggregation information from the searched metadata table, and then reading the data of the small file in the corresponding large file by using the aggregation information.
As can be seen, in the embodiment of the present application, a corresponding file identifier is generated by using a bucket name and a file name corresponding to a small file, and then aggregation information corresponding to different small files is added to different metadata tables by using the file identifier; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, the large file is the aggregation file corresponding to the small file, when a read request for the small file sent by a user terminal is obtained, the corresponding metadata table is searched, the corresponding aggregation information is inquired from the searched metadata table, and then the data of the small file in the corresponding large file is read by using the aggregation information. That is, according to the method and the device for processing the small files, aggregation information corresponding to different small files is added to different metadata tables by using file identifiers, so that the rate of querying the corresponding aggregation information from the metadata tables can be increased, and the reading rate of the small files is increased.
In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: searching whether the metadata table with the data table name corresponding to the file identifier exists or not aiming at the aggregation information corresponding to any small file; if the small files exist, adding the aggregation information corresponding to the current small files to the corresponding metadata table; and if the small file does not exist, adding the aggregation information corresponding to the current small file to the empty metadata table, and generating the data table name of the metadata table by using the file identifier.
In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: adding the aggregation information corresponding to the current small file to the pre-created empty metadata table; or, creating an empty metadata table in real time, and then adding the aggregation information corresponding to the current small file to the metadata table.
In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: generating a corresponding character string by using the bucket name and the file name corresponding to the small file; and carrying out Hash operation on the character strings to obtain the corresponding file identification.
In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: reading the first character of the file identifier; and generating the data table name by using the first character and the corresponding barrel name.
In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: generating a corresponding target character string by using the bucket name and the file name in the reading request; performing hash operation on the target character string to obtain a corresponding target file identifier; extracting a target first character of the target file identifier; searching the corresponding metadata table by using the target first character and the barrel name in the reading request; and querying the corresponding aggregation information from the searched metadata table.
In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: and acquiring a file aggregation instruction to trigger the step of generating the corresponding file identifier by using the bucket name and the file name corresponding to the small file.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The method, the device, the equipment and the medium for storing the small files provided by the application are described in detail, specific examples are applied in the description to explain the principle and the implementation of the application, and the description of the above embodiments is only used for helping to understand the method and the core idea of the application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (8)

1. A small file storage method is characterized by comprising the following steps:
generating a corresponding file identifier by using the bucket name and the file name corresponding to the small file;
adding the aggregation information corresponding to different small files to different metadata tables by using the file identification; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, and the large file is the aggregation file corresponding to the small file;
when a read request aiming at the small file sent by a user terminal is obtained, searching the corresponding metadata table, inquiring the corresponding aggregation information from the searched metadata table, and then reading the data of the small file in the corresponding large file by using the aggregation information;
adding the aggregation information corresponding to the different small files to different metadata tables by using the file identifiers, wherein the adding comprises:
searching whether the metadata table with the data table name corresponding to the file identifier exists or not aiming at the aggregation information corresponding to any small file;
if the small files exist, adding the aggregation information corresponding to the current small files to the corresponding metadata table;
if the small file does not exist in the metadata table, adding the aggregation information corresponding to the current small file to the empty metadata table, and generating a data table name of the metadata table by using the file identifier;
when the read request for the small file sent by the user terminal is obtained, searching the corresponding metadata table, and querying the corresponding aggregation information from the searched metadata table, the method includes:
generating a corresponding target character string by using the bucket name and the file name in the reading request;
performing hash operation on the target character string to obtain a corresponding target file identifier;
extracting a target first character of the target file identifier;
searching the corresponding metadata table by using the target first character and the barrel name in the reading request;
and querying the corresponding aggregation information from the searched metadata table.
2. The doclet storage method according to claim 1, wherein the adding of the aggregation information corresponding to the current doclet to the empty metadata table includes:
adding the aggregation information corresponding to the current small file to the pre-created empty metadata table;
or, creating an empty metadata table in real time, and then adding the aggregation information corresponding to the current small file to the metadata table.
3. The method for storing the small files according to claim 1, wherein the generating the corresponding file identifiers by using the bucket names and the file names corresponding to the small files comprises:
generating a corresponding character string by using the bucket name and the file name corresponding to the small file;
and carrying out Hash operation on the character strings to obtain the corresponding file identification.
4. The doclet storage method according to claim 3, wherein the generating of the data table name of the metadata table by using the file identifier includes:
reading the first character of the file identifier;
and generating the data table name by using the first character and the corresponding barrel name.
5. The method for storing the small files according to any one of claims 1 to 4, wherein before generating the corresponding file identifier by using the bucket name and the file name corresponding to the small file, the method further comprises:
and acquiring a file aggregation instruction to trigger the step of generating the corresponding file identifier by using the bucket name and the file name corresponding to the small file.
6. A small file storage device, comprising:
the file identifier generating module is used for generating a corresponding file identifier by using the bucket name and the file name corresponding to the small file;
the aggregation information adding module is used for adding the aggregation information corresponding to different small files to different metadata tables by using the file identifiers; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, and the large file is the aggregation file corresponding to the small file;
the file data reading module is used for searching the corresponding metadata table when a reading request aiming at the small file sent by a user terminal is obtained, inquiring the corresponding aggregation information from the searched metadata table, and then reading the data of the small file in the corresponding large file by using the aggregation information;
the aggregation information adding module is specifically configured to search whether the metadata table corresponding to the data table name and the file identifier exists for aggregation information corresponding to any one of the small files; if the small files exist, adding the aggregation information corresponding to the current small files to the corresponding metadata table; if the small file does not exist in the metadata table, adding the aggregation information corresponding to the current small file to the empty metadata table, and generating a data table name of the metadata table by using the file identifier;
the file data reading module is specifically used for generating a corresponding target character string from the bucket name and the file name in the reading request; performing hash operation on the target character string to obtain a corresponding target file identifier; extracting a target first character of the target file identifier; searching the corresponding metadata table by using the target first character and the barrel name in the reading request; and querying the corresponding aggregation information from the searched metadata table.
7. A small file storage device comprising a processor and a memory; wherein the content of the first and second substances,
the memory is used for storing a computer program;
the processor is configured to execute the computer program to implement the doclet storage method according to any one of claims 1 to 5.
8. A computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the doclet storage method according to any one of claims 1 to 5.
CN201911382298.6A 2019-12-27 2019-12-27 Small file storage method, device, equipment and medium Active CN111176574B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911382298.6A CN111176574B (en) 2019-12-27 2019-12-27 Small file storage method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911382298.6A CN111176574B (en) 2019-12-27 2019-12-27 Small file storage method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN111176574A CN111176574A (en) 2020-05-19
CN111176574B true CN111176574B (en) 2022-03-22

Family

ID=70647446

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911382298.6A Active CN111176574B (en) 2019-12-27 2019-12-27 Small file storage method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN111176574B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113064859B (en) * 2021-03-26 2022-11-04 山东英信计算机技术有限公司 Metadata processing method and device, electronic equipment and storage medium
CN113111194B (en) * 2021-04-07 2022-11-18 山东英信计算机技术有限公司 Object metadata aggregation method, object metadata reading device, object metadata equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102855239A (en) * 2011-06-28 2013-01-02 清华大学 Distributed geographical file system
CN105069048A (en) * 2015-07-23 2015-11-18 东方网力科技股份有限公司 Small file storage method, query method and device
CN106326292A (en) * 2015-06-29 2017-01-11 杭州海康威视数字技术股份有限公司 Data structure and file aggregation and reading methods and apparatuses
CN106776967A (en) * 2016-12-05 2017-05-31 哈尔滨工业大学(威海) Mass small documents real-time storage method and device based on sequential aggregating algorithm
CN107704203A (en) * 2017-09-27 2018-02-16 郑州云海信息技术有限公司 It polymerize delet method, device, equipment and the computer-readable storage medium of big file
CN107729432A (en) * 2017-09-29 2018-02-23 浪潮软件股份有限公司 A kind of storage of distributed small documents, read method, device and access system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8311038B2 (en) * 2009-03-30 2012-11-13 Martin Feuerhahn Instant internet browser based VoIP system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102855239A (en) * 2011-06-28 2013-01-02 清华大学 Distributed geographical file system
CN106326292A (en) * 2015-06-29 2017-01-11 杭州海康威视数字技术股份有限公司 Data structure and file aggregation and reading methods and apparatuses
CN105069048A (en) * 2015-07-23 2015-11-18 东方网力科技股份有限公司 Small file storage method, query method and device
CN106776967A (en) * 2016-12-05 2017-05-31 哈尔滨工业大学(威海) Mass small documents real-time storage method and device based on sequential aggregating algorithm
CN107704203A (en) * 2017-09-27 2018-02-16 郑州云海信息技术有限公司 It polymerize delet method, device, equipment and the computer-readable storage medium of big file
CN107729432A (en) * 2017-09-29 2018-02-23 浪潮软件股份有限公司 A kind of storage of distributed small documents, read method, device and access system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"tpNFS: Efficient Support of Small Files Processing over pNFS";Bo Wang; Jinlei Jiang; Guangwen Yang;《2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum》;20131031;摘要 *
"一种海量小文件对象存储优化方案";屠雪真; 黄震江;《计算机技术与发展》;20190327;第29卷(第8期);第31-36页 *

Also Published As

Publication number Publication date
CN111176574A (en) 2020-05-19

Similar Documents

Publication Publication Date Title
US11461027B2 (en) Deduplication-aware load balancing in distributed storage systems
US10013317B1 (en) Restoring a volume in a storage system
US20200150890A1 (en) Data Deduplication Method and Apparatus
CN106933854B (en) Short link processing method and device and server
JP5996088B2 (en) Cryptographic hash database
US8719237B2 (en) Method and apparatus for deleting duplicate data
CN107491523B (en) Method and device for storing data object
US11157445B2 (en) Indexing implementing method and system in file storage
CN107368527B (en) Multi-attribute index method based on data stream
CN105468642A (en) Data storage method and apparatus
JP2010157204A (en) Content addressable storage system and method employing searchable block
CN106951179B (en) Data migration method and device
WO2010099715A1 (en) Method, system, client and data server for data operation
CN111176574B (en) Small file storage method, device, equipment and medium
CN109766318B (en) File reading method and device
US20150066877A1 (en) Segment combining for deduplication
US20190004957A1 (en) Low-overhead index for a flash cache
CN112416880A (en) Method and device for optimizing storage performance of mass small files based on real-time merging
CN104035822A (en) Low-cost efficient internal storage redundancy removing method and system
US10241927B2 (en) Linked-list-based method and device for application caching management
TWI420333B (en) A distributed de-duplication system and the method therefore
CN113961730A (en) Graph data query method, system, computer device and readable storage medium
CN110018990B (en) Method and device for caching snapshot and method and device for reading snapshot
EP2164005B1 (en) Content addressable storage systems and methods employing searchable blocks
CN111723266A (en) Mass data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant