CN111176574B - Small file storage method, device, equipment and medium - Google Patents
Small file storage method, device, equipment and medium Download PDFInfo
- Publication number
- CN111176574B CN111176574B CN201911382298.6A CN201911382298A CN111176574B CN 111176574 B CN111176574 B CN 111176574B CN 201911382298 A CN201911382298 A CN 201911382298A CN 111176574 B CN111176574 B CN 111176574B
- Authority
- CN
- China
- Prior art keywords
- file
- small
- name
- aggregation information
- metadata table
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000004220 aggregation Methods 0.000 claims abstract description 161
- 230000002776 aggregation Effects 0.000 claims abstract description 161
- 238000004590 computer program Methods 0.000 claims description 12
- 239000000126 substance Substances 0.000 claims description 2
- 230000004931 aggregating effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0643—Management of files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0679—Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The application discloses a method, a device, equipment and a medium for storing small files, which comprise the following steps: generating a corresponding file identifier by using the bucket name and the file name corresponding to the small file; adding the aggregation information corresponding to different small files to different metadata tables by using the file identification; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, and the large file is the aggregation file corresponding to the small file; when a read request for the small file sent by a user terminal is acquired, searching the corresponding metadata table, querying the corresponding aggregation information from the searched metadata table, and then reading the data of the small file in the corresponding large file by using the aggregation information. Therefore, the rate of inquiring the corresponding aggregation information from the metadata table can be increased, and the reading rate of the small file is increased.
Description
Technical Field
The present application relates to the field of storage technologies, and in particular, to a method, an apparatus, a device, and a medium for storing a small file.
Background
Mechanical hard disks are widely used in a storage system, and seek time exists in the process of reading and writing, so that for some user scenes with more small file storage requirements, the reading and writing efficiency of small files is very low, the speed is only a few MB/S, and the bottleneck of the small file storage efficiency is formed. Therefore, aiming at the application scenario of the small file, a small file aggregation scheme is proposed in the industry, the small file is firstly put into a high-speed medium such as a memory or a Solid State Disk (SSD) hard Disk, and then the small file is aggregated into a large file, and then the large file is brushed down to a mechanical hard Disk, so that the IO bottleneck of the mechanical hard Disk to the small file is reduced, and the processing speed of the system to the small file is greatly increased.
Currently, in distributed object storage, a common file reading process is to combine a bucket name and a file name in a read request into an object name, and send the object name to a bottom-layer rados storage cluster to acquire data. Therefore, in a scenario of aggregating small files into a large file, a search process of adding a large file in the middle is usually required, the aggregated large file is searched through a bucket name, a file name and a metadata directory, and then small file data is read from the searched large file and returned. Thus, there is a problem that for distributed mass storage, tens of millions of objects, even hundreds of millions of objects, may exist in a bucket, which may cause the search process of a large file to be very long, and become a bottleneck of the reading performance of a small file.
Disclosure of Invention
In view of the above, an object of the present application is to provide a method, an apparatus, a device and a medium for storing a small file, which can improve the reading rate of the small file. The specific scheme is as follows:
in a first aspect, the present application discloses a method for storing a small file, comprising:
generating a corresponding file identifier by using the bucket name and the file name corresponding to the small file;
adding the aggregation information corresponding to different small files to different metadata tables by using the file identification; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, and the large file is the aggregation file corresponding to the small file;
when a read request for the small file sent by a user terminal is acquired, searching the corresponding metadata table, querying the corresponding aggregation information from the searched metadata table, and then reading the data of the small file in the corresponding large file by using the aggregation information.
Optionally, adding, by using the file identifier, aggregation information corresponding to different small files to different metadata tables includes:
searching whether the metadata table with the data table name corresponding to the file identifier exists or not aiming at the aggregation information corresponding to any small file;
if the small files exist, adding the aggregation information corresponding to the current small files to the corresponding metadata table;
and if the small file does not exist, adding the aggregation information corresponding to the current small file to the empty metadata table, and generating the data table name of the metadata table by using the file identifier.
Optionally, the adding the aggregation information corresponding to the current doclet to the empty metadata table includes:
adding the aggregation information corresponding to the current small file to the pre-created empty metadata table;
or, creating an empty metadata table in real time, and then adding the aggregation information corresponding to the current small file to the metadata table.
Optionally, the generating a corresponding file identifier by using the bucket name and the file name corresponding to the small file includes:
generating a corresponding character string by using the bucket name and the file name corresponding to the small file;
and carrying out Hash operation on the character strings to obtain the corresponding file identification.
Optionally, the generating a data table name of the metadata table by using the file identifier includes:
reading the first character of the file identifier;
and generating the data table name by using the first character and the corresponding barrel name.
Optionally, when the read request for the small file sent by the user terminal is obtained, searching the corresponding metadata table, and querying the corresponding aggregation information from the searched metadata table, the method includes:
generating a corresponding target character string by using the bucket name and the file name in the reading request;
performing hash operation on the target character string to obtain a corresponding target file identifier;
extracting a target first character of the target file identifier;
searching the corresponding metadata table by using the target first character and the barrel name in the reading request;
and querying the corresponding aggregation information from the searched metadata table.
Optionally, before generating the corresponding file identifier by using the bucket name and the file name corresponding to the small file, the method further includes:
and acquiring a file aggregation instruction to trigger the step of generating the corresponding file identifier by using the bucket name and the file name corresponding to the small file.
In a second aspect, the present application discloses a small file storage device, comprising:
the file identifier generating module is used for generating a corresponding file identifier by using the bucket name and the file name corresponding to the small file;
the aggregation information adding module is used for adding the aggregation information corresponding to different small files to different metadata tables by using the file identifiers; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, and the large file is the aggregation file corresponding to the small file;
and the file data reading module is used for searching the corresponding metadata table when a reading request aiming at the small file sent by a user terminal is obtained, inquiring the corresponding aggregation information from the searched metadata table, and then reading the data of the small file in the corresponding large file by using the aggregation information.
In a third aspect, the present application discloses a doclet storage device comprising a processor and a memory; wherein the content of the first and second substances,
the memory is used for storing a computer program;
the processor is used for executing the computer program to realize the small file storage method.
In a fourth aspect, the present application discloses a computer readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the aforementioned doclet storage method.
Therefore, according to the method and the device, the corresponding file identifier is generated by using the bucket name and the file name corresponding to the small file, and then the aggregate information corresponding to different small files is added to different metadata tables by using the file identifier; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, the large file is the aggregation file corresponding to the small file, when a read request for the small file sent by a user terminal is obtained, the corresponding metadata table is searched, the corresponding aggregation information is inquired from the searched metadata table, and then the data of the small file in the corresponding large file is read by using the aggregation information. That is, according to the method and the device for processing the small files, aggregation information corresponding to different small files is added to different metadata tables by using file identifiers, so that the rate of querying the corresponding aggregation information from the metadata tables can be increased, and the reading rate of the small files is increased.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a method for storing a small file disclosed in the present application;
FIG. 2 is a flow chart of a specific method for storing a small file disclosed in the present application;
FIG. 3 is a schematic structural diagram of a small file storage device disclosed in the present application;
fig. 4 is a structural diagram of a small file storage device disclosed in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Currently, in distributed object storage, a common file reading process is to combine a bucket name and a file name in a read request into an object name, and send the object name to a bottom-layer rados storage cluster to acquire data. Therefore, in a scenario of aggregating small files into a large file, a search process of adding a large file in the middle is usually required, the aggregated large file is searched through a bucket name, a file name and a metadata directory, and then small file data is read from the searched large file and returned. Thus, there is a problem that for distributed mass storage, tens of millions of objects, even hundreds of millions of objects, may exist in a bucket, which may cause the search process of a large file to be very long, and become a bottleneck of the reading performance of a small file. Therefore, the small file storage scheme is provided, and the reading speed of the small file can be improved.
Referring to fig. 1, an embodiment of the present application discloses a method for storing a small file, including:
step S11: and generating a corresponding file identifier by using the bucket name and the file name corresponding to the small file.
In a specific implementation manner, this embodiment may obtain a file aggregation instruction to trigger the step of generating the corresponding file identifier by using the bucket name and the file name corresponding to the small file. That is, when the file aggregation instruction is obtained, the corresponding file identifier is generated by using the bucket name and the file name corresponding to the small file.
Of course, in some embodiments, the corresponding file identifier may also be generated in advance by using the bucket name and the file name corresponding to the small file, so that when the file aggregation instruction is obtained, the file identifier generated in advance may be read, and corresponding subsequent steps may be performed.
The small files are files with file sizes smaller than a first preset threshold value.
Step S12: adding the aggregation information corresponding to different small files to different metadata tables by using the file identification; the aggregation information includes the file identifier, a name of a corresponding large file, and an offset position of the small file in the corresponding large file, and the large file is an aggregation file corresponding to the small file.
In a specific implementation manner, in this embodiment, it may be found whether the metadata table having a data table name corresponding to the file identifier exists for aggregation information corresponding to any one of the small files; if the small files exist, adding the aggregation information corresponding to the current small files to the corresponding metadata table; and if the small file does not exist, adding the aggregation information corresponding to the current small file to the empty metadata table, and generating the data table name of the metadata table by using the file identifier.
And, in some embodiments, aggregation information corresponding to the current doclet may be added to the metadata table created in advance and empty. In other embodiments, the empty metadata table may be created in real time, and then the aggregation information corresponding to the current small file may be added to the metadata table.
That is, in this embodiment, data fragmentation storage may be performed on aggregation information corresponding to all the small files, and the file identifiers of the small files are used to store aggregation information corresponding to different small files in different metadata tables, so that the length of each metadata table is relatively short, so as to facilitate quick search of the aggregation information. The offset position of the small file in the aggregation information in the corresponding large file may include a start position and an end position of the small file in the corresponding file, and the end position is determined according to the length of the start position and the small file.
The large file is a file with the file size larger than a second preset threshold value.
Step S13: when a read request for the small file sent by a user terminal is acquired, searching the corresponding metadata table, querying the corresponding aggregation information from the searched metadata table, and then reading the data of the small file in the corresponding large file by using the aggregation information.
In a specific implementation process, when a read request for the small file sent by a user terminal is acquired, the corresponding target file identification is generated using the bucket name and the file name in the read request, and since the data table name of the metadata is created from the file identification, therefore, the corresponding metadata table can be searched according to the target file identifier, and then the corresponding aggregation information can be queried from the searched metadata table according to the target file identifier, then the name of the corresponding large file and the offset position of the small file in the corresponding large file are found by utilizing the aggregation information, and then searching a corresponding large file from the bottom layer of the storage system by using the name of the searched large file, reading the data of the small file in the searched large file according to the initial position and the end position included in the offset position, and returning the read data to the corresponding user terminal. It can be understood that the aggregate information includes the file identifier of the small file, so that the corresponding aggregate information can be queried according to the file identifier. In addition, the embodiment can use the file identifier as an index value of the aggregation information to improve the query speed.
In some embodiments, the aggregation information stored in the metadata table may include the file identifier and a name of the corresponding large file, a start position of the small file in the corresponding large file, and a length of the small file, and then, when the aggregation information is used to read the data of the small file in the corresponding large file, the data of the small file in the large file may be read by using the searched start position and length of the small file.
As can be seen, in the embodiment of the present application, a corresponding file identifier is generated by using a bucket name and a file name corresponding to a small file, and then aggregation information corresponding to different small files is added to different metadata tables by using the file identifier; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, the large file is the aggregation file corresponding to the small file, when a read request for the small file sent by a user terminal is obtained, the corresponding metadata table is searched, the corresponding aggregation information is inquired from the searched metadata table, and then the data of the small file in the corresponding large file is read by using the aggregation information. That is, according to the method and the device for processing the small files, aggregation information corresponding to different small files is added to different metadata tables by using file identifiers, so that the rate of querying the corresponding aggregation information from the metadata tables can be increased, and the reading rate of the small files is increased.
Referring to fig. 2, an embodiment of the present application discloses a specific small file storage method, including:
step S21: and generating a corresponding character string by using the bucket name and the file name corresponding to the small file.
Step S22: and carrying out Hash operation on the character strings to obtain the corresponding file identification.
That is, the implementation may generate a hash corresponding to the bucket name and the file name corresponding to the small file, and use the hash as the corresponding file identifier, so as to obtain the file identifier with a fixed length, so as to perform information search by using the file identifier.
Step S23: adding the aggregation information corresponding to different small files to different metadata tables by using the file identification; the aggregation information includes the file identifier, a name of a corresponding large file, and an offset position of the small file in the corresponding large file, and the large file is an aggregation file corresponding to the small file.
In a specific implementation manner, in this embodiment, it may be found whether the metadata table having a data table name corresponding to the file identifier exists for aggregation information corresponding to any one of the small files; if the small files exist, adding the aggregation information corresponding to the current small files to the corresponding metadata table; if the aggregation information does not exist, the aggregation information corresponding to the current small file is added to the empty metadata table, and the file identifier is used to generate the data table name of the metadata table, in this embodiment, the first character of the file identifier and the bucket name of the file identifier may generate a corresponding data table name, for example, the bucket name + the first character, so that, for the aggregation information corresponding to any small file, the corresponding metadata table may be searched by using the first character of the file identifier and the bucket name of the file identifier, and then the aggregation information is added to the corresponding metadata table, that is, in this embodiment, the aggregation information of the small file with the same file identifier first character and bucket name is added to the metadata table whose corresponding data table name is the corresponding bucket name and first character.
Step S24: when a read request for the small file sent by a user terminal is acquired, searching the corresponding metadata table, querying the corresponding aggregation information from the searched metadata table, and then reading the data of the small file in the corresponding large file by using the aggregation information.
In a specific implementation process, when a read request for the small file sent by a user terminal is acquired, a corresponding target character string is generated from a barrel name and a file name in the read request, hash operation is performed on the target character string to obtain a corresponding target file identifier, a target first character of the target file identifier is extracted, the corresponding metadata table is searched by using the target first character and the barrel name in the read request, the corresponding aggregation information is inquired from the searched metadata table, and then the data of the small file in the corresponding large file is read by using the aggregation information.
Referring to fig. 3, the present application discloses a small file storage device, including:
and the file identifier generating module 11 is configured to generate a corresponding file identifier by using the bucket name and the file name corresponding to the small file.
The aggregate information adding module 12 is configured to add aggregate information corresponding to different small files to different metadata tables by using the file identifier; the aggregation information includes the file identifier, a name of a corresponding large file, and an offset position of the small file in the corresponding large file, and the large file is an aggregation file corresponding to the small file.
The file data reading module 13 is configured to, when a read request for the small file sent by the user terminal is obtained, search the corresponding metadata table, query the corresponding aggregation information from the searched metadata table, and then read the data of the small file in the corresponding large file by using the aggregation information.
As can be seen, in the embodiment of the present application, a corresponding file identifier is generated by using a bucket name and a file name corresponding to a small file, and then aggregation information corresponding to different small files is added to different metadata tables by using the file identifier; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, the large file is the aggregation file corresponding to the small file, when a read request for the small file sent by a user terminal is obtained, the corresponding metadata table is searched, the corresponding aggregation information is inquired from the searched metadata table, and then the data of the small file in the corresponding large file is read by using the aggregation information. That is, according to the method and the device for processing the small files, aggregation information corresponding to different small files is added to different metadata tables by using file identifiers, so that the rate of querying the corresponding aggregation information from the metadata tables can be increased, and the reading rate of the small files is increased.
The aggregate information adding module 12 is specifically configured to search, for the aggregate information corresponding to any one of the small files, whether the metadata table having the data table name corresponding to the file identifier exists; if the small files exist, adding the aggregation information corresponding to the current small files to the corresponding metadata table; and if the small file does not exist, adding the aggregation information corresponding to the current small file to the empty metadata table, and generating the data table name of the metadata table by using the file identifier.
In a specific embodiment, the aggregation information adding module 12 is configured to add aggregation information corresponding to the current doclet to the empty metadata table created in advance.
In a specific embodiment, the aggregation information adding module 12 is configured to create an empty metadata table in real time, and then add aggregation information corresponding to the current small file to the metadata table.
The file identifier generating module 11 is specifically configured to generate a corresponding character string from a bucket name and a file name corresponding to a small file; and carrying out Hash operation on the character strings to obtain the corresponding file identification. Correspondingly, the aggregate information adding module 12 is configured to read a first character of the file identifier; and generating the data table name by using the first character and the corresponding barrel name.
Correspondingly, the file data reading module 13 is specifically configured to generate a corresponding target character string from the bucket name and the file name in the read request; performing hash operation on the target character string to obtain a corresponding target file identifier; extracting a target first character of the target file identifier; searching the corresponding metadata table by using the target first character and the barrel name in the reading request; and querying the corresponding aggregation information from the searched metadata table.
Further, the small file storage device further comprises an aggregation instruction acquisition module, configured to acquire a file aggregation instruction, so as to trigger the step of generating the corresponding file identifier by using the bucket name and the file name corresponding to the small file.
Referring to fig. 4, an embodiment of the present application discloses a small file storage device, which includes a processor 21 and a memory 22; wherein, the memory 22 is used for saving computer programs; the processor 21 is configured to execute the computer program to implement the following steps:
generating a corresponding file identifier by using the bucket name and the file name corresponding to the small file; adding the aggregation information corresponding to different small files to different metadata tables by using the file identification; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, and the large file is the aggregation file corresponding to the small file; when a read request for the small file sent by a user terminal is acquired, searching the corresponding metadata table, querying the corresponding aggregation information from the searched metadata table, and then reading the data of the small file in the corresponding large file by using the aggregation information.
As can be seen, in the embodiment of the present application, a corresponding file identifier is generated by using a bucket name and a file name corresponding to a small file, and then aggregation information corresponding to different small files is added to different metadata tables by using the file identifier; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, the large file is the aggregation file corresponding to the small file, when a read request for the small file sent by a user terminal is obtained, the corresponding metadata table is searched, the corresponding aggregation information is inquired from the searched metadata table, and then the data of the small file in the corresponding large file is read by using the aggregation information. That is, according to the method and the device for processing the small files, aggregation information corresponding to different small files is added to different metadata tables by using file identifiers, so that the rate of querying the corresponding aggregation information from the metadata tables can be increased, and the reading rate of the small files is increased.
In this embodiment, when the processor 21 executes the computer subprogram stored in the memory 22, the following steps may be specifically implemented: searching whether the metadata table with the data table name corresponding to the file identifier exists or not aiming at the aggregation information corresponding to any small file; if the small files exist, adding the aggregation information corresponding to the current small files to the corresponding metadata table; and if the small file does not exist, adding the aggregation information corresponding to the current small file to the empty metadata table, and generating the data table name of the metadata table by using the file identifier.
In this embodiment, when the processor 21 executes the computer subprogram stored in the memory 22, the following steps may be specifically implemented: adding the aggregation information corresponding to the current small file to the pre-created empty metadata table; or, creating an empty metadata table in real time, and then adding the aggregation information corresponding to the current small file to the metadata table.
In this embodiment, when the processor 21 executes the computer subprogram stored in the memory 22, the following steps may be specifically implemented: generating a corresponding character string by using the bucket name and the file name corresponding to the small file; and carrying out Hash operation on the character strings to obtain the corresponding file identification.
In this embodiment, when the processor 21 executes the computer subprogram stored in the memory 22, the following steps may be specifically implemented: reading the first character of the file identifier; and generating the data table name by using the first character and the corresponding barrel name.
In this embodiment, when the processor 21 executes the computer subprogram stored in the memory 22, the following steps may be specifically implemented: generating a corresponding target character string by using the bucket name and the file name in the reading request; performing hash operation on the target character string to obtain a corresponding target file identifier; extracting a target first character of the target file identifier; searching the corresponding metadata table by using the target first character and the barrel name in the reading request; and querying the corresponding aggregation information from the searched metadata table.
In this embodiment, when the processor 21 executes the computer subprogram stored in the memory 22, the following steps may be specifically implemented: and acquiring a file aggregation instruction to trigger the step of generating the corresponding file identifier by using the bucket name and the file name corresponding to the small file.
Further, an embodiment of the present application also discloses a computer readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the following steps:
generating a corresponding file identifier by using the bucket name and the file name corresponding to the small file; adding the aggregation information corresponding to different small files to different metadata tables by using the file identification; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, and the large file is the aggregation file corresponding to the small file; when a read request for the small file sent by a user terminal is acquired, searching the corresponding metadata table, querying the corresponding aggregation information from the searched metadata table, and then reading the data of the small file in the corresponding large file by using the aggregation information.
As can be seen, in the embodiment of the present application, a corresponding file identifier is generated by using a bucket name and a file name corresponding to a small file, and then aggregation information corresponding to different small files is added to different metadata tables by using the file identifier; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, the large file is the aggregation file corresponding to the small file, when a read request for the small file sent by a user terminal is obtained, the corresponding metadata table is searched, the corresponding aggregation information is inquired from the searched metadata table, and then the data of the small file in the corresponding large file is read by using the aggregation information. That is, according to the method and the device for processing the small files, aggregation information corresponding to different small files is added to different metadata tables by using file identifiers, so that the rate of querying the corresponding aggregation information from the metadata tables can be increased, and the reading rate of the small files is increased.
In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: searching whether the metadata table with the data table name corresponding to the file identifier exists or not aiming at the aggregation information corresponding to any small file; if the small files exist, adding the aggregation information corresponding to the current small files to the corresponding metadata table; and if the small file does not exist, adding the aggregation information corresponding to the current small file to the empty metadata table, and generating the data table name of the metadata table by using the file identifier.
In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: adding the aggregation information corresponding to the current small file to the pre-created empty metadata table; or, creating an empty metadata table in real time, and then adding the aggregation information corresponding to the current small file to the metadata table.
In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: generating a corresponding character string by using the bucket name and the file name corresponding to the small file; and carrying out Hash operation on the character strings to obtain the corresponding file identification.
In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: reading the first character of the file identifier; and generating the data table name by using the first character and the corresponding barrel name.
In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: generating a corresponding target character string by using the bucket name and the file name in the reading request; performing hash operation on the target character string to obtain a corresponding target file identifier; extracting a target first character of the target file identifier; searching the corresponding metadata table by using the target first character and the barrel name in the reading request; and querying the corresponding aggregation information from the searched metadata table.
In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: and acquiring a file aggregation instruction to trigger the step of generating the corresponding file identifier by using the bucket name and the file name corresponding to the small file.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The method, the device, the equipment and the medium for storing the small files provided by the application are described in detail, specific examples are applied in the description to explain the principle and the implementation of the application, and the description of the above embodiments is only used for helping to understand the method and the core idea of the application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.
Claims (8)
1. A small file storage method is characterized by comprising the following steps:
generating a corresponding file identifier by using the bucket name and the file name corresponding to the small file;
adding the aggregation information corresponding to different small files to different metadata tables by using the file identification; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, and the large file is the aggregation file corresponding to the small file;
when a read request aiming at the small file sent by a user terminal is obtained, searching the corresponding metadata table, inquiring the corresponding aggregation information from the searched metadata table, and then reading the data of the small file in the corresponding large file by using the aggregation information;
adding the aggregation information corresponding to the different small files to different metadata tables by using the file identifiers, wherein the adding comprises:
searching whether the metadata table with the data table name corresponding to the file identifier exists or not aiming at the aggregation information corresponding to any small file;
if the small files exist, adding the aggregation information corresponding to the current small files to the corresponding metadata table;
if the small file does not exist in the metadata table, adding the aggregation information corresponding to the current small file to the empty metadata table, and generating a data table name of the metadata table by using the file identifier;
when the read request for the small file sent by the user terminal is obtained, searching the corresponding metadata table, and querying the corresponding aggregation information from the searched metadata table, the method includes:
generating a corresponding target character string by using the bucket name and the file name in the reading request;
performing hash operation on the target character string to obtain a corresponding target file identifier;
extracting a target first character of the target file identifier;
searching the corresponding metadata table by using the target first character and the barrel name in the reading request;
and querying the corresponding aggregation information from the searched metadata table.
2. The doclet storage method according to claim 1, wherein the adding of the aggregation information corresponding to the current doclet to the empty metadata table includes:
adding the aggregation information corresponding to the current small file to the pre-created empty metadata table;
or, creating an empty metadata table in real time, and then adding the aggregation information corresponding to the current small file to the metadata table.
3. The method for storing the small files according to claim 1, wherein the generating the corresponding file identifiers by using the bucket names and the file names corresponding to the small files comprises:
generating a corresponding character string by using the bucket name and the file name corresponding to the small file;
and carrying out Hash operation on the character strings to obtain the corresponding file identification.
4. The doclet storage method according to claim 3, wherein the generating of the data table name of the metadata table by using the file identifier includes:
reading the first character of the file identifier;
and generating the data table name by using the first character and the corresponding barrel name.
5. The method for storing the small files according to any one of claims 1 to 4, wherein before generating the corresponding file identifier by using the bucket name and the file name corresponding to the small file, the method further comprises:
and acquiring a file aggregation instruction to trigger the step of generating the corresponding file identifier by using the bucket name and the file name corresponding to the small file.
6. A small file storage device, comprising:
the file identifier generating module is used for generating a corresponding file identifier by using the bucket name and the file name corresponding to the small file;
the aggregation information adding module is used for adding the aggregation information corresponding to different small files to different metadata tables by using the file identifiers; the aggregation information comprises the file identification, the name of a corresponding large file and the offset position of the small file in the corresponding large file, and the large file is the aggregation file corresponding to the small file;
the file data reading module is used for searching the corresponding metadata table when a reading request aiming at the small file sent by a user terminal is obtained, inquiring the corresponding aggregation information from the searched metadata table, and then reading the data of the small file in the corresponding large file by using the aggregation information;
the aggregation information adding module is specifically configured to search whether the metadata table corresponding to the data table name and the file identifier exists for aggregation information corresponding to any one of the small files; if the small files exist, adding the aggregation information corresponding to the current small files to the corresponding metadata table; if the small file does not exist in the metadata table, adding the aggregation information corresponding to the current small file to the empty metadata table, and generating a data table name of the metadata table by using the file identifier;
the file data reading module is specifically used for generating a corresponding target character string from the bucket name and the file name in the reading request; performing hash operation on the target character string to obtain a corresponding target file identifier; extracting a target first character of the target file identifier; searching the corresponding metadata table by using the target first character and the barrel name in the reading request; and querying the corresponding aggregation information from the searched metadata table.
7. A small file storage device comprising a processor and a memory; wherein the content of the first and second substances,
the memory is used for storing a computer program;
the processor is configured to execute the computer program to implement the doclet storage method according to any one of claims 1 to 5.
8. A computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the doclet storage method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911382298.6A CN111176574B (en) | 2019-12-27 | 2019-12-27 | Small file storage method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911382298.6A CN111176574B (en) | 2019-12-27 | 2019-12-27 | Small file storage method, device, equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111176574A CN111176574A (en) | 2020-05-19 |
CN111176574B true CN111176574B (en) | 2022-03-22 |
Family
ID=70647446
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911382298.6A Active CN111176574B (en) | 2019-12-27 | 2019-12-27 | Small file storage method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111176574B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113064859B (en) * | 2021-03-26 | 2022-11-04 | 山东英信计算机技术有限公司 | Metadata processing method and device, electronic equipment and storage medium |
CN113111194B (en) * | 2021-04-07 | 2022-11-18 | 山东英信计算机技术有限公司 | Object metadata aggregation method, object metadata reading device, object metadata equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102855239A (en) * | 2011-06-28 | 2013-01-02 | 清华大学 | Distributed geographical file system |
CN105069048A (en) * | 2015-07-23 | 2015-11-18 | 东方网力科技股份有限公司 | Small file storage method, query method and device |
CN106326292A (en) * | 2015-06-29 | 2017-01-11 | 杭州海康威视数字技术股份有限公司 | Data structure and file aggregation and reading methods and apparatuses |
CN106776967A (en) * | 2016-12-05 | 2017-05-31 | 哈尔滨工业大学(威海) | Mass small documents real-time storage method and device based on sequential aggregating algorithm |
CN107704203A (en) * | 2017-09-27 | 2018-02-16 | 郑州云海信息技术有限公司 | It polymerize delet method, device, equipment and the computer-readable storage medium of big file |
CN107729432A (en) * | 2017-09-29 | 2018-02-23 | 浪潮软件股份有限公司 | A kind of storage of distributed small documents, read method, device and access system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8311038B2 (en) * | 2009-03-30 | 2012-11-13 | Martin Feuerhahn | Instant internet browser based VoIP system |
-
2019
- 2019-12-27 CN CN201911382298.6A patent/CN111176574B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102855239A (en) * | 2011-06-28 | 2013-01-02 | 清华大学 | Distributed geographical file system |
CN106326292A (en) * | 2015-06-29 | 2017-01-11 | 杭州海康威视数字技术股份有限公司 | Data structure and file aggregation and reading methods and apparatuses |
CN105069048A (en) * | 2015-07-23 | 2015-11-18 | 东方网力科技股份有限公司 | Small file storage method, query method and device |
CN106776967A (en) * | 2016-12-05 | 2017-05-31 | 哈尔滨工业大学(威海) | Mass small documents real-time storage method and device based on sequential aggregating algorithm |
CN107704203A (en) * | 2017-09-27 | 2018-02-16 | 郑州云海信息技术有限公司 | It polymerize delet method, device, equipment and the computer-readable storage medium of big file |
CN107729432A (en) * | 2017-09-29 | 2018-02-23 | 浪潮软件股份有限公司 | A kind of storage of distributed small documents, read method, device and access system |
Non-Patent Citations (2)
Title |
---|
"tpNFS: Efficient Support of Small Files Processing over pNFS";Bo Wang; Jinlei Jiang; Guangwen Yang;《2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum》;20131031;摘要 * |
"一种海量小文件对象存储优化方案";屠雪真; 黄震江;《计算机技术与发展》;20190327;第29卷(第8期);第31-36页 * |
Also Published As
Publication number | Publication date |
---|---|
CN111176574A (en) | 2020-05-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11461027B2 (en) | Deduplication-aware load balancing in distributed storage systems | |
US10013317B1 (en) | Restoring a volume in a storage system | |
US20200150890A1 (en) | Data Deduplication Method and Apparatus | |
CN106933854B (en) | Short link processing method and device and server | |
JP5996088B2 (en) | Cryptographic hash database | |
US8719237B2 (en) | Method and apparatus for deleting duplicate data | |
CN107491523B (en) | Method and device for storing data object | |
US11157445B2 (en) | Indexing implementing method and system in file storage | |
CN107368527B (en) | Multi-attribute index method based on data stream | |
CN105468642A (en) | Data storage method and apparatus | |
JP2010157204A (en) | Content addressable storage system and method employing searchable block | |
CN106951179B (en) | Data migration method and device | |
WO2010099715A1 (en) | Method, system, client and data server for data operation | |
CN111176574B (en) | Small file storage method, device, equipment and medium | |
CN109766318B (en) | File reading method and device | |
US20150066877A1 (en) | Segment combining for deduplication | |
US20190004957A1 (en) | Low-overhead index for a flash cache | |
CN112416880A (en) | Method and device for optimizing storage performance of mass small files based on real-time merging | |
CN104035822A (en) | Low-cost efficient internal storage redundancy removing method and system | |
US10241927B2 (en) | Linked-list-based method and device for application caching management | |
TWI420333B (en) | A distributed de-duplication system and the method therefore | |
CN113961730A (en) | Graph data query method, system, computer device and readable storage medium | |
CN110018990B (en) | Method and device for caching snapshot and method and device for reading snapshot | |
EP2164005B1 (en) | Content addressable storage systems and methods employing searchable blocks | |
CN111723266A (en) | Mass data processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |