WO2022228458A1

WO2022228458A1 - Access optimization method, apparatus and device for large quantity of small files, and storage medium

Info

Publication number: WO2022228458A1
Application number: PCT/CN2022/089529
Authority: WO
Inventors: 郑平
Original assignee: 康键信息技术（深圳）有限公司
Priority date: 2021-04-30
Filing date: 2022-04-27
Publication date: 2022-11-03
Also published as: CN113176857A

Abstract

The present application relates to the field of base frame operation and maintenance, and discloses an access optimization method, apparatus and device for a large quantity of small files, and a storage medium. The access optimization method for a large quantity of small files comprises: by means of a preset dynamic division rule, dividing files to be stored into a local disk storage pool and a high-speed disk storage pool; then, setting a disk array structure by means of the high-speed disk storage pool, thereby improving the performance of the high-speed disk storage pool; afterwards, reading files in the local disk storage pool and generating a corresponding file reading record; and finally, dynamically monitoring the file reading record, and migrating files the number of reading instances of which is greater than a preset threshold to the high-speed disk storage pool, and generating, in a memory, a storage position record of corresponding files; when a read request for the migrated files is received, then according to the storage position record in the memory, directly redirecting to said files for reading. Thus, the I/O number accessed by a local disk is reduced, and the file reading speed is faster, thereby improving local file access performance.

Description

Mass small file access optimization method, device, device and storage medium

This application claims the priority of the Chinese patent application filed on April 30, 2021 with the application number 202110484057.3 and the invention titled "Method, Apparatus, Equipment and Storage Medium for Accessing Massive Small Files", the entire contents of which are Incorporated in the application by reference.

technical field

The present application relates to the field of scaffolding operation and maintenance, and in particular, to a method, apparatus, device and storage medium for optimizing access to massive small files.

Background technique

At present, there are a large number of small files in Internet applications, such as video files divided into small segments, pictures in shopping web pages, pictures in news websites, etc. In addition, large websites may store more than 10 billion pictures. Efficiency becomes a key issue affecting service performance. In real scenarios, the disk is frequently read, which will make the disk work under high load, thus affecting the performance of the disk.

In response to the phenomenon of "frequent disk reading affects disk performance", the inventor realized that the existing performance optimization solution is to cache frequently read files through memory, but this method is still based on local mechanical disks for file reading However, local file access performance is low.

SUMMARY OF THE INVENTION

The present application provides a method, device, device, and storage medium for optimizing access to massive small files, which solve the problem of low local file access performance in the current method for optimizing access to massive small files.

In order to achieve the above object, a first aspect of the present application provides a method for optimizing access to a large number of small files, including: according to a preset dynamic file division rule, dividing the to-be-stored file and storing it in a corresponding storage pool, the storage The pool includes a local disk storage pool and a high-speed disk storage pool; an independent redundant disk array structure is set for the high-speed disk storage pool based on the disk array technology; the read request of the file is obtained, and according to the read request, the Execute the corresponding file read operation in the storage pool, and generate a file read operation record; based on the file read operation record, perform file filtering on the files in the local disk storage pool, and transmit the filtered files. In the high-speed disk storage pool, when a read request of the filtered file is received, the file is redirected to the filtered file for reading.

A second aspect of the present application provides a device for optimizing access to a large number of small files, including: a file storage module, configured to divide the files to be stored and store them in a corresponding storage pool according to a preset file dynamic division rule, and the The storage pool includes a local disk storage pool and a high-speed disk storage pool; a disk structure optimization module is used to set an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology; a file reading module is used to obtain the file data. read request, and perform corresponding file read operation in the storage pool according to the read request, and generate a file read operation record; a file access optimization module, used for reading based on the file The operation record is to perform file filtering on the files in the local disk storage pool, transfer the filtered files to the high-speed disk storage pool, and redirect to the Filter the resulting file for reading.

A third aspect of the present application provides a device for optimizing access to massive small files, including: a memory and at least one processor, where an instruction is stored in the memory; the at least one processor invokes the instruction in the memory, In order to make the massive small file access optimization device execute the steps of the massive small file access optimization method as described below: according to the preset file dynamic division rules, the files to be stored are divided and stored in the corresponding storage pool, so The storage pool includes a local disk storage pool and a high-speed disk storage pool; an independent redundant disk array structure is set for the high-speed disk storage pool based on the disk array technology; the read request of the file is obtained, and according to the read request, The corresponding file read operation is performed in the storage pool, and a file read operation record is generated; based on the file read operation record, file filtering is performed on the files in the local disk storage pool, and the filtered files are The file is transferred to the high-speed disk storage pool, and when a read request of the filtered file is received, it is redirected to the filtered file for reading.

A fourth aspect of the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a computer, the computer executes the following method for optimizing access to a large number of small files. Steps: according to the preset file dynamic division rules, the files to be stored are divided and stored in the corresponding storage pool, the storage pool includes a local disk storage pool and a high-speed disk storage pool; based on the disk array technology, the high-speed disk storage The pool sets an independent redundant disk array structure; obtains the read request of the file, and executes the corresponding file read operation in the storage pool according to the read request, and generates the read operation record of the file; The read operation record of the file, file filtering is performed on the file in the local disk storage pool, the file obtained by filtering is transmitted to the high-speed disk storage pool, when receiving the read request of the file obtained by the filtering, Redirect to the filtered file for reading.

In the technical solution provided by the present application, the files to be stored are divided into the local disk storage pool and the high-speed disk storage pool according to the preset dynamic division rules, and then the disk array structure is set through the high-speed disk storage pool, thereby improving the high-speed disk storage pool. performance, then read the files in the local disk storage pool and generate the corresponding file read records, and finally dynamically monitor the file read records, and migrate the files whose read times are greater than the preset threshold to the high-speed disk storage pool. And generate the storage location record of the corresponding file in the memory. When receiving the read request of these migrated files, it is directly redirected to the file for reading according to the storage location record in the memory, and the high-speed disk storage pool assists the local disk storage pool for reading. High-frequency access to file storage and reading reduces the number of I/Os accessed by local disks, and high-speed disks have better performance and file reading speeds, thereby improving local file access performance.

Description of drawings

1 is a schematic diagram of a first embodiment of a method for optimizing access to massive small files in an embodiment of the present application;

2 is a schematic diagram of a second embodiment of a method for optimizing access to massive small files in an embodiment of the present application;

3 is a schematic diagram of a third embodiment of a method for optimizing access to massive small files in an embodiment of the present application;

FIG. 4 is a schematic diagram of an embodiment of an apparatus for optimizing access to a large number of small files in an embodiment of the present application;

5 is a schematic diagram of another embodiment of the apparatus for optimizing access to massive small files in an embodiment of the present application;

FIG. 6 is a schematic diagram of an embodiment of a device for optimizing access to massive small files in an embodiment of the present application.

Detailed ways

In order to make those skilled in the art better understand the solutions of the present application, the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

The terms "first", "second", "third", "fourth", etc. (if any) in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It is to be understood that data so used may be interchanged under appropriate circumstances so that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" or "having" and any variations thereof are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those expressly listed steps or units, but may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.

Referring to FIG. 1, a flowchart of a method for optimizing access to massive small files provided by an embodiment of the present application specifically includes:

101. According to a preset dynamic file division rule, divide the files to be stored and store them in a corresponding storage pool, where the storage pool includes a local disk storage pool and a high-speed disk storage pool;

The local disk storage pool is a common mechanical hard disk, and the high-speed disk storage pool can be an SSD (Solid State Drive) or other flash disks. "File dynamic division rule" is a rule that changes according to needs, and the classification standard is the expected access frequency of the file (Expected access frequency). , the first category is the file with "expected access frequency is high", and the second category is the file with "expected access frequency is low". Among them, the "expected access frequency" is calculated according to the actual number of file accesses in the recent period, and is backed up by text. For example, it is specified that the access frequency of the file in the last 5 minutes is used as the determination method of the expected access frequency parameter. The access frequency is compared with a set threshold to determine whether the desired access frequency is high or low.

For the first type of "expected access frequency" files, store such files in the high-speed disk storage pool, because the high access frequency of such files will bring greater pressure to the disk, and the high-speed disk storage pool ( SSD) has better performance than local disk storage pools (mechanical hard disks), and can identify binary bits by charging and discharging. For the second type of files that are "expected to be accessed less frequently", such files are stored in the local disk storage pool, and the performance requirements of the disk are also reduced at the same time.

It should be noted that in the actual operation and maintenance, a mechanical disk will not be equipped with a high-speed disk at the same time. Adhering to the principle of no waste, the program tries to configure on-demand, and divides logical volumes by high-speed disks. For example, a disk in the early stage can use a 30G solid-state hard disk to dynamically expand according to actual needs.

102. Set up an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology;

RAID is Redundant Arrays of Inexpensive Disks. The principle is to use the array method to make a disk group, and to cooperate with the design of scattered arrangement of data to improve the security of data. A disk array is composed of multiple disks combined into a large-capacity disk group, which utilizes the additional effects of individual disks to provide data to improve the performance of the entire disk system. Using this technology, the data is cut into many segments and stored on each hard disk. Disk arrays can also use the concept of parity check. When any hard disk in the array fails, the data can still be read. When the data is reconstructed, the data is re-calculated and placed in the new hard disk.

For example, after the high-speed disk is found to be broken during the actual operation and maintenance process, the cache of all disks will be invalid, because a high-speed disk is shared on a machine for cache, which will cause the entire business system to be slow and affect the requests of online users. For this kind of single point of failure, we can avoid this situation by doing raid1 for the high-speed disk. If one of the high-speed disks is broken, another one can provide services, take alarm measures, and replace the broken high-speed disk in time. The most widely used way to do Raid at present is to use a hard Raid card, first connect the hard disk to the Raid controller, then the Raid controller is connected to the system PCIE bus, and finally change the hard disk mode in the system settings.

103. Obtain the read request of the file, and perform the corresponding file read operation in the storage pool according to the read request, and generate a file read operation record;

This embodiment does not limit the method of reading files from the disk. The method of reading files from the disk depends on the storage method of the files. It is the offset (offset) and the length (length), which can be read according to these two values; it can also be stored based on a file system, such as XFS, at this time, it can be read directly according to the file ID as the file name. The details are done by the file system here. The specific disk is determined by the upper-layer routing. The program (osd) corresponding to the storage pool records each file read operation, including the file name, read time, etc.

In this embodiment, the read request includes the virtual disk partition where the file to be read is located and the index of the file to be read, secondly, the corresponding physical disk partition is determined according to the virtual disk partition, and then according to the index of the file to be read, the The corresponding file is found in the partition in the corresponding physical disk, and the read operation is performed.

For example, the read request is "A disk/11501", and the real physical disk corresponding to A disk is the logical volume G in the SSD, then the corresponding file is found in the logical volume G according to the index "11501", such as the rules defined by the index number. is "file group identification number + file serial number", then it can be determined that the file group identification number of the file to be read is "1", and the file serial number of the file to be read is "1501".

In this embodiment, step 103 specifically further includes the following steps:

Obtain a file read request, wherein the file read request includes the target virtual disk partition where the file to be read is located and the virtual logical address of the file to be read;

Determine the target physical disk partition corresponding to the target virtual disk partition according to the mapping relationship between the virtual disk partition and the physical disk partition;

According to the virtual logical address of the file to be read and the file read request, a corresponding file read operation is performed on the target physical disk partition, and a file read operation record is generated.

When the file on the virtual disk partition needs to be read, a file read request can be initiated to the terminal, and the file read request includes the target virtual disk partition where the file to be read is located, that is, a certain part of the virtual disk partition. For example, the virtual disk partition displayed on terminal A includes partition 1, partition 2, and partition 3. When a file read operation is performed on partition 3, partition 3 is the target disk partition.

The file read request can also include the virtual logical address of the file to be read, that is, the logical address on the target virtual disk partition, and the index, start position and read length of the file to be read can be specified through the virtual logical address. etc., so that it can correspond to the physical disk partition to perform the corresponding read operation on the physical disk partition.

The mapping relationship between virtual disk partitions and physical disk partitions is pre-stored in the system. For example, physical disk partitions include partition a, partition b, and partition c, virtual disk partitions include partition 1, partition 2, and partition 3, and partition a and partition 2 Correspondingly, partition b corresponds to partition 1, and partition c corresponds to partition 3. Of course, in the mobile terminal system, a database may also be created for the above-mentioned mapping relationship, and the above-mentioned mapping relationship is stored in the form of a database. After acquiring the target virtual disk partition where the file to be read is located, the target physical disk partition corresponding to the target virtual disk partition can be determined according to the mapping relationship between the virtual disk partition and the physical disk partition.

The corresponding relationship between the virtual logical address and the physical logical address is also stored in the system. After obtaining the virtual logical address of the file to be read according to the file read request (that is, the address of the file to be read on the target virtual disk partition) , the physical logical address can be obtained by matching the virtual logical address, that is, the physical logical address on the target physical disk partition, and then the physical logical address of the target file on the target physical disk partition corresponding to the file to be read can be determined according to the physical logical address. Address, that is, the starting position of the target file. A file read request can include a read operation to the target file, such as reading data of a specified length in the target file, or writing data of a specified length from a certain byte of the target file, and so on.

Optionally, after step 103, it further includes:

Get the access frequency of each file in the high-speed disk storage pool;

Determine whether the access frequency of each file is less than the threshold;

If the access frequency is less than the preset threshold, the files corresponding to the access frequency less than the threshold are deleted.

Considering the limited space of high-speed disks, the disk space needs to be cleaned up regularly. In this embodiment, the access frequency is recorded in the memory of each corresponding program (osd), for example, the access frequency of the file per minute is counted. If the frequency exceeds a certain frequency threshold, migration to the high-speed disk occurs. A certain frequency threshold can mark the contents of the disk as invalid and trigger the corresponding disk content deletion operation. One advantage of this is that the application effect can be achieved without the cost of scanning the log. Even if the corresponding program is restarted, the high-speed disk can be deleted. The corresponding caches are all marked as invalid, triggering the process of recycling, in which the frequency threshold can be adaptively changed according to requirements.

For example, the access frequency of file a and file b in the high-speed disk osd are 10 times/5 minutes and 6 times/5 minutes respectively, and the frequency threshold set under the system is 8 times/5 minutes, then the access frequency of file a exceeds the set frequency. The set frequency threshold triggers the migration mechanism of the disk content. For file b, its access frequency is lower than the set frequency threshold, and the file b is marked as invalid. delete operation.

104. Based on the read operation record of the file, file filtering is performed on the files in the local disk storage pool, and the filtered files are transferred to the high-speed disk storage pool. When a read request for the filtered files is received, redirection is performed. Read to the filtered file;

To reduce the access pressure on the local mechanical disk, transfer frequently accessed files to high-performance high-speed disks for reading as much as possible. In this embodiment, it is determined by accelerated detection whether a file is frequently accessed, and in this embodiment, it is determined by comparing the number of accesses with a threshold. In this way, the frequently accessed files in the local disk storage pool are filtered out and transferred to the high-performance high-speed disk storage pool, and the location where these files are transferred to the high-speed disk storage pool is recorded in memory. When the file is accessed and read, the location record saved in the memory is directly read, the file is found in the address in the location record and read.

Optionally, after step 104, the method further includes:

Receive file overwrite request;

According to the overwrite request, determine the file to be overwritten and the storage location record of the file to be overwritten in the memory;

Delete the storage location record of the file to be overwritten in the memory;

Delete the file to be overwritten from the high-speed disk storage pool and write a new file.

In the above steps, if the file in the local disk is a frequently accessed file, then a copy of the file is transferred to the high-speed disk, so one file may exist in the local disk or in the high-speed disk, if both exist at the same time It is necessary to pay attention to the problem of file consistency. If an overwrite request is received, the files in the local disk storage pool and the backup files in the high-speed disk storage pool need to be modified at the same time, otherwise the user may read wrong data. For the situation that a certain file needs to be overwritten, in this embodiment, the address cache record of the file is deleted in the memory, and at the same time, a deletion instruction is sent to the high-speed disk (SSD) asynchronously, and the file is deleted in the high-speed disk (SSD). Stored backup files.

In the embodiment of the present application, a high-speed disk is added as an auxiliary disk on the basis of the traditional local mechanical disk. , and the high-speed disk has better performance, the file reading speed will be faster, thus improving the local file access performance. .

Referring to FIG. 2, another flowchart of the method for optimizing access to massive small files provided by the embodiment of the present application specifically includes:

201. Acquire preset expected access frequency parameters of all files to be stored, and divide the files to be stored into high-frequency access file classes and low-frequency access file classes according to the expected access frequency parameters;

For each file to be stored, there is a corresponding expected access frequency parameter, which is calculated according to the actual number of file accesses in the recent period, and is recorded and backed up by text. The access frequency of the file is used as a way to determine the parameter of the expected access frequency. When this parameter needs to be obtained, only the corresponding data field needs to be obtained from the record file. Secondly, compare the value of the obtained expected access frequency parameter with the preset threshold, and add a corresponding identification to the corresponding file according to the comparison result, not limited to file name identification, data identification and other methods, and finally determine the file to which it belongs. category.

For example, the expected access frequency of file a evaluated in the obtained record file is 0.6, the expected access frequency of file b is 0.3, and the preset threshold is 0.5. The threshold value is 0.5 for numerical comparison. The comparison result is that the expected access frequency of file a is greater than the threshold value of 0.5, and the expected access frequency of file b is less than the threshold value of 0.5. According to this result, add a file name identifier to file a, such as adding the file name prefix A-, for the file b Add a file and add the file name identifier B-, and finally get two types of files: Type A: file A-a, Type B: file B-b.

202. Write the files in the high-frequency access file class to the storage primitives in the high-speed disk storage pool in sequence, and write the files in the low-frequency access file class to the storage primitives in the local disk storage pool in sequence;

A storage primitive is a basic unit of storage used to store data. In this embodiment, storage primitives refer to storage blocks on storage disks. After receiving the file to be stored, a storage primitive can be randomly selected for the writing operation, or an "appropriate" storage primitive can be selected for the writing operation according to a certain "selection rule". The "selection rule" here can be embodied in various specific forms. For example, according to the storage space occupancy of the storage primitives, the storage primitives with larger free space are selected for writing files. For example, according to the access busyness of the storage primitives , data receiving and processing capacity and other load information, select a storage primitive with a lighter load for writing files, etc., and realize the load balancing of writing through these "selection rules". When the load information of storage primitives is used as the determining factor for the selection of storage primitives, in order to improve the selection efficiency of storage primitives, a cache can be maintained separately, which collects the load information of each storage primitive in real time. After the load of the storage primitive changes, it will actively report its own load status, or the cache can periodically initiate a load query request, and the storage primitive will return its own load information. When a file needs to be written to the file, the cache is first queried. According to the load condition of the storage primitives stored in the internal storage, a storage primitive with a lighter load is selected as the storage primitive for writing the file according to the query result. In the actual application process, the write operation can be performed through the DataService (data management service) program. It is worth noting that: the embodiment of the present application writes the received file in a sequential manner when writing the received file into the storage primitive, so that subsequent operations can accurately obtain the serial number of the file in the file group.

It is understandable that due to the high frequency of file access in the high-frequency file class, and the high file access frequency will bring a certain read and write pressure to the disk, so choose to store all the files in the high-frequency file class in a high-performance high-speed In the disk storage pool; for the files in the low-frequency file class, their corresponding file access frequency will not be too high, and the local disk storage pool (mechanical disk) is sufficient to cope with such pressure, and will not cause the hard disk to overheat and the system to freeze. Wait for the system abnormality caused by the hard disk.

203. Determine the file group to which the file belongs and the serial number of the file in the file group according to the starting address and capacity size of the file in the storage primitive, and the file group includes at least two sequentially stored files;

When the file is written into the storage primitive, the storage process of the entire file is not completed, and the purpose of storage is to access, so it is necessary to establish a path when accessing. After the file is stored in the storage primitive, it will return the starting address of the file on the storage primitive and the size of the file. The size of the file can be obtained by making the difference between the starting address and the ending address of the file, or by direct analysis. file gets. After obtaining the starting address and capacity size of the file, the starting address can be compared with the preset starting address and capacity size of the file group, so as to determine the identification number of the file group stored in the file, and the file in the file. The sequence number within the group. The "file group" here is a general term for multiple sequentially stored files, it corresponds to a virtual storage space, its preset start address is also the start address of the first file in the file group, and its end The address is the ending address of the last file within the file group.

For example, the addresses occupied by file 1 on storage unit 1 are 1000-1500 (for the convenience of explanation, the address space is represented in decimal here), the addresses occupied by file 2 on storage unit 1 are 1501-1800, and the addresses occupied by file 3 in storage The address occupied on primitive 1 is 1801 to 2000. If the preset size of file group 1 is 1000, then the file group 1 includes three files, and the starting address of the file group is the first file (file 1) The start address is 1000, and its end address is the end address of the third file (file 3), which is 2000. In this example, it is assumed that there are file group 2 and file group 3, and their preset storage spaces are 2001-2600 and 2601-3000 respectively. If file 2 is written into storage primitive 1, by comparing its starting address and capacity with the starting address and capacity of the file group, the identification number of the file group where file 2 is located can be determined, that is, the file 2 belongs to Filegroup 1.

Similarly, the serial number of the file in the file group can be obtained, and the serial number can be directly expressed as the offset of the starting address of the file relative to the starting address of the file group. For example, the serial number of file 2 in this example is 1501. The memory primitives are written sequentially, so the sequence number increases incrementally without panic. also. The serial number of the file can also be programmed as a continuously increasing natural sequence, and the natural sequence has a corresponding relationship with the offset of the file.

204. Using the identification number of the file group and the serial number of the file as an index, establish a corresponding relationship between the index and the file name of the file.

After obtaining the identification number of the file group to which the file belongs and the serial number of the file in the group through the above steps, the "file group identification number and file serial number" can be used as an index to establish a corresponding relationship with the file name of the file. The specific index can be expressed as: "file group identification number + file serial number", "file serial number + file group identification number", and so on. For example, if the identification number of the file group to which file 2 belongs is 1, and the serial number of file 1 in the file group is 1501, an index table between "11501" and file 2 can be established. If accessing file 1, first query the index table to get the index "11501" corresponding to file 2, the first number "1" is resolved as the file group number, and the second string of numbers "1501" is resolved as file 1 in the file group The serial number inside, according to these two parameters, the file 2 can be read from the storage primitive, so as to realize the access process. After the index between "the identification number of the file group and the serial number of the file" and the file name of the file is constructed, the storage process of the file ends.

205. Set up an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology;

206. Obtain the read request of the file, and according to the read request, perform a corresponding file read operation in the storage pool, and generate a read operation record of the file;

207. Based on the read operation record of the file, perform file filtering on the files in the local disk storage pool, transfer the filtered files to the high-speed disk storage pool, and redirect the file when a read request for the filtered files is received. Read to the filtered file.

In the embodiments of the present application, the division of files to be stored and the storage method are described in detail. By dividing the files to be stored into high and low frequency access files and storing them on different disks according to the expected access frequency parameters, the classified storage of files is realized, which not only facilitates the reading of files, thereby improving the reading speed of files, and reduces the local The number of I/O reads from the disk improves local disk performance.

Referring to FIG. 3, the third flowchart of the method for optimizing access to massive small files provided by the embodiment of the present application specifically includes:

301. According to a preset dynamic file division rule, divide the files to be stored and store them in a corresponding storage pool, where the storage pool includes a local disk storage pool and a high-speed disk storage pool;

302. Set an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology;

303. Acquire a read request of the file, and perform a corresponding file read operation in the storage pool according to the read request, and generate a read operation record of the file;

304. Obtain the read times of all files in the local disk storage pool within a preset time period from the read operation record of the file;

In this example, by using a high-speed disk (SSD) as a memory to cache data, all files in the local disk storage pool are statistically maintained in the memory, and each file access record is cached.

305. If the number of reads is greater than the preset first threshold, transfer the file whose number of reads is greater than the first threshold to the high-speed disk storage pool, and generate the read file in the memory in the high-speed disk storage pool. record of storage location;

When it is detected that the number of reads of a file in a period of time is greater than the preset threshold, it is determined that the file is a frequently accessed file, the file is transferred to a high-speed disk (SSD), and the file is cached in memory at the same time. The storage address of the file in the high-speed disk (SSD) (logical volume + file group identification number + file serial number).

306. When re-reading the file whose read times are greater than the first threshold, directly redirect to the corresponding file according to the storage location record of the file whose read times are greater than the first threshold in the high-speed disk storage pool. read.

When accessing the file again, first check whether there is an address cache record of the file in the memory. If there is an address cache record, directly access the file address in the address cache record and read the backup file of the file in the high-speed disk.

In the embodiment of the present application, the redirected reading process after the high-frequency file is migrated to the high-speed disk is described in detail. Redirected reading is based on high-speed disk reading, and the file reading speed is faster without affecting the performance of the local disk.

The method for optimizing access to massive small files in the embodiment of the present application has been described above, and the following describes the device for optimizing access to massive small files in the embodiment of the present application. Referring to FIG. 4 , the device for optimizing access to massive small files in the embodiment of the present application is described An example of includes:

The file storage module 401 is configured to divide the files to be stored and store them in a corresponding storage pool according to a preset dynamic file division rule, where the storage pool includes a local disk storage pool and a high-speed disk storage pool;

A disk structure optimization module 402, configured to set an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology;

The file reading module 403 is used to obtain the reading request of the file, and according to the reading request, execute the corresponding file reading operation in the storage pool, and generate the reading operation record of the file;

The file access optimization module 404 is configured to perform file filtering on the files in the local disk storage pool based on the read operation records of the files, and transmit the filtered files to the high-speed disk storage pool. When the read request of the screened file is received, the screen is redirected to the screened file for reading.

In the embodiment of the present application, a high-speed disk is added as an auxiliary disk on the basis of the traditional local mechanical disk. , and the high-speed disk has better performance, the file reading speed will be faster, thus improving the local file access performance

Referring to FIG. 5 , another embodiment of the apparatus for optimizing access to massive small files in the embodiment of the present application includes:

Optionally, the file storage module 401 includes:

The file division unit 4011 is used to obtain preset expected access frequency parameters of all files to be stored, and according to the expected access frequency parameters, the to-be-stored files are divided into high-frequency access file classes and low-frequency access file classes;

A file writing unit 4012, configured to sequentially write the files in the high-frequency access file class into the storage primitives in the high-speed disk storage pool, and sequentially write the files in the low-frequency access file class into the local storage primitives in disk storage pools;

The index obtaining unit 4013 is used to determine the file group to which the file belongs and the serial number of the file in the file group according to the starting address and capacity size of the file in the storage primitive, and the file group includes at least two sequentially stored files;

The index association unit 4014 is configured to use the identification number of the file group and the serial number of the file as an index to establish a corresponding relationship between the index and the file name of the file.

Optionally, the file reading module 403 includes:

The request obtaining unit 4031 is used to obtain a file read request, wherein the file read request includes the target virtual disk partition where the file to be read is located and the virtual logical address of the file to be read;

A partition obtaining unit 4032, configured to determine the target physical disk partition corresponding to the target virtual disk partition according to the mapping relationship between the virtual disk partition and the physical disk partition;

The file reading unit 4033 is configured to perform a corresponding file reading operation on the target physical disk partition according to the virtual logical address of the file to be read and the file reading request, and generate a file reading operation record .

Optionally, the file access optimization module 404 includes:

The reading times obtaining unit 4041 is used to obtain the reading times of all files in the local disk storage pool within a preset time period from the reading operation records of the files;

The data transmission unit 4042 is configured to transmit the files whose read times are greater than the first threshold to the high-speed disk storage pool if the read times are greater than the preset first threshold, and generate the read times in memory. The storage location record of the fetched file in the high-speed disk storage pool;

The redirecting reading unit 4043 is configured to, when reading the files whose reading times are greater than the first threshold, store the files in the high-speed disk storage pool according to the reading times greater than the first threshold The location record is directly redirected to the corresponding file for reading.

In the embodiment of the present application, the files to be stored are divided into high and low frequency access files according to expected access frequency parameters, and then stored in different disks, and stored in different disks according to the division results to facilitate file reading. At the same time, redirected reading is based on high-speed disk reading, and the file reading speed is faster without affecting the performance of the local disk.

4 to 5 above describe in detail the apparatus for optimizing access to a large number of small files in the embodiment of the present application from the perspective of a modular functional entity, and the following describes the device for optimizing access to a large number of small files in the embodiment of the present application in detail from the perspective of hardware processing. describe.

6 is a schematic structural diagram of a device for optimizing access to a large number of small files provided by an embodiment of the present application. The device 600 for optimizing access to a large number of small files may vary greatly due to different configurations or performances, and may include one or more than one Central processing units (CPU) 610 (eg, one or more processors) and memory 620, one or more storage media 630 (eg, one or more mass storage devices) that store application programs 633 or data 632. Among them, the memory 620 and the storage medium 630 may be short-term storage or persistent storage. The program stored in the storage medium 630 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the apparatus 600 for optimizing access to massive small files. Furthermore, the processor 610 may be configured to communicate with the storage medium 630 to execute a series of instruction operations in the storage medium 630 on the medical field intent recognition device 600 .

Mass small file access optimization device 600 may also include one or more power supplies 640, one or more wired or wireless network interfaces 650, one or more input and output interfaces 660, and/or, one or more operating systems 631, For example Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art can understand that the structure of the massive small file access optimization device shown in FIG. 6 does not constitute a limitation on the massive small file access optimization device, and may include more or less components than those shown in the figure, or a combination of certain some components, or a different arrangement of components.

The present application also provides a device for optimizing access to massive small files, including: a memory and at least one processor, where instructions are stored in the memory, and the memory and the at least one processor are interconnected through a line; the at least one processor The processor invokes the instructions in the memory, so that the device for optimizing access to a large number of small files executes the steps in the above method for optimizing access to a large number of small files.

The present application also provides a computer-readable storage medium, and the computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium. The computer-readable storage medium stores computer instructions, and when the computer instructions are executed on the computer, the computer performs the following steps:

According to the preset file dynamic division rule, the files to be stored are divided and stored in the corresponding storage pool, the storage pool includes a local disk storage pool and a high-speed disk storage pool;

Set up an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology;

Obtain a read request of a file, and perform a corresponding file read operation in the storage pool according to the read request, and generate a file read operation record;

Based on the read operation records of the files, file filtering is performed on the files in the local disk storage pool, and the filtered files are transferred to the high-speed disk storage pool. When a read request for the filtered files is received , redirect to the filtered file for reading.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the system, device and unit described above may refer to the corresponding process in the foregoing method embodiments, which will not be repeated here.

The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: The technical solutions recorded in the embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the present application.

Claims

A method for optimizing access to massive small files, comprising:

According to the preset file dynamic division rule, the files to be stored are divided and stored in the corresponding storage pool, the storage pool includes a local disk storage pool and a high-speed disk storage pool;

Set up an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology;

Obtain a read request of a file, and perform a corresponding file read operation in the storage pool according to the read request, and generate a file read operation record;

Based on the read operation records of the files, file filtering is performed on the files in the local disk storage pool, and the filtered files are transferred to the high-speed disk storage pool. When a read request for the filtered files is received , redirect to the filtered file for reading.
The method for optimizing access to a large number of small files according to claim 1, wherein, according to a preset dynamic file division rule, dividing the to-be-stored file and storing it in a corresponding storage pool comprises:

Acquiring preset expected access frequency parameters of all files to be stored, and dividing the to-be-stored files into high-frequency access file classes and low-frequency access file classes according to the expected access frequency parameters;

Write files in the high-frequency access file class to storage primitives in the high-speed disk storage pool sequentially, and sequentially write files in the low-frequency access file class to storage primitives in the local disk storage pool ;

Determine the file group to which the file belongs and the serial number of the file in the file group according to the starting address and capacity size of the file in the storage primitive, and the file group contains at least two sequentially stored files;

Using the identification number of the file group and the serial number of the file as an index, a corresponding relationship between the index and the file name of the file is established.
The method for optimizing access to a large number of small files according to claim 1, wherein the obtaining a read request of a file, and according to the read request, execute a corresponding file read operation in the storage pool, and generate The file read operation records include:

Obtaining a file read request, wherein the file read request includes the target virtual disk partition where the file to be read is located and the virtual logical address of the file to be read;

According to the mapping relationship between the virtual disk partition and the physical disk partition, determine the target physical disk partition corresponding to the target virtual disk partition;

According to the virtual logical address of the file to be read and the file read request, a corresponding file read operation is performed on the target physical disk partition, and a file read operation record is generated.
The method for optimizing access to a large number of small files according to claim 3, wherein, according to the virtual logical address of the file to be read and the file read request, execute the corresponding file in the target physical disk partition After the read operation and the read operation record of the file is generated, it also includes:

Obtain the access frequency of each file in the high-speed disk storage pool;

Determine whether the access frequency of each file is less than the threshold;

If the access frequency is less than the preset threshold, the files corresponding to the access frequency less than the threshold are deleted.
The method for optimizing access to massive small files according to any one of claims 1-4, wherein the file filtering is performed on the files in the local disk storage pool based on the read operation records of the files, and the filtering The obtained file is transferred to the high-speed disk storage pool, and when receiving the read request of the file obtained by the filtering, redirecting to the file obtained by the filtering for reading includes:

From the read operation record of the file, obtain the read times of all files in the local disk storage pool within a preset time period;

If the number of reads is greater than the preset first threshold, the files whose number of reads is greater than the first threshold are transferred to the high-speed disk storage pool, and the read files are generated in memory and stored in the high-speed disk Storage location records in the pool;

When reading the file whose number of reads is greater than the first threshold again, the file is directly redirected to the corresponding file to read.
The method for optimizing access to a large number of small files according to claim 5, wherein when reading the file whose number of readings is greater than the first threshold The storage location record of the file in the high-speed disk storage pool, after being directly redirected to the corresponding file for reading, also includes:

Receive file overwrite request;

According to the overwriting request, determine the file to be overwritten and the storage location record of the file to be overwritten in the memory;

Delete the storage location record of the file to be overwritten in the memory;

The file to be overwritten is deleted from the high-speed disk storage pool and a new file is written.
A device for optimizing access to massive small files, comprising a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, and implemented when the processor executes the computer-readable instructions Follow the steps below:

According to the preset file dynamic division rule, the files to be stored are divided and stored in the corresponding storage pool, the storage pool includes a local disk storage pool and a high-speed disk storage pool;

Set up an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology;

Obtain a read request of a file, and perform a corresponding file read operation in the storage pool according to the read request, and generate a file read operation record;

Based on the read operation records of the files, file filtering is performed on the files in the local disk storage pool, and the filtered files are transferred to the high-speed disk storage pool. When a read request for the filtered files is received , redirect to the filtered file for reading.
The device for optimizing access to a large number of small files according to claim 7, wherein the dividing the files to be stored and storing them in a corresponding storage pool according to a preset dynamic file division rule comprises:

Acquiring preset expected access frequency parameters of all files to be stored, and dividing the to-be-stored files into high-frequency access file classes and low-frequency access file classes according to the expected access frequency parameters;

Write files in the high-frequency access file class to storage primitives in the high-speed disk storage pool sequentially, and sequentially write files in the low-frequency access file class to storage primitives in the local disk storage pool ;

Determine the file group to which the file belongs and the serial number of the file in the file group according to the starting address and capacity size of the file in the storage primitive, and the file group contains at least two sequentially stored files;

Using the identification number of the file group and the serial number of the file as an index, a corresponding relationship between the index and the file name of the file is established.
The device for optimizing access to massive small files according to claim 7, wherein the acquiring a read request of a file, and according to the read request, execute a corresponding file read operation in the storage pool, and generate The file read operation records include:

Obtaining a file read request, wherein the file read request includes the target virtual disk partition where the file to be read is located and the virtual logical address of the file to be read;

According to the mapping relationship between the virtual disk partition and the physical disk partition, determine the target physical disk partition corresponding to the target virtual disk partition;

According to the virtual logical address of the file to be read and the file read request, a corresponding file read operation is performed on the target physical disk partition, and a file read operation record is generated.
The device for optimizing access to a large number of small files according to claim 9, wherein, according to the virtual logical address of the to-be-read file and the file read request, execute the corresponding file on the target physical disk partition After the read operation and the read operation record of the file is generated, it also includes:

Obtain the access frequency of each file in the high-speed disk storage pool;

Determine whether the access frequency of each file is less than the threshold;

If the access frequency is less than the preset threshold, the files corresponding to the access frequency less than the threshold are deleted.
The device for optimizing access to massive small files according to any one of claims 7-10, wherein the file filtering is performed on the files in the local disk storage pool based on the read operation records of the files, and the filtering The obtained file is transferred to the high-speed disk storage pool, and when receiving the read request of the file obtained by the filtering, redirecting to the file obtained by the filtering for reading includes:

From the read operation record of the file, obtain the read times of all files in the local disk storage pool within a preset time period;

If the number of reads is greater than the preset first threshold, the files whose number of reads is greater than the first threshold are transferred to the high-speed disk storage pool, and the read files are generated in memory and stored in the high-speed disk Storage location records in the pool;

When reading the file whose number of reads is greater than the first threshold again, the file is directly redirected to the corresponding file to read.
The device for optimizing access to a large number of small files according to claim 11, wherein when the file whose number of readings is greater than the first threshold is read again, according to the number of readings greater than the first threshold The storage location record of the file in the high-speed disk storage pool, after being directly redirected to the corresponding file for reading, also includes:

Receive file overwrite request;

According to the overwriting request, determine the file to be overwritten and the storage location record of the file to be overwritten in the memory;

Delete the storage location record of the file to be overwritten in the memory;

The file to be overwritten is deleted from the high-speed disk storage pool and a new file is written.
A computer-readable storage medium, storing computer instructions in the computer-readable storage medium, when the computer instructions are executed on a computer, the computer is made to perform the following steps:

According to the preset file dynamic division rule, the files to be stored are divided and stored in the corresponding storage pool, the storage pool includes a local disk storage pool and a high-speed disk storage pool;

Set up an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology;

Obtain a read request of a file, and perform a corresponding file read operation in the storage pool according to the read request, and generate a file read operation record;

Based on the read operation records of the files, file filtering is performed on the files in the local disk storage pool, and the filtered files are transferred to the high-speed disk storage pool. When a read request for the filtered files is received , redirect to the filtered file for reading.
The computer-readable storage medium according to claim 13 , wherein, according to a preset dynamic file division rule, dividing the to-be-stored file and storing it in a corresponding storage pool comprises:

Acquiring preset expected access frequency parameters of all files to be stored, and dividing the to-be-stored files into high-frequency access file classes and low-frequency access file classes according to the expected access frequency parameters;

Write files in the high-frequency access file class to storage primitives in the high-speed disk storage pool sequentially, and sequentially write files in the low-frequency access file class to storage primitives in the local disk storage pool ;

Determine the file group to which the file belongs and the serial number of the file in the file group according to the starting address and capacity size of the file in the storage primitive, and the file group contains at least two sequentially stored files;

Using the identification number of the file group and the serial number of the file as an index, a corresponding relationship between the index and the file name of the file is established.
The computer-readable storage medium according to claim 13, wherein the obtaining a read request of a file, and according to the read request, perform a corresponding file read operation in the storage pool, and generate a file's read request. The read operation records include:

Obtaining a file read request, wherein the file read request includes the target virtual disk partition where the file to be read is located and the virtual logical address of the file to be read;

According to the mapping relationship between the virtual disk partition and the physical disk partition, determine the target physical disk partition corresponding to the target virtual disk partition;

According to the virtual logical address of the file to be read and the file read request, a corresponding file read operation is performed on the target physical disk partition, and a file read operation record is generated.
The computer-readable storage medium according to claim 15, wherein, according to the virtual logical address of the file to be read and the file read request, a corresponding file read is performed on the target physical disk partition After the operation, and the read operation record of the file is generated, it also includes:

Obtain the access frequency of each file in the high-speed disk storage pool;

Determine whether the access frequency of each file is less than the threshold;

If the access frequency is less than the preset threshold, the files corresponding to the access frequency less than the threshold are deleted.
The computer-readable storage medium according to any one of claims 13-16, wherein the file filtering is performed on the files in the local disk storage pool based on the read operation record of the file, and the filtered The file is transferred to the high-speed disk storage pool, and when a read request of the filtered file is received, redirecting to the filtered file for reading includes:

From the read operation record of the file, obtain the read times of all files in the local disk storage pool within a preset time period;

If the number of reads is greater than the preset first threshold, the files whose number of reads is greater than the first threshold are transferred to the high-speed disk storage pool, and the read files are generated in memory and stored in the high-speed disk Storage location records in the pool;

When reading the file whose number of reads is greater than the first threshold again, the file is directly redirected to the corresponding file to read.
The computer-readable storage medium according to claim 17, wherein, when the file whose number of readings is greater than the first threshold is read again, according to the file whose number of readings is greater than the first threshold The storage location record in the high-speed disk storage pool, after being directly redirected to the corresponding file for reading, also includes:

Receive file overwrite request;

According to the overwriting request, determine the file to be overwritten and the storage location record of the file to be overwritten in the memory;

Delete the storage location record of the file to be overwritten in the memory;

The file to be overwritten is deleted from the high-speed disk storage pool and a new file is written.
An apparatus for optimizing access to a large number of small files, wherein the apparatus for optimizing access to a large number of small files includes:

a file storage module, configured to divide the files to be stored and store them in a corresponding storage pool according to a preset dynamic file division rule, where the storage pool includes a local disk storage pool and a high-speed disk storage pool;

A disk structure optimization module, used for setting an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology;

a file reading module, configured to obtain a reading request of a file, and perform a corresponding file reading operation in the storage pool according to the reading request, and generate a reading operation record of the file;

The file access optimization module is used to filter the files in the local disk storage pool based on the read operation records of the files, and transmit the filtered files to the high-speed disk storage pool. When the read request of the file obtained by filtering is redirected to the file obtained by filtering for reading.
The apparatus for optimizing access to massive small files according to claim 19, wherein the file storage module specifically comprises:

a file dividing unit, configured to obtain preset expected access frequency parameters of all files to be stored, and divide the to-be-stored files into high-frequency access file classes and low-frequency access file classes according to the expected access frequency parameters;

A file writing unit, configured to sequentially write the files in the high-frequency access file class to the storage primitives in the high-speed disk storage pool, and sequentially write the files in the low-frequency access file class to the local disk storage primitives in the storage pool;

The index obtaining unit is used to determine the file group to which the file belongs and the serial number of the file in the file group according to the starting address and capacity size of the file in the storage primitive, and the file group includes at least two files stored sequentially;

The index association unit is configured to use the identification number of the file group and the serial number of the file as an index to establish a corresponding relationship between the index and the file name of the file.