WO2022228458A1 - Procédé, appareil et dispositif d'optimisation d'accès pour une grande quantité de petits fichiers, et support de stockage - Google Patents

Procédé, appareil et dispositif d'optimisation d'accès pour une grande quantité de petits fichiers, et support de stockage Download PDF

Info

Publication number
WO2022228458A1
WO2022228458A1 PCT/CN2022/089529 CN2022089529W WO2022228458A1 WO 2022228458 A1 WO2022228458 A1 WO 2022228458A1 CN 2022089529 W CN2022089529 W CN 2022089529W WO 2022228458 A1 WO2022228458 A1 WO 2022228458A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
files
storage pool
read
access
Prior art date
Application number
PCT/CN2022/089529
Other languages
English (en)
Chinese (zh)
Inventor
郑平
Original Assignee
康键信息技术(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 康键信息技术(深圳)有限公司 filed Critical 康键信息技术(深圳)有限公司
Publication of WO2022228458A1 publication Critical patent/WO2022228458A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device

Definitions

  • the present application relates to the field of scaffolding operation and maintenance, and in particular, to a method, apparatus, device and storage medium for optimizing access to massive small files.
  • the present application provides a method, device, device, and storage medium for optimizing access to massive small files, which solve the problem of low local file access performance in the current method for optimizing access to massive small files.
  • a first aspect of the present application provides a method for optimizing access to a large number of small files, including: according to a preset dynamic file division rule, dividing the to-be-stored file and storing it in a corresponding storage pool, the storage
  • the pool includes a local disk storage pool and a high-speed disk storage pool; an independent redundant disk array structure is set for the high-speed disk storage pool based on the disk array technology; the read request of the file is obtained, and according to the read request, the Execute the corresponding file read operation in the storage pool, and generate a file read operation record; based on the file read operation record, perform file filtering on the files in the local disk storage pool, and transmit the filtered files.
  • the high-speed disk storage pool when a read request of the filtered file is received, the file is redirected to the filtered file for reading.
  • a second aspect of the present application provides a device for optimizing access to a large number of small files, including: a file storage module, configured to divide the files to be stored and store them in a corresponding storage pool according to a preset file dynamic division rule, and the The storage pool includes a local disk storage pool and a high-speed disk storage pool; a disk structure optimization module is used to set an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology; a file reading module is used to obtain the file data.
  • the operation record is to perform file filtering on the files in the local disk storage pool, transfer the filtered files to the high-speed disk storage pool, and redirect to the Filter the resulting file for reading.
  • a third aspect of the present application provides a device for optimizing access to massive small files, including: a memory and at least one processor, where an instruction is stored in the memory; the at least one processor invokes the instruction in the memory,
  • the massive small file access optimization device executes the steps of the massive small file access optimization method as described below: according to the preset file dynamic division rules, the files to be stored are divided and stored in the corresponding storage pool, so The storage pool includes a local disk storage pool and a high-speed disk storage pool; an independent redundant disk array structure is set for the high-speed disk storage pool based on the disk array technology; the read request of the file is obtained, and according to the read request, The corresponding file read operation is performed in the storage pool, and a file read operation record is generated; based on the file read operation record, file filtering is performed on the files in the local disk storage pool, and the filtered files are The file is transferred to the high-speed disk storage pool, and when a read request of the filtered file is received, it is redirected to the filtered
  • a fourth aspect of the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a computer, the computer executes the following method for optimizing access to a large number of small files. Steps: according to the preset file dynamic division rules, the files to be stored are divided and stored in the corresponding storage pool, the storage pool includes a local disk storage pool and a high-speed disk storage pool; based on the disk array technology, the high-speed disk storage The pool sets an independent redundant disk array structure; obtains the read request of the file, and executes the corresponding file read operation in the storage pool according to the read request, and generates the read operation record of the file; The read operation record of the file, file filtering is performed on the file in the local disk storage pool, the file obtained by filtering is transmitted to the high-speed disk storage pool, when receiving the read request of the file obtained by the filtering, Redirect to the filtered file for reading.
  • the files to be stored are divided into the local disk storage pool and the high-speed disk storage pool according to the preset dynamic division rules, and then the disk array structure is set through the high-speed disk storage pool, thereby improving the high-speed disk storage pool. performance, then read the files in the local disk storage pool and generate the corresponding file read records, and finally dynamically monitor the file read records, and migrate the files whose read times are greater than the preset threshold to the high-speed disk storage pool. And generate the storage location record of the corresponding file in the memory. When receiving the read request of these migrated files, it is directly redirected to the file for reading according to the storage location record in the memory, and the high-speed disk storage pool assists the local disk storage pool for reading. High-frequency access to file storage and reading reduces the number of I/Os accessed by local disks, and high-speed disks have better performance and file reading speeds, thereby improving local file access performance.
  • FIG. 1 is a schematic diagram of a first embodiment of a method for optimizing access to massive small files in an embodiment of the present application
  • FIG. 2 is a schematic diagram of a second embodiment of a method for optimizing access to massive small files in an embodiment of the present application
  • FIG. 3 is a schematic diagram of a third embodiment of a method for optimizing access to massive small files in an embodiment of the present application
  • FIG. 4 is a schematic diagram of an embodiment of an apparatus for optimizing access to a large number of small files in an embodiment of the present application
  • FIG. 5 is a schematic diagram of another embodiment of the apparatus for optimizing access to massive small files in an embodiment of the present application
  • FIG. 6 is a schematic diagram of an embodiment of a device for optimizing access to massive small files in an embodiment of the present application.
  • the present application provides a method, device, device, and storage medium for optimizing access to massive small files, which solve the problem of low local file access performance in the current method for optimizing access to massive small files.
  • a flowchart of a method for optimizing access to massive small files provided by an embodiment of the present application specifically includes:
  • a preset dynamic file division rule divide the files to be stored and store them in a corresponding storage pool, where the storage pool includes a local disk storage pool and a high-speed disk storage pool;
  • the local disk storage pool is a common mechanical hard disk, and the high-speed disk storage pool can be an SSD (Solid State Drive) or other flash disks.
  • “File dynamic division rule” is a rule that changes according to needs, and the classification standard is the expected access frequency of the file (Expected access frequency). , the first category is the file with "expected access frequency is high”, and the second category is the file with "expected access frequency is low”. Among them, the "expected access frequency” is calculated according to the actual number of file accesses in the recent period, and is backed up by text. For example, it is specified that the access frequency of the file in the last 5 minutes is used as the determination method of the expected access frequency parameter. The access frequency is compared with a set threshold to determine whether the desired access frequency is high or low.
  • a mechanical disk will not be equipped with a high-speed disk at the same time.
  • the program Adhering to the principle of no waste, the program tries to configure on-demand, and divides logical volumes by high-speed disks. For example, a disk in the early stage can use a 30G solid-state hard disk to dynamically expand according to actual needs.
  • RAID Redundant Arrays of Inexpensive Disks.
  • the principle is to use the array method to make a disk group, and to cooperate with the design of scattered arrangement of data to improve the security of data.
  • a disk array is composed of multiple disks combined into a large-capacity disk group, which utilizes the additional effects of individual disks to provide data to improve the performance of the entire disk system.
  • Disk arrays can also use the concept of parity check. When any hard disk in the array fails, the data can still be read. When the data is reconstructed, the data is re-calculated and placed in the new hard disk.
  • the cache of all disks will be invalid, because a high-speed disk is shared on a machine for cache, which will cause the entire business system to be slow and affect the requests of online users. For this kind of single point of failure, we can avoid this situation by doing raid1 for the high-speed disk. If one of the high-speed disks is broken, another one can provide services, take alarm measures, and replace the broken high-speed disk in time.
  • the most widely used way to do Raid at present is to use a hard Raid card, first connect the hard disk to the Raid controller, then the Raid controller is connected to the system PCIE bus, and finally change the hard disk mode in the system settings.
  • This embodiment does not limit the method of reading files from the disk.
  • the method of reading files from the disk depends on the storage method of the files. It is the offset (offset) and the length (length), which can be read according to these two values; it can also be stored based on a file system, such as XFS, at this time, it can be read directly according to the file ID as the file name. The details are done by the file system here.
  • the specific disk is determined by the upper-layer routing.
  • the program (osd) corresponding to the storage pool records each file read operation, including the file name, read time, etc.
  • the read request includes the virtual disk partition where the file to be read is located and the index of the file to be read, secondly, the corresponding physical disk partition is determined according to the virtual disk partition, and then according to the index of the file to be read, the The corresponding file is found in the partition in the corresponding physical disk, and the read operation is performed.
  • the read request is "A disk/11501”
  • the real physical disk corresponding to A disk is the logical volume G in the SSD
  • the corresponding file is found in the logical volume G according to the index "11501", such as the rules defined by the index number.
  • index "11501” such as the rules defined by the index number.
  • file group identification number + file serial number such as the rules defined by the index number.
  • step 103 specifically further includes the following steps:
  • the file read request includes the target virtual disk partition where the file to be read is located and the virtual logical address of the file to be read;
  • a corresponding file read operation is performed on the target physical disk partition, and a file read operation record is generated.
  • a file read request can be initiated to the terminal, and the file read request includes the target virtual disk partition where the file to be read is located, that is, a certain part of the virtual disk partition.
  • the virtual disk partition displayed on terminal A includes partition 1, partition 2, and partition 3.
  • partition 3 is the target disk partition.
  • the file read request can also include the virtual logical address of the file to be read, that is, the logical address on the target virtual disk partition, and the index, start position and read length of the file to be read can be specified through the virtual logical address. etc., so that it can correspond to the physical disk partition to perform the corresponding read operation on the physical disk partition.
  • mapping relationship between virtual disk partitions and physical disk partitions is pre-stored in the system.
  • physical disk partitions include partition a, partition b, and partition c
  • virtual disk partitions include partition 1, partition 2, and partition 3, and partition a and partition 2
  • partition b corresponds to partition 1
  • partition c corresponds to partition 3.
  • a database may also be created for the above-mentioned mapping relationship, and the above-mentioned mapping relationship is stored in the form of a database.
  • the corresponding relationship between the virtual logical address and the physical logical address is also stored in the system.
  • the physical logical address can be obtained by matching the virtual logical address, that is, the physical logical address on the target physical disk partition, and then the physical logical address of the target file on the target physical disk partition corresponding to the file to be read can be determined according to the physical logical address. Address, that is, the starting position of the target file.
  • a file read request can include a read operation to the target file, such as reading data of a specified length in the target file, or writing data of a specified length from a certain byte of the target file, and so on.
  • step 103 it further includes:
  • the files corresponding to the access frequency less than the threshold are deleted.
  • the access frequency is recorded in the memory of each corresponding program (osd), for example, the access frequency of the file per minute is counted. If the frequency exceeds a certain frequency threshold, migration to the high-speed disk occurs. A certain frequency threshold can mark the contents of the disk as invalid and trigger the corresponding disk content deletion operation.
  • a certain frequency threshold can mark the contents of the disk as invalid and trigger the corresponding disk content deletion operation.
  • the access frequency of file a and file b in the high-speed disk osd are 10 times/5 minutes and 6 times/5 minutes respectively, and the frequency threshold set under the system is 8 times/5 minutes, then the access frequency of file a exceeds the set frequency.
  • the set frequency threshold triggers the migration mechanism of the disk content.
  • file b its access frequency is lower than the set frequency threshold, and the file b is marked as invalid. delete operation.
  • file filtering is performed on the files in the local disk storage pool, and the filtered files are transferred to the high-speed disk storage pool.
  • redirection is performed. Read to the filtered file;
  • the method further includes:
  • the overwrite request determine the file to be overwritten and the storage location record of the file to be overwritten in the memory
  • the file in the local disk is a frequently accessed file
  • a copy of the file is transferred to the high-speed disk, so one file may exist in the local disk or in the high-speed disk, if both exist at the same time It is necessary to pay attention to the problem of file consistency. If an overwrite request is received, the files in the local disk storage pool and the backup files in the high-speed disk storage pool need to be modified at the same time, otherwise the user may read wrong data.
  • the address cache record of the file is deleted in the memory, and at the same time, a deletion instruction is sent to the high-speed disk (SSD) asynchronously, and the file is deleted in the high-speed disk (SSD). Stored backup files.
  • a high-speed disk is added as an auxiliary disk on the basis of the traditional local mechanical disk.
  • the high-speed disk has better performance, the file reading speed will be faster, thus improving the local file access performance.
  • FIG. 2 another flowchart of the method for optimizing access to massive small files provided by the embodiment of the present application specifically includes:
  • a corresponding expected access frequency parameter For each file to be stored, there is a corresponding expected access frequency parameter, which is calculated according to the actual number of file accesses in the recent period, and is recorded and backed up by text.
  • the access frequency of the file is used as a way to determine the parameter of the expected access frequency.
  • this parameter needs to be obtained, only the corresponding data field needs to be obtained from the record file.
  • compare the value of the obtained expected access frequency parameter with the preset threshold and add a corresponding identification to the corresponding file according to the comparison result, not limited to file name identification, data identification and other methods, and finally determine the file to which it belongs. category.
  • the expected access frequency of file a evaluated in the obtained record file is 0.6
  • the expected access frequency of file b is 0.3
  • the preset threshold is 0.5
  • the threshold value is 0.5 for numerical comparison.
  • the comparison result is that the expected access frequency of file a is greater than the threshold value of 0.5, and the expected access frequency of file b is less than the threshold value of 0.5.
  • add a file name identifier to file a such as adding the file name prefix A-, for the file b
  • Add a file and add the file name identifier B- and finally get two types of files: Type A: file A-a
  • Type B file B-b.
  • a storage primitive is a basic unit of storage used to store data.
  • storage primitives refer to storage blocks on storage disks.
  • a storage primitive can be randomly selected for the writing operation, or an "appropriate" storage primitive can be selected for the writing operation according to a certain "selection rule".
  • the "selection rule" here can be embodied in various specific forms. For example, according to the storage space occupancy of the storage primitives, the storage primitives with larger free space are selected for writing files. For example, according to the access busyness of the storage primitives , data receiving and processing capacity and other load information, select a storage primitive with a lighter load for writing files, etc., and realize the load balancing of writing through these "selection rules".
  • a cache can be maintained separately, which collects the load information of each storage primitive in real time. After the load of the storage primitive changes, it will actively report its own load status, or the cache can periodically initiate a load query request, and the storage primitive will return its own load information.
  • the cache is first queried. According to the load condition of the storage primitives stored in the internal storage, a storage primitive with a lighter load is selected as the storage primitive for writing the file according to the query result.
  • the write operation can be performed through the DataService (data management service) program. It is worth noting that: the embodiment of the present application writes the received file in a sequential manner when writing the received file into the storage primitive, so that subsequent operations can accurately obtain the serial number of the file in the file group.
  • the storage process of the entire file is not completed, and the purpose of storage is to access, so it is necessary to establish a path when accessing.
  • the file After the file is stored in the storage primitive, it will return the starting address of the file on the storage primitive and the size of the file.
  • the size of the file can be obtained by making the difference between the starting address and the ending address of the file, or by direct analysis. file gets. After obtaining the starting address and capacity size of the file, the starting address can be compared with the preset starting address and capacity size of the file group, so as to determine the identification number of the file group stored in the file, and the file in the file. The sequence number within the group.
  • file group here is a general term for multiple sequentially stored files, it corresponds to a virtual storage space, its preset start address is also the start address of the first file in the file group, and its end The address is the ending address of the last file within the file group.
  • the addresses occupied by file 1 on storage unit 1 are 1000-1500 (for the convenience of explanation, the address space is represented in decimal here), the addresses occupied by file 2 on storage unit 1 are 1501-1800, and the addresses occupied by file 3 in storage
  • the address occupied on primitive 1 is 1801 to 2000.
  • the file group 1 includes three files, and the starting address of the file group is the first file (file 1)
  • the start address is 1000
  • its end address is the end address of the third file (file 3), which is 2000.
  • file group 2 and file group 3 it is assumed that there are file group 2 and file group 3, and their preset storage spaces are 2001-2600 and 2601-3000 respectively. If file 2 is written into storage primitive 1, by comparing its starting address and capacity with the starting address and capacity of the file group, the identification number of the file group where file 2 is located can be determined, that is, the file 2 belongs to Filegroup 1.
  • the serial number of the file in the file group can be obtained, and the serial number can be directly expressed as the offset of the starting address of the file relative to the starting address of the file group.
  • the serial number of file 2 in this example is 1501.
  • the memory primitives are written sequentially, so the sequence number increases incrementally without panic.
  • the serial number of the file can also be programmed as a continuously increasing natural sequence, and the natural sequence has a corresponding relationship with the offset of the file.
  • the "file group identification number and file serial number” can be used as an index to establish a corresponding relationship with the file name of the file.
  • the specific index can be expressed as: "file group identification number + file serial number”, "file serial number + file group identification number”, and so on. For example, if the identification number of the file group to which file 2 belongs is 1, and the serial number of file 1 in the file group is 1501, an index table between "11501" and file 2 can be established.
  • the file 2 can be read from the storage primitive, so as to realize the access process. After the index between "the identification number of the file group and the serial number of the file” and the file name of the file is constructed, the storage process of the file ends.
  • the division of files to be stored and the storage method are described in detail.
  • the classified storage of files is realized, which not only facilitates the reading of files, thereby improving the reading speed of files, and reduces the local The number of I/O reads from the disk improves local disk performance.
  • the third flowchart of the method for optimizing access to massive small files specifically includes:
  • a preset dynamic file division rule divide the files to be stored and store them in a corresponding storage pool, where the storage pool includes a local disk storage pool and a high-speed disk storage pool;
  • the file When it is detected that the number of reads of a file in a period of time is greater than the preset threshold, it is determined that the file is a frequently accessed file, the file is transferred to a high-speed disk (SSD), and the file is cached in memory at the same time.
  • the storage address of the file in the high-speed disk (SSD) (logical volume + file group identification number + file serial number).
  • Redirected reading is based on high-speed disk reading, and the file reading speed is faster without affecting the performance of the local disk.
  • the file storage module 401 is configured to divide the files to be stored and store them in a corresponding storage pool according to a preset dynamic file division rule, where the storage pool includes a local disk storage pool and a high-speed disk storage pool;
  • a disk structure optimization module 402 configured to set an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology
  • the file reading module 403 is used to obtain the reading request of the file, and according to the reading request, execute the corresponding file reading operation in the storage pool, and generate the reading operation record of the file;
  • the file access optimization module 404 is configured to perform file filtering on the files in the local disk storage pool based on the read operation records of the files, and transmit the filtered files to the high-speed disk storage pool.
  • the screen is redirected to the screened file for reading.
  • a high-speed disk is added as an auxiliary disk on the basis of the traditional local mechanical disk.
  • the high-speed disk has better performance, the file reading speed will be faster, thus improving the local file access performance
  • another embodiment of the apparatus for optimizing access to massive small files in the embodiment of the present application includes:
  • the file storage module 401 is configured to divide the files to be stored and store them in a corresponding storage pool according to a preset dynamic file division rule, where the storage pool includes a local disk storage pool and a high-speed disk storage pool;
  • a disk structure optimization module 402 configured to set an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology
  • the file reading module 403 is used to obtain the reading request of the file, and according to the reading request, execute the corresponding file reading operation in the storage pool, and generate the reading operation record of the file;
  • the file access optimization module 404 is configured to perform file filtering on the files in the local disk storage pool based on the read operation records of the files, and transmit the filtered files to the high-speed disk storage pool.
  • the screen is redirected to the screened file for reading.
  • the file storage module 401 includes:
  • the file division unit 4011 is used to obtain preset expected access frequency parameters of all files to be stored, and according to the expected access frequency parameters, the to-be-stored files are divided into high-frequency access file classes and low-frequency access file classes;
  • a file writing unit 4012 configured to sequentially write the files in the high-frequency access file class into the storage primitives in the high-speed disk storage pool, and sequentially write the files in the low-frequency access file class into the local storage primitives in disk storage pools;
  • the index obtaining unit 4013 is used to determine the file group to which the file belongs and the serial number of the file in the file group according to the starting address and capacity size of the file in the storage primitive, and the file group includes at least two sequentially stored files;
  • the index association unit 4014 is configured to use the identification number of the file group and the serial number of the file as an index to establish a corresponding relationship between the index and the file name of the file.
  • the file reading module 403 includes:
  • the request obtaining unit 4031 is used to obtain a file read request, wherein the file read request includes the target virtual disk partition where the file to be read is located and the virtual logical address of the file to be read;
  • a partition obtaining unit 4032 configured to determine the target physical disk partition corresponding to the target virtual disk partition according to the mapping relationship between the virtual disk partition and the physical disk partition;
  • the file reading unit 4033 is configured to perform a corresponding file reading operation on the target physical disk partition according to the virtual logical address of the file to be read and the file reading request, and generate a file reading operation record .
  • the file access optimization module 404 includes:
  • the reading times obtaining unit 4041 is used to obtain the reading times of all files in the local disk storage pool within a preset time period from the reading operation records of the files;
  • the data transmission unit 4042 is configured to transmit the files whose read times are greater than the first threshold to the high-speed disk storage pool if the read times are greater than the preset first threshold, and generate the read times in memory.
  • the redirecting reading unit 4043 is configured to, when reading the files whose reading times are greater than the first threshold, store the files in the high-speed disk storage pool according to the reading times greater than the first threshold The location record is directly redirected to the corresponding file for reading.
  • the files to be stored are divided into high and low frequency access files according to expected access frequency parameters, and then stored in different disks, and stored in different disks according to the division results to facilitate file reading.
  • redirected reading is based on high-speed disk reading, and the file reading speed is faster without affecting the performance of the local disk.
  • the device 600 for optimizing access to a large number of small files may vary greatly due to different configurations or performances, and may include one or more than one Central processing units (CPU) 610 (eg, one or more processors) and memory 620, one or more storage media 630 (eg, one or more mass storage devices) that store application programs 633 or data 632.
  • the memory 620 and the storage medium 630 may be short-term storage or persistent storage.
  • the program stored in the storage medium 630 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the apparatus 600 for optimizing access to massive small files.
  • the processor 610 may be configured to communicate with the storage medium 630 to execute a series of instruction operations in the storage medium 630 on the medical field intent recognition device 600 .
  • Mass small file access optimization device 600 may also include one or more power supplies 640, one or more wired or wireless network interfaces 650, one or more input and output interfaces 660, and/or, one or more operating systems 631, For example Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc.
  • operating systems 631 For example Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc.
  • FIG. 6 does not constitute a limitation on the massive small file access optimization device, and may include more or less components than those shown in the figure, or a combination of certain some components, or a different arrangement of components.
  • the present application also provides a device for optimizing access to massive small files, including: a memory and at least one processor, where instructions are stored in the memory, and the memory and the at least one processor are interconnected through a line; the at least one processor The processor invokes the instructions in the memory, so that the device for optimizing access to a large number of small files executes the steps in the above method for optimizing access to a large number of small files.
  • the present application also provides a computer-readable storage medium, and the computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium.
  • the computer-readable storage medium stores computer instructions, and when the computer instructions are executed on the computer, the computer performs the following steps:
  • the files to be stored are divided and stored in the corresponding storage pool, the storage pool includes a local disk storage pool and a high-speed disk storage pool;
  • file filtering is performed on the files in the local disk storage pool, and the filtered files are transferred to the high-speed disk storage pool.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention concerne le domaine de l'exploitation et de la maintenance de base, et concerne un procédé, un appareil et un dispositif d'optimisation d'accès pour une grande quantité de petits fichiers, ainsi qu'un support de stockage. Le procédé d'optimisation d'accès pour une grande quantité de petits fichiers comprend les étapes consistant à : au moyen d'une règle de division dynamique prédéfinie, diviser les fichiers à stocker en un groupe de stockage sur disque local et un groupe de stockage sur disque rapide ; puis configurer une structure de réseau de disques au moyen du groupe de stockage sur disque rapide, afin d'améliorer les performances du groupe de stockage sur disque rapide ; lire ensuite les fichiers dans le groupe de stockage sur disque local et générer un enregistrement de lecture de fichier correspondant ; enfin, surveiller dynamiquement l'enregistrement de lecture de fichier, faire migrer les fichiers dont le nombre d'instances de lecture est supérieur à un seuil prédéfini vers le groupe de stockage sur disque rapide et générer, dans une mémoire, un enregistrement de position de stockage des fichiers correspondants ; lorsqu'une demande de lecture concernant les fichiers ayant migré est reçue, en fonction de l'enregistrement de position de stockage généré dans la mémoire, rediriger alors directement vers lesdits fichiers pour la lecture. Le procédé permet donc de réduire le nombre d'E/S auquel un disque local accède, d'accélérer la lecture des fichiers et d'améliorer les performances d'accès aux fichiers locaux.
PCT/CN2022/089529 2021-04-30 2022-04-27 Procédé, appareil et dispositif d'optimisation d'accès pour une grande quantité de petits fichiers, et support de stockage WO2022228458A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110484057.3 2021-04-30
CN202110484057.3A CN113176857A (zh) 2021-04-30 2021-04-30 海量小文件存取优化方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022228458A1 true WO2022228458A1 (fr) 2022-11-03

Family

ID=76925818

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/089529 WO2022228458A1 (fr) 2021-04-30 2022-04-27 Procédé, appareil et dispositif d'optimisation d'accès pour une grande quantité de petits fichiers, et support de stockage

Country Status (2)

Country Link
CN (1) CN113176857A (fr)
WO (1) WO2022228458A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116069263A (zh) * 2023-03-07 2023-05-05 苏州浪潮智能科技有限公司 文件系统的优化方法、装置、服务器、设备及存储介质
CN117640626A (zh) * 2024-01-25 2024-03-01 合肥中科类脑智能技术有限公司 文件传输方法、装置及系统

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113176857A (zh) * 2021-04-30 2021-07-27 康键信息技术(深圳)有限公司 海量小文件存取优化方法、装置、设备及存储介质
CN114033224B (zh) * 2021-09-26 2023-04-07 烟台杰瑞石油服务集团股份有限公司 资源存取方法和装置
CN114020216B (zh) * 2021-11-03 2024-03-08 南京中孚信息技术有限公司 一种提升小容量文件落盘速度的方法
CN115904263B (zh) * 2023-03-10 2023-05-23 浪潮电子信息产业股份有限公司 一种数据迁移方法、系统、设备及计算机可读存储介质
CN117076387B (zh) * 2023-08-22 2024-03-01 北京天华星航科技有限公司 基于磁带的海量小文件的快速归档恢复系统
CN117376289B (zh) * 2023-10-11 2024-04-12 哈尔滨工业大学 面向道路监测数据使用申请的网络磁盘数据调度方法
CN117170590B (zh) * 2023-11-03 2024-01-26 沈阳卓志创芯科技有限公司 一种基于云计算的计算机数据存储方法及系统
CN117539634B (zh) * 2023-11-28 2024-05-24 中国大唐集团科学技术研究总院有限公司 一种面向全闪分布式存储的负载均衡方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103076993A (zh) * 2012-12-28 2013-05-01 北京思特奇信息技术股份有限公司 一种密集型系统中的存储系统及方法
CN105138290A (zh) * 2015-08-20 2015-12-09 浪潮(北京)电子信息产业有限公司 一种高性能存储池组织方法及装置
US20160179379A1 (en) * 2014-12-23 2016-06-23 Teradata Us, Inc. System and method for data management across volatile and non-volatile storage technologies
CN106406759A (zh) * 2016-09-13 2017-02-15 郑州云海信息技术有限公司 一种数据存储方法及装置
CN106777342A (zh) * 2017-01-16 2017-05-31 湖南大学 一种基于可靠性的高性能文件系统混合节能存储系统及方法
CN113176857A (zh) * 2021-04-30 2021-07-27 康键信息技术(深圳)有限公司 海量小文件存取优化方法、装置、设备及存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102662992B (zh) * 2012-03-14 2014-10-08 北京搜狐新媒体信息技术有限公司 一种海量小文件的存储、访问方法及装置
US9043530B1 (en) * 2012-04-09 2015-05-26 Netapp, Inc. Data storage within hybrid storage aggregate

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103076993A (zh) * 2012-12-28 2013-05-01 北京思特奇信息技术股份有限公司 一种密集型系统中的存储系统及方法
US20160179379A1 (en) * 2014-12-23 2016-06-23 Teradata Us, Inc. System and method for data management across volatile and non-volatile storage technologies
CN105138290A (zh) * 2015-08-20 2015-12-09 浪潮(北京)电子信息产业有限公司 一种高性能存储池组织方法及装置
CN106406759A (zh) * 2016-09-13 2017-02-15 郑州云海信息技术有限公司 一种数据存储方法及装置
CN106777342A (zh) * 2017-01-16 2017-05-31 湖南大学 一种基于可靠性的高性能文件系统混合节能存储系统及方法
CN113176857A (zh) * 2021-04-30 2021-07-27 康键信息技术(深圳)有限公司 海量小文件存取优化方法、装置、设备及存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116069263A (zh) * 2023-03-07 2023-05-05 苏州浪潮智能科技有限公司 文件系统的优化方法、装置、服务器、设备及存储介质
CN117640626A (zh) * 2024-01-25 2024-03-01 合肥中科类脑智能技术有限公司 文件传输方法、装置及系统
CN117640626B (zh) * 2024-01-25 2024-04-26 合肥中科类脑智能技术有限公司 文件传输方法、装置及系统

Also Published As

Publication number Publication date
CN113176857A (zh) 2021-07-27

Similar Documents

Publication Publication Date Title
WO2022228458A1 (fr) Procédé, appareil et dispositif d'optimisation d'accès pour une grande quantité de petits fichiers, et support de stockage
Dong et al. Optimizing Space Amplification in RocksDB.
US10761758B2 (en) Data aware deduplication object storage (DADOS)
US10169365B2 (en) Multiple deduplication domains in network storage system
JP5087467B2 (ja) コンピュータストレージシステムにおいてデータ圧縮並びに整合性を管理する方法および装置
US8290972B1 (en) System and method for storing and accessing data using a plurality of probabilistic data structures
US9798728B2 (en) System performing data deduplication using a dense tree data structure
US7389382B2 (en) ISCSI block cache and synchronization technique for WAN edge device
US8171253B2 (en) Virtual disk mapping
JP5302886B2 (ja) ブロック指紋を読み出し、ブロック指紋を使用してデータ重複を解消するシステム、及び方法
US8712963B1 (en) Method and apparatus for content-aware resizing of data chunks for replication
US7743038B1 (en) Inode based policy identifiers in a filing system
US10210188B2 (en) Multi-tiered data storage in a deduplication system
US8131688B2 (en) Storage system data compression enhancement
US11580162B2 (en) Key value append
US10719450B2 (en) Storage of run-length encoded database column data in non-volatile memory
US8924642B2 (en) Monitoring record management method and device
US12001703B2 (en) Data processing method and storage device
WO2022267508A1 (fr) Procédé et appareil de compression de métadonnées
EP4394575A1 (fr) Procédé de traitement de données et système de stockage
US11144533B1 (en) Inline deduplication using log based storage
US10922027B2 (en) Managing data storage in storage systems
WO2024022330A1 (fr) Procédé de gestion de métadonnées basé sur un système de fichiers, et dispositif associé
US20240143449A1 (en) Data Processing Method and Apparatus
US20130007363A1 (en) Control device and control method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22794928

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 14.02.2024)