WO2022228458A1 - Access optimization method, apparatus and device for large quantity of small files, and storage medium - Google Patents

Access optimization method, apparatus and device for large quantity of small files, and storage medium Download PDF

Info

Publication number
WO2022228458A1
WO2022228458A1 PCT/CN2022/089529 CN2022089529W WO2022228458A1 WO 2022228458 A1 WO2022228458 A1 WO 2022228458A1 CN 2022089529 W CN2022089529 W CN 2022089529W WO 2022228458 A1 WO2022228458 A1 WO 2022228458A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
files
storage pool
read
access
Prior art date
Application number
PCT/CN2022/089529
Other languages
French (fr)
Chinese (zh)
Inventor
郑平
Original Assignee
康键信息技术(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 康键信息技术(深圳)有限公司 filed Critical 康键信息技术(深圳)有限公司
Publication of WO2022228458A1 publication Critical patent/WO2022228458A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device

Definitions

  • the present application relates to the field of scaffolding operation and maintenance, and in particular, to a method, apparatus, device and storage medium for optimizing access to massive small files.
  • the present application provides a method, device, device, and storage medium for optimizing access to massive small files, which solve the problem of low local file access performance in the current method for optimizing access to massive small files.
  • a first aspect of the present application provides a method for optimizing access to a large number of small files, including: according to a preset dynamic file division rule, dividing the to-be-stored file and storing it in a corresponding storage pool, the storage
  • the pool includes a local disk storage pool and a high-speed disk storage pool; an independent redundant disk array structure is set for the high-speed disk storage pool based on the disk array technology; the read request of the file is obtained, and according to the read request, the Execute the corresponding file read operation in the storage pool, and generate a file read operation record; based on the file read operation record, perform file filtering on the files in the local disk storage pool, and transmit the filtered files.
  • the high-speed disk storage pool when a read request of the filtered file is received, the file is redirected to the filtered file for reading.
  • a second aspect of the present application provides a device for optimizing access to a large number of small files, including: a file storage module, configured to divide the files to be stored and store them in a corresponding storage pool according to a preset file dynamic division rule, and the The storage pool includes a local disk storage pool and a high-speed disk storage pool; a disk structure optimization module is used to set an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology; a file reading module is used to obtain the file data.
  • the operation record is to perform file filtering on the files in the local disk storage pool, transfer the filtered files to the high-speed disk storage pool, and redirect to the Filter the resulting file for reading.
  • a third aspect of the present application provides a device for optimizing access to massive small files, including: a memory and at least one processor, where an instruction is stored in the memory; the at least one processor invokes the instruction in the memory,
  • the massive small file access optimization device executes the steps of the massive small file access optimization method as described below: according to the preset file dynamic division rules, the files to be stored are divided and stored in the corresponding storage pool, so The storage pool includes a local disk storage pool and a high-speed disk storage pool; an independent redundant disk array structure is set for the high-speed disk storage pool based on the disk array technology; the read request of the file is obtained, and according to the read request, The corresponding file read operation is performed in the storage pool, and a file read operation record is generated; based on the file read operation record, file filtering is performed on the files in the local disk storage pool, and the filtered files are The file is transferred to the high-speed disk storage pool, and when a read request of the filtered file is received, it is redirected to the filtered
  • a fourth aspect of the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a computer, the computer executes the following method for optimizing access to a large number of small files. Steps: according to the preset file dynamic division rules, the files to be stored are divided and stored in the corresponding storage pool, the storage pool includes a local disk storage pool and a high-speed disk storage pool; based on the disk array technology, the high-speed disk storage The pool sets an independent redundant disk array structure; obtains the read request of the file, and executes the corresponding file read operation in the storage pool according to the read request, and generates the read operation record of the file; The read operation record of the file, file filtering is performed on the file in the local disk storage pool, the file obtained by filtering is transmitted to the high-speed disk storage pool, when receiving the read request of the file obtained by the filtering, Redirect to the filtered file for reading.
  • the files to be stored are divided into the local disk storage pool and the high-speed disk storage pool according to the preset dynamic division rules, and then the disk array structure is set through the high-speed disk storage pool, thereby improving the high-speed disk storage pool. performance, then read the files in the local disk storage pool and generate the corresponding file read records, and finally dynamically monitor the file read records, and migrate the files whose read times are greater than the preset threshold to the high-speed disk storage pool. And generate the storage location record of the corresponding file in the memory. When receiving the read request of these migrated files, it is directly redirected to the file for reading according to the storage location record in the memory, and the high-speed disk storage pool assists the local disk storage pool for reading. High-frequency access to file storage and reading reduces the number of I/Os accessed by local disks, and high-speed disks have better performance and file reading speeds, thereby improving local file access performance.
  • FIG. 1 is a schematic diagram of a first embodiment of a method for optimizing access to massive small files in an embodiment of the present application
  • FIG. 2 is a schematic diagram of a second embodiment of a method for optimizing access to massive small files in an embodiment of the present application
  • FIG. 3 is a schematic diagram of a third embodiment of a method for optimizing access to massive small files in an embodiment of the present application
  • FIG. 4 is a schematic diagram of an embodiment of an apparatus for optimizing access to a large number of small files in an embodiment of the present application
  • FIG. 5 is a schematic diagram of another embodiment of the apparatus for optimizing access to massive small files in an embodiment of the present application
  • FIG. 6 is a schematic diagram of an embodiment of a device for optimizing access to massive small files in an embodiment of the present application.
  • the present application provides a method, device, device, and storage medium for optimizing access to massive small files, which solve the problem of low local file access performance in the current method for optimizing access to massive small files.
  • a flowchart of a method for optimizing access to massive small files provided by an embodiment of the present application specifically includes:
  • a preset dynamic file division rule divide the files to be stored and store them in a corresponding storage pool, where the storage pool includes a local disk storage pool and a high-speed disk storage pool;
  • the local disk storage pool is a common mechanical hard disk, and the high-speed disk storage pool can be an SSD (Solid State Drive) or other flash disks.
  • “File dynamic division rule” is a rule that changes according to needs, and the classification standard is the expected access frequency of the file (Expected access frequency). , the first category is the file with "expected access frequency is high”, and the second category is the file with "expected access frequency is low”. Among them, the "expected access frequency” is calculated according to the actual number of file accesses in the recent period, and is backed up by text. For example, it is specified that the access frequency of the file in the last 5 minutes is used as the determination method of the expected access frequency parameter. The access frequency is compared with a set threshold to determine whether the desired access frequency is high or low.
  • a mechanical disk will not be equipped with a high-speed disk at the same time.
  • the program Adhering to the principle of no waste, the program tries to configure on-demand, and divides logical volumes by high-speed disks. For example, a disk in the early stage can use a 30G solid-state hard disk to dynamically expand according to actual needs.
  • RAID Redundant Arrays of Inexpensive Disks.
  • the principle is to use the array method to make a disk group, and to cooperate with the design of scattered arrangement of data to improve the security of data.
  • a disk array is composed of multiple disks combined into a large-capacity disk group, which utilizes the additional effects of individual disks to provide data to improve the performance of the entire disk system.
  • Disk arrays can also use the concept of parity check. When any hard disk in the array fails, the data can still be read. When the data is reconstructed, the data is re-calculated and placed in the new hard disk.
  • the cache of all disks will be invalid, because a high-speed disk is shared on a machine for cache, which will cause the entire business system to be slow and affect the requests of online users. For this kind of single point of failure, we can avoid this situation by doing raid1 for the high-speed disk. If one of the high-speed disks is broken, another one can provide services, take alarm measures, and replace the broken high-speed disk in time.
  • the most widely used way to do Raid at present is to use a hard Raid card, first connect the hard disk to the Raid controller, then the Raid controller is connected to the system PCIE bus, and finally change the hard disk mode in the system settings.
  • This embodiment does not limit the method of reading files from the disk.
  • the method of reading files from the disk depends on the storage method of the files. It is the offset (offset) and the length (length), which can be read according to these two values; it can also be stored based on a file system, such as XFS, at this time, it can be read directly according to the file ID as the file name. The details are done by the file system here.
  • the specific disk is determined by the upper-layer routing.
  • the program (osd) corresponding to the storage pool records each file read operation, including the file name, read time, etc.
  • the read request includes the virtual disk partition where the file to be read is located and the index of the file to be read, secondly, the corresponding physical disk partition is determined according to the virtual disk partition, and then according to the index of the file to be read, the The corresponding file is found in the partition in the corresponding physical disk, and the read operation is performed.
  • the read request is "A disk/11501”
  • the real physical disk corresponding to A disk is the logical volume G in the SSD
  • the corresponding file is found in the logical volume G according to the index "11501", such as the rules defined by the index number.
  • index "11501” such as the rules defined by the index number.
  • file group identification number + file serial number such as the rules defined by the index number.
  • step 103 specifically further includes the following steps:
  • the file read request includes the target virtual disk partition where the file to be read is located and the virtual logical address of the file to be read;
  • a corresponding file read operation is performed on the target physical disk partition, and a file read operation record is generated.
  • a file read request can be initiated to the terminal, and the file read request includes the target virtual disk partition where the file to be read is located, that is, a certain part of the virtual disk partition.
  • the virtual disk partition displayed on terminal A includes partition 1, partition 2, and partition 3.
  • partition 3 is the target disk partition.
  • the file read request can also include the virtual logical address of the file to be read, that is, the logical address on the target virtual disk partition, and the index, start position and read length of the file to be read can be specified through the virtual logical address. etc., so that it can correspond to the physical disk partition to perform the corresponding read operation on the physical disk partition.
  • mapping relationship between virtual disk partitions and physical disk partitions is pre-stored in the system.
  • physical disk partitions include partition a, partition b, and partition c
  • virtual disk partitions include partition 1, partition 2, and partition 3, and partition a and partition 2
  • partition b corresponds to partition 1
  • partition c corresponds to partition 3.
  • a database may also be created for the above-mentioned mapping relationship, and the above-mentioned mapping relationship is stored in the form of a database.
  • the corresponding relationship between the virtual logical address and the physical logical address is also stored in the system.
  • the physical logical address can be obtained by matching the virtual logical address, that is, the physical logical address on the target physical disk partition, and then the physical logical address of the target file on the target physical disk partition corresponding to the file to be read can be determined according to the physical logical address. Address, that is, the starting position of the target file.
  • a file read request can include a read operation to the target file, such as reading data of a specified length in the target file, or writing data of a specified length from a certain byte of the target file, and so on.
  • step 103 it further includes:
  • the files corresponding to the access frequency less than the threshold are deleted.
  • the access frequency is recorded in the memory of each corresponding program (osd), for example, the access frequency of the file per minute is counted. If the frequency exceeds a certain frequency threshold, migration to the high-speed disk occurs. A certain frequency threshold can mark the contents of the disk as invalid and trigger the corresponding disk content deletion operation.
  • a certain frequency threshold can mark the contents of the disk as invalid and trigger the corresponding disk content deletion operation.
  • the access frequency of file a and file b in the high-speed disk osd are 10 times/5 minutes and 6 times/5 minutes respectively, and the frequency threshold set under the system is 8 times/5 minutes, then the access frequency of file a exceeds the set frequency.
  • the set frequency threshold triggers the migration mechanism of the disk content.
  • file b its access frequency is lower than the set frequency threshold, and the file b is marked as invalid. delete operation.
  • file filtering is performed on the files in the local disk storage pool, and the filtered files are transferred to the high-speed disk storage pool.
  • redirection is performed. Read to the filtered file;
  • the method further includes:
  • the overwrite request determine the file to be overwritten and the storage location record of the file to be overwritten in the memory
  • the file in the local disk is a frequently accessed file
  • a copy of the file is transferred to the high-speed disk, so one file may exist in the local disk or in the high-speed disk, if both exist at the same time It is necessary to pay attention to the problem of file consistency. If an overwrite request is received, the files in the local disk storage pool and the backup files in the high-speed disk storage pool need to be modified at the same time, otherwise the user may read wrong data.
  • the address cache record of the file is deleted in the memory, and at the same time, a deletion instruction is sent to the high-speed disk (SSD) asynchronously, and the file is deleted in the high-speed disk (SSD). Stored backup files.
  • a high-speed disk is added as an auxiliary disk on the basis of the traditional local mechanical disk.
  • the high-speed disk has better performance, the file reading speed will be faster, thus improving the local file access performance.
  • FIG. 2 another flowchart of the method for optimizing access to massive small files provided by the embodiment of the present application specifically includes:
  • a corresponding expected access frequency parameter For each file to be stored, there is a corresponding expected access frequency parameter, which is calculated according to the actual number of file accesses in the recent period, and is recorded and backed up by text.
  • the access frequency of the file is used as a way to determine the parameter of the expected access frequency.
  • this parameter needs to be obtained, only the corresponding data field needs to be obtained from the record file.
  • compare the value of the obtained expected access frequency parameter with the preset threshold and add a corresponding identification to the corresponding file according to the comparison result, not limited to file name identification, data identification and other methods, and finally determine the file to which it belongs. category.
  • the expected access frequency of file a evaluated in the obtained record file is 0.6
  • the expected access frequency of file b is 0.3
  • the preset threshold is 0.5
  • the threshold value is 0.5 for numerical comparison.
  • the comparison result is that the expected access frequency of file a is greater than the threshold value of 0.5, and the expected access frequency of file b is less than the threshold value of 0.5.
  • add a file name identifier to file a such as adding the file name prefix A-, for the file b
  • Add a file and add the file name identifier B- and finally get two types of files: Type A: file A-a
  • Type B file B-b.
  • a storage primitive is a basic unit of storage used to store data.
  • storage primitives refer to storage blocks on storage disks.
  • a storage primitive can be randomly selected for the writing operation, or an "appropriate" storage primitive can be selected for the writing operation according to a certain "selection rule".
  • the "selection rule" here can be embodied in various specific forms. For example, according to the storage space occupancy of the storage primitives, the storage primitives with larger free space are selected for writing files. For example, according to the access busyness of the storage primitives , data receiving and processing capacity and other load information, select a storage primitive with a lighter load for writing files, etc., and realize the load balancing of writing through these "selection rules".
  • a cache can be maintained separately, which collects the load information of each storage primitive in real time. After the load of the storage primitive changes, it will actively report its own load status, or the cache can periodically initiate a load query request, and the storage primitive will return its own load information.
  • the cache is first queried. According to the load condition of the storage primitives stored in the internal storage, a storage primitive with a lighter load is selected as the storage primitive for writing the file according to the query result.
  • the write operation can be performed through the DataService (data management service) program. It is worth noting that: the embodiment of the present application writes the received file in a sequential manner when writing the received file into the storage primitive, so that subsequent operations can accurately obtain the serial number of the file in the file group.
  • the storage process of the entire file is not completed, and the purpose of storage is to access, so it is necessary to establish a path when accessing.
  • the file After the file is stored in the storage primitive, it will return the starting address of the file on the storage primitive and the size of the file.
  • the size of the file can be obtained by making the difference between the starting address and the ending address of the file, or by direct analysis. file gets. After obtaining the starting address and capacity size of the file, the starting address can be compared with the preset starting address and capacity size of the file group, so as to determine the identification number of the file group stored in the file, and the file in the file. The sequence number within the group.
  • file group here is a general term for multiple sequentially stored files, it corresponds to a virtual storage space, its preset start address is also the start address of the first file in the file group, and its end The address is the ending address of the last file within the file group.
  • the addresses occupied by file 1 on storage unit 1 are 1000-1500 (for the convenience of explanation, the address space is represented in decimal here), the addresses occupied by file 2 on storage unit 1 are 1501-1800, and the addresses occupied by file 3 in storage
  • the address occupied on primitive 1 is 1801 to 2000.
  • the file group 1 includes three files, and the starting address of the file group is the first file (file 1)
  • the start address is 1000
  • its end address is the end address of the third file (file 3), which is 2000.
  • file group 2 and file group 3 it is assumed that there are file group 2 and file group 3, and their preset storage spaces are 2001-2600 and 2601-3000 respectively. If file 2 is written into storage primitive 1, by comparing its starting address and capacity with the starting address and capacity of the file group, the identification number of the file group where file 2 is located can be determined, that is, the file 2 belongs to Filegroup 1.
  • the serial number of the file in the file group can be obtained, and the serial number can be directly expressed as the offset of the starting address of the file relative to the starting address of the file group.
  • the serial number of file 2 in this example is 1501.
  • the memory primitives are written sequentially, so the sequence number increases incrementally without panic.
  • the serial number of the file can also be programmed as a continuously increasing natural sequence, and the natural sequence has a corresponding relationship with the offset of the file.
  • the "file group identification number and file serial number” can be used as an index to establish a corresponding relationship with the file name of the file.
  • the specific index can be expressed as: "file group identification number + file serial number”, "file serial number + file group identification number”, and so on. For example, if the identification number of the file group to which file 2 belongs is 1, and the serial number of file 1 in the file group is 1501, an index table between "11501" and file 2 can be established.
  • the file 2 can be read from the storage primitive, so as to realize the access process. After the index between "the identification number of the file group and the serial number of the file” and the file name of the file is constructed, the storage process of the file ends.
  • the division of files to be stored and the storage method are described in detail.
  • the classified storage of files is realized, which not only facilitates the reading of files, thereby improving the reading speed of files, and reduces the local The number of I/O reads from the disk improves local disk performance.
  • the third flowchart of the method for optimizing access to massive small files specifically includes:
  • a preset dynamic file division rule divide the files to be stored and store them in a corresponding storage pool, where the storage pool includes a local disk storage pool and a high-speed disk storage pool;
  • the file When it is detected that the number of reads of a file in a period of time is greater than the preset threshold, it is determined that the file is a frequently accessed file, the file is transferred to a high-speed disk (SSD), and the file is cached in memory at the same time.
  • the storage address of the file in the high-speed disk (SSD) (logical volume + file group identification number + file serial number).
  • Redirected reading is based on high-speed disk reading, and the file reading speed is faster without affecting the performance of the local disk.
  • the file storage module 401 is configured to divide the files to be stored and store them in a corresponding storage pool according to a preset dynamic file division rule, where the storage pool includes a local disk storage pool and a high-speed disk storage pool;
  • a disk structure optimization module 402 configured to set an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology
  • the file reading module 403 is used to obtain the reading request of the file, and according to the reading request, execute the corresponding file reading operation in the storage pool, and generate the reading operation record of the file;
  • the file access optimization module 404 is configured to perform file filtering on the files in the local disk storage pool based on the read operation records of the files, and transmit the filtered files to the high-speed disk storage pool.
  • the screen is redirected to the screened file for reading.
  • a high-speed disk is added as an auxiliary disk on the basis of the traditional local mechanical disk.
  • the high-speed disk has better performance, the file reading speed will be faster, thus improving the local file access performance
  • another embodiment of the apparatus for optimizing access to massive small files in the embodiment of the present application includes:
  • the file storage module 401 is configured to divide the files to be stored and store them in a corresponding storage pool according to a preset dynamic file division rule, where the storage pool includes a local disk storage pool and a high-speed disk storage pool;
  • a disk structure optimization module 402 configured to set an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology
  • the file reading module 403 is used to obtain the reading request of the file, and according to the reading request, execute the corresponding file reading operation in the storage pool, and generate the reading operation record of the file;
  • the file access optimization module 404 is configured to perform file filtering on the files in the local disk storage pool based on the read operation records of the files, and transmit the filtered files to the high-speed disk storage pool.
  • the screen is redirected to the screened file for reading.
  • the file storage module 401 includes:
  • the file division unit 4011 is used to obtain preset expected access frequency parameters of all files to be stored, and according to the expected access frequency parameters, the to-be-stored files are divided into high-frequency access file classes and low-frequency access file classes;
  • a file writing unit 4012 configured to sequentially write the files in the high-frequency access file class into the storage primitives in the high-speed disk storage pool, and sequentially write the files in the low-frequency access file class into the local storage primitives in disk storage pools;
  • the index obtaining unit 4013 is used to determine the file group to which the file belongs and the serial number of the file in the file group according to the starting address and capacity size of the file in the storage primitive, and the file group includes at least two sequentially stored files;
  • the index association unit 4014 is configured to use the identification number of the file group and the serial number of the file as an index to establish a corresponding relationship between the index and the file name of the file.
  • the file reading module 403 includes:
  • the request obtaining unit 4031 is used to obtain a file read request, wherein the file read request includes the target virtual disk partition where the file to be read is located and the virtual logical address of the file to be read;
  • a partition obtaining unit 4032 configured to determine the target physical disk partition corresponding to the target virtual disk partition according to the mapping relationship between the virtual disk partition and the physical disk partition;
  • the file reading unit 4033 is configured to perform a corresponding file reading operation on the target physical disk partition according to the virtual logical address of the file to be read and the file reading request, and generate a file reading operation record .
  • the file access optimization module 404 includes:
  • the reading times obtaining unit 4041 is used to obtain the reading times of all files in the local disk storage pool within a preset time period from the reading operation records of the files;
  • the data transmission unit 4042 is configured to transmit the files whose read times are greater than the first threshold to the high-speed disk storage pool if the read times are greater than the preset first threshold, and generate the read times in memory.
  • the redirecting reading unit 4043 is configured to, when reading the files whose reading times are greater than the first threshold, store the files in the high-speed disk storage pool according to the reading times greater than the first threshold The location record is directly redirected to the corresponding file for reading.
  • the files to be stored are divided into high and low frequency access files according to expected access frequency parameters, and then stored in different disks, and stored in different disks according to the division results to facilitate file reading.
  • redirected reading is based on high-speed disk reading, and the file reading speed is faster without affecting the performance of the local disk.
  • the device 600 for optimizing access to a large number of small files may vary greatly due to different configurations or performances, and may include one or more than one Central processing units (CPU) 610 (eg, one or more processors) and memory 620, one or more storage media 630 (eg, one or more mass storage devices) that store application programs 633 or data 632.
  • the memory 620 and the storage medium 630 may be short-term storage or persistent storage.
  • the program stored in the storage medium 630 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the apparatus 600 for optimizing access to massive small files.
  • the processor 610 may be configured to communicate with the storage medium 630 to execute a series of instruction operations in the storage medium 630 on the medical field intent recognition device 600 .
  • Mass small file access optimization device 600 may also include one or more power supplies 640, one or more wired or wireless network interfaces 650, one or more input and output interfaces 660, and/or, one or more operating systems 631, For example Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc.
  • operating systems 631 For example Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc.
  • FIG. 6 does not constitute a limitation on the massive small file access optimization device, and may include more or less components than those shown in the figure, or a combination of certain some components, or a different arrangement of components.
  • the present application also provides a device for optimizing access to massive small files, including: a memory and at least one processor, where instructions are stored in the memory, and the memory and the at least one processor are interconnected through a line; the at least one processor The processor invokes the instructions in the memory, so that the device for optimizing access to a large number of small files executes the steps in the above method for optimizing access to a large number of small files.
  • the present application also provides a computer-readable storage medium, and the computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium.
  • the computer-readable storage medium stores computer instructions, and when the computer instructions are executed on the computer, the computer performs the following steps:
  • the files to be stored are divided and stored in the corresponding storage pool, the storage pool includes a local disk storage pool and a high-speed disk storage pool;
  • file filtering is performed on the files in the local disk storage pool, and the filtered files are transferred to the high-speed disk storage pool.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application relates to the field of base frame operation and maintenance, and discloses an access optimization method, apparatus and device for a large quantity of small files, and a storage medium. The access optimization method for a large quantity of small files comprises: by means of a preset dynamic division rule, dividing files to be stored into a local disk storage pool and a high-speed disk storage pool; then, setting a disk array structure by means of the high-speed disk storage pool, thereby improving the performance of the high-speed disk storage pool; afterwards, reading files in the local disk storage pool and generating a corresponding file reading record; and finally, dynamically monitoring the file reading record, and migrating files the number of reading instances of which is greater than a preset threshold to the high-speed disk storage pool, and generating, in a memory, a storage position record of corresponding files; when a read request for the migrated files is received, then according to the storage position record in the memory, directly redirecting to said files for reading. Thus, the I/O number accessed by a local disk is reduced, and the file reading speed is faster, thereby improving local file access performance.

Description

海量小文件存取优化方法、装置、设备及存储介质Mass small file access optimization method, device, device and storage medium
本申请要求于2021年4月30日提交中国专利局、申请号为202110484057.3、发明名称为“海量小文件存取优化方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。This application claims the priority of the Chinese patent application filed on April 30, 2021 with the application number 202110484057.3 and the invention titled "Method, Apparatus, Equipment and Storage Medium for Accessing Massive Small Files", the entire contents of which are Incorporated in the application by reference.
技术领域technical field
本申请涉及基架运维领域,尤其涉及一种海量小文件存取优化方法、装置、设备及存储介质。The present application relates to the field of scaffolding operation and maintenance, and in particular, to a method, apparatus, device and storage medium for optimizing access to massive small files.
背景技术Background technique
当前互联网应用中具有大量的小文件,譬如被分成小段的视频文件,购物网页中的图片,新闻网站的图片等,此外,大型的网站可能存放超过百亿级别的图片,这个数据的存储读取效率成了影响服务性能的关键问题。现实场景中还存在磁盘频繁读取的情况,这将会让磁盘高负荷工作,从而影响磁盘的性能。At present, there are a large number of small files in Internet applications, such as video files divided into small segments, pictures in shopping web pages, pictures in news websites, etc. In addition, large websites may store more than 10 billion pictures. Efficiency becomes a key issue affecting service performance. In real scenarios, the disk is frequently read, which will make the disk work under high load, thus affecting the performance of the disk.
针对“磁盘频繁读取而影响磁盘性能”这一现象,发明人意识到现有的性能优化解决方案是通过内存对频繁读取的文件进行缓存,但是这种方法还是基于本地机械磁盘进行文件读取,本地文件访问性能较低。In response to the phenomenon of "frequent disk reading affects disk performance", the inventor realized that the existing performance optimization solution is to cache frequently read files through memory, but this method is still based on local mechanical disks for file reading However, local file access performance is low.
发明内容SUMMARY OF THE INVENTION
本申请提供了一种海量小文件存取优化方法、装置、设备及存储介质,解决了当前海量小文件存取优化方法存在本地文件访问性能较低的问题。The present application provides a method, device, device, and storage medium for optimizing access to massive small files, which solve the problem of low local file access performance in the current method for optimizing access to massive small files.
为实现上述目的,本申请第一方面提供了一种海量小文件存取优化方法,包括:根据预设的文件动态划分规则,将待存储的文件进行划分后存储到对应存储池,所述存储池包括本地磁盘存储池和高速磁盘存储池;基于磁盘阵列技术为所述高速磁盘存储池设定独立冗余磁盘阵列结构;获取文件的读取请求,并根据所述读取请求,在所述存储池中执行相应的文件读取操作,并生成文件的读取操作记录;基于所述文件的读取操作记录,对所述本地磁盘存储池中的文件进行文件过滤,将过滤得到的文件传输至高速磁盘存储池中,当接收到所述过滤得到的文件的读取请求时,重定向至所述过滤得到的文件进行读取。In order to achieve the above object, a first aspect of the present application provides a method for optimizing access to a large number of small files, including: according to a preset dynamic file division rule, dividing the to-be-stored file and storing it in a corresponding storage pool, the storage The pool includes a local disk storage pool and a high-speed disk storage pool; an independent redundant disk array structure is set for the high-speed disk storage pool based on the disk array technology; the read request of the file is obtained, and according to the read request, the Execute the corresponding file read operation in the storage pool, and generate a file read operation record; based on the file read operation record, perform file filtering on the files in the local disk storage pool, and transmit the filtered files. In the high-speed disk storage pool, when a read request of the filtered file is received, the file is redirected to the filtered file for reading.
本申请第二方面提供了一种海量小文件存取优化装置,包括:文件存储模块,用于根据预设的文件动态划分规则,将待存储的文件进行划分后存储到对应存储池,所述存储池包括本地磁盘存储池和高速磁盘存储池;磁盘结构优化模块,用于基于磁盘阵列技术为所述高速磁盘存储池设定独立冗余磁盘阵列结构;文件读取模块,用于获取文件的读取请求,并根据所述读取请求,在所述存储池中执行相应的文件读取操作,并生成文件的读取操作记录;文件存取优化模块,用于基于所述文件的读取操作记录,对所述本地磁盘存储池中的文件进行文件过滤,将过滤得到的文件传输至高速磁盘存储池中,当接收到所述过滤得到的文件的读取请求时,重定向至所述过滤得到的文件进行读取。A second aspect of the present application provides a device for optimizing access to a large number of small files, including: a file storage module, configured to divide the files to be stored and store them in a corresponding storage pool according to a preset file dynamic division rule, and the The storage pool includes a local disk storage pool and a high-speed disk storage pool; a disk structure optimization module is used to set an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology; a file reading module is used to obtain the file data. read request, and perform corresponding file read operation in the storage pool according to the read request, and generate a file read operation record; a file access optimization module, used for reading based on the file The operation record is to perform file filtering on the files in the local disk storage pool, transfer the filtered files to the high-speed disk storage pool, and redirect to the Filter the resulting file for reading.
本申请第三方面提供了一种海量小文件存取优化设备,包括:存储器和至少一个处理器,所述存储器中存储有指令;所述至少一个处理器调用所述存储器中的所述指令,以使得所述海量小文件存取优化设备执行如下所述的海量小文件存取优化方法的步骤:根据预设的文件动态划分规则,将待存储的文件进行划分后存储到对应存储池,所述存储池包括本地磁盘存储池和高速磁盘存储池;基于磁盘阵列技术为所述高速磁盘存储池设定独立冗余磁盘阵列结构;获取文件的读取请求,并根据所述读取请求,在所述存储池中执行相应的文件读取操作,并生成文件的读取操作记录;基于所述文件的读取操作记录,对所述本地磁盘存储池中的文件进行文件过滤,将过滤得到的文件传输至高速磁盘存储池中,当接收到所述过滤得到的文件的读取请求时,重定向至所述过滤得到的文件进行读取。A third aspect of the present application provides a device for optimizing access to massive small files, including: a memory and at least one processor, where an instruction is stored in the memory; the at least one processor invokes the instruction in the memory, In order to make the massive small file access optimization device execute the steps of the massive small file access optimization method as described below: according to the preset file dynamic division rules, the files to be stored are divided and stored in the corresponding storage pool, so The storage pool includes a local disk storage pool and a high-speed disk storage pool; an independent redundant disk array structure is set for the high-speed disk storage pool based on the disk array technology; the read request of the file is obtained, and according to the read request, The corresponding file read operation is performed in the storage pool, and a file read operation record is generated; based on the file read operation record, file filtering is performed on the files in the local disk storage pool, and the filtered files are The file is transferred to the high-speed disk storage pool, and when a read request of the filtered file is received, it is redirected to the filtered file for reading.
本申请第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行如下所述的海量小文件存取优化方法的步 骤:根据预设的文件动态划分规则,将待存储的文件进行划分后存储到对应存储池,所述存储池包括本地磁盘存储池和高速磁盘存储池;基于磁盘阵列技术为所述高速磁盘存储池设定独立冗余磁盘阵列结构;获取文件的读取请求,并根据所述读取请求,在所述存储池中执行相应的文件读取操作,并生成文件的读取操作记录;基于所述文件的读取操作记录,对所述本地磁盘存储池中的文件进行文件过滤,将过滤得到的文件传输至高速磁盘存储池中,当接收到所述过滤得到的文件的读取请求时,重定向至所述过滤得到的文件进行读取。A fourth aspect of the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a computer, the computer executes the following method for optimizing access to a large number of small files. Steps: according to the preset file dynamic division rules, the files to be stored are divided and stored in the corresponding storage pool, the storage pool includes a local disk storage pool and a high-speed disk storage pool; based on the disk array technology, the high-speed disk storage The pool sets an independent redundant disk array structure; obtains the read request of the file, and executes the corresponding file read operation in the storage pool according to the read request, and generates the read operation record of the file; The read operation record of the file, file filtering is performed on the file in the local disk storage pool, the file obtained by filtering is transmitted to the high-speed disk storage pool, when receiving the read request of the file obtained by the filtering, Redirect to the filtered file for reading.
本申请提供的技术方案中,通过预设的动态划分规则将待存储文件划分到本地磁盘存储池和高速磁盘存储池,其次通过高速磁盘存储池设定磁盘阵列结构,从而提升高速磁盘存储池的性能,随后对本地磁盘存储池中的文件进行读取并生成相应的文件读取记录,最后动态监测该文件读取记录,将读取次数大于预设阈值的文件迁移至高速磁盘存储池中,并在内存中生成对应文件的存储位置记录,当接收到这些迁移文件的读取请求时,根据内存中存储位置记录直接重定向到该文件进行读取,高速磁盘存储池辅助本地磁盘存储池进行高频访问文件的存储和读取,降低了本地磁盘访问的I/O数量,并且高速磁盘拥有更好的性能,文件读取速度会更快,从而提升了本地文件访问性能。In the technical solution provided by the present application, the files to be stored are divided into the local disk storage pool and the high-speed disk storage pool according to the preset dynamic division rules, and then the disk array structure is set through the high-speed disk storage pool, thereby improving the high-speed disk storage pool. performance, then read the files in the local disk storage pool and generate the corresponding file read records, and finally dynamically monitor the file read records, and migrate the files whose read times are greater than the preset threshold to the high-speed disk storage pool. And generate the storage location record of the corresponding file in the memory. When receiving the read request of these migrated files, it is directly redirected to the file for reading according to the storage location record in the memory, and the high-speed disk storage pool assists the local disk storage pool for reading. High-frequency access to file storage and reading reduces the number of I/Os accessed by local disks, and high-speed disks have better performance and file reading speeds, thereby improving local file access performance.
附图说明Description of drawings
图1为本申请实施例中海量小文件存取优化方法的第一个实施例示意图;1 is a schematic diagram of a first embodiment of a method for optimizing access to massive small files in an embodiment of the present application;
图2为本申请实施例中海量小文件存取优化方法的第二个实施例示意图;2 is a schematic diagram of a second embodiment of a method for optimizing access to massive small files in an embodiment of the present application;
图3为本申请实施例中海量小文件存取优化方法的第三个实施例示意图;3 is a schematic diagram of a third embodiment of a method for optimizing access to massive small files in an embodiment of the present application;
图4为本申请实施例中海量小文件存取优化装置的一个实施例示意图;FIG. 4 is a schematic diagram of an embodiment of an apparatus for optimizing access to a large number of small files in an embodiment of the present application;
图5为本申请实施例中海量小文件存取优化装置的另一个实施例示意图;5 is a schematic diagram of another embodiment of the apparatus for optimizing access to massive small files in an embodiment of the present application;
图6为本申请实施例中海量小文件存取优化设备的一个实施例示意图。FIG. 6 is a schematic diagram of an embodiment of a device for optimizing access to massive small files in an embodiment of the present application.
具体实施方式Detailed ways
本申请提供了一种海量小文件存取优化方法、装置、设备及存储介质,解决了当前海量小文件存取优化方法存在本地文件访问性能较低的问题。The present application provides a method, device, device, and storage medium for optimizing access to massive small files, which solve the problem of low local file access performance in the current method for optimizing access to massive small files.
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例进行描述。In order to make those skilled in the art better understand the solutions of the present application, the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”或“具有”及其任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if any) in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It is to be understood that data so used may be interchanged under appropriate circumstances so that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" or "having" and any variations thereof are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those expressly listed steps or units, but may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.
请参阅图1,本申请实施例提供的海量小文件存取优化方法的流程图,具体包括:Referring to FIG. 1, a flowchart of a method for optimizing access to massive small files provided by an embodiment of the present application specifically includes:
101、根据预设的文件动态划分规则,将待存储的文件进行划分后存储到对应的存储池,存储池包括本地磁盘存储池和高速磁盘存储池;101. According to a preset dynamic file division rule, divide the files to be stored and store them in a corresponding storage pool, where the storage pool includes a local disk storage pool and a high-speed disk storage pool;
本地磁盘存储池为常用机械硬盘,高速磁盘存储池可以是SSD(固态硬盘)或者其他闪存盘。“文件动态划分规则”是一种按照需求变动的规则,分类标准是文件的期望访问频率(Expected access frequency),在这里通过所述“文件动态划分规则”将所有的待存储文件分为了两类,第一类是“期望访问频率高”的文件,第二类是“期望访问频率低”的文件。其中,“期望访问频率”是根据最近一段时间文件实际访问的次数算出,并通过文本进行了记录备份,例如规定将最近5分钟内文件的访问频次作为期望访问频率参数的确定方式,通过将期望访问频率和设定的阈值进行比较,可以确定期望访问频率是高还是低。The local disk storage pool is a common mechanical hard disk, and the high-speed disk storage pool can be an SSD (Solid State Drive) or other flash disks. "File dynamic division rule" is a rule that changes according to needs, and the classification standard is the expected access frequency of the file (Expected access frequency). , the first category is the file with "expected access frequency is high", and the second category is the file with "expected access frequency is low". Among them, the "expected access frequency" is calculated according to the actual number of file accesses in the recent period, and is backed up by text. For example, it is specified that the access frequency of the file in the last 5 minutes is used as the determination method of the expected access frequency parameter. The access frequency is compared with a set threshold to determine whether the desired access frequency is high or low.
对于第一类“期望访问频率高”的文件,将这类文件存储于高速磁盘存储池中,因为 这类文件的访问频率高,会给磁盘带来较大的压力,而高速磁盘存储池(SSD)相比于本地磁盘存储池(机械硬盘)的性能较好,可以通过充放电来识别二进制位。对于第二类“期望访问频率低”的文件,这类文件存储于本地磁盘存储池中,访问频率低对磁盘的性能要求也同时降低。For the first type of "expected access frequency" files, store such files in the high-speed disk storage pool, because the high access frequency of such files will bring greater pressure to the disk, and the high-speed disk storage pool ( SSD) has better performance than local disk storage pools (mechanical hard disks), and can identify binary bits by charging and discharging. For the second type of files that are "expected to be accessed less frequently", such files are stored in the local disk storage pool, and the performance requirements of the disk are also reduced at the same time.
需要注意的是,在实际运维的时候不会一块机械盘同时配备一个高速磁盘,一方面是造价不允许,一方面是实际也没那么多文件需要高速硬盘来缓存。秉承着不浪费,程序尽量做到按需配置,通过对高速磁盘来划分逻辑卷,比如前期一个磁盘可以用30G的固态硬盘,根据实际需要来动态扩展。It should be noted that in the actual operation and maintenance, a mechanical disk will not be equipped with a high-speed disk at the same time. Adhering to the principle of no waste, the program tries to configure on-demand, and divides logical volumes by high-speed disks. For example, a disk in the early stage can use a 30G solid-state hard disk to dynamically expand according to actual needs.
102、基于磁盘阵列技术为高速磁盘存储池设定独立冗余磁盘阵列结构;102. Set up an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology;
RAID是磁盘阵列(Redundant Arrays of Inexpensive Disks),原理是利用数组方式来作磁盘组,配合数据分散排列的设计,提升数据的安全性。磁盘阵列是由多个磁盘组合成一个容量巨大的磁盘组,利用个别磁盘提供数据所产生加成效果提升整个磁盘系统效能。利用这项技术,将数据切割成许多区段,分别存放在各个硬盘上。磁盘阵列还能利用同位检查的观念,在数组中任一颗硬盘故障时,仍可读出数据,在数据重构时,将数据经计算后重新置入新硬盘中。RAID is Redundant Arrays of Inexpensive Disks. The principle is to use the array method to make a disk group, and to cooperate with the design of scattered arrangement of data to improve the security of data. A disk array is composed of multiple disks combined into a large-capacity disk group, which utilizes the additional effects of individual disks to provide data to improve the performance of the entire disk system. Using this technology, the data is cut into many segments and stored on each hard disk. Disk arrays can also use the concept of parity check. When any hard disk in the array fails, the data can still be read. When the data is reconstructed, the data is re-calculated and placed in the new hard disk.
例如,在实际的运维过程中发现高速磁盘坏掉之后,所有磁盘的缓存将失效,因为一个机器上公用一块高速磁盘来做缓存,会导致整个业务系统缓慢,影响线上用户的请求。对于这种单点故障,通过给高速磁盘做raid1来避免这种局面,如果其中一块坏掉,还有一块可以提供服务,做好报警措施,及时更换坏的那块高速磁盘。做Raid目前最广泛的方式,是采用硬Raid卡,先将硬盘接入到Raid控制器上,Raid控制器再接入到系统PCIE总线上,最后在系统设定中更改硬盘模式。For example, after the high-speed disk is found to be broken during the actual operation and maintenance process, the cache of all disks will be invalid, because a high-speed disk is shared on a machine for cache, which will cause the entire business system to be slow and affect the requests of online users. For this kind of single point of failure, we can avoid this situation by doing raid1 for the high-speed disk. If one of the high-speed disks is broken, another one can provide services, take alarm measures, and replace the broken high-speed disk in time. The most widely used way to do Raid at present is to use a hard Raid card, first connect the hard disk to the Raid controller, then the Raid controller is connected to the system PCIE bus, and finally change the hard disk mode in the system settings.
103、获取文件的读取请求,并根据读取请求,在存储池中执行相应的文件读取操作,并生成文件的读取操作记录;103. Obtain the read request of the file, and perform the corresponding file read operation in the storage pool according to the read request, and generate a file read operation record;
本实施例对磁盘读取文件的方式并不做限定,磁盘读取文件的方式取决于文件的存储方式,例如是基于裸盘写入,可以根据文件标识找到对应的索引,索引里面可以存储的是偏移量(offset)和长度(length),根据这两个值就可以读取;也可以基于文件系统,比如XFS来存储,这个时候直接根据文件标识作为文件名就可以读取,具体内部细节是由这里的文件系统来完成。具体在什么盘,由上层路由来决定。由存储池对应的程序(osd)记录每一次文件的读取操作,包括文件名称,读取时间等。This embodiment does not limit the method of reading files from the disk. The method of reading files from the disk depends on the storage method of the files. It is the offset (offset) and the length (length), which can be read according to these two values; it can also be stored based on a file system, such as XFS, at this time, it can be read directly according to the file ID as the file name. The details are done by the file system here. The specific disk is determined by the upper-layer routing. The program (osd) corresponding to the storage pool records each file read operation, including the file name, read time, etc.
在本实施例中,其中读取请求包括待读取文件所处的虚拟磁盘分区和待读取文件的索引,其次根据虚拟磁盘分区确定对应的物理磁盘分区,然后根据待读取文件的索引在对应的物理磁盘内的分区里查找到相应的文件,执行读取操作。In this embodiment, the read request includes the virtual disk partition where the file to be read is located and the index of the file to be read, secondly, the corresponding physical disk partition is determined according to the virtual disk partition, and then according to the index of the file to be read, the The corresponding file is found in the partition in the corresponding physical disk, and the read operation is performed.
例如读取请求为“A盘/11501”,而A盘对应的真实物理磁盘为SSD中逻辑卷G,那么在逻辑卷G中根据索引“11501”查找到相应的文件,例如索引号定义的规则为“文件组标识号+文件序号”,那么可以确定的是,待读取文件的文件组标识号为“1”,待读取文件的文件序号为“1501”。For example, the read request is "A disk/11501", and the real physical disk corresponding to A disk is the logical volume G in the SSD, then the corresponding file is found in the logical volume G according to the index "11501", such as the rules defined by the index number. is "file group identification number + file serial number", then it can be determined that the file group identification number of the file to be read is "1", and the file serial number of the file to be read is "1501".
在本实施例中,步骤103具体还包括以下步骤:In this embodiment, step 103 specifically further includes the following steps:
获取文件读取请求,其中,文件读取请求中包括待读取文件所处的目标虚拟磁盘分区及待读取文件的虚拟逻辑地址;Obtain a file read request, wherein the file read request includes the target virtual disk partition where the file to be read is located and the virtual logical address of the file to be read;
根据虚拟磁盘分区与物理磁盘分区的映射关系,确定目标虚拟磁盘分区对应的目标物理磁盘分区;Determine the target physical disk partition corresponding to the target virtual disk partition according to the mapping relationship between the virtual disk partition and the physical disk partition;
根据待读取文件的虚拟逻辑地址和文件读取请求,在目标物理磁盘分区执行相应的文件读取操作,并生成文件的读取操作记录。According to the virtual logical address of the file to be read and the file read request, a corresponding file read operation is performed on the target physical disk partition, and a file read operation record is generated.
在需要对虚拟磁盘分区上的文件执行读取时,可以向终端发起文件读取请求,在文件 读取请求中包含有待读取文件所处的目标虚拟磁盘分区,也即虚拟磁盘分区中的某个分区,例如,在终端A上展示的虚拟磁盘分区包括分区1、分区2和分区3,在对分区3上执行某个文件的读操作时,则分区3即为目标磁盘分区。When the file on the virtual disk partition needs to be read, a file read request can be initiated to the terminal, and the file read request includes the target virtual disk partition where the file to be read is located, that is, a certain part of the virtual disk partition. For example, the virtual disk partition displayed on terminal A includes partition 1, partition 2, and partition 3. When a file read operation is performed on partition 3, partition 3 is the target disk partition.
在文件读取请求中还可以包括待读取文件的虚拟逻辑地址,也即目标虚拟磁盘分区上的逻辑地址,通过虚拟逻辑地址可以明确待读取文件的索引、起始位置和读取长度等等,从而可以与物理磁盘分区相对应,以在物理磁盘分区上执行相应的读取操作。The file read request can also include the virtual logical address of the file to be read, that is, the logical address on the target virtual disk partition, and the index, start position and read length of the file to be read can be specified through the virtual logical address. etc., so that it can correspond to the physical disk partition to perform the corresponding read operation on the physical disk partition.
在系统中预先存储有虚拟磁盘分区与物理磁盘分区的映射关系,例如,物理磁盘分区包括分区a、分区b和分区c,虚拟磁盘分区包括分区1、分区2和分区3,分区a与分区2对应,分区b与分区1对应,分区c域分区3对应。当然,在移动终端系统中还可以为上述映射关系创建数据库,以数据库的形式保存上述映射关系。在获取待读取文件所处的目标虚拟磁盘分区之后,可以根据虚拟磁盘分区与物理磁盘分区的映射关系,确定出与目标虚拟磁盘分区对应的目标物理磁盘分区。The mapping relationship between virtual disk partitions and physical disk partitions is pre-stored in the system. For example, physical disk partitions include partition a, partition b, and partition c, virtual disk partitions include partition 1, partition 2, and partition 3, and partition a and partition 2 Correspondingly, partition b corresponds to partition 1, and partition c corresponds to partition 3. Of course, in the mobile terminal system, a database may also be created for the above-mentioned mapping relationship, and the above-mentioned mapping relationship is stored in the form of a database. After acquiring the target virtual disk partition where the file to be read is located, the target physical disk partition corresponding to the target virtual disk partition can be determined according to the mapping relationship between the virtual disk partition and the physical disk partition.
在系统中还保存有虚拟逻辑地址与物理逻辑地址的对应关系,在根据文件读取请求获取待读取文件的虚拟逻辑地址(即待读取文件在目标虚拟磁盘分区上所处的地址)之后,可以根据虚拟逻辑地址匹配得到物理逻辑地址,即目标物理磁盘分区上的物理逻辑地址,进而根据物理逻辑地址可以确定与待读取文件对应的目标物理磁盘分区上的目标文件所处的物理逻辑地址,也即目标文件的起始位置。文件读取请求可以包含有对目标文件的读操作,如在目标文件中读取指定长度的数据,或从目标文件的某个字节开始写入指定长度的数据等等操作。The corresponding relationship between the virtual logical address and the physical logical address is also stored in the system. After obtaining the virtual logical address of the file to be read according to the file read request (that is, the address of the file to be read on the target virtual disk partition) , the physical logical address can be obtained by matching the virtual logical address, that is, the physical logical address on the target physical disk partition, and then the physical logical address of the target file on the target physical disk partition corresponding to the file to be read can be determined according to the physical logical address. Address, that is, the starting position of the target file. A file read request can include a read operation to the target file, such as reading data of a specified length in the target file, or writing data of a specified length from a certain byte of the target file, and so on.
可选的,在步骤103之后还包括:Optionally, after step 103, it further includes:
获取高速磁盘存储池中的每个文件的访问频率;Get the access frequency of each file in the high-speed disk storage pool;
判断各文件的访问频率是否小于阈值;Determine whether the access frequency of each file is less than the threshold;
若访问频率小于预置阈值,则删除访问频率小于阈值对应的文件。If the access frequency is less than the preset threshold, the files corresponding to the access frequency less than the threshold are deleted.
考虑到高速磁盘的空间有限性,需要定时对磁盘空间进行清理。本实施例中通过将访问频次记录在每个存储对应的程序(osd)的内存里面,比如统计每分钟的文件的访问频次,如果超过一定频次阈值,就发生到高速磁盘的迁移,如果低于某个频次阈值就可以把磁盘里面的内容标记为失效,触发对应的磁盘内容删除操作,这样的一个好处是无需花费扫描日志的代价就可以达到应用的效果,即使对应程序重启,可以把高速磁盘里面对应的缓存全部标记为失效,触发回收的过程,其中频次阈值可以根据需求进行做适应性的改变。Considering the limited space of high-speed disks, the disk space needs to be cleaned up regularly. In this embodiment, the access frequency is recorded in the memory of each corresponding program (osd), for example, the access frequency of the file per minute is counted. If the frequency exceeds a certain frequency threshold, migration to the high-speed disk occurs. A certain frequency threshold can mark the contents of the disk as invalid and trigger the corresponding disk content deletion operation. One advantage of this is that the application effect can be achieved without the cost of scanning the log. Even if the corresponding program is restarted, the high-speed disk can be deleted. The corresponding caches are all marked as invalid, triggering the process of recycling, in which the frequency threshold can be adaptively changed according to requirements.
例如高速磁盘osd中文件a和文件b的访问频次分别为10次/5分钟,6次/5分钟,系统下设定的频次阈值为8次/5分钟,那么文件a的访问频次超过了设定的频次阈值,触发磁盘内容的迁移机制,对于文件b而言,其访问频次低于设定的频次阈值,文件b则被标记为失效,当检测到磁盘内存在失效内容时,触发磁盘的删除操作。For example, the access frequency of file a and file b in the high-speed disk osd are 10 times/5 minutes and 6 times/5 minutes respectively, and the frequency threshold set under the system is 8 times/5 minutes, then the access frequency of file a exceeds the set frequency. The set frequency threshold triggers the migration mechanism of the disk content. For file b, its access frequency is lower than the set frequency threshold, and the file b is marked as invalid. delete operation.
104、基于文件的读取操作记录,对本地磁盘存储池中的文件进行文件过滤,将过滤得到的文件传输至高速磁盘存储池中,当接收到过滤得到的文件的读取请求时,重定向至过滤得到的文件进行读取;104. Based on the read operation record of the file, file filtering is performed on the files in the local disk storage pool, and the filtered files are transferred to the high-speed disk storage pool. When a read request for the filtered files is received, redirection is performed. Read to the filtered file;
为了降低本地机械磁盘的访问压力,尽可能将访问频繁的文件转移到高性能的高速磁盘上进行读取。本实施例中通过加速检测确定一个文件是否属于频繁访问,本实例例中通过统计访问次数与阈值比较的方式确定。通过该方式将本地磁盘存储池中的访问频繁文件过滤出来并将它们传输到高性能的高速磁盘存储池中,并在内存中记录这些文件传输到高速磁盘存储池中的位置,当后续对这些文件进行访问读取时,直接读取内存中保存的位置记录,在位置记录中的地址中查找到该文件并进行读取。To reduce the access pressure on the local mechanical disk, transfer frequently accessed files to high-performance high-speed disks for reading as much as possible. In this embodiment, it is determined by accelerated detection whether a file is frequently accessed, and in this embodiment, it is determined by comparing the number of accesses with a threshold. In this way, the frequently accessed files in the local disk storage pool are filtered out and transferred to the high-performance high-speed disk storage pool, and the location where these files are transferred to the high-speed disk storage pool is recorded in memory. When the file is accessed and read, the location record saved in the memory is directly read, the file is found in the address in the location record and read.
可选的,在步骤104之后还包括:Optionally, after step 104, the method further includes:
接收文件覆盖写请求;Receive file overwrite request;
根据覆盖写请求,确定待覆盖的文件以及待覆盖的文件在内存中的存储位置记录;According to the overwrite request, determine the file to be overwritten and the storage location record of the file to be overwritten in the memory;
删除内存中待覆盖的文件的存储位置记录;Delete the storage location record of the file to be overwritten in the memory;
从高速磁盘存储池中删除待覆盖的文件并写入新的文件。Delete the file to be overwritten from the high-speed disk storage pool and write a new file.
在上述步骤中,若本地磁盘中的文件为访问频繁的文件,那么则将该文件传输一份到高速磁盘中,所以一份文件可能存在本地磁盘中也可能存在高速磁盘中,如果同时都存在就要注意文件一致性的问题。如果接收到覆盖写的请求时,那么这种情况下需要对本地磁盘存储池中的文件和高速磁盘存储池中的备份文件同时进行修改,否则用户可能读取到错误的数据。对于某个文件需要覆盖写这种情况,本实施例中通过在内存中将该文件的地址缓存记录删除,同时异步向高速磁盘(SSD)发出删除指令,删除该文件在高速磁盘(SSD)中存储的备份文件。In the above steps, if the file in the local disk is a frequently accessed file, then a copy of the file is transferred to the high-speed disk, so one file may exist in the local disk or in the high-speed disk, if both exist at the same time It is necessary to pay attention to the problem of file consistency. If an overwrite request is received, the files in the local disk storage pool and the backup files in the high-speed disk storage pool need to be modified at the same time, otherwise the user may read wrong data. For the situation that a certain file needs to be overwritten, in this embodiment, the address cache record of the file is deleted in the memory, and at the same time, a deletion instruction is sent to the high-speed disk (SSD) asynchronously, and the file is deleted in the high-speed disk (SSD). Stored backup files.
本申请实施例中,在传统的本地机械磁盘的基础上添加了一个高速磁盘作为辅助磁盘,高速磁盘辅助本地磁盘进行高频访问文件的存储和读取,降低了本地磁盘访问的I/O数量,并且高速磁盘拥有更好的性能,文件读取速度会更快,从而提升了本地文件访问性能。。In the embodiment of the present application, a high-speed disk is added as an auxiliary disk on the basis of the traditional local mechanical disk. , and the high-speed disk has better performance, the file reading speed will be faster, thus improving the local file access performance. .
请参阅图2,本申请实施例提供的海量小文件存取优化方法的另一个流程图,具体包括:Referring to FIG. 2, another flowchart of the method for optimizing access to massive small files provided by the embodiment of the present application specifically includes:
201、获取所有待存储文件预设期望访问频率参数,根据期望访问频率参数,将待存储文件划分为高频访问文件类和低频访问文件类;201. Acquire preset expected access frequency parameters of all files to be stored, and divide the files to be stored into high-frequency access file classes and low-frequency access file classes according to the expected access frequency parameters;
对于每一个待存储的文件来说,都存在一个对应的期望访问频率参数,该参数由是根据最近一段时间文件实际访问的次数算出,并通过文本进行了记录备份,例如规定将最近5分钟内文件的访问频次作为期望访问频率参数的确定方式,当需要获取这个参数的时候仅需要从记录文件中获取对应的数据字段。其次,将获取到的期望访问频率参数与预先设定的阈值进行数值大小的比较,根据比较的结果为对应的文件添加相应的标识,不限于文件名标识、数据标识等方式,最终确定文件所属的类别。For each file to be stored, there is a corresponding expected access frequency parameter, which is calculated according to the actual number of file accesses in the recent period, and is recorded and backed up by text. The access frequency of the file is used as a way to determine the parameter of the expected access frequency. When this parameter needs to be obtained, only the corresponding data field needs to be obtained from the record file. Secondly, compare the value of the obtained expected access frequency parameter with the preset threshold, and add a corresponding identification to the corresponding file according to the comparison result, not limited to file name identification, data identification and other methods, and finally determine the file to which it belongs. category.
例如,获取到记录文件中评估的文件a的期望访问频率为0.6,文件b的期望访问频率为0.3,预先设定的阈值为0.5,将文件a的期望访问频率、文件b的期望访问频率与阈值0.5进行数值比较,比较结果为文件a的期望访问频率大于阈值0.5,文件b的期望访问频率小于阈值0.5,根据此结果为文件a添加文件名标识,例如添加文件名前缀A-,为文件b添加文件添加文件名标识B-,最终得到两类文件:A类:文件A-a,B类:文件B-b。For example, the expected access frequency of file a evaluated in the obtained record file is 0.6, the expected access frequency of file b is 0.3, and the preset threshold is 0.5. The threshold value is 0.5 for numerical comparison. The comparison result is that the expected access frequency of file a is greater than the threshold value of 0.5, and the expected access frequency of file b is less than the threshold value of 0.5. According to this result, add a file name identifier to file a, such as adding the file name prefix A-, for the file b Add a file and add the file name identifier B-, and finally get two types of files: Type A: file A-a, Type B: file B-b.
202、将高频访问文件类中的文件顺序写入高速磁盘存储池中的存储基元,将低频访问文件类中的文件顺序写入本地磁盘存储池中的存储基元;202. Write the files in the high-frequency access file class to the storage primitives in the high-speed disk storage pool in sequence, and write the files in the low-frequency access file class to the storage primitives in the local disk storage pool in sequence;
存储基元是用于存储数据的一个基本存储单位。在本实施例中存储基元指存储磁盘上的存储块。接收待存储的文件后,可随机选择一个存储基元进行写入操作,也可按照一定的“选择规则”有针对性地选择一个“恰当”的存储基元进行写入操作。这里的“选择规则”可以体现为多种具体形式,比如根据存储基元的存储空间占用情况,选择空闲空间较大的存储基元用于写入文件,还比如根据存储基元的访问繁忙程度、数据接收处理能力等负载信息,选择一个负载较轻的存储基元用于写入文件等等,通过这些“选择规则”实现写入的负载均衡。当将存储基元的负载信息作为存储基元的选择决定因素时,为了提高存储基元的选择效率,可另行维护一个高速缓存,该缓存实时地收集各存储基元的负载信息,收集方式可以是存储基元负载发生变化后即主动上报自己的负载情况,也可以由缓存周期性地发起负载询问请求,由存储基元返回自身的负载信息,当有文件需要写入文件时,先查询缓存内存储的存储基元的负载情况,根据查询结果选择一个负载较轻的存储基元,作为写入文件的存储基元。实际应用过程中,可通过DataService(数据管理服务)程序执行写操作。值得注意的是:本申请实施例在将接收的文件写入存储基元时以顺序的方式写入, 以便后续操作准确获得文件在文件组内的序号。A storage primitive is a basic unit of storage used to store data. In this embodiment, storage primitives refer to storage blocks on storage disks. After receiving the file to be stored, a storage primitive can be randomly selected for the writing operation, or an "appropriate" storage primitive can be selected for the writing operation according to a certain "selection rule". The "selection rule" here can be embodied in various specific forms. For example, according to the storage space occupancy of the storage primitives, the storage primitives with larger free space are selected for writing files. For example, according to the access busyness of the storage primitives , data receiving and processing capacity and other load information, select a storage primitive with a lighter load for writing files, etc., and realize the load balancing of writing through these "selection rules". When the load information of storage primitives is used as the determining factor for the selection of storage primitives, in order to improve the selection efficiency of storage primitives, a cache can be maintained separately, which collects the load information of each storage primitive in real time. After the load of the storage primitive changes, it will actively report its own load status, or the cache can periodically initiate a load query request, and the storage primitive will return its own load information. When a file needs to be written to the file, the cache is first queried. According to the load condition of the storage primitives stored in the internal storage, a storage primitive with a lighter load is selected as the storage primitive for writing the file according to the query result. In the actual application process, the write operation can be performed through the DataService (data management service) program. It is worth noting that: the embodiment of the present application writes the received file in a sequential manner when writing the received file into the storage primitive, so that subsequent operations can accurately obtain the serial number of the file in the file group.
可以理解的是,由于高频文件类中的文件访问频率高,而文件访问频率高会为磁盘带来一定的读写压力,所以选择将高频文件类中的文件全部存储于高性能的高速磁盘存储池中;而对于低频文件类中的文件来说,它们对应的文件访问频率不会太高,本地磁盘存储池(机械盘)足以应付这样的压力,不会造成硬盘过热,系统卡顿等硬盘带来的系统异常状况。It is understandable that due to the high frequency of file access in the high-frequency file class, and the high file access frequency will bring a certain read and write pressure to the disk, so choose to store all the files in the high-frequency file class in a high-performance high-speed In the disk storage pool; for the files in the low-frequency file class, their corresponding file access frequency will not be too high, and the local disk storage pool (mechanical disk) is sufficient to cope with such pressure, and will not cause the hard disk to overheat and the system to freeze. Wait for the system abnormality caused by the hard disk.
203、根据文件在存储基元的起始地址和容量大小,确定文件所属的文件组以及文件在该文件组内的序号,文件组包含至少两个顺序存储的文件;203. Determine the file group to which the file belongs and the serial number of the file in the file group according to the starting address and capacity size of the file in the storage primitive, and the file group includes at least two sequentially stored files;
将文件写入存储基元,整个文件的存储过程并未完成,存储的目的在于访问,因此,需要建立访问时的路径。文件存储到存储基元后,将返回文件在存储基元上的起始地址和文件的容量大小,文件的容量大小可通过对文件的起始地址和结束地址作差得到,也可通过直接解析文件得到。获得文件的起始地址和容量大小后,即可将该起始地址和文件组的预设起始地址和容量大小进行比较,从而确定该文件存储的文件组的标识号,以及文件在该文件组内的序号。这里的“文件组”是对多个顺序存储的文件的统称,它对应一个虚拟的存储空间,它的预设起始地址也是该文件组内的第一个文件的起始地址,它的结束地址是该文件组内的最后一个文件的结束地址。When the file is written into the storage primitive, the storage process of the entire file is not completed, and the purpose of storage is to access, so it is necessary to establish a path when accessing. After the file is stored in the storage primitive, it will return the starting address of the file on the storage primitive and the size of the file. The size of the file can be obtained by making the difference between the starting address and the ending address of the file, or by direct analysis. file gets. After obtaining the starting address and capacity size of the file, the starting address can be compared with the preset starting address and capacity size of the file group, so as to determine the identification number of the file group stored in the file, and the file in the file. The sequence number within the group. The "file group" here is a general term for multiple sequentially stored files, it corresponds to a virtual storage space, its preset start address is also the start address of the first file in the file group, and its end The address is the ending address of the last file within the file group.
比如,文件1在存储基元1上占用的地址为1000~1500(为便于说明,这里以十进制表示地址空间),文件2在存储基元1上占用的地址为1501~1800,文件3在存储基元1上占用的地址为1801~2000,如果文件组1的预设大小为1000,则该文件组1内包括三个文件,该文件组的起始地址为第一个文件(文件1)的起始地址即1000,其结束地址为第三个文件(文件3)的结束地址即2000。在该例中,假设还有文件组2、文件组3,其预设存储空间分别为2001~2600、2601~3000。如果文件2写入到存储基元1后,通过将其起始地址和容量大小与文件组的起始地址和容量大小进行比较,即可判断文件2所在的文件组标识号,即文件2属于文件组1。For example, the addresses occupied by file 1 on storage unit 1 are 1000-1500 (for the convenience of explanation, the address space is represented in decimal here), the addresses occupied by file 2 on storage unit 1 are 1501-1800, and the addresses occupied by file 3 in storage The address occupied on primitive 1 is 1801 to 2000. If the preset size of file group 1 is 1000, then the file group 1 includes three files, and the starting address of the file group is the first file (file 1) The start address is 1000, and its end address is the end address of the third file (file 3), which is 2000. In this example, it is assumed that there are file group 2 and file group 3, and their preset storage spaces are 2001-2600 and 2601-3000 respectively. If file 2 is written into storage primitive 1, by comparing its starting address and capacity with the starting address and capacity of the file group, the identification number of the file group where file 2 is located can be determined, that is, the file 2 belongs to Filegroup 1.
类似的,可获得文件在该文件组内的序号,该序号可以直接表现为文件的起始地址相对于文件组起始地址的偏移量,如本例中文件2的序号为1501,由于文件顺序写入存储基元,所以该序号逐步增加,不会发生错乱。此外。文件的序号也可以编制为连续增长的自然序列,该自然序列与文件的偏移量存在对应关系。Similarly, the serial number of the file in the file group can be obtained, and the serial number can be directly expressed as the offset of the starting address of the file relative to the starting address of the file group. For example, the serial number of file 2 in this example is 1501. The memory primitives are written sequentially, so the sequence number increases incrementally without panic. also. The serial number of the file can also be programmed as a continuously increasing natural sequence, and the natural sequence has a corresponding relationship with the offset of the file.
204、以文件组的标识号与文件的序号为索引,建立索引与文件的文件名之间的对应关系。204. Using the identification number of the file group and the serial number of the file as an index, establish a corresponding relationship between the index and the file name of the file.
通过上述步骤得到文件所属文件组的标识号与文件在该组内的序号后,即可以“文件组标识号与文件序号”为索引,建立其与文件的文件名之间的对应关系。具体的索引可表现为:“文件组标识号+文件序号”、“文件序号+文件组标识号”等等。比如,文件2的所属文件组标识号为1,文件1在该文件组内的序号为1501,则可建立“11501”与文件2之间的索引表。如果进行文件1的访问,先查询该索引表,得到文件2对应的索引“11501”,第一个数字“1”解析为文件组号,第二串数字“1501”解析为文件1在文件组内的序号,根据这两个参数即可从存储基元上读取文件2,从而实现访问过程。构建出“文件组的标识号与文件的序号”与文件的文件名之间的索引后,文件的存储过程结束。After obtaining the identification number of the file group to which the file belongs and the serial number of the file in the group through the above steps, the "file group identification number and file serial number" can be used as an index to establish a corresponding relationship with the file name of the file. The specific index can be expressed as: "file group identification number + file serial number", "file serial number + file group identification number", and so on. For example, if the identification number of the file group to which file 2 belongs is 1, and the serial number of file 1 in the file group is 1501, an index table between "11501" and file 2 can be established. If accessing file 1, first query the index table to get the index "11501" corresponding to file 2, the first number "1" is resolved as the file group number, and the second string of numbers "1501" is resolved as file 1 in the file group The serial number inside, according to these two parameters, the file 2 can be read from the storage primitive, so as to realize the access process. After the index between "the identification number of the file group and the serial number of the file" and the file name of the file is constructed, the storage process of the file ends.
205、基于磁盘阵列技术为高速磁盘存储池设定独立冗余磁盘阵列结构;205. Set up an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology;
206、获取文件的读取请求,并根据读取请求,在存储池中执行相应的文件读取操作,并生成文件的读取操作记录;206. Obtain the read request of the file, and according to the read request, perform a corresponding file read operation in the storage pool, and generate a read operation record of the file;
207、基于文件的读取操作记录,对本地磁盘存储池中的文件进行文件过滤,将过滤得到的文件传输至高速磁盘存储池中,当接收到过滤得到的文件的读取请求时,重定向至过滤得到的文件进行读取。207. Based on the read operation record of the file, perform file filtering on the files in the local disk storage pool, transfer the filtered files to the high-speed disk storage pool, and redirect the file when a read request for the filtered files is received. Read to the filtered file.
本申请实施例中,详细描述了待存储文件的划分以及存储方法。通过将待存储文件按照期望访问频率参数划分为高低频访问文件后储池于不同的磁盘,实现了文件的分类存储,不仅方便了文件的读取,从而提升文件读取速度,而且减少了本地磁盘的I/O读取数量,提升了本地磁盘性能。In the embodiments of the present application, the division of files to be stored and the storage method are described in detail. By dividing the files to be stored into high and low frequency access files and storing them on different disks according to the expected access frequency parameters, the classified storage of files is realized, which not only facilitates the reading of files, thereby improving the reading speed of files, and reduces the local The number of I/O reads from the disk improves local disk performance.
请参阅图3,本申请实施例提供的海量小文件存取优化方法的第三个流程图,具体包括:Referring to FIG. 3, the third flowchart of the method for optimizing access to massive small files provided by the embodiment of the present application specifically includes:
301、根据预设的文件动态划分规则,将待存储的文件进行划分后存储到对应存储池,存储池包括本地磁盘存储池和高速磁盘存储池;301. According to a preset dynamic file division rule, divide the files to be stored and store them in a corresponding storage pool, where the storage pool includes a local disk storage pool and a high-speed disk storage pool;
302、基于磁盘阵列技术为高速磁盘存储池设定独立冗余磁盘阵列结构;302. Set an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology;
303、获取文件的读取请求,并根据读取请求,在存储池中执行相应的文件读取操作,并生成文件的读取操作记录;303. Acquire a read request of the file, and perform a corresponding file read operation in the storage pool according to the read request, and generate a read operation record of the file;
304、从文件的读取操作记录中,获取本地磁盘存储池中所有文件在预设时间段内的读取次数;304. Obtain the read times of all files in the local disk storage pool within a preset time period from the read operation record of the file;
本实例中,通过将高速磁盘(SSD)当做内存进行缓存数据,在内存中对本地磁盘存储池中的所有文件进行统计维护,每一次的文件访问记录都做缓存。In this example, by using a high-speed disk (SSD) as a memory to cache data, all files in the local disk storage pool are statistically maintained in the memory, and each file access record is cached.
305、若读取次数大于预设第一阈值,则将读取次数大于所述第一阈值的文件传输至高速磁盘存储池中,并在内存中生成该读取的文件在高速磁盘存储池中的存储位置记录;305. If the number of reads is greater than the preset first threshold, transfer the file whose number of reads is greater than the first threshold to the high-speed disk storage pool, and generate the read file in the memory in the high-speed disk storage pool. record of storage location;
当监测到某一文件在一段时间内的读取次数大于了预设的阈值时,即确定该文件属于访问频繁的文件,将该文件传输到高速磁盘(SSD)中,同时在内存中缓存该文件在高速磁盘(SSD)中的保存地址(逻辑卷+文件组标识号+文件序号)。When it is detected that the number of reads of a file in a period of time is greater than the preset threshold, it is determined that the file is a frequently accessed file, the file is transferred to a high-speed disk (SSD), and the file is cached in memory at the same time. The storage address of the file in the high-speed disk (SSD) (logical volume + file group identification number + file serial number).
306、当再次读取所述读取次数大于所述第一阈值的文件时,根据读取次数大于第一阈值的文件在高速磁盘存储池中的存储位置记录,直接重定向至对应的文件进行读取。306. When re-reading the file whose read times are greater than the first threshold, directly redirect to the corresponding file according to the storage location record of the file whose read times are greater than the first threshold in the high-speed disk storage pool. read.
当再次访问该文件时先在内存中查看是否存在该文件的地址缓存记录,如果存在地址缓存记录,则直接访问地址缓存记录中的文件地址,读取到该文件在高速磁盘中的备份文件。When accessing the file again, first check whether there is an address cache record of the file in the memory. If there is an address cache record, directly access the file address in the address cache record and read the backup file of the file in the high-speed disk.
本申请实施例中,详细描述了高频文件迁移到高速磁盘后的重定向读取过程。重定向读取基于高速磁盘进行读取,文件读取速度更快,且不影响本地磁盘的性能。In the embodiment of the present application, the redirected reading process after the high-frequency file is migrated to the high-speed disk is described in detail. Redirected reading is based on high-speed disk reading, and the file reading speed is faster without affecting the performance of the local disk.
上面对本申请实施例中海量小文件存取优化方法进行了描述,下面对本申请实施例中海量小文件存取优化装置进行描述,请参阅图4,本申请实施例中海量小文件存取优化装置的一个实施例包括:The method for optimizing access to massive small files in the embodiment of the present application has been described above, and the following describes the device for optimizing access to massive small files in the embodiment of the present application. Referring to FIG. 4 , the device for optimizing access to massive small files in the embodiment of the present application is described An example of includes:
文件存储模块401,用于根据预设的文件动态划分规则,将待存储的文件进行划分后存储到对应存储池,所述存储池包括本地磁盘存储池和高速磁盘存储池;The file storage module 401 is configured to divide the files to be stored and store them in a corresponding storage pool according to a preset dynamic file division rule, where the storage pool includes a local disk storage pool and a high-speed disk storage pool;
磁盘结构优化模块402,用于基于磁盘阵列技术为所述高速磁盘存储池设定独立冗余磁盘阵列结构;A disk structure optimization module 402, configured to set an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology;
文件读取模块403,用于获取文件的读取请求,并根据所述读取请求,在所述存储池中执行相应的文件读取操作,并生成文件的读取操作记录;The file reading module 403 is used to obtain the reading request of the file, and according to the reading request, execute the corresponding file reading operation in the storage pool, and generate the reading operation record of the file;
文件存取优化模块404,用于基于所述文件的读取操作记录,对所述本地磁盘存储池中的文件进行文件过滤,将筛选得到的文件传输至高速磁盘存储池中,当接收到所述筛选得到的文件的读取请求时,重定向至所述筛选得到的文件进行读取。The file access optimization module 404 is configured to perform file filtering on the files in the local disk storage pool based on the read operation records of the files, and transmit the filtered files to the high-speed disk storage pool. When the read request of the screened file is received, the screen is redirected to the screened file for reading.
本申请实施例中,在传统的本地机械磁盘的基础上添加了一个高速磁盘作为辅助磁盘,高速磁盘辅助本地磁盘进行高频访问文件的存储和读取,降低了本地磁盘访问的I/O数量,并且高速磁盘拥有更好的性能,文件读取速度会更快,从而提升了本地文件访问性能In the embodiment of the present application, a high-speed disk is added as an auxiliary disk on the basis of the traditional local mechanical disk. , and the high-speed disk has better performance, the file reading speed will be faster, thus improving the local file access performance
请参阅图5,本申请实施例中海量小文件存取优化装置的另一个实施例包括:Referring to FIG. 5 , another embodiment of the apparatus for optimizing access to massive small files in the embodiment of the present application includes:
文件存储模块401,用于根据预设的文件动态划分规则,将待存储的文件进行划分后存储到对应存储池,所述存储池包括本地磁盘存储池和高速磁盘存储池;The file storage module 401 is configured to divide the files to be stored and store them in a corresponding storage pool according to a preset dynamic file division rule, where the storage pool includes a local disk storage pool and a high-speed disk storage pool;
磁盘结构优化模块402,用于基于磁盘阵列技术为所述高速磁盘存储池设定独立冗余磁盘阵列结构;A disk structure optimization module 402, configured to set an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology;
文件读取模块403,用于获取文件的读取请求,并根据所述读取请求,在所述存储池中执行相应的文件读取操作,并生成文件的读取操作记录;The file reading module 403 is used to obtain the reading request of the file, and according to the reading request, execute the corresponding file reading operation in the storage pool, and generate the reading operation record of the file;
文件存取优化模块404,用于基于所述文件的读取操作记录,对所述本地磁盘存储池中的文件进行文件过滤,将筛选得到的文件传输至高速磁盘存储池中,当接收到所述筛选得到的文件的读取请求时,重定向至所述筛选得到的文件进行读取。The file access optimization module 404 is configured to perform file filtering on the files in the local disk storage pool based on the read operation records of the files, and transmit the filtered files to the high-speed disk storage pool. When the read request of the screened file is received, the screen is redirected to the screened file for reading.
可选的,所述文件存储模块401包括:Optionally, the file storage module 401 includes:
文件划分单元4011,用于获取所有待存储文件预设期望访问频率参数,根据所述期望访问频率参数,将所述待存储文件划分为高频访问文件类和低频访问文件类;The file division unit 4011 is used to obtain preset expected access frequency parameters of all files to be stored, and according to the expected access frequency parameters, the to-be-stored files are divided into high-frequency access file classes and low-frequency access file classes;
文件写入单元4012,用于将所述高频访问文件类中的文件顺序写入所述高速磁盘存储池中的存储基元,将所述低频访问文件类中的文件顺序写入所述本地磁盘存储池中的存储基元;A file writing unit 4012, configured to sequentially write the files in the high-frequency access file class into the storage primitives in the high-speed disk storage pool, and sequentially write the files in the low-frequency access file class into the local storage primitives in disk storage pools;
索引获取单元4013,用于根据所述文件在所述存储基元的起始地址和容量大小,确定所述文件所属的文件组以及文件在该文件组内的序号,所述文件组包含至少两个顺序存储的文件;The index obtaining unit 4013 is used to determine the file group to which the file belongs and the serial number of the file in the file group according to the starting address and capacity size of the file in the storage primitive, and the file group includes at least two sequentially stored files;
索引关联单元4014,用于以所述文件组的标识号与文件的序号为索引,建立所述索引与文件的文件名之间的对应关系。The index association unit 4014 is configured to use the identification number of the file group and the serial number of the file as an index to establish a corresponding relationship between the index and the file name of the file.
可选的,所述文件读取模块403包括:Optionally, the file reading module 403 includes:
请求获取单元4031,用于获取文件读取请求,其中,所述文件读取请求中包括待读取文件所处的目标虚拟磁盘分区及所述待读取文件的虚拟逻辑地址;The request obtaining unit 4031 is used to obtain a file read request, wherein the file read request includes the target virtual disk partition where the file to be read is located and the virtual logical address of the file to be read;
分区获取单元4032,用于根据虚拟磁盘分区与物理磁盘分区的映射关系,确定所述目标虚拟磁盘分区对应的目标物理磁盘分区;A partition obtaining unit 4032, configured to determine the target physical disk partition corresponding to the target virtual disk partition according to the mapping relationship between the virtual disk partition and the physical disk partition;
文件读取单元4033,用于根据所述待读取文件的虚拟逻辑地址和所述文件读取请求,在所述目标物理磁盘分区执行相应的文件读取操作,并生成文件的读取操作记录。The file reading unit 4033 is configured to perform a corresponding file reading operation on the target physical disk partition according to the virtual logical address of the file to be read and the file reading request, and generate a file reading operation record .
可选的,所述文件存取优化模块404包括:Optionally, the file access optimization module 404 includes:
读取次数获取单元4041,用于从所述文件的读取操作记录中,获取所述本地磁盘存储池中所有文件在预设时间段内的读取次数;The reading times obtaining unit 4041 is used to obtain the reading times of all files in the local disk storage pool within a preset time period from the reading operation records of the files;
数据传输单元4042,用于若所述读取次数大于预设第一阈值,则将读取次数大于所述第一阈值的文件传输至所述高速磁盘存储池中,并在内存中生成该读取的文件在高速磁盘存储池中的存储位置记录;The data transmission unit 4042 is configured to transmit the files whose read times are greater than the first threshold to the high-speed disk storage pool if the read times are greater than the preset first threshold, and generate the read times in memory. The storage location record of the fetched file in the high-speed disk storage pool;
重定向读取单元4043,用于当再次读取所述读取次数大于所述第一阈值的文件时,根据所述读取次数大于所述第一阈值的文件在高速磁盘存储池中的存储位置记录,直接重定向至对应的文件进行读取。The redirecting reading unit 4043 is configured to, when reading the files whose reading times are greater than the first threshold, store the files in the high-speed disk storage pool according to the reading times greater than the first threshold The location record is directly redirected to the corresponding file for reading.
本申请实施例中,通过将待存储文件按照期望访问频率参数划分为高低频访问文件后储池于不同的磁盘,按照划分结果存储于不同磁盘方便文件读取。同时,重定向读取基于高速磁盘进行读取,文件读取速度更快,且不影响本地磁盘的性能。In the embodiment of the present application, the files to be stored are divided into high and low frequency access files according to expected access frequency parameters, and then stored in different disks, and stored in different disks according to the division results to facilitate file reading. At the same time, redirected reading is based on high-speed disk reading, and the file reading speed is faster without affecting the performance of the local disk.
上面图4至图5从模块化功能实体的角度对本申请实施例中的海量小文件存取优化装置进行详细描述,下面从硬件处理的角度对本申请实施例中海量小文件存取优化设备进行 详细描述。4 to 5 above describe in detail the apparatus for optimizing access to a large number of small files in the embodiment of the present application from the perspective of a modular functional entity, and the following describes the device for optimizing access to a large number of small files in the embodiment of the present application in detail from the perspective of hardware processing. describe.
图6是本申请实施例提供的一种海量小文件存取优化设备的结构示意图,该海量小文件存取优化设备600可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)610(例如,一个或一个以上处理器)和存储器620,一个或一个以上存储应用程序633或数据632的存储介质630(例如一个或一个以上海量存储设备)。其中,存储器620和存储介质630可以是短暂存储或持久存储。存储在存储介质630的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对海量小文件存取优化设备600中的一系列指令操作。更进一步地,处理器610可以设置为与存储介质630通信,在医疗领域意图识别设备600上执行存储介质630中的一系列指令操作。6 is a schematic structural diagram of a device for optimizing access to a large number of small files provided by an embodiment of the present application. The device 600 for optimizing access to a large number of small files may vary greatly due to different configurations or performances, and may include one or more than one Central processing units (CPU) 610 (eg, one or more processors) and memory 620, one or more storage media 630 (eg, one or more mass storage devices) that store application programs 633 or data 632. Among them, the memory 620 and the storage medium 630 may be short-term storage or persistent storage. The program stored in the storage medium 630 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the apparatus 600 for optimizing access to massive small files. Furthermore, the processor 610 may be configured to communicate with the storage medium 630 to execute a series of instruction operations in the storage medium 630 on the medical field intent recognition device 600 .
海量小文件存取优化设备600还可以包括一个或一个以上电源640,一个或一个以上有线或无线网络接口650,一个或一个以上输入输出接口660,和/或,一个或一个以上操作系统631,例如Windows Serve,Mac OS X,Unix,Linux,FreeBSD等等。本领域技术人员可以理解,图6示出的海量小文件存取优化设备结构并不构成对海量小文件存取优化设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Mass small file access optimization device 600 may also include one or more power supplies 640, one or more wired or wireless network interfaces 650, one or more input and output interfaces 660, and/or, one or more operating systems 631, For example Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art can understand that the structure of the massive small file access optimization device shown in FIG. 6 does not constitute a limitation on the massive small file access optimization device, and may include more or less components than those shown in the figure, or a combination of certain some components, or a different arrangement of components.
本申请还提供一种海量小文件存取优化设备,包括:存储器和至少一个处理器,所述存储器中存储有指令,所述存储器和所述至少一个处理器通过线路互连;所述至少一个处理器调用所述存储器中的所述指令,以使得所述海量小文件存取优化设备执行上述海量小文件存取优化方法中的步骤。The present application also provides a device for optimizing access to massive small files, including: a memory and at least one processor, where instructions are stored in the memory, and the memory and the at least one processor are interconnected through a line; the at least one processor The processor invokes the instructions in the memory, so that the device for optimizing access to a large number of small files executes the steps in the above method for optimizing access to a large number of small files.
本申请还提供一种计算机可读存储介质,该计算机可读存储介质可以为非易失性计算机可读存储介质,也可以为易失性计算机可读存储介质。计算机可读存储介质存储有计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:The present application also provides a computer-readable storage medium, and the computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium. The computer-readable storage medium stores computer instructions, and when the computer instructions are executed on the computer, the computer performs the following steps:
根据预设的文件动态划分规则,将待存储的文件进行划分后存储到对应存储池,所述存储池包括本地磁盘存储池和高速磁盘存储池;According to the preset file dynamic division rule, the files to be stored are divided and stored in the corresponding storage pool, the storage pool includes a local disk storage pool and a high-speed disk storage pool;
基于磁盘阵列技术为所述高速磁盘存储池设定独立冗余磁盘阵列结构;Set up an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology;
获取文件的读取请求,并根据所述读取请求,在所述存储池中执行相应的文件读取操作,并生成文件的读取操作记录;Obtain a read request of a file, and perform a corresponding file read operation in the storage pool according to the read request, and generate a file read operation record;
基于所述文件的读取操作记录,对所述本地磁盘存储池中的文件进行文件过滤,将过滤得到的文件传输至高速磁盘存储池中,当接收到所述过滤得到的文件的读取请求时,重定向至所述过滤得到的文件进行读取。Based on the read operation records of the files, file filtering is performed on the files in the local disk storage pool, and the filtered files are transferred to the high-speed disk storage pool. When a read request for the filtered files is received , redirect to the filtered file for reading.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the system, device and unit described above may refer to the corresponding process in the foregoing method embodiments, which will not be repeated here.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修 改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: The technical solutions recorded in the embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the present application.

Claims (20)

  1. 一种海量小文件存取优化方法,包括:A method for optimizing access to massive small files, comprising:
    根据预设的文件动态划分规则,将待存储的文件进行划分后存储到对应存储池,所述存储池包括本地磁盘存储池和高速磁盘存储池;According to the preset file dynamic division rule, the files to be stored are divided and stored in the corresponding storage pool, the storage pool includes a local disk storage pool and a high-speed disk storage pool;
    基于磁盘阵列技术为所述高速磁盘存储池设定独立冗余磁盘阵列结构;Set up an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology;
    获取文件的读取请求,并根据所述读取请求,在所述存储池中执行相应的文件读取操作,并生成文件的读取操作记录;Obtain a read request of a file, and perform a corresponding file read operation in the storage pool according to the read request, and generate a file read operation record;
    基于所述文件的读取操作记录,对所述本地磁盘存储池中的文件进行文件过滤,将过滤得到的文件传输至高速磁盘存储池中,当接收到所述过滤得到的文件的读取请求时,重定向至所述过滤得到的文件进行读取。Based on the read operation records of the files, file filtering is performed on the files in the local disk storage pool, and the filtered files are transferred to the high-speed disk storage pool. When a read request for the filtered files is received , redirect to the filtered file for reading.
  2. 根据权利要求1所述的海量小文件存取优化方法,其中,所述根据预设的文件动态划分规则,将待存储的文件进行划分后存储到对应的存储池包括:The method for optimizing access to a large number of small files according to claim 1, wherein, according to a preset dynamic file division rule, dividing the to-be-stored file and storing it in a corresponding storage pool comprises:
    获取所有待存储文件预设期望访问频率参数,根据所述期望访问频率参数,将所述待存储文件划分为高频访问文件类和低频访问文件类;Acquiring preset expected access frequency parameters of all files to be stored, and dividing the to-be-stored files into high-frequency access file classes and low-frequency access file classes according to the expected access frequency parameters;
    将所述高频访问文件类中的文件顺序写入所述高速磁盘存储池中的存储基元,将所述低频访问文件类中的文件顺序写入所述本地磁盘存储池中的存储基元;Write files in the high-frequency access file class to storage primitives in the high-speed disk storage pool sequentially, and sequentially write files in the low-frequency access file class to storage primitives in the local disk storage pool ;
    根据所述文件在所述存储基元的起始地址和容量大小,确定所述文件所属的文件组以及文件在该文件组内的序号,所述文件组包含至少两个顺序存储的文件;Determine the file group to which the file belongs and the serial number of the file in the file group according to the starting address and capacity size of the file in the storage primitive, and the file group contains at least two sequentially stored files;
    以所述文件组的标识号与文件的序号为索引,建立所述索引与文件的文件名之间的对应关系。Using the identification number of the file group and the serial number of the file as an index, a corresponding relationship between the index and the file name of the file is established.
  3. 根据权利要求1所述的海量小文件存取优化方法,其中,所述获取文件的读取请求,并根据所述读取请求,在所述存储池中执行相应的文件读取操作,并生成文件的读取操作记录包括:The method for optimizing access to a large number of small files according to claim 1, wherein the obtaining a read request of a file, and according to the read request, execute a corresponding file read operation in the storage pool, and generate The file read operation records include:
    获取文件读取请求,其中,所述文件读取请求中包括待读取文件所处的目标虚拟磁盘分区及所述待读取文件的虚拟逻辑地址;Obtaining a file read request, wherein the file read request includes the target virtual disk partition where the file to be read is located and the virtual logical address of the file to be read;
    根据虚拟磁盘分区与物理磁盘分区的映射关系,确定所述目标虚拟磁盘分区对应的目标物理磁盘分区;According to the mapping relationship between the virtual disk partition and the physical disk partition, determine the target physical disk partition corresponding to the target virtual disk partition;
    根据所述待读取文件的虚拟逻辑地址和所述文件读取请求,在所述目标物理磁盘分区执行相应的文件读取操作,并生成文件的读取操作记录。According to the virtual logical address of the file to be read and the file read request, a corresponding file read operation is performed on the target physical disk partition, and a file read operation record is generated.
  4. 根据权利要求3所述的海量小文件存取优化方法,其中,在所述根据所述待读取文件的虚拟逻辑地址和所述文件读取请求,在所述目标物理磁盘分区执行相应的文件读取操作,并生成文件的读取操作记录之后,还包括:The method for optimizing access to a large number of small files according to claim 3, wherein, according to the virtual logical address of the file to be read and the file read request, execute the corresponding file in the target physical disk partition After the read operation and the read operation record of the file is generated, it also includes:
    获取所述高速磁盘存储池中的每个文件的访问频率;Obtain the access frequency of each file in the high-speed disk storage pool;
    判断各文件的访问频率是否小于阈值;Determine whether the access frequency of each file is less than the threshold;
    若所述访问频率小于预置阈值,则删除访问频率小于阈值对应的文件。If the access frequency is less than the preset threshold, the files corresponding to the access frequency less than the threshold are deleted.
  5. 根据权利要求1-4中任一项所述的海量小文件存取优化方法,其中,所述基于所述文件的读取操作记录对所述本地磁盘存储池中的文件进行文件过滤,将过滤得到的文件传输到高速磁盘存储池中,当接收到所述过滤得到的文件的读取请求时,重定向至所述过滤得到的文件进行读取包括:The method for optimizing access to massive small files according to any one of claims 1-4, wherein the file filtering is performed on the files in the local disk storage pool based on the read operation records of the files, and the filtering The obtained file is transferred to the high-speed disk storage pool, and when receiving the read request of the file obtained by the filtering, redirecting to the file obtained by the filtering for reading includes:
    从所述文件的读取操作记录中,获取所述本地磁盘存储池中所有文件在预设时间段内的读取次数;From the read operation record of the file, obtain the read times of all files in the local disk storage pool within a preset time period;
    若所述读取次数大于预设第一阈值,则将读取次数大于所述第一阈值的文件传输至所述高速磁盘存储池中,并在内存中生成该读取的文件在高速磁盘存储池中的存储位置记录;If the number of reads is greater than the preset first threshold, the files whose number of reads is greater than the first threshold are transferred to the high-speed disk storage pool, and the read files are generated in memory and stored in the high-speed disk Storage location records in the pool;
    当再次读取所述读取次数大于所述第一阈值的文件时,根据所述读取次数大于所述第 一阈值的文件在高速磁盘存储池中的存储位置记录,直接重定向至对应的文件进行读取。When reading the file whose number of reads is greater than the first threshold again, the file is directly redirected to the corresponding file to read.
  6. 根据权利要求5所述的海量小文件存取优化方法,其中,在当再次读取所述读取次数大于所述第一阈值的文件时,根据所述读取次数大于所述第一阈值的文件在高速磁盘存储池中的存储位置记录,直接重定向至对应的文件进行读取之后,还包括:The method for optimizing access to a large number of small files according to claim 5, wherein when reading the file whose number of readings is greater than the first threshold The storage location record of the file in the high-speed disk storage pool, after being directly redirected to the corresponding file for reading, also includes:
    接收文件覆盖写请求;Receive file overwrite request;
    根据所述覆盖写请求,确定待覆盖的文件以及所述待覆盖的文件在内存中的存储位置记录;According to the overwriting request, determine the file to be overwritten and the storage location record of the file to be overwritten in the memory;
    删除内存中所述待覆盖的文件的存储位置记录;Delete the storage location record of the file to be overwritten in the memory;
    从所述高速磁盘存储池中删除所述待覆盖的文件并写入新的文件。The file to be overwritten is deleted from the high-speed disk storage pool and a new file is written.
  7. 一种海量小文件存取优化设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:A device for optimizing access to massive small files, comprising a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, and implemented when the processor executes the computer-readable instructions Follow the steps below:
    根据预设的文件动态划分规则,将待存储的文件进行划分后存储到对应存储池,所述存储池包括本地磁盘存储池和高速磁盘存储池;According to the preset file dynamic division rule, the files to be stored are divided and stored in the corresponding storage pool, the storage pool includes a local disk storage pool and a high-speed disk storage pool;
    基于磁盘阵列技术为所述高速磁盘存储池设定独立冗余磁盘阵列结构;Set up an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology;
    获取文件的读取请求,并根据所述读取请求,在所述存储池中执行相应的文件读取操作,并生成文件的读取操作记录;Obtain a read request of a file, and perform a corresponding file read operation in the storage pool according to the read request, and generate a file read operation record;
    基于所述文件的读取操作记录,对所述本地磁盘存储池中的文件进行文件过滤,将过滤得到的文件传输至高速磁盘存储池中,当接收到所述过滤得到的文件的读取请求时,重定向至所述过滤得到的文件进行读取。Based on the read operation records of the files, file filtering is performed on the files in the local disk storage pool, and the filtered files are transferred to the high-speed disk storage pool. When a read request for the filtered files is received , redirect to the filtered file for reading.
  8. 根据权利要求7所述的海量小文件存取优化设备,其中,所述根据预设的文件动态划分规则,将待存储的文件进行划分后存储到对应的存储池包括:The device for optimizing access to a large number of small files according to claim 7, wherein the dividing the files to be stored and storing them in a corresponding storage pool according to a preset dynamic file division rule comprises:
    获取所有待存储文件预设期望访问频率参数,根据所述期望访问频率参数,将所述待存储文件划分为高频访问文件类和低频访问文件类;Acquiring preset expected access frequency parameters of all files to be stored, and dividing the to-be-stored files into high-frequency access file classes and low-frequency access file classes according to the expected access frequency parameters;
    将所述高频访问文件类中的文件顺序写入所述高速磁盘存储池中的存储基元,将所述低频访问文件类中的文件顺序写入所述本地磁盘存储池中的存储基元;Write files in the high-frequency access file class to storage primitives in the high-speed disk storage pool sequentially, and sequentially write files in the low-frequency access file class to storage primitives in the local disk storage pool ;
    根据所述文件在所述存储基元的起始地址和容量大小,确定所述文件所属的文件组以及文件在该文件组内的序号,所述文件组包含至少两个顺序存储的文件;Determine the file group to which the file belongs and the serial number of the file in the file group according to the starting address and capacity size of the file in the storage primitive, and the file group contains at least two sequentially stored files;
    以所述文件组的标识号与文件的序号为索引,建立所述索引与文件的文件名之间的对应关系。Using the identification number of the file group and the serial number of the file as an index, a corresponding relationship between the index and the file name of the file is established.
  9. 根据权利要求7所述的海量小文件存取优化设备,其中,所述获取文件的读取请求,并根据所述读取请求,在所述存储池中执行相应的文件读取操作,并生成文件的读取操作记录包括:The device for optimizing access to massive small files according to claim 7, wherein the acquiring a read request of a file, and according to the read request, execute a corresponding file read operation in the storage pool, and generate The file read operation records include:
    获取文件读取请求,其中,所述文件读取请求中包括待读取文件所处的目标虚拟磁盘分区及所述待读取文件的虚拟逻辑地址;Obtaining a file read request, wherein the file read request includes the target virtual disk partition where the file to be read is located and the virtual logical address of the file to be read;
    根据虚拟磁盘分区与物理磁盘分区的映射关系,确定所述目标虚拟磁盘分区对应的目标物理磁盘分区;According to the mapping relationship between the virtual disk partition and the physical disk partition, determine the target physical disk partition corresponding to the target virtual disk partition;
    根据所述待读取文件的虚拟逻辑地址和所述文件读取请求,在所述目标物理磁盘分区执行相应的文件读取操作,并生成文件的读取操作记录。According to the virtual logical address of the file to be read and the file read request, a corresponding file read operation is performed on the target physical disk partition, and a file read operation record is generated.
  10. 根据权利要求9所述的海量小文件存取优化设备,其中,在所述根据所述待读取文件的虚拟逻辑地址和所述文件读取请求,在所述目标物理磁盘分区执行相应的文件读取操作,并生成文件的读取操作记录之后,还包括:The device for optimizing access to a large number of small files according to claim 9, wherein, according to the virtual logical address of the to-be-read file and the file read request, execute the corresponding file on the target physical disk partition After the read operation and the read operation record of the file is generated, it also includes:
    获取所述高速磁盘存储池中的每个文件的访问频率;Obtain the access frequency of each file in the high-speed disk storage pool;
    判断各文件的访问频率是否小于阈值;Determine whether the access frequency of each file is less than the threshold;
    若所述访问频率小于预置阈值,则删除访问频率小于阈值对应的文件。If the access frequency is less than the preset threshold, the files corresponding to the access frequency less than the threshold are deleted.
  11. 根据权利要求7-10中任一项所述的海量小文件存取优化设备,其中,所述基于所述文件的读取操作记录对所述本地磁盘存储池中的文件进行文件过滤,将过滤得到的文件传输到高速磁盘存储池中,当接收到所述过滤得到的文件的读取请求时,重定向至所述过滤得到的文件进行读取包括:The device for optimizing access to massive small files according to any one of claims 7-10, wherein the file filtering is performed on the files in the local disk storage pool based on the read operation records of the files, and the filtering The obtained file is transferred to the high-speed disk storage pool, and when receiving the read request of the file obtained by the filtering, redirecting to the file obtained by the filtering for reading includes:
    从所述文件的读取操作记录中,获取所述本地磁盘存储池中所有文件在预设时间段内的读取次数;From the read operation record of the file, obtain the read times of all files in the local disk storage pool within a preset time period;
    若所述读取次数大于预设第一阈值,则将读取次数大于所述第一阈值的文件传输至所述高速磁盘存储池中,并在内存中生成该读取的文件在高速磁盘存储池中的存储位置记录;If the number of reads is greater than the preset first threshold, the files whose number of reads is greater than the first threshold are transferred to the high-speed disk storage pool, and the read files are generated in memory and stored in the high-speed disk Storage location records in the pool;
    当再次读取所述读取次数大于所述第一阈值的文件时,根据所述读取次数大于所述第一阈值的文件在高速磁盘存储池中的存储位置记录,直接重定向至对应的文件进行读取。When reading the file whose number of reads is greater than the first threshold again, the file is directly redirected to the corresponding file to read.
  12. 根据权利要求11所述的海量小文件存取优化设备,其中,在当再次读取所述读取次数大于所述第一阈值的文件时,根据所述读取次数大于所述第一阈值的文件在高速磁盘存储池中的存储位置记录,直接重定向至对应的文件进行读取之后,还包括:The device for optimizing access to a large number of small files according to claim 11, wherein when the file whose number of readings is greater than the first threshold is read again, according to the number of readings greater than the first threshold The storage location record of the file in the high-speed disk storage pool, after being directly redirected to the corresponding file for reading, also includes:
    接收文件覆盖写请求;Receive file overwrite request;
    根据所述覆盖写请求,确定待覆盖的文件以及所述待覆盖的文件在内存中的存储位置记录;According to the overwriting request, determine the file to be overwritten and the storage location record of the file to be overwritten in the memory;
    删除内存中所述待覆盖的文件的存储位置记录;Delete the storage location record of the file to be overwritten in the memory;
    从所述高速磁盘存储池中删除所述待覆盖的文件并写入新的文件。The file to be overwritten is deleted from the high-speed disk storage pool and a new file is written.
  13. 一种计算机可读存储介质,所述计算机可读存储介质中存储计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:A computer-readable storage medium, storing computer instructions in the computer-readable storage medium, when the computer instructions are executed on a computer, the computer is made to perform the following steps:
    根据预设的文件动态划分规则,将待存储的文件进行划分后存储到对应存储池,所述存储池包括本地磁盘存储池和高速磁盘存储池;According to the preset file dynamic division rule, the files to be stored are divided and stored in the corresponding storage pool, the storage pool includes a local disk storage pool and a high-speed disk storage pool;
    基于磁盘阵列技术为所述高速磁盘存储池设定独立冗余磁盘阵列结构;Set up an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology;
    获取文件的读取请求,并根据所述读取请求,在所述存储池中执行相应的文件读取操作,并生成文件的读取操作记录;Obtain a read request of a file, and perform a corresponding file read operation in the storage pool according to the read request, and generate a file read operation record;
    基于所述文件的读取操作记录,对所述本地磁盘存储池中的文件进行文件过滤,将过滤得到的文件传输至高速磁盘存储池中,当接收到所述过滤得到的文件的读取请求时,重定向至所述过滤得到的文件进行读取。Based on the read operation records of the files, file filtering is performed on the files in the local disk storage pool, and the filtered files are transferred to the high-speed disk storage pool. When a read request for the filtered files is received , redirect to the filtered file for reading.
  14. 根据权利要求13所述的计算机可读存储介质,其中,所述根据预设的文件动态划分规则,将待存储的文件进行划分后存储到对应的存储池包括:The computer-readable storage medium according to claim 13 , wherein, according to a preset dynamic file division rule, dividing the to-be-stored file and storing it in a corresponding storage pool comprises:
    获取所有待存储文件预设期望访问频率参数,根据所述期望访问频率参数,将所述待存储文件划分为高频访问文件类和低频访问文件类;Acquiring preset expected access frequency parameters of all files to be stored, and dividing the to-be-stored files into high-frequency access file classes and low-frequency access file classes according to the expected access frequency parameters;
    将所述高频访问文件类中的文件顺序写入所述高速磁盘存储池中的存储基元,将所述低频访问文件类中的文件顺序写入所述本地磁盘存储池中的存储基元;Write files in the high-frequency access file class to storage primitives in the high-speed disk storage pool sequentially, and sequentially write files in the low-frequency access file class to storage primitives in the local disk storage pool ;
    根据所述文件在所述存储基元的起始地址和容量大小,确定所述文件所属的文件组以及文件在该文件组内的序号,所述文件组包含至少两个顺序存储的文件;Determine the file group to which the file belongs and the serial number of the file in the file group according to the starting address and capacity size of the file in the storage primitive, and the file group contains at least two sequentially stored files;
    以所述文件组的标识号与文件的序号为索引,建立所述索引与文件的文件名之间的对应关系。Using the identification number of the file group and the serial number of the file as an index, a corresponding relationship between the index and the file name of the file is established.
  15. 根据权利要求13所述的计算机可读存储介质,其中,所述获取文件的读取请求,并根据所述读取请求,在所述存储池中执行相应的文件读取操作,并生成文件的读取操作记录包括:The computer-readable storage medium according to claim 13, wherein the obtaining a read request of a file, and according to the read request, perform a corresponding file read operation in the storage pool, and generate a file's read request. The read operation records include:
    获取文件读取请求,其中,所述文件读取请求中包括待读取文件所处的目标虚拟磁盘分区及所述待读取文件的虚拟逻辑地址;Obtaining a file read request, wherein the file read request includes the target virtual disk partition where the file to be read is located and the virtual logical address of the file to be read;
    根据虚拟磁盘分区与物理磁盘分区的映射关系,确定所述目标虚拟磁盘分区对应的目标物理磁盘分区;According to the mapping relationship between the virtual disk partition and the physical disk partition, determine the target physical disk partition corresponding to the target virtual disk partition;
    根据所述待读取文件的虚拟逻辑地址和所述文件读取请求,在所述目标物理磁盘分区执行相应的文件读取操作,并生成文件的读取操作记录。According to the virtual logical address of the file to be read and the file read request, a corresponding file read operation is performed on the target physical disk partition, and a file read operation record is generated.
  16. 根据权利要求15所述的计算机可读存储介质,其中,在所述根据所述待读取文件的虚拟逻辑地址和所述文件读取请求,在所述目标物理磁盘分区执行相应的文件读取操作,并生成文件的读取操作记录之后,还包括:The computer-readable storage medium according to claim 15, wherein, according to the virtual logical address of the file to be read and the file read request, a corresponding file read is performed on the target physical disk partition After the operation, and the read operation record of the file is generated, it also includes:
    获取所述高速磁盘存储池中的每个文件的访问频率;Obtain the access frequency of each file in the high-speed disk storage pool;
    判断各文件的访问频率是否小于阈值;Determine whether the access frequency of each file is less than the threshold;
    若所述访问频率小于预置阈值,则删除访问频率小于阈值对应的文件。If the access frequency is less than the preset threshold, the files corresponding to the access frequency less than the threshold are deleted.
  17. 根据权利要求13-16中任一项所述的计算机可读存储介质,其中,所述基于所述文件的读取操作记录对所述本地磁盘存储池中的文件进行文件过滤,将过滤得到的文件传输到高速磁盘存储池中,当接收到所述过滤得到的文件的读取请求时,重定向至所述过滤得到的文件进行读取包括:The computer-readable storage medium according to any one of claims 13-16, wherein the file filtering is performed on the files in the local disk storage pool based on the read operation record of the file, and the filtered The file is transferred to the high-speed disk storage pool, and when a read request of the filtered file is received, redirecting to the filtered file for reading includes:
    从所述文件的读取操作记录中,获取所述本地磁盘存储池中所有文件在预设时间段内的读取次数;From the read operation record of the file, obtain the read times of all files in the local disk storage pool within a preset time period;
    若所述读取次数大于预设第一阈值,则将读取次数大于所述第一阈值的文件传输至所述高速磁盘存储池中,并在内存中生成该读取的文件在高速磁盘存储池中的存储位置记录;If the number of reads is greater than the preset first threshold, the files whose number of reads is greater than the first threshold are transferred to the high-speed disk storage pool, and the read files are generated in memory and stored in the high-speed disk Storage location records in the pool;
    当再次读取所述读取次数大于所述第一阈值的文件时,根据所述读取次数大于所述第一阈值的文件在高速磁盘存储池中的存储位置记录,直接重定向至对应的文件进行读取。When reading the file whose number of reads is greater than the first threshold again, the file is directly redirected to the corresponding file to read.
  18. 根据权利要求17所述的计算机可读存储介质,其中,在当再次读取所述读取次数大于所述第一阈值的文件时,根据所述读取次数大于所述第一阈值的文件在高速磁盘存储池中的存储位置记录,直接重定向至对应的文件进行读取之后,还包括:The computer-readable storage medium according to claim 17, wherein, when the file whose number of readings is greater than the first threshold is read again, according to the file whose number of readings is greater than the first threshold The storage location record in the high-speed disk storage pool, after being directly redirected to the corresponding file for reading, also includes:
    接收文件覆盖写请求;Receive file overwrite request;
    根据所述覆盖写请求,确定待覆盖的文件以及所述待覆盖的文件在内存中的存储位置记录;According to the overwriting request, determine the file to be overwritten and the storage location record of the file to be overwritten in the memory;
    删除内存中所述待覆盖的文件的存储位置记录;Delete the storage location record of the file to be overwritten in the memory;
    从所述高速磁盘存储池中删除所述待覆盖的文件并写入新的文件。The file to be overwritten is deleted from the high-speed disk storage pool and a new file is written.
  19. 一种海量小文件存取优化装置,其中,所述海量小文件存取优化装置包括:An apparatus for optimizing access to a large number of small files, wherein the apparatus for optimizing access to a large number of small files includes:
    文件存储模块,用于根据预设的文件动态划分规则,将待存储的文件进行划分后存储到对应存储池,所述存储池包括本地磁盘存储池和高速磁盘存储池;a file storage module, configured to divide the files to be stored and store them in a corresponding storage pool according to a preset dynamic file division rule, where the storage pool includes a local disk storage pool and a high-speed disk storage pool;
    磁盘结构优化模块,用于基于磁盘阵列技术为所述高速磁盘存储池设定独立冗余磁盘阵列结构;A disk structure optimization module, used for setting an independent redundant disk array structure for the high-speed disk storage pool based on the disk array technology;
    文件读取模块,用于获取文件的读取请求,并根据所述读取请求,在所述存储池中执行相应的文件读取操作,并生成文件的读取操作记录;a file reading module, configured to obtain a reading request of a file, and perform a corresponding file reading operation in the storage pool according to the reading request, and generate a reading operation record of the file;
    文件存取优化模块,用于基于所述文件的读取操作记录,对所述本地磁盘存储池中的文件进行文件过滤,将过滤得到的文件传输至高速磁盘存储池中,当接收到所述过滤得到的文件的读取请求时,重定向至所述过滤得到的文件进行读取。The file access optimization module is used to filter the files in the local disk storage pool based on the read operation records of the files, and transmit the filtered files to the high-speed disk storage pool. When the read request of the file obtained by filtering is redirected to the file obtained by filtering for reading.
  20. 根据权利要求19所述的海量小文件存取优化装置,其中,所述文件存储模块具体包括:The apparatus for optimizing access to massive small files according to claim 19, wherein the file storage module specifically comprises:
    文件划分单元,用于获取所有待存储文件预设的期望访问频率参数,根据所述期望访问频率参数,将所述待存储文件划分为高频访问文件类和低频访问文件类;a file dividing unit, configured to obtain preset expected access frequency parameters of all files to be stored, and divide the to-be-stored files into high-frequency access file classes and low-frequency access file classes according to the expected access frequency parameters;
    文件写入单元,用于将所述高频访问文件类中的文件顺序写入所述高速磁盘存储池中的存储基元,将所述低频访问文件类中的文件顺序写入所述本地磁盘存储池中的存储基元;A file writing unit, configured to sequentially write the files in the high-frequency access file class to the storage primitives in the high-speed disk storage pool, and sequentially write the files in the low-frequency access file class to the local disk storage primitives in the storage pool;
    索引获取单元,用于根据所述文件在所述存储基元的起始地址和容量大小,确定所述文件所属的文件组以及文件在该文件组内的序号,所述文件组包含至少两个顺序存储的文件;The index obtaining unit is used to determine the file group to which the file belongs and the serial number of the file in the file group according to the starting address and capacity size of the file in the storage primitive, and the file group includes at least two files stored sequentially;
    索引关联单元,用于以所述文件组的标识号与文件的序号为索引,建立所述索引与文件的文件名之间的对应关系。The index association unit is configured to use the identification number of the file group and the serial number of the file as an index to establish a corresponding relationship between the index and the file name of the file.
PCT/CN2022/089529 2021-04-30 2022-04-27 Access optimization method, apparatus and device for large quantity of small files, and storage medium WO2022228458A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110484057.3A CN113176857A (en) 2021-04-30 2021-04-30 Massive small file access optimization method, device, equipment and storage medium
CN202110484057.3 2021-04-30

Publications (1)

Publication Number Publication Date
WO2022228458A1 true WO2022228458A1 (en) 2022-11-03

Family

ID=76925818

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/089529 WO2022228458A1 (en) 2021-04-30 2022-04-27 Access optimization method, apparatus and device for large quantity of small files, and storage medium

Country Status (2)

Country Link
CN (1) CN113176857A (en)
WO (1) WO2022228458A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116069263A (en) * 2023-03-07 2023-05-05 苏州浪潮智能科技有限公司 File system optimization method, device, server, equipment and storage medium
CN117640626A (en) * 2024-01-25 2024-03-01 合肥中科类脑智能技术有限公司 File transmission method, device and system

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113176857A (en) * 2021-04-30 2021-07-27 康键信息技术(深圳)有限公司 Massive small file access optimization method, device, equipment and storage medium
CN114033224B (en) * 2021-09-26 2023-04-07 烟台杰瑞石油服务集团股份有限公司 Resource access method and device
CN114020216B (en) * 2021-11-03 2024-03-08 南京中孚信息技术有限公司 Method for improving small-capacity file tray-drop speed
CN114489475A (en) * 2021-12-01 2022-05-13 阿里巴巴(中国)有限公司 Distributed storage system and data storage method thereof
CN115904263B (en) * 2023-03-10 2023-05-23 浪潮电子信息产业股份有限公司 Data migration method, system, equipment and computer readable storage medium
CN117076387B (en) * 2023-08-22 2024-03-01 北京天华星航科技有限公司 Quick gear restoration system for mass small files based on magnetic tape
CN117376289B (en) * 2023-10-11 2024-04-12 哈尔滨工业大学 Network disk data scheduling method for road monitoring data use application
CN117170590B (en) * 2023-11-03 2024-01-26 沈阳卓志创芯科技有限公司 Computer data storage method and system based on cloud computing

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103076993A (en) * 2012-12-28 2013-05-01 北京思特奇信息技术股份有限公司 Storage system and method for concentration type system
CN105138290A (en) * 2015-08-20 2015-12-09 浪潮(北京)电子信息产业有限公司 High-performance storage pool organization method and device
US20160179379A1 (en) * 2014-12-23 2016-06-23 Teradata Us, Inc. System and method for data management across volatile and non-volatile storage technologies
CN106406759A (en) * 2016-09-13 2017-02-15 郑州云海信息技术有限公司 Data storage method and device
CN106777342A (en) * 2017-01-16 2017-05-31 湖南大学 A kind of HPFS mixing energy-conservation storage system and method based on reliability
CN113176857A (en) * 2021-04-30 2021-07-27 康键信息技术(深圳)有限公司 Massive small file access optimization method, device, equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102662992B (en) * 2012-03-14 2014-10-08 北京搜狐新媒体信息技术有限公司 Method and device for storing and accessing massive small files
US9043530B1 (en) * 2012-04-09 2015-05-26 Netapp, Inc. Data storage within hybrid storage aggregate

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103076993A (en) * 2012-12-28 2013-05-01 北京思特奇信息技术股份有限公司 Storage system and method for concentration type system
US20160179379A1 (en) * 2014-12-23 2016-06-23 Teradata Us, Inc. System and method for data management across volatile and non-volatile storage technologies
CN105138290A (en) * 2015-08-20 2015-12-09 浪潮(北京)电子信息产业有限公司 High-performance storage pool organization method and device
CN106406759A (en) * 2016-09-13 2017-02-15 郑州云海信息技术有限公司 Data storage method and device
CN106777342A (en) * 2017-01-16 2017-05-31 湖南大学 A kind of HPFS mixing energy-conservation storage system and method based on reliability
CN113176857A (en) * 2021-04-30 2021-07-27 康键信息技术(深圳)有限公司 Massive small file access optimization method, device, equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116069263A (en) * 2023-03-07 2023-05-05 苏州浪潮智能科技有限公司 File system optimization method, device, server, equipment and storage medium
CN117640626A (en) * 2024-01-25 2024-03-01 合肥中科类脑智能技术有限公司 File transmission method, device and system
CN117640626B (en) * 2024-01-25 2024-04-26 合肥中科类脑智能技术有限公司 File transmission method, device and system

Also Published As

Publication number Publication date
CN113176857A (en) 2021-07-27

Similar Documents

Publication Publication Date Title
WO2022228458A1 (en) Access optimization method, apparatus and device for large quantity of small files, and storage medium
Dong et al. Optimizing Space Amplification in RocksDB.
US10761758B2 (en) Data aware deduplication object storage (DADOS)
JP5087467B2 (en) Method and apparatus for managing data compression and integrity in a computer storage system
US10169365B2 (en) Multiple deduplication domains in network storage system
US8290972B1 (en) System and method for storing and accessing data using a plurality of probabilistic data structures
US9798728B2 (en) System performing data deduplication using a dense tree data structure
US7389382B2 (en) ISCSI block cache and synchronization technique for WAN edge device
US8171253B2 (en) Virtual disk mapping
US8712963B1 (en) Method and apparatus for content-aware resizing of data chunks for replication
US8412682B2 (en) System and method for retrieving and using block fingerprints for data deduplication
US7984259B1 (en) Reducing load imbalance in a storage system
US10210188B2 (en) Multi-tiered data storage in a deduplication system
US11580162B2 (en) Key value append
US8131688B2 (en) Storage system data compression enhancement
KR20170054299A (en) Reference block aggregating into a reference set for deduplication in memory management
US10719450B2 (en) Storage of run-length encoded database column data in non-volatile memory
US8140886B2 (en) Apparatus, system, and method for virtual storage access method volume data set recovery
US8924642B2 (en) Monitoring record management method and device
US11593312B2 (en) File layer to block layer communication for selective data reduction
US20220269431A1 (en) Data processing method and storage device
US11144533B1 (en) Inline deduplication using log based storage
US10922027B2 (en) Managing data storage in storage systems
US20240143449A1 (en) Data Processing Method and Apparatus
US20130007363A1 (en) Control device and control method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22794928

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE