CN114442937B - File caching method and device, computer equipment and storage medium - Google Patents

File caching method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN114442937B
CN114442937B CN202111663107.0A CN202111663107A CN114442937B CN 114442937 B CN114442937 B CN 114442937B CN 202111663107 A CN202111663107 A CN 202111663107A CN 114442937 B CN114442937 B CN 114442937B
Authority
CN
China
Prior art keywords
file
directory
hash value
read
cached
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111663107.0A
Other languages
Chinese (zh)
Other versions
CN114442937A (en
Inventor
高华龙
冯玉朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yunkuanzhiye Network Technology Co ltd
Original Assignee
Beijing Yunkuanzhiye Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yunkuanzhiye Network Technology Co ltd filed Critical Beijing Yunkuanzhiye Network Technology Co ltd
Priority to CN202111663107.0A priority Critical patent/CN114442937B/en
Publication of CN114442937A publication Critical patent/CN114442937A/en
Application granted granted Critical
Publication of CN114442937B publication Critical patent/CN114442937B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a file caching method, a file caching device, computer equipment and a storage medium, wherein the method comprises the following steps: responding to a read-write instruction of a target file, and calculating a hash value corresponding to the file name of the target file; obtaining the remainder of the hash value by using the number of the mounted directories under the cache directory; and determining a unique path of the target file under the cache directory based on the remainder of the hash value. By applying the scheme of the invention, the read-write performance of the cached small files can be improved, the server resources can be fully utilized, and the dependence on a file system can be reduced.

Description

File caching method and device, computer equipment and storage medium
Technical Field
The present invention relates to the field of disk data storage technologies, and in particular, to a file caching method and apparatus, a computer device, and a storage medium.
Background
The file cache is generally used for rapidly storing a large number of temporarily fragmented files, and is usually switched in from the aspect of disk media in order to improve the performance of the file cache, and a method for improving the performance of the disk media can use a RAID technology to reorganize a plurality of disks besides using a device with higher performance, so as to obtain the stability and performance of the disk media by increasing the number of the disk media. Among them, RAID (Redundant Arrays of Independent Disks) has the meaning of "array with redundancy capability made up of Independent Disks". The disk array is a disk group with huge capacity formed by combining a plurality of independent disks according to different modes, thereby providing higher storage performance than a single disk and providing a data backup technology, and improving the efficiency of the whole disk system by utilizing the additive effect generated by providing data by individual disks. With this technique, data is divided into a plurality of sectors, each of which is stored on a respective hard disk.
In the prior art, the common RAID schemes mainly include RAID0, RAID1, RAID5, RAID6, and RAID10. The RAID0 only improves the read-write performance and does not provide a data redundancy function, the principle is that a plurality of hard disks are combined into a large hard disk, when data are read and written, continuous data are cut into a plurality of blocks according to rules, and the data are written into a plurality of hard disks in a scattered mode to be read and written at the same time, so that the read-write bandwidth is improved, and the read-write speed of the magnetic disks in the same time is improved by N times through the parallel operation of the hard disks. However, since it has no data redundancy function, almost all data is affected if one of the hard disks is destroyed. RAID1 is just opposite to RAID0, only provides data redundancy function, and the read-write performance depends on the hard disk with the smallest bandwidth, and is characterized in that the data security is ensured, and as long as one hard disk is arranged in the hard disk group, the data is secure. RAID5 is the most used scheme at present, and is equivalent to additionally using one hard disk for data verification on the basis of RAID0, so that the problem of data loss due to damage of the hard disk does not occur, and in order to reduce the load of the data verification device, RAID5 writes verification data into all hard disks in turn, which is also a difference between RAID5 and RAID 3. RAID6 mainly increases the number of redundant blocks compared to RAID5, but RAID5 does not improve sequential reading performance for continuous small IO (data IO size is smaller than cache block size) compared to single disk, and writing performance is rather reduced due to the overwriting problem. It can be known from the above conventional RAID technology that a group of disks with the same specification is generally required, and a server may not only generally mount a plurality of disks, but also mount other disks (such as a disk expansion cabinet) in addition through other ways, and if it is desired to use multiple sources of disks to form a cache disk group in the face of a complex device, the RAID technology has some limitations obviously. Because RAID provides a block device and cannot directly provide a read-write interface at a file level, the block device must be managed by an intermediate layer, that is, a file system, and existing file systems are various and have various features except for basic functions of creating, reading, writing, deleting, and the like, some of them can support a larger storage space, some of them support more file numbers, some of them support an ultra-long file name, and some of them support an ultra-large single file. Users often use different file systems according to specific business requirements. Therefore, it is necessary to provide a file caching scheme based on multiple hard disks to solve the technical drawbacks.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: in the prior art, the improvement of the read-write performance of the small files is beneficial to a limited extent and even reduced by simply using the RAID technology; the support is not good for the condition that the specifications of the disks are different, which is not beneficial to fully utilizing the server resources; most file systems have problems such as limited support capacity, limited number of files in a single directory, or slow read/write speed when the number of files in a directory increases in response to a cached business scenario. And when the abundant service demands are met, the performance of the RAID technology is difficult to keep in the service scenes with large capacity and large quantity by using the prior RAID technology.
In order to solve the technical problem, the invention provides a file caching method, which comprises the following steps:
responding to a read-write instruction of a target file, and calculating a hash value corresponding to the file name of the target file;
utilizing the number of the mounted catalogues under the cache catalogues to carry out remainder operation on the hash value to obtain the remainder of the hash value;
and determining a unique path of the target file under the cache directory based on the remainder of the hash value.
Optionally, the calculating a hash value corresponding to the filename of the target file includes:
and carrying out hash calculation on the file name of the target file based on a hash algorithm to obtain a hash value corresponding to the file name of the target file.
Optionally, the determining, based on the remainder of the hash value, a unique path of the target file under the cache directory includes:
obtaining the total number of cached files by dividing the total cached capacity by the average volume of the cached files;
determining the number of leaf directories to be cached by dividing the total cached file number by the maximum file number under the directories;
and determining the unique path of the target file under the cache directory according to the number of the leaf directories and the maximum directory level of a file system.
Optionally, the method further comprises:
responding to a read-write instruction aiming at high-speed low-reliability service, and performing disk data read-write operation by using a mode of combining RAID0 and SSD media;
and responding to a read-write instruction aiming at the low-speed high-reliability service, and performing disk data read-write operation by using a mode of combining RAID1 and HDD medium.
In order to solve the above technical problem, the present invention provides a file caching apparatus, including:
the hash calculation module is used for responding to a read-write instruction of a target file and calculating a hash value corresponding to the file name of the target file;
the hash value processing module is used for obtaining the remainder of the hash value by using the number of the mounted directories under the cache directory;
and the caching module is used for determining a unique path of the target file under the caching directory based on the remainder of the hash value.
Optionally, the hash calculation module is configured to:
and carrying out hash calculation on the file name of the target file based on a hash algorithm to obtain a hash value corresponding to the file name of the target file.
Optionally, the cache module is configured to:
obtaining the total number of cached files by dividing the total cached capacity by the average volume of the cached files;
dividing the total cached file number by the maximum file number under the directories to determine the number of the leaf directories to be cached;
and determining the unique path of the target file under the cache directory according to the number of the leaf directories and the maximum directory level of a file system.
Optionally, the cache module is further configured to:
responding to a read-write instruction aiming at high-speed low-reliability service, and performing disk data read-write operation by using a mode of combining RAID0 and SSD media;
and responding to a read-write instruction aiming at the low-speed high-reliability service, and performing disk data read-write operation by using a mode of combining RAID1 and HDD media.
In order to solve the above technical problem, the present invention provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the above method when executing the computer program.
To solve the above technical problem, the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above method.
Compared with the prior art, one or more embodiments in the above scheme can have the following advantages or beneficial effects:
by applying the multi-hard-disk file caching method, the device, the computer equipment and the storage medium based on the hash ring, the hash value corresponding to the file name of the target file is calculated in response to the read-write instruction of the target file; utilizing the number of the mounted catalogues under the cache catalogues to carry out remainder operation on the hash value to obtain the remainder of the hash value; and determining a unique path of the target file under the cache directory based on the remainder of the hash value. Therefore, the read-write performance of the cached small files can be improved, the server resources can be fully utilized, and the dependence on a file system can be reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a file caching method according to an embodiment of the present invention;
FIG. 2 is a block diagram of a general cache organization provided by an embodiment of the present invention;
fig. 3 is a structural diagram of a file caching apparatus according to an embodiment of the present invention;
fig. 4 is a block diagram of a computer device provided by the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
In the prior art, the common RAID schemes mainly include RAID0, RAID1, RAID5, RAID6, and RAID10. The RAID0 only improves the read-write performance and does not provide a data redundancy function, the principle is that a plurality of hard disks are combined into a large hard disk, when data are read and written, continuous data are cut into a plurality of blocks according to rules, and the data are written into a plurality of hard disks in a scattered mode to be read and written at the same time, so that the read-write bandwidth is improved, and the read-write speed of the magnetic disks in the same time is improved by N times through the parallel operation of the hard disks. However, since it has no data redundancy function, almost all data is affected if one of the hard disks is destroyed. RAID1 is just opposite to RAID0, only provides data redundancy function, and the read-write performance depends on the hard disk with the smallest bandwidth, which is characterized in that the data security is ensured, and the data is secure as long as one hard disk is arranged in the hard disk group. RAID5 is the most used scheme at present, and is equivalent to additionally using one hard disk for data verification on the basis of RAID0, so that the problem of data loss due to damage of the hard disk does not occur, and in order to reduce the load of the data verification device, RAID5 writes verification data into all hard disks in turn, which is also a difference between RAID5 and RAID 3. RAID6 mainly increases the number of redundant blocks compared to RAID5, but RAID5 does not improve sequential read performance against continuous small IO (data IO size is smaller than cache block size) compared to single disk, and write performance is rather degraded due to the overwrite problem. It can be known from the above conventional RAID technology that a group of disks with the same specification is generally required, and the server not only can commonly mount a plurality of disks, but also can additionally mount other disks (such as a disk expansion cabinet) in other ways, and if it is desired to use disks from multiple sources to form a cache disk group in the face of a complex device, the RAID technology has some limitations. Because RAID provides a block device and cannot directly provide a read-write interface at a file level, the block device must be managed by an intermediate layer, that is, a file system, and existing file systems are various and have various features except for basic functions of creating, reading, writing, deleting, and the like, some of them can support a larger storage space, some of them support more file numbers, some of them support an ultra-long file name, and some of them support an ultra-large single file. Users often use different file systems according to specific business requirements.
In view of this, the method aims to solve the problem that the improvement of the read-write performance of the small file is beneficial to a limited extent and even reduced by simply using the RAID technology in the prior art; the support is not good under the condition of different disk specifications, which is not beneficial to fully utilizing server resources; the invention provides a file caching method, a device, computer equipment and a storage medium, which solve the problems that most file systems have limit on supporting capacity, limit on the number of files in a single directory or slow reading and writing speed after the number of files in the directory is increased when the service scenes of caching are met, and have difficulty in continuously maintaining the performance of the file systems in the service scenes with large capacity and large number by using the conventional RAID technology when the service scenes of rich services are met.
The following describes a file caching method provided by the present invention.
As shown in fig. 1, a flowchart of a file caching method provided by the present invention may include the following steps:
step S101: and responding to the read-write instruction of the target file, and calculating the hash value corresponding to the file name of the target file.
In one case, the file name of the target file is hashed based on a hash algorithm, so as to obtain a hash value corresponding to the file name of the target file.
Step S102: and utilizing the number of the mounted catalogues under the cache catalogues to carry out remainder on the hash value to obtain the remainder of the hash value.
Step S103: and determining a unique path of the target file under the cache directory based on the remainder of the hash value. In one case, the remainder of the hash value is the directory location of the target file in the corresponding directory hierarchy.
In one case, referring to fig. 2, the unique path of the target file under the cache directory may be determined as follows: dividing the total cache capacity by the average volume of the cached files to obtain the total cache file number; dividing the total cached file number by the maximum file number under the directories to determine the number of the leaf directories to be cached; and determining the unique path of the target file under the cache directory according to the number of the leaf directories and the maximum directory level of a file system.
Furthermore, the file caching scheme is applied to a block device layer of a disk system, RAID is used for organizing and managing the disks, and the file caching scheme is used for meeting the requirements of different services on read-write performance and data redundancy; and responding to a read-write instruction aiming at the low-speed high-reliability service, and performing disk data read-write operation by using a mode of combining RAID1 and HDD media.
On the basis, a user can select different file systems according to the service characteristics, establish the file systems on each RAID, and mount the file systems under the cache directory, so that the problem that some file systems have capacity limitation can be solved. Each directory under the cache directory corresponds to an independent file system, the directory names of the directories are sequentially increased from 0, and the value of the number should be prime as much as possible so as to equalize the number of total files in each directory.
In each independent file system, a user can configure directory hierarchies according to the service volume and the characteristics of the file system so as to avoid the problem that the performance of some file systems is reduced due to the limitation of the number of single directory files or the excessive number of single directory files.
It should be noted that the numbers of subdirectories of each level of directory of the cache directory need to be prime numbers, and the numerical values are different, so as to avoid unbalanced file distribution, so that when the number of each level of directory is calculated, the number of the cache leaf directories can be approximated by multiplying different prime numbers by referring to a prime number table formed by each prime number. Typically, a few primes of 1000 or less are sufficient, for example, if the primes are combined to [3, 919, 929, 937, 941], 2258300311401 leaf directories are provided, that is, there are 3 directories under the cache directory, and 919 directories under each of the 3 directories, and so on.
By applying the scheme of the invention, the reading and writing instruction of the target file is responded, and the hash value corresponding to the file name of the target file is calculated; utilizing the number of the mounted catalogues under the cache catalogues to carry out remainder operation on the hash value to obtain the remainder of the hash value; and determining a unique path of the target file under the cache directory based on the remainder of the hash value. Therefore, the read-write performance of the cached small files can be improved, server resources can be fully utilized, and the dependence on a file system can be reduced.
The following describes a file caching apparatus according to the present invention.
As shown in fig. 3, a structure diagram of a file caching apparatus provided in an embodiment of the present invention includes:
the hash calculation module 210 is configured to respond to a read-write instruction of a target file, and calculate a hash value corresponding to a file name of the target file;
a hash value processing module 220, configured to use the number of the mounted directories in the cache directory to obtain a remainder of the hash value;
and the caching module 230 is configured to determine a unique path of the target file under the caching directory based on a remainder of the hash value.
In one case, the hash calculation module 210 is configured to perform a hash calculation on the file name of the target file based on a hash algorithm, so as to obtain a hash value corresponding to the file name of the target file.
In another case, the caching module 230 is configured to obtain a total number of cached files by dividing the total cached capacity by the average volume of the cached files; determining the number of leaf directories to be cached by dividing the total cached file number by the maximum file number under the directories; and determining the unique path of the target file under the cache directory according to the number of the leaf directories and the maximum directory level of a file system.
In another case, the cache module 230 is further configured to perform, in response to a read-write instruction for a high-speed low-reliability service, a read-write operation on the disk data in a form of combining RAID0 and an SSD medium; and responding to a read-write instruction aiming at the low-speed high-reliability service, and performing disk data read-write operation by using a mode of combining RAID1 and HDD medium.
By applying the scheme of the invention, the reading and writing instruction of the target file is responded, and the hash value corresponding to the file name of the target file is calculated; obtaining the remainder of the hash value by using the number of the mounted directories under the cache directory; and determining a unique path of the target file under the cache directory based on the remainder of the hash value. Therefore, the read-write performance of the cached small files can be improved, the server resources can be fully utilized, and the dependence on a file system can be reduced.
The following describes a computer device provided in an embodiment of the present invention.
To solve the above technical problem, the present invention provides a computer device, as shown in fig. 4, including a memory 310, a processor 320 and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program to implement the method as described above.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer device may include, but is not limited to, a processor 320, a memory 310. Those skilled in the art will appreciate that fig. 4 is merely an example of a computing device and is not intended to be limiting and may include more or fewer components than those shown, or some of the components may be combined, or different components, e.g., the computing device may also include input output devices, network access devices, buses, etc.
The Processor 320 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 310 may be an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. The memory 310 may also be an external storage device of a computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. provided on the computer device. Further, the memory 310 may also include both an internal storage unit and an external storage device of the computer device. The memory 310 is used for storing the computer program and other programs and data required by the computer device. The memory 310 may also be used to temporarily store data that has been output or is to be output.
The embodiment of the present application further provides a computer-readable storage medium, which may be a computer-readable storage medium contained in the memory in the foregoing embodiment; or it may be a computer-readable storage medium that exists separately and is not incorporated into a computer device. The computer-readable storage medium stores one or more computer programs which, when executed by a processor, implement the methods described above.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory 310, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
For system or apparatus embodiments, since they are substantially similar to method embodiments, they are described in relative simplicity, and reference may be made to some descriptions of method embodiments for related points.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
It should be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
It is to be understood that the terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if the described condition or event is detected" may be interpreted to mean "upon determining" or "in response to determining" or "upon detecting the described condition or event" or "in response to detecting the described condition or event", depending on the context.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (8)

1. A file caching method is characterized by comprising the following steps:
responding to a read-write instruction of a target file, and calculating a hash value corresponding to the file name of the target file;
obtaining the remainder of the hash value by using the number of the mounted directories under the cache directory; each directory under the cache directory corresponds to an independent file system, and the directory hierarchy in each independent file system is configured according to the traffic and the characteristics of the file systems; the number of the subdirectories of each directory hierarchy is prime number and the numerical values are different;
obtaining the total number of cached files by dividing the total cached capacity by the average volume of the cached files;
determining the number of leaf directories to be cached by dividing the total cached file number by the maximum file number under each directory;
according to the number of the leaf directories, calculating the number of each level of directory in a mode of multiplying different prime numbers in a prime number table to approximate the number of the leaf directories;
and determining the unique path of the target file under the cache directory according to the number of the leaf directories and the maximum directory level of a file system.
2. The method according to claim 1, wherein the calculating a hash value corresponding to the file name of the target file comprises:
and carrying out hash calculation on the file name of the target file based on a hash algorithm to obtain a hash value corresponding to the file name of the target file.
3. The file caching method according to claim 1, further comprising:
responding to a read-write instruction aiming at high-speed low-reliability service, and performing disk data read-write operation by using a mode of combining RAID0 and SSD media;
and responding to a read-write instruction aiming at the low-speed high-reliability service, and performing disk data read-write operation by using a mode of combining RAID1 and HDD media.
4. A file caching apparatus, comprising:
the hash calculation module is used for responding to a read-write instruction of a target file and calculating a hash value corresponding to the file name of the target file;
the hash value processing module is used for utilizing the number of the mounted catalogues under the cache catalogues to carry out remainder on the hash value to obtain the remainder of the hash value; each directory under the cache directory corresponds to an independent file system, and the directory hierarchy in each independent file system is configured according to the traffic and the characteristics of the file systems; the number of the subdirectories of each directory hierarchy is prime number and the numerical values are different;
a cache module for
Obtaining the total number of cached files by dividing the total cached capacity by the average volume of the cached files;
determining the number of leaf directories to be cached by dividing the total cached file number by the maximum file number under each directory;
according to the number of the leaf directories, calculating the number of each level of directory in a mode of multiplying different prime numbers in a prime number table to approximate the number of the leaf directories;
and determining the unique path of the target file under the cache directory according to the number of the leaf directories and the maximum directory level of a file system.
5. The file caching apparatus of claim 4, wherein the hash calculation module is configured to:
and carrying out hash calculation on the file name of the target file based on a hash algorithm to obtain a hash value corresponding to the file name of the target file.
6. The file caching apparatus according to claim 4, wherein the caching module is further configured to:
responding to a read-write instruction aiming at high-speed low-reliability service, and performing disk data read-write operation by using a mode of combining RAID0 and SSD media;
and responding to a read-write instruction aiming at the low-speed high-reliability service, and performing disk data read-write operation by using a mode of combining RAID1 and HDD media.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 3 when executing the computer program.
8. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 3.
CN202111663107.0A 2021-12-31 2021-12-31 File caching method and device, computer equipment and storage medium Active CN114442937B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111663107.0A CN114442937B (en) 2021-12-31 2021-12-31 File caching method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111663107.0A CN114442937B (en) 2021-12-31 2021-12-31 File caching method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114442937A CN114442937A (en) 2022-05-06
CN114442937B true CN114442937B (en) 2023-03-28

Family

ID=81366196

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111663107.0A Active CN114442937B (en) 2021-12-31 2021-12-31 File caching method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114442937B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116185965B (en) * 2023-05-04 2023-08-04 联想凌拓科技有限公司 Method, apparatus, device and medium for quality of service control
CN117061615B (en) * 2023-10-09 2024-01-16 杭州优云科技有限公司 Cache path acquisition method, device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8266136B1 (en) * 2009-04-13 2012-09-11 Netapp, Inc. Mechanism for performing fast directory lookup in a server system
CN109271361A (en) * 2018-08-13 2019-01-25 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Distributed storage method and system for massive small files
CN109446160A (en) * 2018-11-06 2019-03-08 郑州云海信息技术有限公司 A kind of file reading, system, device and computer readable storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130226888A1 (en) * 2012-02-28 2013-08-29 Netapp, Inc. Systems and methods for caching data files
US20180032540A1 (en) * 2016-07-28 2018-02-01 Dell Products L.P. Method and system for implementing reverse directory lookup using hashed file metadata
CN106775453B (en) * 2016-11-22 2019-07-05 华中科技大学 A kind of construction method mixing storage array
CN107562786A (en) * 2017-07-27 2018-01-09 平安科技(深圳)有限公司 File memory method, terminal and computer-readable recording medium
CN111367861A (en) * 2020-02-29 2020-07-03 苏州浪潮智能科技有限公司 File caching method, system, device and medium
JP2021144381A (en) * 2020-03-11 2021-09-24 Necソリューションイノベータ株式会社 Protocol converter, block storage device, protocol conversion method, program, and recording medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8266136B1 (en) * 2009-04-13 2012-09-11 Netapp, Inc. Mechanism for performing fast directory lookup in a server system
CN109271361A (en) * 2018-08-13 2019-01-25 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Distributed storage method and system for massive small files
CN109446160A (en) * 2018-11-06 2019-03-08 郑州云海信息技术有限公司 A kind of file reading, system, device and computer readable storage medium

Also Published As

Publication number Publication date
CN114442937A (en) 2022-05-06

Similar Documents

Publication Publication Date Title
US9529545B1 (en) Managing data deduplication in storage systems based on storage space characteristics
US9384206B1 (en) Managing data deduplication in storage systems
US9460102B1 (en) Managing data deduplication in storage systems based on I/O activities
US10031675B1 (en) Method and system for tiering data
US10169365B2 (en) Multiple deduplication domains in network storage system
US11010300B2 (en) Optimized record lookups
US11586366B2 (en) Managing deduplication characteristics in a storage system
CN104408091B (en) The date storage method and system of distributed file system
EP2502148B1 (en) Selective file system caching based upon a configurable cache map
US9449011B1 (en) Managing data deduplication in storage systems
CN114442937B (en) File caching method and device, computer equipment and storage medium
US8090924B2 (en) Method for the allocation of data on physical media by a file system which optimizes power consumption
US9612758B1 (en) Performing a pre-warm-up procedure via intelligently forecasting as to when a host computer will access certain host data
US10579593B2 (en) Techniques for selectively deactivating storage deduplication
US8793226B1 (en) System and method for estimating duplicate data
US10481820B1 (en) Managing data in storage systems
US8538933B1 (en) Deduplicating range of data blocks
US9383936B1 (en) Percent quotas for deduplication storage appliance
US11199990B2 (en) Data reduction reporting in storage systems
US10387369B1 (en) Managing file deletions of files and versions of files in storage systems
CN112684975B (en) Data storage method and device
US11232043B2 (en) Mapping virtual block addresses to portions of a logical address space that point to the virtual block addresses
US10725944B2 (en) Managing storage system performance
US20190339911A1 (en) Reporting of space savings due to compression in storage systems
US11809379B2 (en) Storage tiering for deduplicated storage environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant