WO2023169188A1 - Popularity identification method and apparatus in file system, and computer device - Google Patents

Popularity identification method and apparatus in file system, and computer device Download PDF

Info

Publication number
WO2023169188A1
WO2023169188A1 PCT/CN2023/077025 CN2023077025W WO2023169188A1 WO 2023169188 A1 WO2023169188 A1 WO 2023169188A1 CN 2023077025 W CN2023077025 W CN 2023077025W WO 2023169188 A1 WO2023169188 A1 WO 2023169188A1
Authority
WO
WIPO (PCT)
Prior art keywords
access
popularity
access object
heat
file
Prior art date
Application number
PCT/CN2023/077025
Other languages
French (fr)
Chinese (zh)
Inventor
杨伦
付克博
沈建强
李亚飞
魏展
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023169188A1 publication Critical patent/WO2023169188A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/185Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof

Definitions

  • the present application relates to the field of computer technology, and in particular to a heat identification method, device and computer equipment in a file system.
  • the data being accessed is likely to be accessed again in the near future, forming a data hotspot.
  • Identification of hot and cold data is very important for optimizing the storage system. It is the basis for realizing data hot and cold classification, data placement and migration, and is also an indispensable factor for the storage system to achieve high cost performance.
  • the identification of hot and cold data in traditional storage systems is mainly based on data blocks. However, as the scale of unstructured data becomes larger and larger, the number and size of files that need to be identified in the storage system have reached billions and gigabytes respectively. In addition to identifying the popularity of files, sometimes it is also necessary to record the popularity of directories. Building directory popularity helps users more comprehensively perceive the hotness and coldness of data, conduct data analysis and mining, and perform directory-level data flow. The popularity of a directory is the sum of the popularity of the files and subdirectories under the directory. According to the traditional solution, when the user requests to obtain the popularity of a certain directory, the files and subdirectories in the entire directory are recursively traversed and summed to obtain the directory popularity.
  • This application provides a heat identification method, device and computer equipment in a file system, which are used to realize synchronous updates of files and directories and avoid the problem of lag in upper directory update.
  • this application provides a method for identifying popularity in a file system.
  • the method includes: obtaining an access request from an application, determining the access object of the access request; counting the access frequency of each access object; The storage path of the access object and the access frequency of each access object are synchronized to update the popularity of the access object and the popularity of each node of the access object in the direction of the parent node in the storage path.
  • the popularity of the parent node is The sum of the popularity of each child node under the parent node.
  • the method further includes: determining the access type for the access object according to the access request; and counting the access frequency of each access object, including: counting the same access frequency of each access object.
  • Access frequency of access type For the same access object, the popularity of different access types can be further distinguished, for example, respectively By counting the popularity of read operations and write operations on the same access object, we can gain a more detailed understanding of the user's needs for the access object, allowing for more reasonable storage optimization.
  • the access object includes a file directory, a file, or a data block in a file.
  • determining the access object of the access request includes: determining the file requested to be accessed based on the object identifier in the access request, and based on the offset and length in the access request, Determine that the access object is located in one or more blocks in the file; synchronize the popularity of the access object and the number of times the access object is in the file according to the storage path of each access object and the access frequency of each access object.
  • Storing the popularity of each node in the direction of the parent node in the path includes: synchronously updating the popularity of the one or more blocks according to the storage path of the one or more blocks and the access frequency of the one or more blocks, The popularity of the file, and the popularity of each node in the direction of the parent node in the storage path of the file.
  • larger files can be stored in blocks. When counting the popularity, the popularity is calculated separately for each block, and the storage of different blocks is optimized based on the popularity without having to cache the entire file.
  • the method further includes: periodically attenuating the popularity of each access object; if the popularity of the first access object decays to less than or equal to a preset threshold, then deleting the first access The object's warmth.
  • the popularity information since there is no need to set up separate metadata and there is no independent storage space for metadata, it is necessary to consider the size of the popularity information and the size of the storage space. As time accumulates, the access objects gradually increase. The popularity information is also gradually increasing, which may cause the problem of insufficient storage space for the popularity information. The popularity information of less popular access objects is deleted to control the storage space occupied by the popularity information.
  • Attenuating the popularity of each access object includes: multiplying the popularity of each access object by an attenuation coefficient; if the value multiplied by the attenuation coefficient is a non-integer, then use 1 The probability minus the attenuation coefficient is rounded down, and the probability of the attenuation coefficient is rounded up. If rounding up or rounding down is used, there will be a difference between the sum of the heat of the child nodes and the heat of the parent node, which is not conducive to subsequent storage optimization of accessed objects based on heat; and the above method can make the heat of the child nodes The sum is equal or approximately equal to the heat of the parent node.
  • counting the access frequency of each access object includes counting the access frequency of each access object within a preset interval, and the preset interval includes any of the following: a preset time interval , preset traffic, preset number of access requests. If a hotness update is performed for each access request, frequent update operations will occupy too much bandwidth when the number of visits is large. In the above method, the accesses in the preset interval are counted and then the hotness is updated. Helps save bandwidth resources.
  • the method further includes: determining whether the accessed object is hot data according to the popularity of the accessed object and the first popularity threshold; and/or determining whether the accessed object is hot data according to the popularity of the accessed object and the second popularity threshold. Check whether the access object is cold data. Classifying hot and cold data will help optimize data storage in the future.
  • the method further includes: sorting the heat of all stored access objects from large to small; using the heat of the Nth access object as the first heat threshold; the N satisfies the following conditions: N divided by the number of all visited objects satisfies the preset proportion condition; or the sum of the popularity of the first N visited objects is divided by the sum of the popularity of all visited objects satisfies the preset proportion condition.
  • the method further includes: receiving a popularity query request, the request being used to request to query the popularity of the target access object; and outputting the popularity of the target access object.
  • this application provides a heat identification device in a file system.
  • the device includes modules/units that execute the above-mentioned first aspect and any possible implementation of the first aspect; these modules/units can be implemented through hardware. Implementation can also be implemented by hardware executing corresponding software.
  • the device includes: a collection module, used to obtain the access request from the application program, determine the access object of the access request; count the access frequency of each access object; and the popularity update module, used to calculate the number of access requests based on each access object.
  • the storage path and the access frequency of each access object are synchronized to update the popularity of the access object and the popularity of each node of the access object in the direction of the parent node in the storage path.
  • the popularity of the parent node is the parent node. The sum of the heat of each child node.
  • the present application provides a computer device, the computer device including a memory and a processor; the memory stores a computer program; the processor is used to call the computer program stored in the memory to execute the first aspect and the method described in any implementation of the first aspect.
  • the present application provides a computer-readable storage medium. Instructions are stored in the computer-readable storage medium. When the instructions are run on a computer, the computer is caused to execute the first aspect and the first aspect. The method described in any implementation.
  • the present application provides a computer program product containing instructions that, when run on a computer, causes the method described in the first aspect and any implementation manner of the first aspect to be executed.
  • Figure 1 is a schematic diagram of Qumulo file directory metadata provided by the embodiment of this application.
  • FIG. 2 is a schematic diagram of the Qumulo file directory metadata update process provided by the embodiment of this application.
  • FIG. 3 is a schematic flow chart of the heat identification method in the file system provided by the embodiment of the present application.
  • Figure 4 is a schematic diagram of the growth of heat information during the heat update process provided by the embodiment of the present application.
  • Figure 5 is a schematic diagram of cold and hot data classification provided by the embodiment of the present application.
  • Figure 6 is a schematic diagram of heat information pruning during the heat update process provided by the embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a heat identification device provided by an embodiment of the present application.
  • Figure 8 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the general parallel file system is configured with a file heat identification method, including a file heat calculation method and a heat update method.
  • a file heat identification method including a file heat calculation method and a heat update method.
  • the popularity value of files that have not been accessed within a period will decay by a percentage.
  • the effective range of attenuation percentage is 0 ⁇ 100%, and the default value is 10%. It can also be customized by the user or customized by the user.
  • the access time atime of the file is automatically modified, and the access popularity of the file also increases. If the update of access time atime is suppressed, the heat calculation of file access may be adversely affected, that is, there is the following dependency relationship: file is accessed -> atime update -> file heat increases.
  • the above method of identifying file popularity only calculates the popularity of files and does not count the popularity of file directories. Since building directory popularity helps users more comprehensively perceive the hotness and coldness of data, conduct data analysis and mining, and directory-level data flow, the distributed file system (Qumulo file fabric, QF2) developed by Qumulo has built-in file elements. Real-time aggregation and real-time analysis features of data.
  • QF2 distributed file fabric
  • Qumulo file directory metadata can be shown in Figure 1, where rank indicates the level of the directory or file, size indicates the size of the directory or file itself, and the values in square brackets indicate uncoordinated values and coordinated values respectively.
  • the uncoordinated value and the reconciled value of a directory are equal to the sum of the uncoordinated values and the sum of the reconciled values of the directory and all files and subdirectories under the directory, respectively.
  • the size of file F1 is updated, the uncoordinated value is first modified, as shown in step 1 in Figure 2. Then add its storage path to the dirty list, as shown in step 2 in Figure 2. Next, the background asynchronously updates the F1 storage path upwards and modifies the coordinated value of F1, as shown in step 3 in Figure 2.
  • the dirty list updates objects with larger rank values first. If users frequently access objects with larger rank values, it will cause the upper-level directory popularity updates in the directory tree to lag behind, and the file and directory popularity times to be out of sync. For large file systems, this cannot be done. Meet real-time requirements.
  • embodiments of the present application provide a method for identifying hotness in a file system, which is used to realize synchronous updates of files and directories and avoid the problem of lagging updates of upper-level directories.
  • the heat identification device may be deployed in an independent server, or may be deployed in the same server with other systems.
  • the heat identification device may be deployed together with a storage system.
  • the heat identification device can include a collection module and a heat update module, wherein the adoption module can be set on the client to collect data from the client, and then send the collected data or processed information to the server.
  • the popularity update module the popularity update module is used to update the popularity of access objects. Alternatively, it is not necessary to set up a collection module in the client, but all the heat identification devices are set up in the server. After receiving the user's access request, the client sends the access request to the heat identification device in the server, so that The heat identification device performs heat statistics and updates.
  • FIG. 3 is a schematic flow chart of a heat identification method in a file system provided by an embodiment of the present application. As shown in the figure, the method may include the following steps:
  • Step 301 Obtain the access request from the application program and determine the access object of the access request.
  • the above step 301 can be performed by a collection module provided on the client in the heat identification device.
  • the collection module can use an application programming interface (API).
  • API application programming interface
  • a cache space can be configured for the collection module, and the collection module stores the collected access requests in the cache space.
  • the client does not need to set up a collection module, but the client sends the access request to the heat identification device, so that the heat identification device can perform heat statistics and updates based on the access request, that is,
  • the above-mentioned step 301 is for the collection module in the heat recognition device to receive the access request and determine the access object based on the received access request.
  • the access request can include information such as data object identification, operation words, offset and length.
  • the data object identifier represents the file or file directory requested to be accessed, etc.
  • the operation word indicates the user's operation type on the access object, such as reading operations, write operations, etc.
  • the offset can be used to represent the starting position of the data the user requests to access
  • the length can be used to represent the size of the data the user requests to access. For example, if the size of a file is 1GB, and the data that the user needs to access is the data from 0.5GB to 0.6GB in the middle of the file, then the offset in the access request is 0.5GB and the length is 0.1GB.
  • the collection module can determine the access object based on the data object identifier in the access request, or determine the access object based on the data object identifier, offset, and length in the access request.
  • the access objects may include file directories, files, or data blocks in files.
  • the data of each file when the data of each file is stored, it may be stored in blocks or not. For example, files whose size reaches a preset threshold are divided into blocks, and files which do not reach the preset threshold are not. Carry out chunking.
  • the accessed file or file directory when determining the access object based on the access request, the accessed file or file directory can be determined based only on the data object identifier.
  • the block requested to be accessed can be determined according to the access request. For example, the size of file A is 1GB and is divided into 10 blocks for storage.
  • the data corresponding to each block is 0 ⁇ 0.1GB, 0.1GB ⁇ 0.2GB, 0.2GB ⁇ 0.3GB,..., 0.9GB ⁇ 1GB; if accessed
  • the offset in the request is 0.55GB
  • the length is 0.1GB
  • the user requests data between 0.55 and 0.65GB. Then the data the user requests to access is located in the 6th and 7th blocks, then the access object is determined are the 6th and 7th blocks of file A.
  • the time of the access request can also be obtained, so that subsequent heat statistics can be analyzed more accurately from a time perspective.
  • the access request may carry time information, then the time information can be obtained directly from the access request; or, the access request may not contain time information, then the access request can be recorded when the access request is obtained. Requested time.
  • Step 302 Count the access frequency of each access object.
  • access request 1 requests to read the 6th and 7th blocks of file A
  • the number of read operations for the 6th block of file A is increased by 1
  • the number of read operations for the 7th block of file A is increased by 1. 1
  • access request 2 requests to write the 6th block in file A
  • the number of write operations on the 6th block in file A is increased by 1
  • access request 3 requests to read file B
  • file B The read operation of directory C is increased by 1
  • access request 4 requests to read directory C the read operation of directory C is increased by 1.
  • the above step 302 can be executed by the collection module, either by the collection module provided in the client or by the collection module provided in the server.
  • the collection module when the collection module performs the above step 302, it can count the access requests within the preset interval and count the access frequency of each access object, where the preset interval can be a preset time period or a preset time period. Set the traffic segment, or preset the number of access requests, etc.
  • the collection module After the collection module counts the access frequency of each access object within the preset interval, it sends the calculated frequency to the heat update module used to update the access object heat, then cleans the calculated frequency data and re-counts the next interval. frequency of visits within.
  • the preset time period is 10 minutes
  • the number of visits to each access object can be counted for all access requests within 10 minutes; when 10 minutes is reached, the counted number of visits will be sent to the server for updating the access object.
  • the popularity update module of Hotness then clears the number of visits to zero and re-counts the number of visits in the next 10 minutes.
  • the preset traffic segment is 1MB
  • multiple access requests with a total size not exceeding 1MB can be counted, and the number of accesses for each access object can be counted.
  • the counted number of visits will be sent to the popularity update module, and then the number of visits will be cleared, and the access requests within the new preset traffic will be counted. For another example, if the default number of access requests is 50, then 50 The number of visits to the object accessed by the access request is counted, and the counted number of visits is sent to the popularity update module, and then the statistical number is cleared to zero, and the number of visits to the object accessed by the 50 access requests obtained thereafter is re-counted.
  • the collection module counts the access requests within the preset interval and then sends them to the popularity update module, which can reduce the number of transmissions and help reduce the bandwidth resources occupied by popularity updates. It does not have to be like GPFS for popularity statistics and updates every time. Once an access request is obtained, it is sent once. The updates are too frequent and occupy too much bandwidth resources.
  • the preset interval is a preset number of access requests, and the preset number is 1, it means that the acquisition module needs to report to the popularity update module once for each access request.
  • Step 303 According to the storage path of each access object and the access frequency of each access object, synchronously update the popularity of the access object and the popularity of each node in the direction of the parent node in the storage path where the access object is located.
  • the popularity update module updates the access popularity of each access object stored in itself according to the access frequency of each access object received. For example, the number of visits to an access object can be used as the popularity value of the access object. Then, after receiving the number of accesses to each access object, the popularity update module adds the stored number of accesses to each access object to the newly acquired to obtain the updated number of visits to each access object, that is, the updated popularity value.
  • the popularity update module receives the access frequency information of a new access object, that is, the popularity update module has not previously stored the access frequency of the access object, then the popularity update module can generate popularity information about the access object. Furthermore, if the popularity update module does not store the popularity information of the parent node of the access object, it also needs to generate the popularity information of its parent node; if it does not store the popularity information of the parent node's parent node, it also needs to generate the parent node's parent node. The popularity information is accessed until the root directory where the object is located. As shown in Figure 4, before the update, the heat information stored by the heat update module is shown in (a) of Figure 4.
  • the heat information of the read operation of directory 00 is stored, and subdirectory 11 and subdirectory 12 under directory 00 are stored.
  • the heat information of the read operation also stores the heat information of the read operation of file 21 under subdirectory 11, and the heat information of the read operation of file 22 and file 23 under subdirectory 12;
  • the heat update module receives the heat information sent by the collection module Based on the heat information of each access object, it is determined that the heat information of the read operation of the file 24 under the subdirectory 21 needs to be added, as shown in (b) in Figure 4 .
  • the popularity of each accessed object is recorded, but also the popularity of the directory is recorded at the same time, thereby facilitating subsequent storage analysis and optimization of the directory level.
  • the access object is file F1 shown in Figure 2, and its storage path /D0/D1/F1 is obtained.
  • Folder D1 is the parent node of file F1, and folder D0 is the parent node of folder D1; then for file F1 When updating the popularity, the popularity values of folder D1 and folder D0 also need to be updated.
  • the popularity of folder D1 is equal to the sum of the popularity of all files under folder D1
  • the popularity of folder D0 is equal to the sum of the popularity of all files under folder D0.
  • the parent node of the access object is the file
  • the popularity value of the file is the sum of the popularity of all blocks under the file.
  • the popularity of the access object and its parent node is updated synchronously, which solves the problem of lagging update of the upper directory popularity, so that when the directory popularity needs to be output, the latest popularity information can be output in a timely manner.
  • the collection module not only determines the access object based on the access request, but also determines the access type of the access request based on the operation word in the access request, such as read operation and write operation; and counts the access frequency separately for different access types of each access object.
  • the popularity update module updates the popularity separately for each access type of each access object. Hot updates that distinguish access types can provide a more detailed understanding of users' needs for access objects, allowing for more reasonable storage optimization.
  • the heat identification device can also output heat information.
  • the popularity identification device can periodically send the latest popularity of each access object to the storage server, so that the storage server can perform storage optimization based on the popularity of each access object.
  • the popularity identification device can also receive a popularity query request for querying the popularity of a target access object, and the popularity identification device can determine the target access object according to the request and output the current popularity information of the target access object.
  • the cold and hot data can be further divided according to the popularity value, thereby providing clearer reference information and optimization basis for storage, push and other services, and simplifying the storage server. , push server operation.
  • the popularity value of each accessed data can be compared with the first popularity threshold. If it is greater than or equal to the first popularity threshold, the access object is regarded as the popular data.
  • the popularity value of each accessed data can also be compared with the second popularity threshold. If it is less than or equal to the first popularity threshold, the access object is regarded as unpopular data.
  • the first heat threshold and the second heat threshold may be equal or unequal; if not, the first heat threshold is greater than the second heat threshold.
  • the first heat threshold and the second heat threshold may be preset, may be obtained by machine learning by the heat identification device, or may be obtained according to a preset strategy.
  • the first popularity threshold can be determined according to the following method: first, based on the total number of access objects and the preset proportion value, determine the number of access objects that reach the preset proportion of the total number, and then determine the number of access objects that reach the preset proportion of the total number. The quantity is represented by N. Then, the popularity values of the accessed objects are sorted from large to small, and the Nth heat value is determined, and the Nth heat value is used as the first heat threshold.
  • the first popularity threshold can also be determined in the following manner: first, the popularity values of the accessed objects are sorted from large to small. Then, find the cumulative sum of the heat values, recorded as K. After that, the sorted popularity values are accumulated from front to back. For example, the first popularity value is accumulated to obtain L 1 , the first and second popularity values are accumulated to obtain L 2 , and accumulated to the i-th The heat value is obtained Li .
  • L N-1 does not meet the preset proportion condition, but L N meets the preset proportion condition, the Nth heat value is used as the first heat threshold. Wherein, satisfying the preset ratio condition may be greater than or equal to the preset ratio.
  • the first heat threshold determined according to either of the above two methods can also be used as the second heat threshold at the same time; or, when the first heat threshold and the second heat threshold are not equal, the first heat threshold can also be determined based on the above two methods.
  • the second heat threshold is determined by setting different preset ratios or preset ratios. According to the first heat threshold and/or the second heat threshold, the access object can be classified into cold and hot categories.
  • the heat value of the parent node is greater than or equal to the heat value of its child nodes. Therefore, if a leaf node is popular data, then the parent node of the leaf node and the parent node parent node Up to the root directory is popular data.
  • directory 00 contains subdirectory 10, subdirectory 11, and subdirectory 12; subdirectory 10 contains subdirectory 20, and subdirectory 20 contains file 30, file 31 and file 32; subdirectory 11 contains file 21 and subdirectory 22, and subdirectory 22 contains file 33 and file 34.
  • file 33 can be regarded as a data block 40, and the data block 40 is split into sub-directories.
  • Block 50 and sub-block 51, and sub-block 50 is divided into sub-block 60 and sub-block 61, sub-block 51 is divided into sub-block 62 and sub-block 63; sub-directory 12 contains sub-directory 23 and sub-directory 24, sub-directory 24 Contains file 35 and file 36.
  • file 30 file 31, file 32, sub-block 60, sub-block 61, sub-block 62, sub-block 63, file 34, file 35 and file 36 are leaf nodes.
  • sub-block 60 the storage path of sub-block 60 is: directory 00-sub-directory 11-sub-directory 22-file 33-block 40-sub-block 50-sub-block 60, then directory 00, sub-directory 11, sub-directory in the storage path 22.
  • File 33, block 40 and sub-block 50 are also popular data.
  • the storage path of file 35 is: directory 00 - subdirectory 12 - subdirectory 24 - file 35. Then directory 00, subdirectory 12 and subdirectory 24 in the storage path are also popular data.
  • the popularity value of the access object keeps accumulating, even for data that is not frequently accessed, the number of visits will gradually increase over time, that is, the popularity value keeps increasing, which is not conducive to the identification of cold and hot data. Therefore, the heat value of each accessed object can be periodically attenuated to prevent the heat value of cold data from increasing all the time.
  • the popularity value of each access type of each access object can be periodically multiplied by the attenuation coefficient ⁇ , where 0 ⁇ 1, to reduce its popularity value.
  • the heat value of each access type of each access object is multiplied by the attenuation coefficient every 30 minutes to complete the heat attenuation; if the heat value of the read operation of folder 1 is 30, the heat value of the read operation of folder 1 is 30.
  • the following contains file A and file B.
  • the read operation heat value of file A is 20. Among them, the read operation heat value of block 1 of file A is 15, the read operation heat value of block 2 of file A is 5, and the read operation heat value of file B is 20.
  • the read operation heat value is 10.
  • the heat value of file A block 1 becomes 7.5, and the heat value of file A block 2 becomes 2.5. In order to facilitate calculation, in a possible implementation, they can be rounded. However, if rounding up or rounding down is used, the sum of the popularity value of block 1 of file A and the popularity value of block 2 of file A may not be equal to the popularity value of file A. In order to make the heat value of the parent node equal to or approximately equal to the sum of the heat values of all child nodes contained in the parent node, when rounding, you can round up according to the probability of ⁇ , and round down according to the probability of 1- ⁇ . Round up or down.
  • data heat information is recorded in the module that stores metadata, and the metadata has independent storage space and sufficient storage resources, so there is no need to consider the size of the storage resources occupied by the heat information.
  • the size of the popularity information is an issue that needs to be considered in the embodiment of the present application. As time accumulates, the number of access objects gradually increases, and so does the popularity information, which may lead to insufficient storage space for the popularity information.
  • the embodiment of the present application provides a pruning solution to control the storage space occupied by the heat information and avoid the heat information only increasing but not Problems that lead to insufficient storage control will be reduced.
  • Figure 6 exemplarily provides a schematic diagram of pruning.
  • the preset threshold is set to 0, that is, when the heat value decays to 0, the heat information is deleted.
  • the heat information stored by the heat update module is shown in (a) in Figure 6. It stores the heat value 6 of the read operation of directory 00, and the heat values of the read operations of subdirectory 11 and subdirectory 12 under directory 00. are 2 and 4 respectively, the heat value of the read operation of file 21 under subdirectory 11 is 2, and the heat value of the read operation of file 22, file 23, file 24, and file 25 under subdirectory 12 are all 1; assuming attenuation
  • the coefficient ⁇ is 0.5.
  • the heat value of the read operation of directory 00 is 3
  • the heat values of the read operations of subdirectory 11 and subdirectory 12 under directory 00 are 1 and 2 respectively, and the heat value of the read operation of file 21 under subdirectory 11
  • the heat value of the operation is 1.
  • Files 22, 23, 24, and 25 under subdirectory 12 have a heat value probability of ⁇ (i.e. 50%) of the read operation of 1, and a probability of 1- ⁇ of 0, and we get
  • the heat value of the read operation of file 22 is 1
  • the heat value of the read operation of file 23 is
  • the heat value of the read operation of file 24 is 1
  • the heat value of the read operation of file 25 is 0. Since the file The heat values of the read operations of files 23 and 25 decay to 0, and their heat information needs to be deleted, that is, the heat information of the read operations of files 23 and 25 is pruned, as shown in (b) in Figure 6.
  • the growth of hot information can be suppressed, which helps to avoid the problem of insufficient storage space caused by excessive hot information.
  • the popularity information whose popularity value is greater than or equal to the preset threshold can also be deleted.
  • pruning is performed immediately to delete one or more heat information with the lowest heat value, or alternatively, the heat value can be All heat information whose difference from the preset threshold is within the preset range is deleted.
  • the heat information with a heat value of 0 can be deleted; when the storage space occupied by the heat information has reached the maximum allowable storage space, the heat value will be deleted. All popularity information less than or equal to 1 is deleted.
  • pruning is performed to delete the heat information with the lowest heat value, or to All heat information whose value differs from the preset threshold within the preset range is deleted.
  • embodiments of the present application also provide a heat identification device for implementing the above method embodiments.
  • the device may include modules/units that execute any of the possible implementation methods in the above method embodiments; these modules/units may be implemented by hardware, or may be implemented by hardware executing corresponding software.
  • the device may include: a collection module 701 and a popularity update module 702.
  • the collection module 701 is used to obtain access requests from applications, determine the access objects of the access requests, and count the access frequency of each access object.
  • the popularity update module 702 is used to synchronously update the popularity of the access object and each node in the direction of the parent node of the access object in the storage path according to the storage path of each access object and the access frequency of each access object.
  • the popularity of the parent node is the sum of the popularity of each child node under the parent node.
  • the collection module 701 is also configured to: determine the access type for the access object according to the access request; when the collection module 701 counts the access frequency of each access object, specifically Used to: count the access frequency of the same access type for each access object.
  • the access object includes a file directory, a file, or a data block in a file.
  • the collection module 701 when determining the access object of the access request, has The body is used to: determine the file requested to be accessed according to the object identifier in the access request, and determine one or more blocks in the file where the access object is located according to the offset and length in the access request; the popularity The update module 702 is specifically configured to: synchronously update the popularity of the one or more blocks, the popularity of the file, and The popularity of each node in the direction of the parent node in the storage path of the file.
  • the device may also include: a heat attenuation module 703, configured to periodically attenuate the heat of each access object; if the heat of the first access object attenuates to less than or equal to the preset threshold, Then delete the popularity of the first visited object.
  • a heat attenuation module 703 configured to periodically attenuate the heat of each access object; if the heat of the first access object attenuates to less than or equal to the preset threshold, Then delete the popularity of the first visited object.
  • the popularity attenuation module 703 when attenuating the popularity of each access object, is specifically used to: multiply the popularity of each access object by an attenuation coefficient; if If the value is a non-integer, then the probability of the attenuation coefficient is rounded down to 1 minus the probability of the attenuation coefficient, and the probability of the attenuation coefficient is rounded up.
  • the collection module 701 when counting the access frequency of each access object, is specifically used to: count the access frequency of each access object within a preset interval, and the preset interval includes any of the following: One: preset time interval, preset traffic, and preset number of access requests.
  • the device may further include: a classification module 704, configured to determine whether the accessed object is hot data based on the popularity of the accessed object and the first popularity threshold; and/or, based on the popularity of the accessed object and The second hotness threshold determines whether the access object is cold data.
  • a classification module 704 configured to determine whether the accessed object is hot data based on the popularity of the accessed object and the first popularity threshold; and/or, based on the popularity of the accessed object and The second hotness threshold determines whether the access object is cold data.
  • the ranking module 704 is also configured to: sort the popularity of all stored access objects from large to small; use the popularity of the Nth access object as the first popularity threshold; the N satisfies The following conditions: N divided by the number of all visited objects satisfies the preset proportion condition; or, the sum of the popularity of the first N visited objects divided by the sum of the popularity of all visited objects satisfies the preset proportion condition.
  • the device may also include a transceiver module (not shown in the figure) for receiving a popularity query request, where the request is used to request to query the popularity of a target access object; and output the target access object of heat.
  • a transceiver module (not shown in the figure) for receiving a popularity query request, where the request is used to request to query the popularity of a target access object; and output the target access object of heat.
  • inventions of the present application also provide a computer device.
  • the computer device includes a processor 801 as shown in Figure 8, and a communication interface 802 connected to the processor 801.
  • the processor 801 may be a general processor, a microprocessor, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or one or more integrated circuits used to control the execution of the program of this application, etc.
  • a general-purpose processor may be a microprocessor or any conventional processor, etc.
  • Communication interface 802 is used to communicate with other devices, such as PCI bus interface, Ethernet, wireless access network (radio access network, RAN), wireless local area networks (WLAN), etc.
  • devices such as PCI bus interface, Ethernet, wireless access network (radio access network, RAN), wireless local area networks (WLAN), etc.
  • the processor 801 is configured to call the communication interface 802 to perform receiving and/or sending functions, and to perform the method as described in any of the previous possible implementations.
  • the computer device may also include a memory 803 and a communication bus 804.
  • the memory 803 is used to store program instructions and/or data, so that the processor 801 calls the instructions and/or data stored in the memory 803 to implement the above functions of the processor 801.
  • Memory 803 may be read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM) or other types of dynamic storage devices that can store information and instructions. It can also be electrically erasable programmable read-only memory (EEPROM) or can be used for portability or storage. Any other medium that has the desired program code in the form of instructions or data structures and can be accessed by a computer, without limitation.
  • the memory 803 may exist independently, such as an off-chip memory, and is connected to the processor 801 through a communication bus 804 . Memory 803 may also be integrated with processor 801.
  • Communication bus 804 may include a path for communicating information between the above-described components.
  • the computer device may communicate with the storage structure through a network, or the computer device may also include a storage structure (not shown in the figure).
  • the storage structure includes one or more memories, and the memory in the storage structure can be a disk, a solid state disk (solid state disk or solid state drive, SSD), a storage-class memory (storage-class memory, SCM), etc., used Used to store the object accessed by the access request.
  • the processor 801 can perform the following steps through the communication interface 802: obtain the access request from the application program, determine the access object of the access request; count the access frequency of each access object; and calculate the storage path of each access object according to and the access frequency of each access object, synchronously updating the popularity of the access object and the popularity of each node of the access object in the direction of the parent node in the storage path.
  • the popularity of the parent node is the value of each child under the parent node. The sum of node heats.
  • the processor 801 is further configured to: determine the access type for the access object according to the access request; when counting the access frequency of each access object, the processor 801 is specifically configured to: count The access frequency of the same access type for each access object.
  • the access object includes a file directory, a file, or a data block in a file.
  • the processor 801 when determining the access object of the access request, is specifically configured to: determine the file requested to be accessed according to the object identifier in the access request, and determine the file requested to be accessed according to the object identifier in the access request.
  • the offset and length of the access object are determined to be located in one or more blocks in the file; the processor 801 synchronously updates the access object according to the storage path of each access object and the access frequency of each access object.
  • the popularity of the object and the popularity of each node of the access object in the direction of the parent node in the storage path are specifically used for: based on the storage path of the one or more blocks and the access of the one or more blocks. Frequency, the popularity of the one or more blocks, the popularity of the file, and the popularity of each node in the direction of the parent node in the storage path of the file are updated synchronously.
  • the processor 801 may also be configured to: periodically attenuate the popularity of each accessed object; if the popularity of the first accessed object decays to less than or equal to a preset threshold, delete all Describes the popularity of the first visited object.
  • the processor 801 when attenuating the popularity of each access object, is specifically used to: multiply the popularity of each access object by the attenuation coefficient; if the value after multiplying by the attenuation coefficient If it is a non-integer, then 1 minus the probability of the attenuation coefficient is rounded down, and the probability of the attenuation coefficient is rounded up.
  • the processor 801 when counting the access frequency of each access object, is specifically configured to: count the access frequency of each access object within a preset interval, and the preset interval includes any of the following: One: preset time interval, preset traffic, and preset number of access requests.
  • the processor 801 may also be configured to: determine whether the accessed object is hot data based on the popularity of the accessed object and the first popularity threshold; and/or, determine whether the accessed object is hot data based on the popularity of the accessed object and the second popularity threshold.
  • the heat threshold determines whether the access object is cold data.
  • the processor 801 may also be configured to: store the hot data of all access objects The degree is sorted from large to small; the heat of the Nth accessed object is used as the first heat threshold; the N satisfies the following conditions: N divided by the number of all accessed objects satisfies the preset proportion condition; or, the first N accessed objects The sum of the popularity is divided by the sum of the popularity of all visited objects to meet the preset proportion conditions.
  • the processor 801 can also execute through the communication interface 802: receive a popularity query request, where the request is used to request to query the popularity of the target access object; and output the popularity of the target access object.
  • embodiments of the present application also provide a computer-readable storage medium.
  • Computer-readable instructions are stored in the computer-readable storage medium.
  • the computer-readable instructions When the computer-readable instructions are run on a computer, the above-mentioned The steps in the method are executed.
  • embodiments of the present application provide a computer program product containing instructions, which when run on a computer causes the steps in the above method to be executed.
  • embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions
  • the device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device.
  • Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed in the present application are a popularity identification method and apparatus in a file system, and a computer device, which are applied to the technical field of computers. The method comprises: acquiring an access request from an application program, and determining access objects and access types for the access request; counting the access frequency for the same access type of each access object; and according to a storage path of each access object and the access frequency for the same access type of each access object, synchronously updating the popularity of the access object and the popularity of each node in the direction of a parent node in the storage path where the access object is located, wherein the popularity of the parent node is the sum of the popularity of each child node under the parent node. In the embodiments of the present application, not only the popularity of an access object is counted and updated, but the popularity of a directory is also counted and updated, thereby facilitating the realization of storage analysis and optimization at a directory level; and the popularity of the access object and the popularity of a parent node of the access object are also synchronously updated, thereby solving the problem of update lag of the popularity of an upper-layer directory.

Description

一种文件系统中的热度识别方法、装置及计算机设备Heat identification method, device and computer equipment in file system 技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种文件系统中的热度识别方法、装置及计算机设备。The present application relates to the field of computer technology, and in particular to a heat identification method, device and computer equipment in a file system.
背景技术Background technique
根据时间局域性原理,正在被访问的数据在近期很可能会再次被访问,形成数据热点。数据冷热识别,对优化存储系统非常重要,它是实现数据冷热分级、数据放置和迁移的基础,也是存储系统实现高性价比必不可少的因素。According to the principle of time locality, the data being accessed is likely to be accessed again in the near future, forming a data hotspot. Identification of hot and cold data is very important for optimizing the storage system. It is the basis for realizing data hot and cold classification, data placement and migration, and is also an indispensable factor for the storage system to achieve high cost performance.
传统存储系统的数据冷热识别主要基于数据块,但随着非结构化数据的规模越来越大,存储系统中需要识别的文件数量和大小分别达到亿级别和千兆级别。除了识别文件的热度,有时还需记录目录的热度,构建目录热度有助于用户更全面地感知数据的冷热,进行数据分析和挖掘,以及目录级别的数据流动。目录的热度为目录下文件和子目录的热度之和。按照传统的方案,在用户请求获取某个目录的热度时,对整个目录下的文件和子目录进行递归遍历求和得到目录热度。The identification of hot and cold data in traditional storage systems is mainly based on data blocks. However, as the scale of unstructured data becomes larger and larger, the number and size of files that need to be identified in the storage system have reached billions and gigabytes respectively. In addition to identifying the popularity of files, sometimes it is also necessary to record the popularity of directories. Building directory popularity helps users more comprehensively perceive the hotness and coldness of data, conduct data analysis and mining, and perform directory-level data flow. The popularity of a directory is the sum of the popularity of the files and subdirectories under the directory. According to the traditional solution, when the user requests to obtain the popularity of a certain directory, the files and subdirectories in the entire directory are recursively traversed and summed to obtain the directory popularity.
但是,当文件目录数量很大时,递归遍历消耗的时间将变得很大。且文件和目的热度在更新时,自下而上逐层遍历,会导致上层目录热度更新滞后,无法满足实时性要求。However, when the number of file directories is large, the time consumed by recursive traversal will become very large. Moreover, when the file and destination popularity are updated, they are traversed layer by layer from bottom to top, which will cause the upper directory popularity update to lag behind and fail to meet real-time requirements.
发明内容Contents of the invention
本申请提供一种文件系统中的热度识别方法、装置及计算机设备,用于实现文件、目录的同步更新,避免上层目录更新滞后的问题。This application provides a heat identification method, device and computer equipment in a file system, which are used to realize synchronous updates of files and directories and avoid the problem of lag in upper directory update.
第一方面,本申请提供一种文件系统中的热度识别方法,该方法包括:获取来自应用程序的访问请求,确定所述访问请求的访问对象;统计每个访问对象的访问频次;根据每个访问对象的存储路径和每个访问对象的访问频次,同步更新所述访问对象的热度和所述访问对象在所述存储路径中父节点方向上每个节点的热度,父节点的热度为所述父节点下每个子节点热度之和。在本申请实施例中,不仅统计、更新访问对象的热度,还统计、更新目录的热度,有助于实现目录级别的存储分析、优化;此外,同步更新访问对象及其父节点方向上每个节点的热度。当后续需要查询父节点方向上任一节点的热度时,可以把更新后的热度数据直接返回给用户。而不用临时统计热度信息,因此该方案解决了上层目录热度更新滞后的问题,不存在目录树中的上层目录热度更新滞后、文件和目录热度时间不同步的问题,在需要输出目录热度时,不会因为目录热度更新滞后而输出未更新的热度信息,也不需要在需要输出目录热度时临时对目录的热度进行更新,对于大文件系统,能够满足实时性要求。In the first aspect, this application provides a method for identifying popularity in a file system. The method includes: obtaining an access request from an application, determining the access object of the access request; counting the access frequency of each access object; The storage path of the access object and the access frequency of each access object are synchronized to update the popularity of the access object and the popularity of each node of the access object in the direction of the parent node in the storage path. The popularity of the parent node is The sum of the popularity of each child node under the parent node. In the embodiment of this application, not only the popularity of the accessed object is counted and updated, but also the popularity of the directory is counted and updated, which helps to achieve directory-level storage analysis and optimization; in addition, each direction of the accessed object and its parent node is synchronously updated. The node's popularity. When it is subsequently necessary to query the popularity of any node in the direction of the parent node, the updated popularity data can be directly returned to the user. Instead of temporarily counting the popularity information, this solution solves the problem of lag in the update of the popularity of the upper directory. There is no problem of lag in the update of the popularity of the upper directory in the directory tree, or the time of the file and directory popularity being out of sync. When the directory popularity needs to be output, there is no need to Unupdated heat information will be output because the directory heat update lags behind, and there is no need to temporarily update the directory heat when the directory heat needs to be output. For large file systems, it can meet real-time requirements.
在一种可能的实现方式中,所述方法还包括:根据所述访问请求确定针对所述访问对象的访问类型;所述统计每个访问对象的访问频次,包括:统计每个访问对象的相同访问类型的访问频次。针对相同的访问对象,可以进一步区分不同访问类型的热度,例如分别 统计对同一访问对象的读操作的热度、写操作的热度,能够更加细致了解用户对该访问对象的需求,从而进行更加合理的存储优化等。In a possible implementation, the method further includes: determining the access type for the access object according to the access request; and counting the access frequency of each access object, including: counting the same access frequency of each access object. Access frequency of access type. For the same access object, the popularity of different access types can be further distinguished, for example, respectively By counting the popularity of read operations and write operations on the same access object, we can gain a more detailed understanding of the user's needs for the access object, allowing for more reasonable storage optimization.
在一种可能的实现方式中,所述访问对象包括文件目录、文件或者文件中的数据块。In a possible implementation, the access object includes a file directory, a file, or a data block in a file.
在一种可能的实现方式中,所述确定所述访问请求的访问对象,包括:根据所述访问请求中的对象标识确定请求访问的文件,根据所述访问请求中的偏移量和长度,确定访问对象位于所述文件中的一个或多个块;所述根据每个访问对象的存储路径和每个访问对象的访问频次,同步更新所述访问对象的热度和所述访问对象在所述存储路径中父节点方向上每个节点的热度,包括:根据所述一个或多个块的存储路径和所述一个或多个块的访问频次,同步更新所述一个或多个块的热度、所述文件的热度,以及所述文件在存储路径中父节点方向上每个节点的热度。在该实现方式中,可以对较大的文件进行分块存储,在统计热度时,针对每个块单独计算热度,根据热度对不同块的存储进行优化,而不必对整个文件进行缓存等。In a possible implementation, determining the access object of the access request includes: determining the file requested to be accessed based on the object identifier in the access request, and based on the offset and length in the access request, Determine that the access object is located in one or more blocks in the file; synchronize the popularity of the access object and the number of times the access object is in the file according to the storage path of each access object and the access frequency of each access object. Storing the popularity of each node in the direction of the parent node in the path includes: synchronously updating the popularity of the one or more blocks according to the storage path of the one or more blocks and the access frequency of the one or more blocks, The popularity of the file, and the popularity of each node in the direction of the parent node in the storage path of the file. In this implementation, larger files can be stored in blocks. When counting the popularity, the popularity is calculated separately for each block, and the storage of different blocks is optimized based on the popularity without having to cache the entire file.
在一种可能的实现方式中,所述方法还包括:周期性对每个访问对象的热度进行衰减;若第一访问对象的热度衰减至小于或等于预设阈值,则删除所述第一访问对象的热度。在本申请实施例中,由于无需设置单独的元数据,也没有了元数据独立的存储空间,因此需要考虑热度信息的大小及存储空间大小的问题,而随着时间积累,访问对象逐渐增加,热度信息也在逐渐增加,可能会导致热度信息存储空间不足的问题。对热度较低的访问对象的热度信息进行删减,以控制热度信息的所占用的存储空间。In a possible implementation, the method further includes: periodically attenuating the popularity of each access object; if the popularity of the first access object decays to less than or equal to a preset threshold, then deleting the first access The object's warmth. In the embodiment of this application, since there is no need to set up separate metadata and there is no independent storage space for metadata, it is necessary to consider the size of the popularity information and the size of the storage space. As time accumulates, the access objects gradually increase. The popularity information is also gradually increasing, which may cause the problem of insufficient storage space for the popularity information. The popularity information of less popular access objects is deleted to control the storage space occupied by the popularity information.
在一种可能的实现方式中,所述对每个访问对象的热度进行衰减,包括:将每个访问对象的热度乘以衰减系数;若乘以衰减系数后的值为非整数,则以1减去衰减系数的概率向下取整,以衰减系数的概率向上取整。若均采用向上取整或向下取整,则子节点的热度之和与父节点的热度就会存在差异,不利于后续根据热度对访问对象进行存储优化;而上述方法可以使得子节点的热度之和与父节点的热度相等或近似相等。In a possible implementation, attenuating the popularity of each access object includes: multiplying the popularity of each access object by an attenuation coefficient; if the value multiplied by the attenuation coefficient is a non-integer, then use 1 The probability minus the attenuation coefficient is rounded down, and the probability of the attenuation coefficient is rounded up. If rounding up or rounding down is used, there will be a difference between the sum of the heat of the child nodes and the heat of the parent node, which is not conducive to subsequent storage optimization of accessed objects based on heat; and the above method can make the heat of the child nodes The sum is equal or approximately equal to the heat of the parent node.
在一种可能的实现方式中,所述统计每个访问对象的访问频次,包括:统计预设区间内每个访问对象的访问频次,所述预设区间包括以下任一种:预设时间间隔,预设流量,预设数量的访问请求。若针对每个访问请求进行一次热度更新,在访问量较大时,频繁的更新操作会占用过多的带宽;而在上述方法中,对预设区间的访问进行统计,再进行热度更新,有利于节省带宽资源。In a possible implementation, counting the access frequency of each access object includes counting the access frequency of each access object within a preset interval, and the preset interval includes any of the following: a preset time interval , preset traffic, preset number of access requests. If a hotness update is performed for each access request, frequent update operations will occupy too much bandwidth when the number of visits is large. In the above method, the accesses in the preset interval are counted and then the hotness is updated. Helps save bandwidth resources.
在一种可能的实现方式中,所述方法还包括:根据访问对象的热度和第一热度阈值,确定访问对象是否为热数据;和/或,根据访问对象的热度和第二热度阈值,确定访问对象是否为冷数据。对数据进行冷热分级,有助于后续对数据进行存储优化。In a possible implementation, the method further includes: determining whether the accessed object is hot data according to the popularity of the accessed object and the first popularity threshold; and/or determining whether the accessed object is hot data according to the popularity of the accessed object and the second popularity threshold. Check whether the access object is cold data. Classifying hot and cold data will help optimize data storage in the future.
在一种可能的实现方式中,所述方法还包括:将存储的所有访问对象的热度从大到小排序;将第N个访问对象的热度作为第一热度阈值;所述N满足以下条件:N除以全部访问对象的数量满足预设比例条件;或者,前N个访问对象的热度之和除以全部访问对象的热度之和,满足预设比例条件。In a possible implementation, the method further includes: sorting the heat of all stored access objects from large to small; using the heat of the Nth access object as the first heat threshold; the N satisfies the following conditions: N divided by the number of all visited objects satisfies the preset proportion condition; or the sum of the popularity of the first N visited objects is divided by the sum of the popularity of all visited objects satisfies the preset proportion condition.
在一种可能的实现方式中,所述方法还包括:接收热度查询请求,所述请求用于请求查询目标访问对象的热度;输出所述目标访问对象的热度。In a possible implementation, the method further includes: receiving a popularity query request, the request being used to request to query the popularity of the target access object; and outputting the popularity of the target access object.
第二方面,本申请提供一种文件系统中的热度识别装置,所述装置包括执行上述第一方面以及第一方面的任意一种可能的实现方式的模块/单元;这些模块/单元可以通过硬件 实现,也可以通过硬件执行相应的软件实现。In a second aspect, this application provides a heat identification device in a file system. The device includes modules/units that execute the above-mentioned first aspect and any possible implementation of the first aspect; these modules/units can be implemented through hardware. Implementation can also be implemented by hardware executing corresponding software.
示例性的,该装置包括:采集模块,用于获取来自应用程序的访问请求,确定所述访问请求的访问对象;统计每个访问对象的访问频次;热度更新模块,用于根据每个访问对象的存储路径和每个访问对象的访问频次,同步更新所述访问对象的热度和所述访问对象在所述存储路径中父节点方向上每个节点的热度,父节点的热度为所述父节点下每个子节点热度之和。Exemplarily, the device includes: a collection module, used to obtain the access request from the application program, determine the access object of the access request; count the access frequency of each access object; and the popularity update module, used to calculate the number of access requests based on each access object. The storage path and the access frequency of each access object are synchronized to update the popularity of the access object and the popularity of each node of the access object in the direction of the parent node in the storage path. The popularity of the parent node is the parent node. The sum of the heat of each child node.
第三方面,本申请提供一种计算机设备,所述计算机设备包括存储器和处理器;所述存储器存储有计算机程序;所述处理器用于调用所述存储器中存储的计算机程序,以执行如第一方面及第一方面任一实现方式所述的方法。In a third aspect, the present application provides a computer device, the computer device including a memory and a processor; the memory stores a computer program; the processor is used to call the computer program stored in the memory to execute the first aspect and the method described in any implementation of the first aspect.
第四方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得所述计算机执行如第一方面及第一方面任一实现方式所述的方法。In a fourth aspect, the present application provides a computer-readable storage medium. Instructions are stored in the computer-readable storage medium. When the instructions are run on a computer, the computer is caused to execute the first aspect and the first aspect. The method described in any implementation.
第五方面,本申请提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得如第一方面及第一方面任一实现方式所述的方法被执行。In a fifth aspect, the present application provides a computer program product containing instructions that, when run on a computer, causes the method described in the first aspect and any implementation manner of the first aspect to be executed.
上述第二方面至第五方面中任一方面中的任一可能实现方式可以实现的技术效果,请参照上述第一方面中相应实现方案可以达到的技术效果说明,重复之处不予论述。For the technical effects that can be achieved by any of the possible implementation methods in any of the above-mentioned second to fifth aspects, please refer to the description of the technical effects that can be achieved by the corresponding implementation scheme in the above-mentioned first aspect, and repeated points will not be discussed.
附图说明Description of the drawings
图1为本申请实施例提供的Qumulo文件目录元数据示意图;Figure 1 is a schematic diagram of Qumulo file directory metadata provided by the embodiment of this application;
图2为本申请实施例提供的Qumulo文件目录元数据更新流程示意图;Figure 2 is a schematic diagram of the Qumulo file directory metadata update process provided by the embodiment of this application;
图3为本申请实施例提供的文件系统中热度识别方法的流程示意图;Figure 3 is a schematic flow chart of the heat identification method in the file system provided by the embodiment of the present application;
图4为本申请实施例提供的热度更新过程中热度信息生长示意图;Figure 4 is a schematic diagram of the growth of heat information during the heat update process provided by the embodiment of the present application;
图5为本申请实施例提供的冷、热数据分级示意图;Figure 5 is a schematic diagram of cold and hot data classification provided by the embodiment of the present application;
图6为本申请实施例提供的热度更新过程中热度信息剪枝示意图;Figure 6 is a schematic diagram of heat information pruning during the heat update process provided by the embodiment of the present application;
图7为本申请实施例提供的热度识别装置结构示意图;Figure 7 is a schematic structural diagram of a heat identification device provided by an embodiment of the present application;
图8为本申请实施例提供的计算机设备结构示意图。Figure 8 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
具体实施方式Detailed ways
通用并行文件系统(general parallel file system,GPFS)配置有文件热度识别方式,包括文件热度计算方式和热度更新方式。具体的,在计算热度时,可以计算文件访问次数的指数移动平均值,在一个周期内未被访问的文件的热度值将以百分比进行衰减。衰减百分比有效范围为0~100%,默认值为10%,也可以由用户自定义,或者由用户自定义。当用户访问文件时,则自动修改该文件的访问时间atime,文件的访问热度也随之增加。如果访问时间atime的更新被抑制,则文件访问的热度计算可能会受到不利影响,即存在如下的依赖关系:文件被访问->atime更新–>文件热度增加。The general parallel file system (GPFS) is configured with a file heat identification method, including a file heat calculation method and a heat update method. Specifically, when calculating the popularity, the exponential moving average of the number of file accesses can be calculated. The popularity value of files that have not been accessed within a period will decay by a percentage. The effective range of attenuation percentage is 0~100%, and the default value is 10%. It can also be customized by the user or customized by the user. When a user accesses a file, the access time atime of the file is automatically modified, and the access popularity of the file also increases. If the update of access time atime is suppressed, the heat calculation of file access may be adversely affected, that is, there is the following dependency relationship: file is accessed -> atime update -> file heat increases.
然而上述对文件热度的识别方法,仅计算文件的热度,并没有对文件目录的热度进行统计。由于构建目录热度有助于用户更全面地感知数据的冷热,进行数据分析和挖掘,以及目录级别的数据流动,因此,Qumulo研发的分布式文件系统(Qumulo file fabric,QF2)内置了文件元数据的实时聚合和实时分析特性。 However, the above method of identifying file popularity only calculates the popularity of files and does not count the popularity of file directories. Since building directory popularity helps users more comprehensively perceive the hotness and coldness of data, conduct data analysis and mining, and directory-level data flow, the distributed file system (Qumulo file fabric, QF2) developed by Qumulo has built-in file elements. Real-time aggregation and real-time analysis features of data.
Qumulo文件目录元数据可以如图1所示,其中秩(rank)表示目录或文件所在层级,size表示该目录或文件自身的大小,方括号里面的值分别表示未协调值和已协调值。目录的未协调值和已协调值分别等于该目录以及目录下所有文件和子目录的未协调值之和、已协调值之和。当文件F1的大小被更新时,首先修改未协调值,如图2中的步骤1所示。然后将及其存储路径添加到脏表(dirty list)中,如图2中的步骤2所示。接下来后台异步向上更新F1存储路径,修改F1的已协调值,如图2中的步骤3所示。进一步修改F1所在目录D1(也可以称为F1的父节点)的未协调值,如图2中的步骤4所示,并从dirty list中删除F1的存储路径并添加其父节点D1存储路径,如图2中的步骤5所示。Dirty list对象按照rank排序,更新时先处理rank值较大的对象,即先更新rank值较大的对象的未协调值和已协调值,再处理rank值较小的对象。QF2具有高性能实时分析功能的原因包括:(1)分析模块内置于QF2文件系统中,并与文件系统本身完全集成;(2)QF2文件系统基于多路搜索树(B-tree)实现。元数据实时聚合可以通过如下两项技术实现:(1)及时更新聚合型元数据,无需等待请求时遍历;(2)自下而上更新和自上而下遍历。Qumulo file directory metadata can be shown in Figure 1, where rank indicates the level of the directory or file, size indicates the size of the directory or file itself, and the values in square brackets indicate uncoordinated values and coordinated values respectively. The uncoordinated value and the reconciled value of a directory are equal to the sum of the uncoordinated values and the sum of the reconciled values of the directory and all files and subdirectories under the directory, respectively. When the size of file F1 is updated, the uncoordinated value is first modified, as shown in step 1 in Figure 2. Then add its storage path to the dirty list, as shown in step 2 in Figure 2. Next, the background asynchronously updates the F1 storage path upwards and modifies the coordinated value of F1, as shown in step 3 in Figure 2. Further modify the uncoordinated value of the directory D1 where F1 is located (which can also be called the parent node of F1), as shown in step 4 in Figure 2, and delete the storage path of F1 from the dirty list and add the storage path of its parent node D1, As shown in step 5 in Figure 2. Dirty list objects are sorted by rank. When updating, objects with larger rank values are processed first, that is, the uncoordinated values and coordinated values of objects with larger rank values are updated first, and then objects with smaller rank values are processed. The reasons why QF2 has high-performance real-time analysis capabilities include: (1) the analysis module is built into the QF2 file system and is fully integrated with the file system itself; (2) the QF2 file system is implemented based on a multi-path search tree (B-tree). Real-time aggregation of metadata can be achieved through the following two technologies: (1) timely updating of aggregated metadata without waiting for request traversal; (2) bottom-up update and top-down traversal.
然而,dirty list先更新rank值较大的对象,若用户频繁访问rank值较大的对象,将导致目录树中的上层目录热度更新滞后,文件和目录热度时间不同步,对于大文件系统,无法满足实时性要求。However, the dirty list updates objects with larger rank values first. If users frequently access objects with larger rank values, it will cause the upper-level directory popularity updates in the directory tree to lag behind, and the file and directory popularity times to be out of sync. For large file systems, this cannot be done. Meet real-time requirements.
有鉴于此,本申请实施例提供一种文件系统中的热度识别方法,用于实现文件、目录的同步更新,避免上层目录更新滞后的问题。In view of this, embodiments of the present application provide a method for identifying hotness in a file system, which is used to realize synchronous updates of files and directories and avoid the problem of lagging updates of upper-level directories.
该方法可以应用于热度识别装置中,即由热度识别装置执行该热度识别方法。该热度识别装置可以部署于独立的服务器中,或者,也可以与其他系统部署在同一服务器中,例如热度识别装置可以与存储系统部署在一起。进一步的,该热度识别装置可以包括采集模块和热度更新模块,其中,采用模块可以设置在客户端,用于从客户端采集数据,然后将采集到的数据或处理后的信息发送至设置在服务器中的热度更新模块中,热度更新模块用于对访问对象的热度进行更新。或者,也可以不必在客户端中设置采集模块,而是将热度识别装置全部设置于服务器中,客户端在接收到用户的访问请求后,将访问请求发送至服务器中的热度识别装置,以使热度识别装置进行热度统计、更新。This method can be applied in a heat identification device, that is, the heat identification device executes the heat identification method. The heat identification device may be deployed in an independent server, or may be deployed in the same server with other systems. For example, the heat identification device may be deployed together with a storage system. Further, the heat identification device can include a collection module and a heat update module, wherein the adoption module can be set on the client to collect data from the client, and then send the collected data or processed information to the server. In the popularity update module, the popularity update module is used to update the popularity of access objects. Alternatively, it is not necessary to set up a collection module in the client, but all the heat identification devices are set up in the server. After receiving the user's access request, the client sends the access request to the heat identification device in the server, so that The heat identification device performs heat statistics and updates.
参见图3,为本申请实施例提供的文件系统中热度识别方法的流程示意图,如图所示,该方法可以包括以下步骤:Refer to Figure 3, which is a schematic flow chart of a heat identification method in a file system provided by an embodiment of the present application. As shown in the figure, the method may include the following steps:
步骤301、获取来自应用程序的访问请求,确定访问请求的访问对象。Step 301: Obtain the access request from the application program and determine the access object of the access request.
在一种可能的实现方式中,上述步骤301可以由热度识别装置中设置在客户端的采集模块执行。具体的,采集模块可以通过应用程序接口API(application programming interface,API),进一步的,可以为该采集模块配置有缓存空间,采集模块将该采集到的访问请求存入缓存空间。In a possible implementation, the above step 301 can be performed by a collection module provided on the client in the heat identification device. Specifically, the collection module can use an application programming interface (API). Further, a cache space can be configured for the collection module, and the collection module stores the collected access requests in the cache space.
在另一种可能的实现方式中,也可以不在客户端设置采集模块,而是由客户端将访问请求发送至热度识别装置,以使热度识别装置能够根据访问请求进行热度统计、更新,也就是说,上述步骤301是由热度识别装置中的采集模块接收访问请求,并根据接收到的访问请求确定访问对象。In another possible implementation, the client does not need to set up a collection module, but the client sends the access request to the heat identification device, so that the heat identification device can perform heat statistics and updates based on the access request, that is, In other words, the above-mentioned step 301 is for the collection module in the heat recognition device to receive the access request and determine the access object based on the received access request.
访问请求中可以包括数据对象标识、操作字、偏移量以及长度等信息。其中,数据对象标识表示请求访问的文件或文件目录等。操作字表示用户对访问对象的操作类型,如读 操作、写操作等。用户在访问文件时,可能仅访问该文件的部分数据而并非访问该文件的全部数据,则可以通过偏移量表示用户请求访问的数据的起始位置,通过长度表示用户请求访问的数据大小,例如,一个文件的大小为1GB,用户需要访问的数据为该文件中间从0.5GB~0.6GB的数据,那么访问请求中的偏移量为0.5GB,长度为0.1GB。The access request can include information such as data object identification, operation words, offset and length. Among them, the data object identifier represents the file or file directory requested to be accessed, etc. The operation word indicates the user's operation type on the access object, such as reading operations, write operations, etc. When a user accesses a file, he or she may only access part of the data in the file but not all the data in the file. The offset can be used to represent the starting position of the data the user requests to access, and the length can be used to represent the size of the data the user requests to access. For example, if the size of a file is 1GB, and the data that the user needs to access is the data from 0.5GB to 0.6GB in the middle of the file, then the offset in the access request is 0.5GB and the length is 0.1GB.
采集模块可以根据访问请求中的数据对象标识确定访问对象,或者,根据访问请求中的数据对象标识、偏移量、长度确定访问对象。其中,访问对象可以包括文件目录、文件或者文件中的数据块。The collection module can determine the access object based on the data object identifier in the access request, or determine the access object based on the data object identifier, offset, and length in the access request. Among them, the access objects may include file directories, files, or data blocks in files.
在本申请实施例中,每个文件的数据在存储时,可以分块存储,也可以不分块,例如,对于大小达到预设阈值的文件进行分块,没有达到预设阈值的文件则不进行分块。对于没有分块存储的文件,在根据访问请求确定访问对象时,可以仅根据数据对象标识确定访问的文件或文件目录即可。对于分块存储的文件,在为了后续能够精确统计每个块的热度,在上述步骤中,可以根据访问请求确定请求访问的块。例如,文件A的大小为1GB,被划分成10块进行存储,每个块对应的数据为0~0.1GB,0.1GB~0.2GB,0.2GB~0.3GB,…,0.9GB~1GB;若访问请求中的偏移量为0.55GB,长度为0.1GB,用户请求范围0.55~0.65GB之间的数据,那么用户请求访问的数据位于第6个块和第7个块,则确定出的访问对象为文件A的第6个块和第7个块。In this embodiment of the present application, when the data of each file is stored, it may be stored in blocks or not. For example, files whose size reaches a preset threshold are divided into blocks, and files which do not reach the preset threshold are not. Carry out chunking. For files that are not stored in blocks, when determining the access object based on the access request, the accessed file or file directory can be determined based only on the data object identifier. For files stored in blocks, in order to accurately count the popularity of each block in the future, in the above steps, the block requested to be accessed can be determined according to the access request. For example, the size of file A is 1GB and is divided into 10 blocks for storage. The data corresponding to each block is 0~0.1GB, 0.1GB~0.2GB, 0.2GB~0.3GB,..., 0.9GB~1GB; if accessed The offset in the request is 0.55GB, the length is 0.1GB, and the user requests data between 0.55 and 0.65GB. Then the data the user requests to access is located in the 6th and 7th blocks, then the access object is determined are the 6th and 7th blocks of file A.
进一步的,还可以获取访问请求的时间,从而便于后续对热度统计时能够从时间角度进行更加精确的分析。具体的,访问请求中可以携带有时间信息,那么可以直接从访问请求中获取时间信息;或者,访问请求中也可能不包含有时间信息,那么可以在获取到该访问请求时,记录获取到访问请求的时间。Furthermore, the time of the access request can also be obtained, so that subsequent heat statistics can be analyzed more accurately from a time perspective. Specifically, the access request may carry time information, then the time information can be obtained directly from the access request; or, the access request may not contain time information, then the access request can be recorded when the access request is obtained. Requested time.
步骤302、统计每个访问对象的访问频次。Step 302: Count the access frequency of each access object.
例如,若访问请求1请求读取文件A的第6个块和第7个块,则对文件A的第6个块的读操作次数加1、对文件A的第7个块的读操作加1;若访问请求2请求对文件A中的第6个块进行写操作,则对文件A的第6个块的写操作次数加1;若访问请求3请求读取文件B,则对文件B的读操作加1;若访问请求4请求读取目录C,则对目录C的读操作加1。For example, if access request 1 requests to read the 6th and 7th blocks of file A, the number of read operations for the 6th block of file A is increased by 1, and the number of read operations for the 7th block of file A is increased by 1. 1; if access request 2 requests to write the 6th block in file A, then the number of write operations on the 6th block in file A is increased by 1; if access request 3 requests to read file B, then file B The read operation of directory C is increased by 1; if access request 4 requests to read directory C, the read operation of directory C is increased by 1.
上述步骤302可以由采集模块执行,既可以由设置在客户端中的采集模块执行,也可以由设置在服务器中的采集模块执行。The above step 302 can be executed by the collection module, either by the collection module provided in the client or by the collection module provided in the server.
可选的,采集模块在执行上述步骤302时,可以对预设区间内的访问请求进行统计,统计每个访问对象的访问频次,其中,预设区间可以是预设时间段,也可以是预设流量段,还可以是预设访问请求的数量等。采集模块在统计出预设区间内每个访问对象的访问频次后,将统计出的频次发送至用于更新访问对象热度的热度更新模块,然后将统计出的频次数据清理,重新统计下一个区间内的访问频次。Optionally, when the collection module performs the above step 302, it can count the access requests within the preset interval and count the access frequency of each access object, where the preset interval can be a preset time period or a preset time period. Set the traffic segment, or preset the number of access requests, etc. After the collection module counts the access frequency of each access object within the preset interval, it sends the calculated frequency to the heat update module used to update the access object heat, then cleans the calculated frequency data and re-counts the next interval. frequency of visits within.
例如,若预设时间段为10分钟,那么可以对10分钟内的所有访问请求,统计每个访问对象的访问次数;当10分钟到达后,将统计出的访问次数发送至用于更新访问对象热度的热度更新模块,然后将访问次数清零,重新统计下一个10分钟内的访问次数。又例如,若预设流量段为1MB,那么可以对总大小不超过1MB内的多个访问请求进行统计,统计每个访问对象的访问次数,当获取到的新的访问请求与已获取的访问请求流量之和超过1MB,则将已统计出的访问次数发送至热度更新模块,然后将访问次数清零,对获取到的新的预设流量内的访问请求进行统计。再例如,若预设访问请求的数量为50次,那么可以对50个 访问请求所访问的对象的访问次数进行统计,并将统计的访问次数发送至热度更新模块,然后将统计次数清零,重新统计此后获取到的50个访问请求所访问的对象的访问次数。For example, if the preset time period is 10 minutes, then the number of visits to each access object can be counted for all access requests within 10 minutes; when 10 minutes is reached, the counted number of visits will be sent to the server for updating the access object. The popularity update module of Hotness then clears the number of visits to zero and re-counts the number of visits in the next 10 minutes. For another example, if the preset traffic segment is 1MB, then multiple access requests with a total size not exceeding 1MB can be counted, and the number of accesses for each access object can be counted. When the new access requests obtained are compared with the accesses that have been obtained, If the sum of the request traffic exceeds 1MB, the counted number of visits will be sent to the popularity update module, and then the number of visits will be cleared, and the access requests within the new preset traffic will be counted. For another example, if the default number of access requests is 50, then 50 The number of visits to the object accessed by the access request is counted, and the counted number of visits is sent to the popularity update module, and then the statistical number is cleared to zero, and the number of visits to the object accessed by the 50 access requests obtained thereafter is re-counted.
采集模块对预设区间内的访问请求进行统计,然后再发送至热度更新模块,能够减少发送次数,有助于减少热度更新所占用的带宽资源,而不必像GPFS进行热度统计、更新时,每获取到一个访问请求则发送一次,更新过于频繁,占用过多的带宽资源。The collection module counts the access requests within the preset interval and then sends them to the popularity update module, which can reduce the number of transmissions and help reduce the bandwidth resources occupied by popularity updates. It does not have to be like GPFS for popularity statistics and updates every time. Once an access request is obtained, it is sent once. The updates are too frequent and occupy too much bandwidth resources.
不过,在本申请的一个极端实施例中,当预设区间为预设数量的访问请求,且预设数量为1时,则表示获取模块需要针对每个访问请求,向热度更新模块上报一次。However, in an extreme embodiment of the present application, when the preset interval is a preset number of access requests, and the preset number is 1, it means that the acquisition module needs to report to the popularity update module once for each access request.
步骤303、根据每个访问对象的存储路径和每个访问对象的访问频次,同步更新访问对象的热度和访问对象所在存储路径中父节点方向上每个节点的热度。Step 303: According to the storage path of each access object and the access frequency of each access object, synchronously update the popularity of the access object and the popularity of each node in the direction of the parent node in the storage path where the access object is located.
上述步骤可以由设置在服务器中的热度更新模块执行。具体的,热度更新模块根据接收到的每个访问对象的访问频次,对自身存储的各访问对象的访问热度进行更新。例如,可以将对访问对象的访问次数作为该访问对象的热度值,那么热度更新模块在接收到每个访问对象的访问次数后,将每个访问对象的已存储的访问次数,加上新获取到的访问次数,从而得到更新后的每个访问对象的访问次数,即更新后的热度值。The above steps can be performed by a hotness update module set in the server. Specifically, the popularity update module updates the access popularity of each access object stored in itself according to the access frequency of each access object received. For example, the number of visits to an access object can be used as the popularity value of the access object. Then, after receiving the number of accesses to each access object, the popularity update module adds the stored number of accesses to each access object to the newly acquired to obtain the updated number of visits to each access object, that is, the updated popularity value.
热度更新模块若接收到的新的访问对象的访问频次信息,即热度更新模块此前并没有存储该访问对象的访问次数,那么热度更新模块可以生成关于该访问对象的热度信息。进一步的,若热度更新模块也没有存储该访问对象父节点的热度信息,也需要生成其父节点的热度信息;若也没有存储父节点的父节点的热度信息,也需要生成父节点的父节点的热度信息,直至访问对象所在的根目录。如图4所示,在更新之前,热度更新模块存储的热度信息如图4中的(a)所示,存储有目录00的读取操作的热度信息,存储有目录00下子目录11和子目录12的读取操作的热度信息,还存储有子目录11下文件21的读取操作的热度信息、子目录12下文件22、文件23的读取操作的热度信息;热度更新模块接收到采集模块发送的各访问对象的热度信息,确定需要增加子目录21下文件24的读取操作的热度信息,如图4中的(b)所示。If the popularity update module receives the access frequency information of a new access object, that is, the popularity update module has not previously stored the access frequency of the access object, then the popularity update module can generate popularity information about the access object. Furthermore, if the popularity update module does not store the popularity information of the parent node of the access object, it also needs to generate the popularity information of its parent node; if it does not store the popularity information of the parent node's parent node, it also needs to generate the parent node's parent node. The popularity information is accessed until the root directory where the object is located. As shown in Figure 4, before the update, the heat information stored by the heat update module is shown in (a) of Figure 4. The heat information of the read operation of directory 00 is stored, and subdirectory 11 and subdirectory 12 under directory 00 are stored. The heat information of the read operation also stores the heat information of the read operation of file 21 under subdirectory 11, and the heat information of the read operation of file 22 and file 23 under subdirectory 12; the heat update module receives the heat information sent by the collection module Based on the heat information of each access object, it is determined that the heat information of the read operation of the file 24 under the subdirectory 21 needs to be added, as shown in (b) in Figure 4 .
在本申请实施例中,不仅记录每个访问对象的热度,还同时记录有目录的热度,从而便于后续对目录级别的存储分析、优化。具体的,在更新访问对象的热度时,还需要获取访问对象的存储路径,并对访问对象在存储路径中父节点方向上的每个节点的热度进行更新,其中父节点的热度为父节点下每个子节点的热度值之和。例如,访问对象为图2中所示的文件F1,获取其存储路径/D0/D1/F1,文件夹D1为文件F1的父节点,文件夹D0为文件夹D1的父节点;则对文件F1的热度进行更新时,还需要对文件夹D1、文件夹D0的热度值进行更新。文件夹D1的热度等于文件夹D1下所有文件热度之和,文件夹D0的热度等于文件夹D0下所有文件热度之和。此外,对于文件分块存储、访问对象为块的情况,则访问对象的父节点为文件,文件的热度值为文件下所有块的热度之和。In the embodiment of this application, not only the popularity of each accessed object is recorded, but also the popularity of the directory is recorded at the same time, thereby facilitating subsequent storage analysis and optimization of the directory level. Specifically, when updating the popularity of the accessed object, it is also necessary to obtain the storage path of the accessed object, and update the popularity of each node in the direction of the parent node of the accessed object in the storage path, where the popularity of the parent node is The sum of the popularity values of each child node. For example, the access object is file F1 shown in Figure 2, and its storage path /D0/D1/F1 is obtained. Folder D1 is the parent node of file F1, and folder D0 is the parent node of folder D1; then for file F1 When updating the popularity, the popularity values of folder D1 and folder D0 also need to be updated. The popularity of folder D1 is equal to the sum of the popularity of all files under folder D1, and the popularity of folder D0 is equal to the sum of the popularity of all files under folder D0. In addition, for files stored in blocks and the access object is a block, the parent node of the access object is the file, and the popularity value of the file is the sum of the popularity of all blocks under the file.
在本申请实施例中,不仅统计、更新访问对象的热度,还统计、更新目录的热度,与GPFS所采用的热度识别方式相比,有助于实现目录级别的存储分析、优化。虽然QF2的热度识别也能够记录目录的热度,但QF2在更新热度时,总是先更新rank值较大的对象,若用户频繁访问rank值较大的对象,将导致目录树中的上层目录热度更新滞后,文件和目录热度时间不同步,在需要输出目录热度时,可能会因为目录热度更新滞后而输出未更新的热度信息,或者在需要输出目录热度时临时对目录的热度进行更新,对于大文件系统,无 法满足实时性要求。而在本申请实施例中,同步更新访问对象及其父节点的热度,解决了上层目录热度更新滞后的问题,使得在需要输出目录热度时,能够及时输出最新的热度信息。In the embodiment of this application, not only the popularity of accessed objects is counted and updated, but also the popularity of directories is counted and updated. Compared with the heat identification method used by GPFS, it helps to achieve directory-level storage analysis and optimization. Although QF2's popularity recognition can also record the popularity of a directory, when QF2 updates the popularity, it always updates the object with a larger rank value first. If the user frequently accesses the object with a larger rank value, the popularity of the upper directory in the directory tree will be reduced. The update lags, and the file and directory popularity times are not synchronized. When the directory popularity needs to be output, unupdated popularity information may be output because the directory popularity update lags, or the directory popularity may be temporarily updated when the directory popularity needs to be output. For large users, File system, none cannot meet the real-time requirements. In the embodiment of this application, the popularity of the access object and its parent node is updated synchronously, which solves the problem of lagging update of the upper directory popularity, so that when the directory popularity needs to be output, the latest popularity information can be output in a timely manner.
为了更加细致分析用户的访问需求,还可以对相同访问对象的不同访问类型分别进行热度统计。具体的,采集模块不仅根据访问请求确定访问对象,还可以根据访问请求中的操作字确定该访问请求的访问类型,如读操作、写操作;针对每个访问对象的不同访问类型分别统计访问频次。热度更新模块则针对每个访问对象的每种访问类型分别进行热度更新。区分访问类型的热度更新,够更加细致了解用户对该访问对象的需求,从而进行更加合理的存储优化等。In order to analyze users' access needs in more detail, you can also perform heat statistics on different access types for the same access object. Specifically, the collection module not only determines the access object based on the access request, but also determines the access type of the access request based on the operation word in the access request, such as read operation and write operation; and counts the access frequency separately for different access types of each access object. . The popularity update module updates the popularity separately for each access type of each access object. Hot updates that distinguish access types can provide a more detailed understanding of users' needs for access objects, allowing for more reasonable storage optimization.
为了给存储、推送等业务提供参考信息、优化依据,热度识别装置还可以输出热度信息。例如,热度识别装置可以周期性的向存储服务器发送各访问对象的最新热度,以便于存储服务器根据各访问对象的热度进行存储优化。此外,热度识别装置也可以接收用于查询目标访问对象热度的热度查询请求,则热度识别装置可以根据请求确定目标访问对象,并输出目标访问对象当前的热度信息。In order to provide reference information and optimization basis for storage, push and other services, the heat identification device can also output heat information. For example, the popularity identification device can periodically send the latest popularity of each access object to the storage server, so that the storage server can perform storage optimization based on the popularity of each access object. In addition, the popularity identification device can also receive a popularity query request for querying the popularity of a target access object, and the popularity identification device can determine the target access object according to the request and output the current popularity information of the target access object.
进一步的,在确定访问对象及其父节点的热度后,还可以进一步根据热度值对数据的冷、热进行划分,从而为存储、推送等业务提供更加明确的参考信息、优化依据,简化存储服务器、推送服务器的操作。具体的,可以将每个访问数据的热度值与第一热度阈值进行比较,若大于或等于第一热度阈值,则将该访问对象作为的热门数据。类似的,也可以将每个访问数据的热度值与第二热度阈值进行比较,若小于或等于第一热度阈值,则将该访问对象作为的冷门数据。其中,第一热度阈值和第二热度阈值可以相等,也可以不等;若不等,则第一热度阈值大于第二热度阈值。Furthermore, after determining the popularity of the access object and its parent node, the cold and hot data can be further divided according to the popularity value, thereby providing clearer reference information and optimization basis for storage, push and other services, and simplifying the storage server. , push server operation. Specifically, the popularity value of each accessed data can be compared with the first popularity threshold. If it is greater than or equal to the first popularity threshold, the access object is regarded as the popular data. Similarly, the popularity value of each accessed data can also be compared with the second popularity threshold. If it is less than or equal to the first popularity threshold, the access object is regarded as unpopular data. The first heat threshold and the second heat threshold may be equal or unequal; if not, the first heat threshold is greater than the second heat threshold.
可选的,第一热度阈值和第二热度阈值可以是预先设置的,也可以是热度识别装置经过机器学习后得到的,或者,还可以是根据预设的策略求得的。Optionally, the first heat threshold and the second heat threshold may be preset, may be obtained by machine learning by the heat identification device, or may be obtained according to a preset strategy.
在一种可能的实现方式中,第一热度阈值可以根据下述方法确定:首先根据访问对象的总数量和预设比例值,确定达到总数量预设比例的访问对象的数量,将确定出的数量用N表示。然后针对访问对象的热度值从大到小进行排序,并确定排在第N位的热度值,将第N位的热度值作为第一热度阈值。In a possible implementation, the first popularity threshold can be determined according to the following method: first, based on the total number of access objects and the preset proportion value, determine the number of access objects that reach the preset proportion of the total number, and then determine the number of access objects that reach the preset proportion of the total number. The quantity is represented by N. Then, the popularity values of the accessed objects are sorted from large to small, and the Nth heat value is determined, and the Nth heat value is used as the first heat threshold.
在另一种可能的实现方式中,第一热度阈值还可以根据如下方式确定:首先针对访问对象的热度值从大到小进行排序。然后,求取热度值的累加和,记为K。之后,对经过排序后的热度值从前往后依次进行累加,例如对第一个热度值进行累加得到L1,对第一个和第二个热度值进行累加得到L2,累加至第i个热度值得到Li,当LN-1不满足预设比例条件,但LN满足预设比例条件时,则将第N个热度值作为第一热度阈值。其中,满足预设比例条件,可以是大于等于预设比值。In another possible implementation, the first popularity threshold can also be determined in the following manner: first, the popularity values of the accessed objects are sorted from large to small. Then, find the cumulative sum of the heat values, recorded as K. After that, the sorted popularity values are accumulated from front to back. For example, the first popularity value is accumulated to obtain L 1 , the first and second popularity values are accumulated to obtain L 2 , and accumulated to the i-th The heat value is obtained Li . When L N-1 does not meet the preset proportion condition, but L N meets the preset proportion condition, the Nth heat value is used as the first heat threshold. Wherein, satisfying the preset ratio condition may be greater than or equal to the preset ratio.
根据上述两种方式中任一方式确定出的第一热度阈值,也可以同时作为第二热度阈值;或者,当第一热度阈值与第二热度阈值不等时,也可以基于上述两种方式中的任一方式,通过设置不同的预设比例或预设比值,确定出第二热度阈值。根据第一热度阈值和/或第二热度阈值,即可对访问对象进行冷、热分级。The first heat threshold determined according to either of the above two methods can also be used as the second heat threshold at the same time; or, when the first heat threshold and the second heat threshold are not equal, the first heat threshold can also be determined based on the above two methods. In any method, the second heat threshold is determined by setting different preset ratios or preset ratios. According to the first heat threshold and/or the second heat threshold, the access object can be classified into cold and hot categories.
由于父节点的热度为其所有子节点热度之和,故父节点的热度值大于或等于其子节点的热度值,因此,若一个叶节点为热门数据,那么该叶节点的父节点以及父节点的父节点 直至根目录均为热门数据。可选的,为了简化热门数据的确定过程,可以仅对叶节点的热度值进行排序,并确定每个叶节点是否为热门数据。对于每个非叶节点,判断其是否包含有热门数据的子节点,若包含,则将该非叶节点确定为热门数据。Since the heat of the parent node is the sum of the heat of all its child nodes, the heat value of the parent node is greater than or equal to the heat value of its child nodes. Therefore, if a leaf node is popular data, then the parent node of the leaf node and the parent node parent node Up to the root directory is popular data. Optionally, in order to simplify the process of determining popular data, you can only sort the popularity values of leaf nodes and determine whether each leaf node is popular data. For each non-leaf node, determine whether it contains a child node of popular data. If it does, determine the non-leaf node as popular data.
例如,在图5所示的实施例中,目录00下包含有子目录10、子目录11、以及子目录12;子目录10下包含有子目录20,子目录20下包含有文件30、文件31和文件32;子目录11下包含有文件21和子目录22,子目录22下包含有文件33和文件34,其中,文件33可以被视为一个数据块40,数据块40又被拆分成子块50和子块51,而子块50被分为子块60和子块61,子块51被分为子块62和子块63;子目录12下包含有子目录23和子目录24,子目录24下包含有文件35和文件36。在图5所示的目录树中,文件30、文件31、文件32、子块60、子块61、子块62、子块63、文件34、文件35以及文件36为叶节点,可以先对这10个叶节点,根据前述方式确定第一热度阈值,并确定每个叶节点是否为热门数据,然后再确定其他非叶节点是否为热门数据。具体的,若这10个叶节点中,子块60和文件35为热门数据,那么在子块60和文件35的存储路径上的节点均为热门数据。其中,子块60的存储路径为:目录00-子目录11-子目录22-文件33-块40-子块50-子块60,那么该存储路径中的目录00、子目录11、子目录22、文件33、块40以及子块50也均为热门数据。文件35的存储路径为:目录00-子目录12-子目录24-文件35,那么该存储路径中的目录00、子目录12以及子目录24也均为热门数据。For example, in the embodiment shown in Figure 5, directory 00 contains subdirectory 10, subdirectory 11, and subdirectory 12; subdirectory 10 contains subdirectory 20, and subdirectory 20 contains file 30, file 31 and file 32; subdirectory 11 contains file 21 and subdirectory 22, and subdirectory 22 contains file 33 and file 34. Among them, file 33 can be regarded as a data block 40, and the data block 40 is split into sub-directories. Block 50 and sub-block 51, and sub-block 50 is divided into sub-block 60 and sub-block 61, sub-block 51 is divided into sub-block 62 and sub-block 63; sub-directory 12 contains sub-directory 23 and sub-directory 24, sub-directory 24 Contains file 35 and file 36. In the directory tree shown in Figure 5, file 30, file 31, file 32, sub-block 60, sub-block 61, sub-block 62, sub-block 63, file 34, file 35 and file 36 are leaf nodes. You can first For these 10 leaf nodes, determine the first popularity threshold according to the aforementioned method, determine whether each leaf node is popular data, and then determine whether other non-leaf nodes are popular data. Specifically, if among these 10 leaf nodes, sub-block 60 and file 35 are popular data, then the nodes on the storage paths of sub-block 60 and file 35 are all popular data. Among them, the storage path of sub-block 60 is: directory 00-sub-directory 11-sub-directory 22-file 33-block 40-sub-block 50-sub-block 60, then directory 00, sub-directory 11, sub-directory in the storage path 22. File 33, block 40 and sub-block 50 are also popular data. The storage path of file 35 is: directory 00 - subdirectory 12 - subdirectory 24 - file 35. Then directory 00, subdirectory 12 and subdirectory 24 in the storage path are also popular data.
若访问对象的热度值一直累加,即使是不经常被访问的数据,随着时间的积累,其访问次数也会逐渐增加,即热度值一直增加,并不利于对冷、热数据的识别。因此,可以周期性对每个访问对象的热度值进行衰减,从而避免冷数据的热度值一直增加。例如,可以周期性的将每个访问对象的每种访问类型的热度值,乘以衰减系数α,其中,0<α<1,以实现降低其热度值。举例说明,假设衰减系数α为0.5,各访问对象的各自访问类型的热度值,每30分钟乘以衰减系数完成一次热度衰减;若文件夹1的读取操作的热度值为30,文件夹1下包含文件A和文件B,文件A的读取操作热度值为20,其中,文件A块1的读取操作热度值为15,文件A块2的读取操作热度值为5,文件B的读取操作热度值为10。到达衰减时刻,则文件夹1的读取操作的热度值为30*0.5=15,文件A的读取操作热度值为20*0.5=10,文件A块1的读取操作的热度值为15*0.5=7.5,文件A块2读取操作的热度值为5*0.5=2.5,文件B的读取操作热度值为10*0.5=5。If the popularity value of the access object keeps accumulating, even for data that is not frequently accessed, the number of visits will gradually increase over time, that is, the popularity value keeps increasing, which is not conducive to the identification of cold and hot data. Therefore, the heat value of each accessed object can be periodically attenuated to prevent the heat value of cold data from increasing all the time. For example, the popularity value of each access type of each access object can be periodically multiplied by the attenuation coefficient α, where 0<α<1, to reduce its popularity value. For example, assuming that the attenuation coefficient α is 0.5, the heat value of each access type of each access object is multiplied by the attenuation coefficient every 30 minutes to complete the heat attenuation; if the heat value of the read operation of folder 1 is 30, the heat value of the read operation of folder 1 is 30. The following contains file A and file B. The read operation heat value of file A is 20. Among them, the read operation heat value of block 1 of file A is 15, the read operation heat value of block 2 of file A is 5, and the read operation heat value of file B is 20. The read operation heat value is 10. When the decay time is reached, the heat value of the read operation of folder 1 is 30*0.5=15, the heat value of the read operation of file A is 20*0.5=10, and the heat value of the read operation of block 1 of file A is 15. *0.5=7.5, the heat value of the read operation of block 2 of file A is 5*0.5=2.5, and the heat value of the read operation of file B is 10*0.5=5.
乘以衰减系数后,文件A块1的热度值变为7.5,文件A块2热度值变为2.5,为了便于计算,在一种可能的实现方式中,可以对其进行取整操作。但若均采用向上取整或向下取整,则文件A块1的热度值和文件A块2的热度值之和与文件A的热度值可能不相等。为了使父节点的热度值等于或近似等于父节点所包含的所有子节点热度值之和,在取整时,可以按照α的概率向上取整,以1-α的概率向下取整,进行向上取整或向下取整。After multiplying by the attenuation coefficient, the heat value of file A block 1 becomes 7.5, and the heat value of file A block 2 becomes 2.5. In order to facilitate calculation, in a possible implementation, they can be rounded. However, if rounding up or rounding down is used, the sum of the popularity value of block 1 of file A and the popularity value of block 2 of file A may not be equal to the popularity value of file A. In order to make the heat value of the parent node equal to or approximately equal to the sum of the heat values of all child nodes contained in the parent node, when rounding, you can round up according to the probability of α, and round down according to the probability of 1-α. Round up or down.
在QF2的热度识别方式中,在存储元数据的模块中记录数据热度信息,且元数据具有独立的存储空间,存储资源充足,因此不需要考虑热度信息所占存储资源大小的问题。而在本申请实施例中,由于无需设置单独的元数据,也没有了元数据独立的存储空间,故热度信息的大小,是本申请实施例需要考虑的问题。随着时间积累,访问对象逐渐增加,热度信息也在逐渐增加,可能会导致热度信息存储空间不足的问题。In QF2's heat identification method, data heat information is recorded in the module that stores metadata, and the metadata has independent storage space and sufficient storage resources, so there is no need to consider the size of the storage resources occupied by the heat information. In the embodiment of the present application, since there is no need to set separate metadata and there is no independent storage space for metadata, the size of the popularity information is an issue that needs to be considered in the embodiment of the present application. As time accumulates, the number of access objects gradually increases, and so does the popularity information, which may lead to insufficient storage space for the popularity information.
为了解决热度信息存储资源有限的问题,在一种可能的实现方式中,当某个访问对象 的热度值衰减到预设阈值以下时,则删除该访问对象的热度值,即,本申请实施例提供一种剪枝方案,从而控制热度信息所占用的存储空间,避免热度信息仅增加但不会减少而导致存储控制不足的问题。In order to solve the problem of limited heat information storage resources, in a possible implementation, when an access object When the heat value of the access object decays below the preset threshold, the heat value of the access object is deleted. That is, the embodiment of the present application provides a pruning solution to control the storage space occupied by the heat information and avoid the heat information only increasing but not Problems that lead to insufficient storage control will be reduced.
图6示例性的提供了一种剪枝示意图。在图6所示的实施例中,设置的预设阈值为0,即热度值衰减为0时则删除该热度信息。在衰减之前,热度更新模块存储的热度信息如图6中的(a)所示,存储有目录00的读取操作的热度值6,目录00下子目录11和子目录12的读取操作的热度值分别为2、4,子目录11下文件21的读取操作的热度值为2,子目录12下文件22、文件23、文件24、文件25的读取操作的热度值均为1;假设衰减系数α为0.5,进行衰减后,目录00的读取操作的热度值3,目录00下子目录11和子目录12的读取操作的热度值分别为1、2,子目录11下文件21的读取操作的热度值为1,子目录12下文件22、文件23、文件24、文件25以α(即50%)的读取操作的热度值概率取1,以1-α的概率取0,得到文件22的读取操作的热度值为1,文件23的读取操作的热度值为0,文件24的读取操作的热度值为1,文件25的读取操作的热度值为0,由于文件23和文件25的读取操作的热度值衰减为0,需要删除其热度信息,即对文件23、文件25的读取操作的热度信息进行剪枝,如图6中的(b)所示。Figure 6 exemplarily provides a schematic diagram of pruning. In the embodiment shown in FIG. 6 , the preset threshold is set to 0, that is, when the heat value decays to 0, the heat information is deleted. Before attenuation, the heat information stored by the heat update module is shown in (a) in Figure 6. It stores the heat value 6 of the read operation of directory 00, and the heat values of the read operations of subdirectory 11 and subdirectory 12 under directory 00. are 2 and 4 respectively, the heat value of the read operation of file 21 under subdirectory 11 is 2, and the heat value of the read operation of file 22, file 23, file 24, and file 25 under subdirectory 12 are all 1; assuming attenuation The coefficient α is 0.5. After attenuation, the heat value of the read operation of directory 00 is 3, the heat values of the read operations of subdirectory 11 and subdirectory 12 under directory 00 are 1 and 2 respectively, and the heat value of the read operation of file 21 under subdirectory 11 The heat value of the operation is 1. Files 22, 23, 24, and 25 under subdirectory 12 have a heat value probability of α (i.e. 50%) of the read operation of 1, and a probability of 1-α of 0, and we get The heat value of the read operation of file 22 is 1, the heat value of the read operation of file 23 is 0, the heat value of the read operation of file 24 is 1, and the heat value of the read operation of file 25 is 0. Since the file The heat values of the read operations of files 23 and 25 decay to 0, and their heat information needs to be deleted, that is, the heat information of the read operations of files 23 and 25 is pruned, as shown in (b) in Figure 6.
通过上述剪枝过程,能够抑制热度信息的增长,有助于避免热度信息过大导致存储空间不足的问题。但若热度信息所占用的存储空间已达到允许的最大存储空间,也可以删除热度值大于或等于预设阈值的热度信息。在一种可能的设计中,当热度信息所占用的存储空间已达到允许的最大存储空间时,则立即进行剪枝,删除热度值最低的一个或多个热度信息,或者,也可以将热度值与预设阈值的差值在预设范围内的热度信息全部删除。例如,在热度信息所占用的存储空间未达到允许的最大存储空间时,可以删除热度值为0的热度信息;在热度信息所占用的存储空间已达到允许的最大存储空间时,则将热度值小于等于1的热度信息全部删除。在另一种可能的设计中,当热度信息所占用的存储空间已达到允许的最大存储空间,且需要增加新的热度信息时,再进行剪枝,删除热度值最低的热度信息,或者将热度值与预设阈值的差值在预设范围内的热度信息全部删除。Through the above pruning process, the growth of hot information can be suppressed, which helps to avoid the problem of insufficient storage space caused by excessive hot information. However, if the storage space occupied by the popularity information has reached the maximum allowed storage space, the popularity information whose popularity value is greater than or equal to the preset threshold can also be deleted. In one possible design, when the storage space occupied by the heat information reaches the maximum allowed storage space, pruning is performed immediately to delete one or more heat information with the lowest heat value, or alternatively, the heat value can be All heat information whose difference from the preset threshold is within the preset range is deleted. For example, when the storage space occupied by the heat information does not reach the maximum allowable storage space, the heat information with a heat value of 0 can be deleted; when the storage space occupied by the heat information has reached the maximum allowable storage space, the heat value will be deleted. All popularity information less than or equal to 1 is deleted. In another possible design, when the storage space occupied by the heat information has reached the maximum allowed storage space and new heat information needs to be added, pruning is performed to delete the heat information with the lowest heat value, or to All heat information whose value differs from the preset threshold within the preset range is deleted.
基于相同的技术构思,本申请实施例还提供一种热度识别装置,用于实现上述方法实施例。装置可以包括执行上述方法实施例中任意一种可能的实现方式的模块/单元;这些模块/单元可以通过硬件实现,也可以通过硬件执行相应的软件实现。Based on the same technical concept, embodiments of the present application also provide a heat identification device for implementing the above method embodiments. The device may include modules/units that execute any of the possible implementation methods in the above method embodiments; these modules/units may be implemented by hardware, or may be implemented by hardware executing corresponding software.
示例性的,该装置可以如图7所示,包括:采集模块701和热度更新模块702。For example, as shown in Figure 7, the device may include: a collection module 701 and a popularity update module 702.
采集模块701,用于获取来自应用程序的访问请求,确定所述访问请求的访问对象;统计每个访问对象的访问频次。The collection module 701 is used to obtain access requests from applications, determine the access objects of the access requests, and count the access frequency of each access object.
热度更新模块702,用于根据每个访问对象的存储路径和每个访问对象的访问频次,同步更新所述访问对象的热度和所述访问对象在所述存储路径中父节点方向上每个节点的热度,父节点的热度为所述父节点下每个子节点热度之和。The popularity update module 702 is used to synchronously update the popularity of the access object and each node in the direction of the parent node of the access object in the storage path according to the storage path of each access object and the access frequency of each access object. The popularity of the parent node is the sum of the popularity of each child node under the parent node.
在一种可能的实现方式中,所述采集模块701还用于:根据所述访问请求确定针对所述访问对象的访问类型;所述采集模块701在统计每个访问对象的访问频次时,具体用于:统计每个访问对象的相同访问类型的访问频次。In a possible implementation, the collection module 701 is also configured to: determine the access type for the access object according to the access request; when the collection module 701 counts the access frequency of each access object, specifically Used to: count the access frequency of the same access type for each access object.
在一种可能的实现方式中,所述访问对象包括文件目录、文件或者文件中的数据块。In a possible implementation, the access object includes a file directory, a file, or a data block in a file.
在一种可能的实现方式中,所述采集模块701在确定所述访问请求的访问对象时,具 体用于:根据所述访问请求中的对象标识确定请求访问的文件,根据所述访问请求中的偏移量和长度,确定访问对象位于所述文件中的一个或多个块;所述热度更新模块702,具体用于:根据所述一个或多个块的存储路径和所述一个或多个块的访问频次,同步更新所述一个或多个块的热度、所述文件的热度,以及所述文件在存储路径中父节点方向上每个节点的热度。In a possible implementation, when determining the access object of the access request, the collection module 701 has The body is used to: determine the file requested to be accessed according to the object identifier in the access request, and determine one or more blocks in the file where the access object is located according to the offset and length in the access request; the popularity The update module 702 is specifically configured to: synchronously update the popularity of the one or more blocks, the popularity of the file, and The popularity of each node in the direction of the parent node in the storage path of the file.
在一种可能的实现方式中,该装置还可以包括:热度衰减模块703,用于周期性对每个访问对象的热度进行衰减;若第一访问对象的热度衰减至小于或等于预设阈值,则删除所述第一访问对象的热度。In a possible implementation, the device may also include: a heat attenuation module 703, configured to periodically attenuate the heat of each access object; if the heat of the first access object attenuates to less than or equal to the preset threshold, Then delete the popularity of the first visited object.
在一种可能的实现方式中,所述热度衰减模块703在对每个访问对象的热度进行衰减时,具体用于:将每个访问对象的热度乘以衰减系数;若乘以衰减系数后的值为非整数,则以1减去衰减系数的概率向下取整,以衰减系数的概率向上取整。In a possible implementation, when attenuating the popularity of each access object, the popularity attenuation module 703 is specifically used to: multiply the popularity of each access object by an attenuation coefficient; if If the value is a non-integer, then the probability of the attenuation coefficient is rounded down to 1 minus the probability of the attenuation coefficient, and the probability of the attenuation coefficient is rounded up.
在一种可能的实现方式中,所述采集模块701在统计每个访问对象的访问频次时,具体用于:统计预设区间内每个访问对象的访问频次,所述预设区间包括以下任一种:预设时间间隔,预设流量,预设数量的访问请求。In a possible implementation, when counting the access frequency of each access object, the collection module 701 is specifically used to: count the access frequency of each access object within a preset interval, and the preset interval includes any of the following: One: preset time interval, preset traffic, and preset number of access requests.
在一种可能的实现方式中,该装置还可以包括:分级模块704,用于根据访问对象的热度和第一热度阈值,确定访问对象是否为热数据;和/或,根据访问对象的热度和第二热度阈值,确定访问对象是否为冷数据。In a possible implementation, the device may further include: a classification module 704, configured to determine whether the accessed object is hot data based on the popularity of the accessed object and the first popularity threshold; and/or, based on the popularity of the accessed object and The second hotness threshold determines whether the access object is cold data.
在一种可能的实现方式中,所述分级模块704还用于:将存储的所有访问对象的热度从大到小排序;将第N个访问对象的热度作为第一热度阈值;所述N满足以下条件:N除以全部访问对象的数量满足预设比例条件;或者,前N个访问对象的热度之和除以全部访问对象的热度之和,满足预设比例条件。In a possible implementation, the ranking module 704 is also configured to: sort the popularity of all stored access objects from large to small; use the popularity of the Nth access object as the first popularity threshold; the N satisfies The following conditions: N divided by the number of all visited objects satisfies the preset proportion condition; or, the sum of the popularity of the first N visited objects divided by the sum of the popularity of all visited objects satisfies the preset proportion condition.
在一种可能的实现方式中,该装置还可以包括收发模块(图中未示出),用于接收热度查询请求,所述请求用于请求查询目标访问对象的热度;输出所述目标访问对象的热度。In a possible implementation, the device may also include a transceiver module (not shown in the figure) for receiving a popularity query request, where the request is used to request to query the popularity of a target access object; and output the target access object of heat.
基于相同的技术构思,本申请实施例还提供一种计算机设备。该计算机设备包括如图8所示的处理器801,以及与处理器801连接的通信接口802。Based on the same technical concept, embodiments of the present application also provide a computer device. The computer device includes a processor 801 as shown in Figure 8, and a communication interface 802 connected to the processor 801.
处理器801可以是通用处理器,微处理器,特定集成电路(application specific integrated circuit,ASIC),现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件,分立门或者晶体管逻辑器件,或一个或多个用于控制本申请方案程序执行的集成电路等。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。The processor 801 may be a general processor, a microprocessor, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or one or more integrated circuits used to control the execution of the program of this application, etc. A general-purpose processor may be a microprocessor or any conventional processor, etc. The steps of the methods disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware processor for execution, or can be executed by a combination of hardware and software modules in the processor.
通信接口802,用于与其他设备通信,如PCI总线接口、以太网,无线接入网(radio access network,RAN),无线局域网(wireless local area networks,WLAN)等。Communication interface 802 is used to communicate with other devices, such as PCI bus interface, Ethernet, wireless access network (radio access network, RAN), wireless local area networks (WLAN), etc.
在本申请实施例中,处理器801用于调用通信接口802执行接收和/或发送的功能,并执行如前任一种可能实现方式所述的方法。In this embodiment of the present application, the processor 801 is configured to call the communication interface 802 to perform receiving and/or sending functions, and to perform the method as described in any of the previous possible implementations.
进一步的,该计算机设备还可以包括存储器803以及通信总线804。Further, the computer device may also include a memory 803 and a communication bus 804.
存储器803,用于存储程序指令和/或数据,以使处理器801调用存储器803中存储的指令和/或数据,实现处理器801的上述功能。存储器803可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器 (random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器803可以是独立存在,例如片外存储器,通过通信总线804与处理器801相连接。存储器803也可以和处理器801集成在一起。The memory 803 is used to store program instructions and/or data, so that the processor 801 calls the instructions and/or data stored in the memory 803 to implement the above functions of the processor 801. Memory 803 may be read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM) or other types of dynamic storage devices that can store information and instructions. It can also be electrically erasable programmable read-only memory (EEPROM) or can be used for portability or storage. Any other medium that has the desired program code in the form of instructions or data structures and can be accessed by a computer, without limitation. The memory 803 may exist independently, such as an off-chip memory, and is connected to the processor 801 through a communication bus 804 . Memory 803 may also be integrated with processor 801.
通信总线804可包括一通路,用于在上述组件之间传送信息。Communication bus 804 may include a path for communicating information between the above-described components.
所述计算机设备可以通过网络与存储结构通信,或者所述计算机设备还可以包括存储结构(图中未出)。所述存储结构包括一个或者多个存储器,所述存储结构中的存储器可以是磁盘、固态硬盘(solid state disk或solid state drive,SSD),存储级存储(storage-class memory,SCM)等,用于存储访问请求所访问的对象。The computer device may communicate with the storage structure through a network, or the computer device may also include a storage structure (not shown in the figure). The storage structure includes one or more memories, and the memory in the storage structure can be a disk, a solid state disk (solid state disk or solid state drive, SSD), a storage-class memory (storage-class memory, SCM), etc., used Used to store the object accessed by the access request.
示例性的,处理器801可以通过通信接口802执行以下步骤:获取来自应用程序的访问请求,确定所述访问请求的访问对象;统计每个访问对象的访问频次;根据每个访问对象的存储路径和每个访问对象的访问频次,同步更新所述访问对象的热度和所述访问对象在所述存储路径中父节点方向上每个节点的热度,父节点的热度为所述父节点下每个子节点热度之和。Exemplarily, the processor 801 can perform the following steps through the communication interface 802: obtain the access request from the application program, determine the access object of the access request; count the access frequency of each access object; and calculate the storage path of each access object according to and the access frequency of each access object, synchronously updating the popularity of the access object and the popularity of each node of the access object in the direction of the parent node in the storage path. The popularity of the parent node is the value of each child under the parent node. The sum of node heats.
在一种可能的实现方式中,处理器801还用于:根据所述访问请求确定针对所述访问对象的访问类型;处理器801在统计每个访问对象的访问频次时,具体用于:统计每个访问对象的相同访问类型的访问频次。In a possible implementation, the processor 801 is further configured to: determine the access type for the access object according to the access request; when counting the access frequency of each access object, the processor 801 is specifically configured to: count The access frequency of the same access type for each access object.
在一种可能的实现方式中,所述访问对象包括文件目录、文件或者文件中的数据块。In a possible implementation, the access object includes a file directory, a file, or a data block in a file.
在一种可能的实现方式中,处理器801在所述确定所述访问请求的访问对象时,具体用于:根据所述访问请求中的对象标识确定请求访问的文件,根据所述访问请求中的偏移量和长度,确定访问对象位于所述文件中的一个或多个块;所述处理器801在根据每个访问对象的存储路径和每个访问对象的访问频次,同步更新所述访问对象的热度和所述访问对象在所述存储路径中父节点方向上每个节点的热度时,具体用于:根据所述一个或多个块的存储路径和所述一个或多个块的访问频次,同步更新所述一个或多个块的热度、所述文件的热度,以及所述文件在存储路径中父节点方向上每个节点的热度。In a possible implementation, when determining the access object of the access request, the processor 801 is specifically configured to: determine the file requested to be accessed according to the object identifier in the access request, and determine the file requested to be accessed according to the object identifier in the access request. The offset and length of the access object are determined to be located in one or more blocks in the file; the processor 801 synchronously updates the access object according to the storage path of each access object and the access frequency of each access object. The popularity of the object and the popularity of each node of the access object in the direction of the parent node in the storage path are specifically used for: based on the storage path of the one or more blocks and the access of the one or more blocks. Frequency, the popularity of the one or more blocks, the popularity of the file, and the popularity of each node in the direction of the parent node in the storage path of the file are updated synchronously.
在一种可能的实现方式中,所述处理器801还可以用于:周期性对每个访问对象的热度进行衰减;若第一访问对象的热度衰减至小于或等于预设阈值,则删除所述第一访问对象的热度。In a possible implementation, the processor 801 may also be configured to: periodically attenuate the popularity of each accessed object; if the popularity of the first accessed object decays to less than or equal to a preset threshold, delete all Describes the popularity of the first visited object.
在一种可能的实现方式中,所述处理器801在对每个访问对象的热度进行衰减时,具体用于:将每个访问对象的热度乘以衰减系数;若乘以衰减系数后的值为非整数,则以1减去衰减系数的概率向下取整,以衰减系数的概率向上取整。In a possible implementation, when attenuating the popularity of each access object, the processor 801 is specifically used to: multiply the popularity of each access object by the attenuation coefficient; if the value after multiplying by the attenuation coefficient If it is a non-integer, then 1 minus the probability of the attenuation coefficient is rounded down, and the probability of the attenuation coefficient is rounded up.
在一种可能的实现方式中,所述处理器801在统计每个访问对象的访问频次时,具体用于:统计预设区间内每个访问对象的访问频次,所述预设区间包括以下任一种:预设时间间隔,预设流量,预设数量的访问请求。In one possible implementation, when counting the access frequency of each access object, the processor 801 is specifically configured to: count the access frequency of each access object within a preset interval, and the preset interval includes any of the following: One: preset time interval, preset traffic, and preset number of access requests.
在一种可能的实现方式中,所述处理器801还可以用于:根据访问对象的热度和第一热度阈值,确定访问对象是否为热数据;和/或,根据访问对象的热度和第二热度阈值,确定访问对象是否为冷数据。In a possible implementation, the processor 801 may also be configured to: determine whether the accessed object is hot data based on the popularity of the accessed object and the first popularity threshold; and/or, determine whether the accessed object is hot data based on the popularity of the accessed object and the second popularity threshold. The heat threshold determines whether the access object is cold data.
在一种可能的实现方式中,所述处理器801还可以用于:将存储的所有访问对象的热 度从大到小排序;将第N个访问对象的热度作为第一热度阈值;所述N满足以下条件:N除以全部访问对象的数量满足预设比例条件;或者,前N个访问对象的热度之和除以全部访问对象的热度之和,满足预设比例条件。In a possible implementation, the processor 801 may also be configured to: store the hot data of all access objects The degree is sorted from large to small; the heat of the Nth accessed object is used as the first heat threshold; the N satisfies the following conditions: N divided by the number of all accessed objects satisfies the preset proportion condition; or, the first N accessed objects The sum of the popularity is divided by the sum of the popularity of all visited objects to meet the preset proportion conditions.
在一种可能的实现方式中,所述处理器801还可以通过通信接口802执行:接收热度查询请求,所述请求用于请求查询目标访问对象的热度;输出所述目标访问对象的热度。In a possible implementation, the processor 801 can also execute through the communication interface 802: receive a popularity query request, where the request is used to request to query the popularity of the target access object; and output the popularity of the target access object.
基于相同的技术构思,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机可读指令,当所述计算机可读指令在计算机上运行时,使得上述方法中的步骤被执行。Based on the same technical concept, embodiments of the present application also provide a computer-readable storage medium. Computer-readable instructions are stored in the computer-readable storage medium. When the computer-readable instructions are run on a computer, the above-mentioned The steps in the method are executed.
基于相同的技术构思,本申请实施例提供还一种包含指令的计算机程序产品,当其在计算机上运行时,使得上述方法中的步骤被执行。Based on the same technical concept, embodiments of the present application provide a computer program product containing instructions, which when run on a computer causes the steps in the above method to be executed.
需要理解的是,在本申请的描述中,“第一”、“第二”等词汇,仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。在本说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。It should be understood that in the description of this application, words such as "first" and "second" are only used for the purpose of distinguishing the description, and cannot be understood as indicating or implying relative importance, nor can they be understood as indicating or implying. order. Reference in this specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Therefore, the phrases "in one embodiment", "in some embodiments", "in other embodiments", "in other embodiments", etc. appearing in different places in this specification are not necessarily References are made to the same embodiment, but rather to "one or more but not all embodiments" unless specifically stated otherwise. The terms "including," "includes," "having," and variations thereof all mean "including but not limited to," unless otherwise specifically emphasized.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will understand that embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in one process or multiple processes of the flowchart and/or one block or multiple blocks of the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions The device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.
尽管已描述了本申请的实施例,对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括上述各实施例以及落入本申请范围的所有变更和修改。 Although the embodiments of the present application have been described, additional changes and modifications are made to these embodiments. Therefore, it is intended that the appended claims be construed to include the above-described embodiments and all changes and modifications that fall within the scope of this application.
显然,本领域的技术人员可以对本申请实施例进行各种改动和变型而不脱离本申请实施例的精神和范围。这样,倘若本申请实施例的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。 Obviously, those skilled in the art can make various changes and modifications to the embodiments of the present application without departing from the spirit and scope of the embodiments of the present application. In this way, if these modifications and variations of the embodiments of the present application fall within the scope of the claims of this application and equivalent technologies, then this application is also intended to include these modifications and variations.

Claims (16)

  1. 一种文件系统中的热度识别方法,其特征在于,包括:A heat identification method in a file system, which is characterized by including:
    获取来自应用程序的访问请求,确定所述访问请求的访问对象;Obtain the access request from the application and determine the access object of the access request;
    统计每个访问对象的访问频次;Count the access frequency of each access object;
    根据每个访问对象的存储路径和每个访问对象的访问频次,同步更新所述访问对象的热度和所述访问对象在所述存储路径中父节点方向上每个节点的热度,父节点的热度为所述父节点下每个子节点热度之和。According to the storage path of each access object and the access frequency of each access object, the popularity of the access object and the popularity of each node of the access object in the direction of the parent node in the storage path are synchronously updated. The popularity of the parent node is the sum of the heat of each child node under the parent node.
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:根据所述访问请求确定针对所述访问对象的访问类型;The method according to claim 1, characterized in that the method further includes: determining the access type for the access object according to the access request;
    所述统计每个访问对象的访问频次,包括:The statistics of the access frequency of each access object include:
    统计每个访问对象的相同访问类型的访问频次。Count the access frequency of the same access type for each access object.
  3. 根据权利要求1或2所述的方法,其特征在于,所述访问对象包括文件目录、文件或者文件中的数据块。The method according to claim 1 or 2, characterized in that the access object includes a file directory, a file, or a data block in a file.
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-3, characterized in that the method further includes:
    周期性对每个访问对象的热度进行衰减;Periodically attenuate the popularity of each visited object;
    若第一访问对象的热度衰减至小于或等于预设阈值,则删除所述第一访问对象的热度。If the popularity of the first access object decreases to less than or equal to the preset threshold, the popularity of the first access object is deleted.
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述统计每个访问对象的访问频次,包括:The method according to any one of claims 1 to 4, characterized in that counting the access frequency of each access object includes:
    统计预设区间内每个访问对象的访问频次,所述预设区间包括以下任一种:预设时间间隔,预设流量,预设数量的访问请求。Count the access frequency of each access object within the preset interval. The preset interval includes any of the following: preset time interval, preset traffic, and preset number of access requests.
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-5, characterized in that the method further includes:
    根据访问对象的热度和第一热度阈值,确定访问对象是否为热数据;和/或Determine whether the access object is hot data based on the popularity of the access object and the first popularity threshold; and/or
    根据访问对象的热度和第二热度阈值,确定访问对象是否为冷数据。Determine whether the access object is cold data based on the popularity of the access object and the second popularity threshold.
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-6, characterized in that the method further includes:
    接收热度查询请求,所述请求用于请求查询目标访问对象的热度;Receive a popularity query request, the request is used to request to query the popularity of the target access object;
    输出所述目标访问对象的热度。Output the popularity of the target access object.
  8. 一种文件系统中的热度识别装置,其特征在于,包括:A heat identification device in a file system, which is characterized by including:
    采集模块,用于获取来自应用程序的访问请求,确定所述访问请求的访问对象;统计每个访问对象的访问频次;The collection module is used to obtain the access request from the application program, determine the access object of the access request, and count the access frequency of each access object;
    热度更新模块,用于根据每个访问对象的存储路径和每个访问对象的访问频次,同步更新所述访问对象的热度和所述访问对象在所述存储路径中父节点方向上每个节点的热度,父节点的热度为所述父节点下每个子节点热度之和。A popularity update module, configured to synchronously update the popularity of the access object and the popularity of each node in the direction of the parent node of the access object in the storage path according to the storage path of each access object and the access frequency of each access object. Heat, the heat of a parent node is the sum of the heat of each child node under the parent node.
  9. 根据权利要求8所述的装置,其特征在于,所述采集模块还用于:根据所述访问请求确定针对所述访问对象的访问类型;The device according to claim 8, wherein the collection module is further configured to: determine the access type for the access object according to the access request;
    所述采集模块在统计每个访问对象的访问频次时,具体用于:统计每个访问对象的相同访问类型的访问频次。When counting the access frequency of each access object, the collection module is specifically used to: count the access frequency of the same access type for each access object.
  10. 根据权利要求8或9所述的装置,其特征在于,所述访问对象包括文件目录、文件或者文件中的数据块。The device according to claim 8 or 9, characterized in that the access object includes a file directory, a file, or a data block in a file.
  11. 根据权利要求8-10任一项所述的装置,其特征在于,还包括:The device according to any one of claims 8-10, further comprising:
    热度衰减模块,用于周期性对每个访问对象的热度进行衰减;若第一访问对象的热度衰 减至小于或等于预设阈值,则删除所述第一访问对象的热度。The heat attenuation module is used to periodically attenuate the heat of each visited object; if the heat of the first visited object declines, If it is reduced to less than or equal to the preset threshold, the popularity of the first access object is deleted.
  12. 根据权利要求8-11任一项所述的装置,其特征在于,所述采集模块在统计每个访问对象的访问频次时,具体用于:The device according to any one of claims 8-11, characterized in that when the collection module counts the access frequency of each access object, it is specifically used to:
    统计预设区间内每个访问对象的访问频次,所述预设区间包括以下任一种:预设时间间隔,预设流量,预设数量的访问请求。Count the access frequency of each access object within the preset interval. The preset interval includes any of the following: preset time interval, preset traffic, and preset number of access requests.
  13. 根据权利要求8-12任一项所述的装置,其特征在于,还包括:The device according to any one of claims 8-12, further comprising:
    分级模块,用于根据访问对象的热度和第一热度阈值,确定访问对象是否为热数据;和/或,根据访问对象的热度和第二热度阈值,确定访问对象是否为冷数据。A classification module, configured to determine whether the accessed object is hot data based on the popularity of the accessed object and the first popularity threshold; and/or determine whether the accessed object is cold data based on the popularity of the accessed object and the second popularity threshold.
  14. 根据权利要求8-13任一项所述的装置,其特征在于,还包括:The device according to any one of claims 8-13, further comprising:
    收发模块,用于接收热度查询请求,所述请求用于请求查询目标访问对象的热度;输出所述目标访问对象的热度。A transceiver module, configured to receive a popularity query request, where the request is used to request to query the popularity of a target access object; and to output the popularity of the target access object.
  15. 一种计算机设备,其特征在于,所述计算机设备包括存储器和处理器;A computer device, characterized in that the computer device includes a memory and a processor;
    所述存储器存储有计算机程序;The memory stores a computer program;
    所述处理器用于调用所述存储器中存储的计算机程序,以执行权利要求1-7任一项所述的方法。The processor is configured to call a computer program stored in the memory to execute the method described in any one of claims 1-7.
  16. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得所述计算机执行如权利要求1-7任一项所述的方法。 A computer-readable storage medium, characterized in that instructions are stored in the computer-readable storage medium, and when the instructions are run on a computer, they cause the computer to execute as described in any one of claims 1-7 Methods.
PCT/CN2023/077025 2022-03-09 2023-02-18 Popularity identification method and apparatus in file system, and computer device WO2023169188A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210224941.8 2022-03-09
CN202210224941.8A CN116775580A (en) 2022-03-09 2022-03-09 Method and device for identifying heat in file system and computer equipment

Publications (1)

Publication Number Publication Date
WO2023169188A1 true WO2023169188A1 (en) 2023-09-14

Family

ID=87937157

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/077025 WO2023169188A1 (en) 2022-03-09 2023-02-18 Popularity identification method and apparatus in file system, and computer device

Country Status (2)

Country Link
CN (1) CN116775580A (en)
WO (1) WO2023169188A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679193A (en) * 2017-10-09 2018-02-09 郑州云海信息技术有限公司 A kind of hot statistics method and system for distributed file system
US20180157654A1 (en) * 2016-12-02 2018-06-07 International Business Machines Corporation Data migration using a migration data placement tool between storage systems based on data access
CN113420005A (en) * 2021-02-10 2021-09-21 阿里巴巴集团控股有限公司 Data storage method, system, electronic device and computer storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180157654A1 (en) * 2016-12-02 2018-06-07 International Business Machines Corporation Data migration using a migration data placement tool between storage systems based on data access
CN107679193A (en) * 2017-10-09 2018-02-09 郑州云海信息技术有限公司 A kind of hot statistics method and system for distributed file system
CN113420005A (en) * 2021-02-10 2021-09-21 阿里巴巴集团控股有限公司 Data storage method, system, electronic device and computer storage medium

Also Published As

Publication number Publication date
CN116775580A (en) 2023-09-19

Similar Documents

Publication Publication Date Title
CN111226205B (en) KVS tree database
US11461286B2 (en) Fair sampling in a hierarchical filesystem
US11132336B2 (en) Filesystem hierarchical capacity quantity and aggregate metrics
US10037341B1 (en) Nesting tree quotas within a filesystem
US20180165300A1 (en) Filesystem capacity and performance metrics and visualizations
CN102332029B (en) Hadoop-based mass classifiable small file association storage method
US10642831B2 (en) Static data caching for queries with a clause that requires multiple iterations to execute
CN110291518A (en) Merge tree garbage index
CN110268394A (en) KVS tree
CN109947668B (en) Method and device for storing data
US6834290B1 (en) System and method for developing a cost-effective reorganization plan for data reorganization
US9218142B2 (en) Log data store that stores data across a plurality of storage devices using non-disjoint layers
CN110268399A (en) Merging tree for attended operation is modified
CN110383261A (en) Stream for multithread storage device selects
US10936551B1 (en) Aggregating alternate data stream metrics for file systems
CN108140040A (en) The selective data compression of database in memory
US10936538B1 (en) Fair sampling of alternate data stream metrics for file systems
CN111858520B (en) Method and device for separately storing block chain node data
CN103176754A (en) Reading and storing method for massive amounts of small files
WO2018205151A1 (en) Data updating method and storage device
WO2021043026A1 (en) Storage space management method and device
CN107704507B (en) Database processing method and device
CN104391961A (en) Tens of millions of small file data read and write solution strategy
US9275091B2 (en) Database management device and database management method
WO2023169188A1 (en) Popularity identification method and apparatus in file system, and computer device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23765766

Country of ref document: EP

Kind code of ref document: A1