WO2021213278A1 - File prefetching method, storage device, and prefetching apparatus - Google Patents

File prefetching method, storage device, and prefetching apparatus Download PDF

Info

Publication number
WO2021213278A1
WO2021213278A1 PCT/CN2021/087840 CN2021087840W WO2021213278A1 WO 2021213278 A1 WO2021213278 A1 WO 2021213278A1 CN 2021087840 W CN2021087840 W CN 2021087840W WO 2021213278 A1 WO2021213278 A1 WO 2021213278A1
Authority
WO
WIPO (PCT)
Prior art keywords
keyword
target
access
file
rule template
Prior art date
Application number
PCT/CN2021/087840
Other languages
French (fr)
Chinese (zh)
Inventor
江舟
向贵东
刘金虎
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021213278A1 publication Critical patent/WO2021213278A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0862Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch

Definitions

  • the embodiments of the present application relate to the field of computer storage, and in particular, to a file prefetching method, storage device, and prefetching device.
  • Cached data prefetching technology is a computer operating system cache storage optimization technology.
  • the cache data prefetching technology refers to loading the aforementioned data from the main memory to the cache memory in advance before the processor in the storage device accesses the data for calculation, so as to increase the hit rate of the read request issued by the host and reduce the processor access to the data. Pause time, so as to achieve the purpose of improving read performance.
  • the storage device will generate the access rule of the accessed file based on the historical access record to form a historical access queue composed of multiple files.
  • the storage device can prefetch the file located after the file in the historical access queue into the cache to increase the hit rate of read requests issued by the host in the storage device , Improve system read performance.
  • the storage device can only prefetch files that have been accessed, but cannot prefetch files that have not been accessed. Therefore, it is not conducive for the host to increase the hit rate of read requests when accessing new files in the storage device.
  • the embodiments of the present application provide a file prefetching method and storage device, which are used to predict the file that the host accesses for the first time, so as to realize the prefetching of the files that have not been accessed, which is beneficial to the host when accessing new files.
  • the hit rate of read requests are used to predict the file that the host accesses for the first time, so as to realize the prefetching of the files that have not been accessed, which is beneficial to the host when accessing new files.
  • an embodiment of the present application provides a file prefetching method, which is applied to a storage device and is used to prefetch a file from a low-speed storage medium to a high-speed storage medium.
  • the storage device first receives a read request issued by the host, the read request is used to indicate the file to be accessed, and the read request carries file access information of the file.
  • the storage device generates the first keyword according to the file access information.
  • the storage device uses the first keyword and the target access rule template to generate the target keyword.
  • the target keyword and the first keyword match the same characteristics, and the target keyword is used to indicate the prediction file.
  • the storage device prefetches the prediction file indicated by the target keyword into the cache.
  • the aforementioned file access information is information related to the file indicated by the read request, for example, file name, file type, access time, file creator name, file visitor name, and access directory.
  • the aforementioned low-speed storage medium may be a cache, and the aforementioned high-speed storage medium may be a hard disk.
  • the target keyword can be generated by using the first keyword and the target access rule template, and then the prediction file to be prefetched is determined based on the target keyword. Since the first keyword and the target keyword match the same characteristics, it can be considered that the predicted file indicated by the target keyword is the file that the user needs to read at the next moment. Therefore, even if it is determined that the file access information of the first keyword comes from the file accessed for the first time, the storage device can use the aforementioned first keyword and the target access rule template to calculate the predicted file that has not been accessed, and calculate the predicted file Prefetch into the cache. It is helpful to improve the hit rate of the host acquiring data in the cache in the storage device.
  • the storage device stores a plurality of access rule templates, and the target access rule template is the first keyword among the plurality of access rule templates.
  • each access rule template is obtained by training multiple initial keywords using a training model based on text semantics.
  • the storage device uses a training model based on text semantics to train multiple initial keywords to obtain the aforementioned access rule template.
  • the storage device can train multiple access rule templates.
  • the access rule template corresponding to the aforementioned first keyword is the aforementioned target access rule template.
  • the storage device needs to select the access rule template corresponding to the first keyword (that is, the target access rule template), so that the target key for finding the predicted file is generated. word.
  • selecting the corresponding access rule template according to the first keyword is beneficial to improve the accuracy of the predicted file, and further helps to increase the probability that the predicted file is hit by the issued read request.
  • Prefetching the predicted file into the cache is beneficial to improve the hit rate of the host acquiring data in the cache of the storage device.
  • the target access rule template includes the first keyword and/or the first characteristic related word, and the first characteristic related word It is used to indicate the characteristics of the first keyword.
  • a target access rule template is further proposed, which provides a basis for the correspondence between the first keyword and the target access rule template.
  • the target access rule template further includes a mapping relationship, and the mapping relationship is used to indicate the relationship between the plurality of initial keywords
  • the multiple initial keywords include the first keyword.
  • Using the first keyword and the target access rule template to generate the target keyword includes: taking the first keyword as the input of the mapping relationship, calculating the first keyword using the mapping relationship, and outputting the target keyword , The target keyword meets the characteristics of the first keyword indicated by the first characteristic associated word.
  • the target access rule template further includes a mapping relationship, which is generated when the foregoing multiple initial keywords are trained.
  • the mapping relationship is used to indicate the association mode between the multiple initial keywords. It can be understood that when one of the multiple initial keywords and the mapping relationship are known, the storage device can either calculate The remaining initial keywords. Therefore, when the first keyword corresponds to the target access rule template, the first keyword is also applicable to the mapping relationship in the target access rule template. At this time, the storage device may use the first keyword as the input of the mapping relationship, and use the mapping relationship to calculate the first keyword, thereby outputting the target keyword. In such an implementation manner, the specific method for determining the target keyword is clarified, and the reliability of the solution is improved.
  • the storage device when the storage device uses the first keyword to access from multiple When searching for the target access rule template in the rule template, the storage device will perform the following steps:
  • the access rule template contains the first keyword, it is determined that the access rule template is the target access rule template
  • the access pattern template does not include the first keyword, it is further determined whether the first keyword meets the characteristics indicated by the characteristic associated words in the access pattern template;
  • the visit rule template is determined to be the target visit rule template.
  • the file access information carried in the read request is also used to generate a second Keywords, both the second keyword and the first keyword are keywords that match the target access rule template.
  • the storage device simultaneously uses the first keyword and the second keyword to find a target access pattern template from multiple access pattern templates, the storage device will perform the following steps:
  • the access pattern template includes the first keyword and the second keyword, determine that the access pattern template is the target access pattern template
  • the access pattern template only contains one of the first keyword and the second keyword or does not contain the first keyword and the second keyword, then it is further determined whether the first keyword and the second keyword are All conform to the characteristics indicated by the characteristic associated words in the access rule template;
  • the access rule template is determined to be the target access rule template.
  • the file access information is from different access directories File information, the multiple predicted files are located in different access directories.
  • the predicted files predicted by the aforementioned method based on the first keyword may also be located in different access directories.
  • the storage range of the prediction file is expanded, so that the prefetched prediction file can come from different access directories.
  • the storage device may also perform access rule template collections.
  • the access rule template set refers to the aforementioned multiple access rule templates located in the storage device.
  • the access rule template set includes one target access rule template and at least one candidate access rule template.
  • the storage device determines the degree of association between the target access rule template and each candidate access rule template in at least one candidate access rule template, where the association degree is used to indicate the mapping relationship between the target access rule template and the candidate access rule template. The degree of similarity between the mapping relationships in the candidate access pattern template. Then, the storage device merges the candidate visit rule template corresponding to the degree of association higher than the preset value with the target candidate visit rule template to obtain an updated set of visit rule templates.
  • the access rule template set in the storage device can be updated based on the target access rule template corresponding to the first keyword, that is, the storage device updates the predicted file based on the first keyword.
  • a collection of access rule templates in the storage device can enable each access rule template in the set of multiple updated access rule templates to more accurately determine a more accurate prediction file based on the input keywords, so that the aforementioned prediction file can be prefetched to the cache. Improve the hit rate of read requests issued by the host.
  • an embodiment of the present application provides a storage device, which includes: a cache, a hard disk, and at least one processor.
  • at least one processor is configured to perform the following operations: generate a first keyword according to the file access information carried in the read request; use the first keyword and the target access rule template to generate a target keyword, the target keyword and the first keyword A keyword meets the same characteristics, and the target keyword is used to indicate a prediction file; the prediction file indicated by the target keyword is prefetched from the hard disk to the cache.
  • the file access information is information related to the file indicated by the read request, for example, file name, file type, access time, file creator name, file visitor name, and access directory.
  • the target keyword can be generated by using the first keyword and the target access rule template, and then the prediction file to be prefetched is determined based on the target keyword. Since the first keyword and the target keyword match the same characteristics, it can be considered that the predicted file indicated by the target keyword is a file that the user needs to read at the next moment. Therefore, even if it is determined that the file access information of the first keyword comes from the file accessed for the first time, the storage device can use the aforementioned first keyword and the target access rule template to calculate the predicted file that has not been accessed, and calculate the predicted file Prefetch into the cache. It is helpful to improve the hit rate of the host acquiring data in the cache in the storage device.
  • the storage device stores a plurality of access rule templates, and the target access rule template is the first keyword among the plurality of access rule templates.
  • each access rule template is obtained by training multiple initial keywords using a training model based on text semantics.
  • the storage device needs to select the access rule template corresponding to the first keyword (that is, the target access rule template), so that the target key for finding the predicted file is generated. word.
  • selecting the corresponding access rule template according to the first keyword is beneficial to improve the accuracy of the predicted file, and further helps to increase the probability that the predicted file is hit by the issued read request.
  • Prefetching the predicted file into the cache is beneficial to improve the hit rate of the host acquiring data in the cache of the storage device.
  • the target access rule template includes the first keyword and/or the first characteristic related word, and the first characteristic related word It is used to indicate the characteristics of the first keyword.
  • the target access rule template further includes a mapping relationship, and the mapping relationship is used to indicate the relationship between the plurality of initial keywords
  • the multiple initial keywords include the first keyword.
  • the at least one processor is specifically configured to use the first keyword as the input of the mapping relationship, calculate the first keyword using the mapping relationship, and output the target keyword, the target keyword conforming to the first feature The characteristics of the first keyword indicated by the associated word.
  • the target access rule template further includes a mapping relationship, which is generated when the foregoing multiple initial keywords are trained.
  • the mapping relationship is used to indicate the association mode between the multiple initial keywords. It can be understood that when one of the multiple initial keywords and the mapping relationship are known, the storage device can either calculate The remaining initial keywords. Therefore, when the first keyword corresponds to the target access rule template, the first keyword is also applicable to the mapping relationship in the target access rule template. At this time, the storage device may use the first keyword as the input of the mapping relationship, and use the mapping relationship to calculate the first keyword, thereby outputting the target keyword. In such an implementation manner, the specific method for determining the target keyword is clarified, and the reliability of the solution is improved.
  • the at least one processor is further configured to: determine the access rule template Whether the first keyword is included; if the access rule template contains the first keyword, the access rule template is determined to be the target access rule template; if the access rule template does not include the first keyword, then the access rule template is further judged Whether the first keyword meets the characteristics indicated by the characteristic associated words in the visit rule template; if the first keyword meets the characteristics indicated by the characteristic associated words in the visit rule template, then the visit rule template is determined to be the target visit rule template.
  • the file access information carried in the read request is also used to generate a second Keywords
  • the second keyword and the first keyword are keywords that match the target access pattern template
  • the at least one processor is also used to: determine whether the access pattern template contains the first keyword and the The second keyword; if the access pattern template contains the first keyword and the second keyword, the access pattern template is determined to be the target access pattern template; if the access pattern template only contains the first keyword and the One of the second keywords or does not include the first keyword and the second keyword, then it is further determined whether the first keyword and the second keyword are both in line with the characteristics indicated by the characteristic related words in the access rule template ; If both the first keyword and the second keyword meet the characteristics indicated by the feature associated words in the access rule template, then the access rule template is determined to be the target access rule template.
  • the file access information is from different access directories File information, the multiple predicted files are located in different access directories.
  • the predicted files predicted by the aforementioned method based on the first keyword may also be located in different access directories.
  • the storage range of the prediction file is expanded, so that the prefetched prediction file can come from different access directories.
  • the at least one processor further includes a first processing
  • the first processor is configured to use the text semantic-based training model to train the multiple initial keywords to obtain the access rule template.
  • the first processor may also update the access rule template set, and the access rule template set refers to the aforementioned set of templates located in the Multiple access regularity templates in the storage device.
  • the visit rule template set includes one target visit rule template and at least one candidate visit rule template.
  • the first processor is configured to determine the degree of association between the target visit rule template and each of the at least one candidate visit rule template, wherein the degree of association is used to indicate the target visit rule template The degree of similarity between the mapping relationship in and the mapping relationship in the candidate access rule template.
  • the first processor is further configured to merge the candidate visit rule template corresponding to the degree of association higher than the preset value with the target candidate visit rule template to obtain an updated set of visit rule templates.
  • the access rule template set in the storage device can be updated based on the target access rule template corresponding to the first keyword, that is, at least one processor in the storage device determines that the access rule template is based on the first keyword. After predicting the file, the first processor in the storage device then updates the access rule template set in the storage device.
  • Such an implementation manner can enable each access rule template in the set of multiple updated access rule templates to more accurately determine a more accurate prediction file based on the input keywords, so that the aforementioned prediction file can be prefetched to the cache. Improve the hit rate of the host acquiring data in the cache in the storage device.
  • an embodiment of the present application provides a prefetching device, the prefetching device is located in a storage device, and the storage device further includes a cache and a hard disk.
  • the storage device stores a computer program or instruction, and the prefetching device invokes the computer program or instruction to execute the following modules: a keyword generation module for generating the first keyword according to the file access information carried in the read request; a calculation module for Use the first keyword and the target access rule template to generate a target keyword, where the target keyword and the first keyword match the same characteristics, and the target keyword is used to indicate the prediction file; the data migration module is used to transfer The predicted file indicated by the target keyword is prefetched from the hard disk to the cache.
  • the file access information is information related to the file indicated by the read request, for example, file name, file type, access time, file creator name, file visitor name, and access directory.
  • the target keyword can be generated by using the first keyword and the target access rule template, and then the prediction file to be prefetched is determined based on the target keyword. Since the first keyword and the target keyword match the same characteristics, it can be considered that the predicted file indicated by the target keyword is a file that the user needs to read at the next moment. Therefore, even if it is determined that the file access information of the first keyword comes from the file accessed for the first time, the storage device can use the aforementioned first keyword and the target access rule template to calculate the predicted file that has not been accessed, and calculate the predicted file Prefetch into the cache. It is helpful to improve the hit rate of the host acquiring data in the cache in the storage device.
  • the prefetching device stores a plurality of access rule templates, and the target access rule template is the first key among the plurality of access rule templates.
  • the access rule template corresponding to the word, each access rule template is obtained by training multiple initial keywords using a training model based on text semantics.
  • the target access rule template includes the first keyword and/or the first characteristic related word, and the first characteristic related word It is used to indicate the characteristics of the first keyword.
  • the target access rule template further includes a mapping relationship, and the mapping relationship is used to indicate the relationship between the plurality of initial keywords
  • the multiple initial keywords include the first keyword; the calculation module is specifically configured to use the first keyword as the input of the mapping relationship, and use the mapping relationship to calculate the first keyword,
  • the target keyword is output, and the target keyword meets the characteristics of the first keyword indicated by the first characteristic associated word.
  • the calculation module is further used to determine whether the access rule template contains The first keyword; if the access pattern template contains the first keyword, the access pattern template is determined to be the target access pattern template; if the access pattern template does not contain the first keyword, the first keyword is further determined Whether it conforms to the characteristics indicated by the characteristic related words in the visit rule template; if the first keyword matches the characteristics indicated by the characteristic related words in the visit rule template, then the visit rule template is determined to be the target visit rule template.
  • the file access information carried in the read request is also used to generate a second Keywords, the second keyword and the first keyword are both keywords that match the target access rule template; the calculation module is also used to determine whether the access rule template contains the first keyword and the second keyword Keywords; if the visit rule template contains the first keyword and the second keyword, then the visit rule template is determined to be the target visit rule template; if the visit rule template only contains the first keyword and the second keyword If one of the keywords or does not include the first keyword and the second keyword, it is further determined whether the first keyword and the second keyword both conform to the characteristics indicated by the characteristic associated words in the access rule template; if Both the first keyword and the second keyword conform to the characteristics indicated by the characteristic associated words in the visit rule template, and the visit rule template is determined to be the target visit rule template.
  • the file access information is from different access directories File information, the multiple predicted files are located in different access directories.
  • the prefetching device further includes: a template generation module , Is used to train the multiple initial keywords using the text semantic training model to obtain the access rule template.
  • the template generation module is also used to update the access rule template set.
  • the access rule template set refers to the aforementioned location Multiple access rule templates in the storage device.
  • the visit rule template set includes one target visit rule template and at least one candidate visit rule template.
  • the template generation module is specifically configured to determine the degree of association between the target access rule template and each of the at least one candidate access rule template, and compare the candidate access rule template corresponding to an association degree higher than a preset value Merge with the target candidate visit rule template to obtain an updated set of visit rule templates.
  • the degree of association is used to indicate the degree of similarity between the mapping relationship in the target visit rule template and the mapping relationship in the candidate visit rule template.
  • an embodiment of the present application provides a smart chip, the smart chip is located in the storage device in the foregoing embodiment, and the smart chip is used to train the input sample data to output a prediction model.
  • the smart chip is located in the first processor in the seventh implementation of the third aspect of the embodiments of the present application, and the smart chip may use the text semantic-based training model to train the multiple initial keywords to obtain access Regular template.
  • the smart chip is located in the template generation module in the seventh implementation of the fourth aspect of the embodiments of the present application, and the smart chip may use the text-based semantic training model to train the multiple initial keywords to obtain access Regular template.
  • the smart chip can also update the access rule template collection.
  • the target keyword can be generated by using the first keyword and the target access rule template, and then the prediction file to be prefetched is determined based on the target keyword. Since the first keyword and the target keyword match the same characteristics, it can be considered that the predicted file indicated by the target keyword is a file that the user needs to read at the next moment. Therefore, even if it is determined that the file access information of the first keyword comes from the file accessed for the first time, the storage device can use the aforementioned first keyword and the target access rule template to calculate the predicted file that has not been accessed, and calculate the predicted file Prefetch into the cache. It is helpful to improve the hit rate of the host acquiring data in the cache in the storage device.
  • FIG. 1A is a system architecture diagram to which the file prefetching method in an embodiment of the application is applicable;
  • FIG. 1B is a flowchart of a file prefetching method in an embodiment of this application.
  • FIG. 2A is a schematic diagram of an embodiment of file access information in an embodiment of this application.
  • FIG. 2B is a schematic diagram of another embodiment of file access information in an embodiment of this application.
  • FIG. 3 is another flowchart of the file prefetching method in the embodiment of the application.
  • FIG. 4 is a schematic diagram of an embodiment of a storage device in an embodiment of the application.
  • FIG. 5 is a schematic diagram of an embodiment of a prefetching device in an embodiment of the application.
  • the storage system provided in this embodiment includes a host 01, a controller 00, and multiple hard disks 02.
  • the host 01 and the controller 00 communicate through a network file system (NFS)/common internet file system (CIFS) protocol or a fiber channel (FC) protocol.
  • the controller 00 includes a processor 001 and a cache 002.
  • the host 01 may send a data write request (referred to as a write request for short) to the controller 00.
  • the controller 00 receives the data write request, the data carried in the data write request is written into the hard disk 02.
  • the host 01 can also send a data read request (referred to as a read request for short) to the controller 00.
  • the controller 00 After the controller 00 receives the data read request, it searches for the data to be read in the buffer 002 according to the address in the data read request. If there is, directly send the data to be read (that is, the prediction file described later) to the host 01; if not, obtain the data from the hard disk 02 and send it to the host 01.
  • the controller 00 and the aforementioned multiple hard disks 02 may be integrated in the storage device proposed in the embodiment of the present application, or may be used as two mutually independent devices to form the storage device proposed in the embodiment of the present application.
  • a new function is configured for the processor 001, or one or more AI chips 003 are added to the controller 00, so that the processor 001 in the file prefetching method can
  • the file that the host 01 accesses for the first time is predicted to implement prefetching of unvisited files, which is beneficial to the host 01 to increase the hit rate of read requests when accessing new files.
  • the storage device will perform the following steps:
  • the host when a host (such as host 01 in FIG. 1A) needs to access a certain file in the storage medium, the host will issue a read request.
  • the read request is used to indicate the file to be accessed, and the read request will carry file access information of the file.
  • the file access information refers to information related to the file that needs to be accessed.
  • the file access information may include file attribute information and access attribute information.
  • the file attribute information is related to the file and is not affected by whether the file is accessed, for example, the file name, file type, file creator name, and access directory information.
  • the access attribute information refers to the information involved in this access operation, and the access attribute information generated for different accesses to the same file is different.
  • the access time and the name of the file visitor For ease of understanding, the aforementioned file access information is introduced by taking FIG. 2A as an example. As shown in Figure 2A, it is a folder containing four files. Take the file in the lower left corner as an example.
  • the file name of the file is "2020 5th grade language simulation test paper 3";
  • the file type is text type (that is, the word type, which can also be called a word document);
  • the file creator is "Zhang Teacher";
  • the access directory is "C: ⁇ Final Test Paper ⁇ Chinese", the access directory can also be called the storage path or storage address, which is used to indicate which folder the accessed file is stored in.
  • information such as the access time and the name of the file visitor is not shown in FIG. 2A, and the information such as the access time and the name of the file visitor may be implicit in the aforementioned read request.
  • the file access information is information of files from different access directories, that is, the read request may carry file access information for different files, and the aforementioned different files are from different access directories.
  • the reading request is used to instruct to read the file whose access directory is "C: ⁇ Final Mock Paper ⁇ Language” and the file name is "2020 Fifth Grade Chinese Mock Paper 3" (specifically as Figure 2A), and the access directory is "C: ⁇ final mock test paper answer ⁇ language" and the file name is "2020 fifth grade language mock test answer 3" (specifically shown in Figure 2B).
  • the processor in the storage device will obtain the file access information carried in the read request, and generate the first keyword based on the file access information.
  • the first keyword refers to a word formed by the storage device dividing or combining part of the file access information based on the aforementioned file access information.
  • the first keyword may be a word or a phrase, which is not specifically limited here.
  • the first keyword may be split from the file name.
  • the first keyword of a file whose file name is "2020 fifth grade Chinese simulation test paper 3" can be "2020" and "fifth grade”. ", "Language mock test paper" or "3".
  • the first keyword may be formed by dividing the file name and then combining it, for example, "2020- May-3".
  • the first keyword may be a phrase composed of different file access information, for example, ⁇ "fifth grade Chinese", “Mr. Zhang", “word” ⁇ .
  • the first keyword may also be a phrase composed of an access directory, for example, ⁇ "final mock test paper", "language” ⁇ .
  • the aforementioned first keyword can be adjusted according to specific needs, and the specific embodiment of the present application does not limit the specific form of the first keyword.
  • the storage device may generate the target keyword based on the aforementioned first keyword and the target access rule template.
  • the target keyword and the first keyword match the same characteristics, it can be understood that the thing indicated by the target keyword has the same characteristics as the thing indicated by the first keyword, and is based on the first keyword and The characteristics of the first keyword are easily associated with the target keyword. It can also be understood that the part of speech of the target keyword is the same as that of the first keyword, and the meaning of the first keyword is similar to or related to the target keyword.
  • the first keyword and the target keyword are both quantitative words, for example, the first keyword is "3" and the target keyword is "4"; for another example, the first keyword is "first section” ,
  • the target keyword is "Section 2".
  • the first keyword and the target keyword are synonyms or antonyms for each other.
  • the first keyword is "Book 1" and the target keyword is "Book 2"; for another example, the first keyword is "Exam Questions" ", the target keyword is "answer”.
  • the target keyword is used to indicate the prediction file. It can also be understood that the target keyword is included in the file attribute information of the prediction file.
  • the target keyword is the file name, file type, or file creator name of the prediction file, etc., which is not specifically limited here.
  • the target access rule template is an access rule template corresponding to the first keyword among the plurality of access rule templates.
  • the target access rule template is used to input the first keyword and calculate the first keyword according to a certain rule to output the target keyword.
  • the target access rule template can also be regarded as a comprehensive calculation model, and the target keyword can be output by inputting the first keyword.
  • the storage device can use the target keyword to find the prediction file in a hard disk (for example, hard disk 02 in FIG. 1A), and prefetch the prediction file To the cache (for example, cache 002 in FIG. 1A).
  • the prefetched file may be a certain file or a certain group of related files, and the specifics are not limited here.
  • FIG. 2A and FIG. 2B Assume that the file name of the file to be accessed by the reading request is "2020 fifth grade language mock test paper 3", and the first keyword is the phrase ⁇ "final mock test paper", "language", "3" ⁇ .
  • the storage device can determine that the file name of the prediction file is "2020 fifth grade Chinese simulation test paper answer 3". Then, the storage device can read the prediction file from the hard disk to the cache.
  • the target keyword can be generated using the first keyword and the target access rule template, and then the prediction file to be prefetched is determined based on the target keyword. Since the first keyword and the target keyword match the same characteristics, it can be considered that the predicted file indicated by the target keyword is a file that the user needs to read at the next moment. Therefore, even if it is determined that the file access information of the first keyword comes from the file accessed for the first time, the storage device can use the aforementioned first keyword and the target access rule template to calculate the predicted file that has not been accessed, and calculate the predicted file Prefetch into the cache. It is helpful to improve the hit rate of the host acquiring data in the cache in the storage device.
  • the foregoing file prefetching method is further introduced. Specifically, as shown in FIG. 3, the storage device will perform the following steps:
  • the storage device can pre-train the access rule template, so that when there is a read request, the storage device can generate the first keyword based on the file access information carried by the read request.
  • the storage device may use a training model based on text semantics to train a plurality of initial keywords to obtain an access rule template.
  • the access rule template is generally stored in a cache in the storage device (for example, cache 002 in FIG. 1A).
  • the storage device may also back up the access rule template to a hard disk (for example, hard disk 02 in FIG. 1A) for subsequent recall at any time.
  • the training model based on text semantics means that the training model can recognize text semantics and perform association training based on text semantics.
  • training models based on text semantics for example, a clustering model based on text semantics, which can classify keywords according to semantics.
  • the model may include the following functions: semantic textual similarity function, which is used to measure the basic semantics of text matching segments (for example, the aforementioned first keyword) Equivalence degree; Paraphrase identification (PI), that is, to identify whether two words (for example, the first keyword and target keyword mentioned above) express the same meaning; natural language inference, used to parse hypotheses The semantic similarity between the premise and the premise, etc.
  • semantic textual similarity function which is used to measure the basic semantics of text matching segments (for example, the aforementioned first keyword) Equivalence degree
  • Paraphrase identification PI
  • natural language inference used to parse hypotheses The semantic similarity between the premise and the premise, etc.
  • a naive Bayesian model based on text semantics can be trained based on the probability of multiple keywords appearing in the inference process.
  • the initial keyword is similar to the first keyword in the foregoing embodiment.
  • the initial keyword can be generated by generating the first keyword.
  • the foregoing initial keyword is generated based on the relevant information carried by a certain historical read request or multiple times. word.
  • the initial keyword can also be preset by the user according to actual needs.
  • the initial keyword may be a word or a phrase, and may be a noun, an adjective, a quantifier, etc.
  • the specific keyword is not limited here.
  • the storage device may execute the foregoing step 301a multiple times to generate multiple access rule templates, and the foregoing multiple access rule templates may form a set, which is referred to as an access rule template set in the embodiment of the present application.
  • the storage device can also update each access rule template in the access rule template set at any time.
  • the access rule template set may be directly stored in the storage device as internal data of the storage device.
  • the set of access rule templates contained in different storage devices may be different, and in addition, the same access rule template may also be different at different times.
  • the access rule template set can be integrated in a chip for the storage device to call.
  • the storage device may integrate the access rule template set into the AI chip.
  • the AI chip for example, AI chip 003 in FIG. 1A
  • the AI chip may be located in a controller in a storage device (for example, in controller 00 in FIG.
  • the AI chip makes a connection call.
  • the file access information is information related to the file indicated by the read request.
  • the specific step 101 has been introduced in detail, and will not be repeated here.
  • the storage device may also generate a second keyword based on the file access information carried in the read request.
  • the second keyword is different from the first keyword, and the storage device will use the aforementioned first keyword and second keyword to perform subsequent prediction steps.
  • the second keyword may also be a keyword from another document.
  • step 301a and step 301b are independent of each other. That is to say, when the storage device generates the first keyword based on the file access information, the storage device also uses the initial keyword to train the access rule template to generate the access rule template set. Therefore, there is no clear time sequence limitation between the foregoing step 301a and step 301b, and it can be understood that the storage device executes the foregoing step 301a and step 301b at the same time.
  • the storage device after the storage device determines the first keyword and the access rule template set, the storage device will search for the target visit corresponding to the first keyword in the access rule template set based on the first keyword. Regular template.
  • the feature that each of the multiple initial keywords in the regular template matches each initial keyword is described as a feature related word.
  • the characteristic associated word may be the common attribute of the aforementioned multiple initial keywords. For example, if a certain access rule template includes three initial keys of "3", "5" and "7", the characteristic related words of the aforementioned multiple initial keywords can be "odd”, “positive integer”, “prime number”, etc. .
  • the target access rule template in this embodiment includes the first keyword and/or the first characteristic related words, where the first characteristic related words are used to indicate the characteristics that the first keyword conforms to. That is to say, the target access rule template may only include the first keyword, or may only include the first special related word, and may also include the aforementioned first keyword and the first characteristic related word. The details are not limited here.
  • the storage device may search for the target access pattern template from the set of access pattern templates based on the first keyword in the following manner:
  • the storage device will traverse each access regularity template in the aforementioned access regularity template set. During the traversal process, the storage device will determine whether the access rule template contains the first keyword. If the access rule template includes the first keyword, the storage device determines that the access rule template is the target access rule template. Then, the storage device will skip the subsequent judgment and directly execute step 303. If the access rule template does not include the first keyword, the storage device will further determine whether the first keyword meets the characteristics indicated by the feature associated words in the access rule template. If the first keyword meets the characteristics indicated by the characteristic associated words in the visit rule template, then the visit rule template is determined to be the target visit rule template. Then, the storage device executes step 303.
  • the storage device needs to search for a target access rule template that matches both the second keyword and the first keyword in the set of access rule templates.
  • the storage device may search for the target access rule template from the access rule template collection based on the first keyword and the second keyword in the following manner:
  • the storage device will traverse each access regularity template in the aforementioned access regularity template set. During the traversal process, the storage device will determine whether the access rule template contains the first keyword and the second keyword. If the visit rule template includes the first keyword and the second keyword, the storage device determines that the visit rule template is the target visit rule template. Then, the storage device will skip the subsequent judgment and directly execute step 303. If the access rule template only includes one of the first keyword and the second keyword or does not include the first keyword and the second keyword, the storage device further determines the first keyword and the second keyword. Whether the two keywords both conform to the characteristics indicated by the characteristic associated words in the access rule template. If the first keyword and the second keyword both conform to the characteristics indicated by the feature associated words in the access rule template, the storage device determines that the access rule template is the target access rule template. Then, the storage device executes step 303.
  • the target access rule template also includes a mapping relationship, and the mapping relationship is used to indicate an association manner between the plurality of initial keywords.
  • the target keyword and the first keyword match the same characteristics. Specifically, the target keyword meets the characteristics of the first keyword indicated by the first characteristic associated word.
  • the target keyword is used to indicate the prediction file. Specifically, the first keyword and the target keyword have been described in detail in step 102 above, and the details are not repeated here.
  • the mapping relationship may be a mapping table corresponding to the test paper and the answer. If you enter "Language Test Paper Volume 1", it will output "Language Test Paper Answer Volume 1". In practical applications, there are still many examples, and the details are not repeated here.
  • step 304 is similar to the aforementioned step 103.
  • step 103 please refer to step 103, which will not be repeated here.
  • the first keyword is used as the input of the mapping relationship in the target access rule template to output the target keyword, and then the prediction file to be prefetched is determined based on the target keyword. Since the first keyword and the target keyword match the same characteristics, it can be considered that the predicted file indicated by the target keyword is a file that the user needs to read at the next moment. Therefore, even if it is determined that the file access information of the first keyword comes from the file accessed for the first time, the storage device can use the aforementioned first keyword and the target access rule template to calculate the predicted file that has not been accessed, and calculate the predicted file Prefetch into the cache. It is helpful to improve the hit rate of the host acquiring data in the cache in the storage device.
  • the storage device is always in a state of updating and maintaining the access rule template set.
  • the storage device will combine the target access rule template and other access rule templates in the access rule template set to the access rule template.
  • the collection is updated and maintained.
  • the access rule template set includes one target access rule template and at least one candidate access rule template.
  • the storage device determines the degree of association between the target access rule template and each candidate access rule template in at least one candidate access rule template, where the association degree is used to indicate the mapping relationship between the target access rule template and the candidate access rule template. The degree of similarity between the mapping relationships in the candidate access pattern template.
  • the storage device merges the candidate visit rule template corresponding to the degree of association higher than the preset value with the target candidate visit rule template to obtain an updated set of visit rule templates.
  • the access rule template set in the storage device can be updated based on the target access rule template corresponding to the first keyword, that is, the storage device updates the predicted file based on the first keyword.
  • a collection of access rule templates in the storage device can enable each access rule template in the set of multiple updated access rule templates to more accurately determine a more accurate prediction file based on the input keywords, so that the aforementioned prediction file can be prefetched to the cache. Improve the hit rate of read requests sent by the host.
  • FIG. 4 a schematic structural diagram of a storage device 40 provided in this embodiment.
  • the storage device in the method embodiment corresponding to FIG. 1B and FIG. 3 may be based on the structure shown in FIG. 4 in this embodiment.
  • the storage device 40 includes at least one processor 401 and at least one storage medium 402.
  • the processor 401 may be a general-purpose central processing unit (central processing unit, CPU) or a microprocessor (microprocessor).
  • the processor 401 may be a single-core processor (single-CPU) or a multi-core processor (multi-CPU).
  • the processor 401 may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).
  • the processor 401 may be configured to receive a read request issued by the host, and generate keywords (such as the first keyword and the second keyword, etc.) based on the file access information carried in the read request.
  • the processor 401 is also configured to use the first keyword and the target access rule template to generate a target keyword, and prefetch a prediction file based on the target keyword. Specifically, the processor 401 may execute other steps in the aforementioned embodiment corresponding to FIG. 1B and FIG. 3.
  • step 301a when the storage device 40 executes step 301a in the embodiment corresponding to FIG. 3, any one of the following implementation manners can be adopted:
  • the processor 401 is further configured to perform training based on a plurality of initial keywords to generate an access rule template, and update the set of access rule templates based on the aforementioned target access rule template.
  • the aforementioned at least one processor 401 includes one or more first processors 4011, and the first processors 4011 are configured to perform training based on a plurality of initial keywords to generate an access rule template, and based on the aforementioned target access rule The template updates the access rule template collection.
  • the first processor 4011 may be a functional module or an independent functional chip in the processor 401, and is used to maintain the aforementioned access rule template set.
  • the first processor 4011 may also be an AI chip with specific computing functions.
  • the storage device 40 further includes an AI chip 403.
  • the AI chip 403 is located outside the processor 401 and is used to implement the function of the aforementioned first processor 4011, that is, the AI chip 403 is used to base multiple initial keywords. Training is performed to generate an access rule template, and the access rule template set is updated based on the aforementioned target access rule template.
  • the aforementioned processor 401 and the AI chip 403 also include a chip interface, and the processor 401 communicates with the AI chip 403 through the chip interface, so as to call the access rule template set generated in the AI chip 403.
  • the storage medium 402 includes a cache memory 4021 and a hard disk 4022.
  • the cache 4021 may also be referred to as a memory, which is a bridge for communication between external storage (that is, the hard disk 4022) and the processor 401.
  • the cache 4021 can be used to temporarily store the arithmetic data in the processor 401 and data exchanged with an external memory such as the hard disk 4022.
  • the processor 401 can transfer the data to be calculated from the cache 4021 to the processor 401 for calculation.
  • the hard disk 4022 and the cache 4021 further include one or more interfaces, and the one or more interfaces are used to implement data transmission between the hard disk 4022 and the cache 4021.
  • the processor 401 is configured to prefetch the prediction file from the hard disk 4022 to the cache 4021 based on the target keyword.
  • the cache 4021 is also used to store the access rule template generated by the aforementioned processor 401/first processor 4011/AI chip 403.
  • the processor 401/first processor 4011/AI chip 403 can also back up the aforementioned access rule template to the hard disk 4022, so that the processor 401/first processor 4011/AI chip 403 can access the
  • the access rule template is called in the hard disk 4022.
  • the AI chip 403 also contains a part of storage media, when the AI chip 403 generates the access rule template, the access rule template can be directly stored in the AI chip 403.
  • processor 401 the aforementioned processor 401, cache 4021, and AI chip 403 are generally located in the same controller (not shown). Specifically, you can refer to the system architecture diagram corresponding to FIG. 1A, and the details are not repeated here.
  • FIG. 5 it is a schematic structural diagram of a prefetching device 50 provided by an embodiment of this application.
  • the prefetching device 50 is located in the aforementioned storage device 40.
  • the storage device 40 stores a computer program or instruction, and the prefetching device 50 invokes the computer program or instruction to execute the following modules: a keyword generation module 501, a calculation module 502, a data migration module 503, and a template generation module 504.
  • the keyword generation module 501, the calculation module 502, and the data migration module 503 are located in the processor 401 in the storage device 40 shown in FIG. 4; the template generation module 504 may be located in the storage device 40 shown in FIG.
  • the processor 401 is, for example, the first processor 4011 in the processor 401; the template generation module 504 may also be located in the AI chip 403 in the storage device 40 shown in FIG. 4.
  • the keyword generating module 501 is configured to generate the first keyword according to the file access information carried in the read request. Specifically, please refer to the relevant introduction in the foregoing step 101 and the foregoing step 301b.
  • the calculation module 502 is configured to use the first keyword and the target access rule template to generate a target keyword.
  • the target keyword matches the same characteristics as the first keyword, and the target keyword is used to indicate a prediction file.
  • the data migration module 503 is configured to prefetch the prediction file indicated by the target keyword from the hard disk to the cache. Specifically, reference may be made to the related introduction of the foregoing step 103 and the foregoing step 304.
  • the prefetching device 50 stores multiple access rule templates, or the prefetching device 50 can call multiple access rule templates stored in the storage device.
  • the target access rule template is an access rule template corresponding to the first keyword among the plurality of access rule templates, and each of the access rule templates is obtained by training a plurality of initial keywords using a training model based on text semantics.
  • the target visit rule template includes the first keyword and/or first characteristic related words, and the first characteristic related words are used to indicate the characteristics that the first keyword conforms to.
  • the target access rule template further includes a mapping relationship, and the mapping relationship is used to indicate an association manner between the plurality of initial keywords, and the plurality of initial keywords include the first keyword.
  • the calculation module 502 is specifically configured to use the first keyword as the input of the mapping relationship, calculate the first keyword using the mapping relationship, and output the target keyword.
  • the target keyword matches the first keyword.
  • the calculation module 502 is further configured to: determine whether the access rule template includes the first keyword; if the access rule template includes the first keyword, determine that the access rule template is the target access rule template; If the access rule template does not contain the first keyword, then it is further determined whether the first keyword meets the characteristics indicated by the feature related words in the access rule template; if the first keyword meets the feature related words in the access rule template Indicates the characteristic, the visit rule template is determined to be the target visit rule template.
  • the relevant introduction in 303 please refer to the relevant introduction in 303 above.
  • the template generation module 504 is used to train the plurality of initial keywords using the text semantic training model to obtain the access rule template. Specifically, reference may be made to the related introduction of the foregoing step 301a.
  • One or more of the above modules can be implemented by software, hardware or a combination of both.
  • the software exists in the form of computer program instructions and is stored in the memory, and the processor can be used to execute the program instructions and implement the above method flow.
  • the processor at this time may include but is not limited to at least one of the following: central processing unit, microprocessor, digital signal processing (digital signal processing, DSP), microcontroller (microcontroller unit, MCU), or artificial intelligence processing
  • Various computing devices that run software, such as a computer and each computing device may include one or more cores for executing software instructions to perform operations or processing.
  • the processor may be built in a system on chip (SoC) or an application specific integrated circuit (ASIC), or it may be an independent semiconductor chip.
  • SoC system on chip
  • ASIC application specific integrated circuit
  • the processor's internal processing is used to execute software instructions to perform calculations or processing, and may further include necessary hardware accelerators, such as field programmable gate array (FPGA) and programmable logic circuit (programmable logic). device, PLD) or a logic circuit that implements dedicated logic operations.
  • FPGA field programmable gate array
  • PLD programmable logic circuit
  • the hardware can be CPU, microcontroller, DSP, MCU, artificial intelligence processor, ASIC, SoC, FPGA, PLD, dedicated digital circuit, hardware accelerator or non-integrated discrete device
  • it can run necessary software or does not rely on software to perform the above method flow.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A file prefetching method, a storage device, and a prefetching apparatus. In the file prefetching method, a storage device generates a first keyword according to file access information carried in a read request (101); generates a target keyword using the first keyword and a target access rule template (102); and prefetches a prediction file indicated by the target keyword into a cache (103). The target keyword and the first keyword match the same characteristics. The first keyword and the target keyword match the same characteristics, and therefore, the prediction file indicated by the target keyword may be a file that a user needs to read at the next moment. Therefore, the file prefetching method can be used to predict a file that a processor accesses for the first time, so as to implement the prefetching of a file that has not been accessed, thereby facilitating the processor to increase the hit rate of a read request when accessing a new file.

Description

一种文件预取方法、存储设备以及预取装置File prefetching method, storage equipment and prefetching device 技术领域Technical field
本申请实施例涉及计算机存储领域,尤其涉及一种文件预取方法、存储设备以及预取装置。The embodiments of the present application relate to the field of computer storage, and in particular, to a file prefetching method, storage device, and prefetching device.
背景技术Background technique
缓存数据预取技术是一种计算机操作系统缓存存储优化技术。该缓存数据预取技术指在存储设备中的处理器访问数据进行计算之前,提前将前述数据从主存储器加载到缓存存储器上,以增加主机下发读请求的命中率,降低处理器访问数据的停顿时间,从而达到提升读性能的目的。Cached data prefetching technology is a computer operating system cache storage optimization technology. The cache data prefetching technology refers to loading the aforementioned data from the main memory to the cache memory in advance before the processor in the storage device accesses the data for calculation, so as to increase the hit rate of the read request issued by the host and reduce the processor access to the data. Pause time, so as to achieve the purpose of improving read performance.
在现有技术中,存储设备将基于历史访问记录生成已被访问的文件的访问规律,形成由多个文件组成的历史访问队列。当前述历史访问队列中的某一个文件再次被访问时,存储设备可以将该历史访问队列中位于该文件之后的文件预取至缓存中,以增加存储设备中的主机下发读请求的命中率,提升系统读性能。In the prior art, the storage device will generate the access rule of the accessed file based on the historical access record to form a historical access queue composed of multiple files. When a certain file in the aforementioned historical access queue is accessed again, the storage device can prefetch the file located after the file in the historical access queue into the cache to increase the hit rate of read requests issued by the host in the storage device , Improve system read performance.
在这样的方案中,由于前述历史访问队列是基于已被访问过的文件生成的,因此,该存储设备仅可以预取已经被访问过的文件,而无法对未访问过的文件进行预取。因此,不利于主机在访问存储设备中的新文件时提升读请求的命中率。In such a solution, since the aforementioned historical access queue is generated based on files that have been accessed, the storage device can only prefetch files that have been accessed, but cannot prefetch files that have not been accessed. Therefore, it is not conducive for the host to increase the hit rate of read requests when accessing new files in the storage device.
发明内容Summary of the invention
本申请实施例提供了一种文件预取方法以及存储设备,用于对主机第一次访问的文件进行预测,以实现对未访问过的文件进行预取,有利于主机在访问新文件时提升读请求的命中率。The embodiments of the present application provide a file prefetching method and storage device, which are used to predict the file that the host accesses for the first time, so as to realize the prefetching of the files that have not been accessed, which is beneficial to the host when accessing new files. The hit rate of read requests.
第一方面,本申请实施例提供了一种文件预取方法,应用于存储设备中,用于将文件从低速存储介质预取至高速存储介质中。在文件预取方法中,该存储设备先接收主机下发的读请求,该读请求用于指示待访问的文件,该读请求携带该文件的文件访问信息。该存储设备根据该文件访问信息生成第一关键词。然后,该存储设备采用该第一关键词和目标访问规律模板生成目标关键词。其中,该目标关键词与该第一关键词符合相同的特征,该目标关键词用于指示预测文件。然后,该存储设备将该目标关键词指示的该预测文件预取至缓存中。In the first aspect, an embodiment of the present application provides a file prefetching method, which is applied to a storage device and is used to prefetch a file from a low-speed storage medium to a high-speed storage medium. In the file prefetching method, the storage device first receives a read request issued by the host, the read request is used to indicate the file to be accessed, and the read request carries file access information of the file. The storage device generates the first keyword according to the file access information. Then, the storage device uses the first keyword and the target access rule template to generate the target keyword. Wherein, the target keyword and the first keyword match the same characteristics, and the target keyword is used to indicate the prediction file. Then, the storage device prefetches the prediction file indicated by the target keyword into the cache.
其中,前述文件访问信息为与该读请求指示的文件相关的信息,例如,文件名称、文件类型、访问时间、文件创建者名称、文件访问者名称以及访问目录等。前述低速存储介质可以为缓存,前述高速存储介质可以为硬盘。Wherein, the aforementioned file access information is information related to the file indicated by the read request, for example, file name, file type, access time, file creator name, file visitor name, and access directory. The aforementioned low-speed storage medium may be a cache, and the aforementioned high-speed storage medium may be a hard disk.
本申请实施例中,采用第一关键词和目标访问规律模板可以生成目标关键词,然后,基于该目标关键词确定进行预取的预测文件。由于,该第一关键词和该目标关键词符合相同的特征,因此,可以认为该目标关键词指示的预测文件为用户在下一时刻需要读取的文 件。因此,即使确定第一关键词的文件访问信息来自第一次访问的文件,该存储设备也可以通过前述第一关键词和目标访问规律模板推算出未访问过的预测文件,并将该预测文件预取至缓存中。有利于提升主机在存储设备中的缓存内获取数据的命中率。In the embodiment of the present application, the target keyword can be generated by using the first keyword and the target access rule template, and then the prediction file to be prefetched is determined based on the target keyword. Since the first keyword and the target keyword match the same characteristics, it can be considered that the predicted file indicated by the target keyword is the file that the user needs to read at the next moment. Therefore, even if it is determined that the file access information of the first keyword comes from the file accessed for the first time, the storage device can use the aforementioned first keyword and the target access rule template to calculate the predicted file that has not been accessed, and calculate the predicted file Prefetch into the cache. It is helpful to improve the hit rate of the host acquiring data in the cache in the storage device.
根据第一方面,本申请实施例第一方面的第一种实施方式中,该存储设备存储有多个访问规律模板,该目标访问规律模板为该多个访问规律模板中与该第一关键词对应的访问规律模板,每个访问规律模板由采用基于文本语义的训练模型对多个初始关键词进行训练而得。According to the first aspect, in the first implementation manner of the first aspect of the embodiments of the present application, the storage device stores a plurality of access rule templates, and the target access rule template is the first keyword among the plurality of access rule templates. Corresponding access rule templates, each access rule template is obtained by training multiple initial keywords using a training model based on text semantics.
也可以理解为,该存储设备采用基于文本语义的训练模型对多个初始关键词进行训练得到前述访问规律模板。类似的,该存储设备可以训练出多个访问规律模板。其中,与前述第一关键词对应的访问规律模板为前述目标访问规律模板。It can also be understood that the storage device uses a training model based on text semantics to train multiple initial keywords to obtain the aforementioned access rule template. Similarly, the storage device can train multiple access rule templates. Wherein, the access rule template corresponding to the aforementioned first keyword is the aforementioned target access rule template.
本实施方式中,提出为了能够较为准确地确定预测文件,该存储设备需要选择与该第一关键词对应的访问规律模板(即目标访问规律模板),以使得生成用于查找预测文件的目标关键词。在这样的实施方式中,根据第一关键词选择对应的访问规律模板有利于提高预测文件的准确率,进一步有利于提高该预测文件被下发的读请求命中的概率。将该预测文件预取至缓存中,有利于提升主机在存储设备中的缓存内获取数据的命中率。In this embodiment, it is proposed that in order to be able to determine the predicted file more accurately, the storage device needs to select the access rule template corresponding to the first keyword (that is, the target access rule template), so that the target key for finding the predicted file is generated. word. In such an embodiment, selecting the corresponding access rule template according to the first keyword is beneficial to improve the accuracy of the predicted file, and further helps to increase the probability that the predicted file is hit by the issued read request. Prefetching the predicted file into the cache is beneficial to improve the hit rate of the host acquiring data in the cache of the storage device.
根据第一方面的第一种实施方式,本申请实施例第一方面的第二种实施方式中,该目标访问规律模板包含该第一关键词和/或第一特征关联词,该第一特征关联词用于指示该第一关键词符合的特征。According to the first implementation manner of the first aspect, in the second implementation manner of the first aspect of the embodiments of the present application, the target access rule template includes the first keyword and/or the first characteristic related word, and the first characteristic related word It is used to indicate the characteristics of the first keyword.
本实施方式中,进一步提出了何为目标访问规律模板,为该第一关键词与该目标访问规律模板对应而提供依据。In this embodiment, what is a target access rule template is further proposed, which provides a basis for the correspondence between the first keyword and the target access rule template.
根据第一方面的第二种实施方式,本申请实施例第一方面的第三种实施方式中,该目标访问规律模板还包含映射关系,该映射关系用于指示该多个初始关键词之间的关联方式,该多个初始关键词包括该第一关键词。该采用该第一关键词和目标访问规律模板生成目标关键词,包括:将该第一关键词作为该映射关系的输入,采用该映射关系对该第一关键词进行计算,输出该目标关键词,该目标关键词符合该第一特征关联词指示的该第一关键词的特征。According to the second implementation manner of the first aspect, in the third implementation manner of the first aspect of the embodiments of the present application, the target access rule template further includes a mapping relationship, and the mapping relationship is used to indicate the relationship between the plurality of initial keywords The multiple initial keywords include the first keyword. Using the first keyword and the target access rule template to generate the target keyword includes: taking the first keyword as the input of the mapping relationship, calculating the first keyword using the mapping relationship, and outputting the target keyword , The target keyword meets the characteristics of the first keyword indicated by the first characteristic associated word.
本实施方式中,提出该目标访问规律模板还包含映射关系,该映射关系是在对前述多个初始关键词进行训练时生成的。该映射关系用于指示该多个初始关键词之间的关联方式,可以理解为,当已知该多个初始关键词中的一个初始关键词和该映射关系之后,该存储设备既可以计算出其余的初始关键词。因此,当该第一关键词与该目标访问规律模板对应时,该第一关键词也适用于该目标访问规律模板中的映射关系。此时,该存储设备可以将该第一关键词作为该映射关系的输入,采用该映射关系对该第一关键词进行计算,从而输出该目标关键词。在这样的实施方式中,明确了确定目标关键词的具体方式,提高了方案的可靠性。In this embodiment, it is proposed that the target access rule template further includes a mapping relationship, which is generated when the foregoing multiple initial keywords are trained. The mapping relationship is used to indicate the association mode between the multiple initial keywords. It can be understood that when one of the multiple initial keywords and the mapping relationship are known, the storage device can either calculate The remaining initial keywords. Therefore, when the first keyword corresponds to the target access rule template, the first keyword is also applicable to the mapping relationship in the target access rule template. At this time, the storage device may use the first keyword as the input of the mapping relationship, and use the mapping relationship to calculate the first keyword, thereby outputting the target keyword. In such an implementation manner, the specific method for determining the target keyword is clarified, and the reliability of the solution is improved.
根据第一方面的第二种实施方式或第一方面的第三种实施方式,本申请实施例第一方面的第四种实施方式中,当该存储设备采用该第一关键词从多个访问规律模板中查找目标访问规律模板时,该存储设备将执行如下步骤:According to the second implementation manner of the first aspect or the third implementation manner of the first aspect, in the fourth implementation manner of the first aspect of the embodiments of the present application, when the storage device uses the first keyword to access from multiple When searching for the target access rule template in the rule template, the storage device will perform the following steps:
判断访问规律模板是否包含该第一关键词;Determine whether the access pattern template contains the first keyword;
若访问规律模板包含该第一关键词,则确定该访问规律模板为该目标访问规律模板;If the access rule template contains the first keyword, it is determined that the access rule template is the target access rule template;
若访问规律模板不包含该第一关键词,则进一步判断该第一关键词是否符合该访问规律模板中的特征关联词指示的特征;If the access pattern template does not include the first keyword, it is further determined whether the first keyword meets the characteristics indicated by the characteristic associated words in the access pattern template;
若该第一关键词符合访问规律模板中的特征关联词指示的特征,则确定该访问规律模板为该目标访问规律模板。If the first keyword conforms to the characteristics indicated by the characteristic associated words in the visit rule template, the visit rule template is determined to be the target visit rule template.
根据第一方面的第二种实施方式或第一方面的第三种实施方式,本申请实施例第一方面的第五种实施方式中,该读请求携带的文件访问信息还用于生成第二关键词,该第二关键词和该第一关键词均为与该目标访问规律模板匹配的关键词。当该存储设备同时采用该第一关键词和该第二关键词从多个访问规律模板中查找目标访问规律模板时,该存储设备将执行如下步骤:According to the second implementation manner of the first aspect or the third implementation manner of the first aspect, in the fifth implementation manner of the first aspect of the embodiments of the present application, the file access information carried in the read request is also used to generate a second Keywords, both the second keyword and the first keyword are keywords that match the target access rule template. When the storage device simultaneously uses the first keyword and the second keyword to find a target access pattern template from multiple access pattern templates, the storage device will perform the following steps:
判断访问规律模板是否包含该第一关键词和该第二关键词;Determine whether the access pattern template contains the first keyword and the second keyword;
若访问规律模板包含该第一关键词和该第二关键词,则确定该访问规律模板为该目标访问规律模板;If the access pattern template includes the first keyword and the second keyword, determine that the access pattern template is the target access pattern template;
若访问规律模板仅包含该第一关键词和该第二关键词中的一个或不包含该第一关键词和该第二关键词,则进一步判断该第一关键词和该第二关键词是否均符合该访问规律模板中的特征关联词指示的特征;If the access pattern template only contains one of the first keyword and the second keyword or does not contain the first keyword and the second keyword, then it is further determined whether the first keyword and the second keyword are All conform to the characteristics indicated by the characteristic associated words in the access rule template;
若该第一关键词和该第二关键词均符合访问规律模板中的特征关联词指示的特征,则确定访问规律模板为该目标访问规律模板。If both the first keyword and the second keyword conform to the characteristics indicated by the characteristic associated words in the access rule template, the access rule template is determined to be the target access rule template.
根据第一方面、第一方面的第一种实施方式至第一方面的第五种实施方式,本申请实施例第一方面的第六种实施方式中,该文件访问信息为来自不同访问目录的文件的信息,该多个预测文件位于不同的访问目录。According to the first aspect, the first implementation manner of the first aspect to the fifth implementation manner of the first aspect, in the sixth implementation manner of the first aspect of the embodiments of the present application, the file access information is from different access directories File information, the multiple predicted files are located in different access directories.
本实施方式中,提出由于确定第一关键词的文件访问信息为来自不同访问目录的文件的信息,基于该第一关键词采用前述方法预测出的预测文件也可以位于不同的访问目录。在这样的实施方式中,相比于仅对同一访问目录下的文件进行预测,扩展了预测文件的存储范围,使得被预取的预测文件可以来自不同的访问目录。In this embodiment, it is proposed that since the file access information of the first keyword is determined to be information from files in different access directories, the predicted files predicted by the aforementioned method based on the first keyword may also be located in different access directories. In such an embodiment, compared to only predicting files in the same access directory, the storage range of the prediction file is expanded, so that the prefetched prediction file can come from different access directories.
根据第一方面、第一方面的第一种实施方式至第一方面的第五种实施方式,本申请实施例第一方面的第七种实施方式中,该存储设备还可以对访问规律模板集合进行更新,访问规律模板集合指前述位于该存储设备中的多个访问规律模板。According to the first aspect and the first implementation manner of the first aspect to the fifth implementation manner of the first aspect, in the seventh implementation manner of the first aspect of the embodiments of the present application, the storage device may also perform access rule template collections. After updating, the access rule template set refers to the aforementioned multiple access rule templates located in the storage device.
本实施方式中,访问规律模板集合包括一个该目标访问规律模板和至少一个候选访问规律模板。具体地,该存储设备确定该目标访问规律模板与至少一个候选访问规律模板中每个该候选访问规律模板的关联度,其中,该关联度用于指示该目标访问规律模板中的映射关系与该候选访问规律模板中的映射关系之间的相似程度。然后,该存储设备将高于预设值的关联度对应的候选访问规律模板与该目标候选访问规律模板合并,得到更新的访问规律模板集合。In this embodiment, the access rule template set includes one target access rule template and at least one candidate access rule template. Specifically, the storage device determines the degree of association between the target access rule template and each candidate access rule template in at least one candidate access rule template, where the association degree is used to indicate the mapping relationship between the target access rule template and the candidate access rule template. The degree of similarity between the mapping relationships in the candidate access pattern template. Then, the storage device merges the candidate visit rule template corresponding to the degree of association higher than the preset value with the target candidate visit rule template to obtain an updated set of visit rule templates.
本实施方式中,提出该存储设备中的访问规律模板集合可以基于与该第一关键词对应的目标访问规律模板进行更新,即该存储设备在基于第一关键词确定了预测文件之后,再更新该存储设备中的访问规律模板集合。这样的实施方式,可以使得经过多次更新的访问规律模板集合中的各个访问规律模板可以更准确地基于输入的关键词确定更准确的预测 文件,进而使得将前述预测文件预取至缓存后可以提高该主机下发读请求的命中率。In this embodiment, it is proposed that the access rule template set in the storage device can be updated based on the target access rule template corresponding to the first keyword, that is, the storage device updates the predicted file based on the first keyword. A collection of access rule templates in the storage device. Such an implementation manner can enable each access rule template in the set of multiple updated access rule templates to more accurately determine a more accurate prediction file based on the input keywords, so that the aforementioned prediction file can be prefetched to the cache. Improve the hit rate of read requests issued by the host.
第二方面,本申请实施例提供了一种存储设备,该存储设备包括:缓存、硬盘以及至少一个处理器。其中,至少一个处理器,用于执行如下操作:根据读请求携带的文件访问信息生成第一关键词;采用该第一关键词和目标访问规律模板生成目标关键词,该目标关键词与该第一关键词符合相同的特征,该目标关键词用于指示预测文件;将该目标关键词指示的该预测文件从该硬盘预取至该缓存中。In a second aspect, an embodiment of the present application provides a storage device, which includes: a cache, a hard disk, and at least one processor. Among them, at least one processor is configured to perform the following operations: generate a first keyword according to the file access information carried in the read request; use the first keyword and the target access rule template to generate a target keyword, the target keyword and the first keyword A keyword meets the same characteristics, and the target keyword is used to indicate a prediction file; the prediction file indicated by the target keyword is prefetched from the hard disk to the cache.
其中,该文件访问信息为与该读请求指示的文件相关的信息,例如,文件名称、文件类型、访问时间、文件创建者名称、文件访问者名称以及访问目录等。The file access information is information related to the file indicated by the read request, for example, file name, file type, access time, file creator name, file visitor name, and access directory.
本申请实施例中,采用第一关键词和目标访问规律模板可以生成目标关键词,然后,基于该目标关键词确定进行预取的预测文件。由于,该第一关键词和该目标关键词符合相同的特征,因此,可以认为该目标关键词指示的预测文件为用户在下一时刻需要读取的文件。因此,即使确定第一关键词的文件访问信息来自第一次访问的文件,该存储设备也可以通过前述第一关键词和目标访问规律模板推算出未访问过的预测文件,并将该预测文件预取至缓存中。有利于提升主机在存储设备中的缓存内获取数据的命中率。In the embodiment of the present application, the target keyword can be generated by using the first keyword and the target access rule template, and then the prediction file to be prefetched is determined based on the target keyword. Since the first keyword and the target keyword match the same characteristics, it can be considered that the predicted file indicated by the target keyword is a file that the user needs to read at the next moment. Therefore, even if it is determined that the file access information of the first keyword comes from the file accessed for the first time, the storage device can use the aforementioned first keyword and the target access rule template to calculate the predicted file that has not been accessed, and calculate the predicted file Prefetch into the cache. It is helpful to improve the hit rate of the host acquiring data in the cache in the storage device.
根据第二方面,本申请实施例第二方面的第一种实施方式中,该存储设备存储有多个访问规律模板,该目标访问规律模板为该多个访问规律模板中与该第一关键词对应的访问规律模板,每个访问规律模板由采用基于文本语义的训练模型对多个初始关键词进行训练而得。According to the second aspect, in the first implementation manner of the second aspect of the embodiments of the present application, the storage device stores a plurality of access rule templates, and the target access rule template is the first keyword among the plurality of access rule templates. Corresponding access rule templates, each access rule template is obtained by training multiple initial keywords using a training model based on text semantics.
本实施方式中,提出为了能够较为准确地确定预测文件,该存储设备需要选择与该第一关键词对应的访问规律模板(即目标访问规律模板),以使得生成用于查找预测文件的目标关键词。在这样的实施方式中,根据第一关键词选择对应的访问规律模板有利于提高预测文件的准确率,进一步有利于提高该预测文件被下发的读请求命中的概率。将该预测文件预取至缓存中,有利于提升主机在存储设备中的缓存内获取数据的命中率。In this embodiment, it is proposed that in order to be able to determine the predicted file more accurately, the storage device needs to select the access rule template corresponding to the first keyword (that is, the target access rule template), so that the target key for finding the predicted file is generated. word. In such an embodiment, selecting the corresponding access rule template according to the first keyword is beneficial to improve the accuracy of the predicted file, and further helps to increase the probability that the predicted file is hit by the issued read request. Prefetching the predicted file into the cache is beneficial to improve the hit rate of the host acquiring data in the cache of the storage device.
根据第二方面的第一种实施方式,本申请实施例第二方面的第二种实施方式中,该目标访问规律模板包含该第一关键词和/或第一特征关联词,该第一特征关联词用于指示该第一关键词符合的特征。According to the first implementation manner of the second aspect, in the second implementation manner of the second aspect of the embodiments of the present application, the target access rule template includes the first keyword and/or the first characteristic related word, and the first characteristic related word It is used to indicate the characteristics of the first keyword.
根据第二方面的第二种实施方式,本申请实施例第二方面的第三种实施方式中,该目标访问规律模板还包含映射关系,该映射关系用于指示该多个初始关键词之间的关联方式,该多个初始关键词包括该第一关键词。该至少一个处理器,具体用于将该第一关键词作为该映射关系的输入,采用该映射关系对该第一关键词进行计算,输出该目标关键词,该目标关键词符合该第一特征关联词指示的该第一关键词的特征。According to the second implementation manner of the second aspect, in the third implementation manner of the second aspect of the embodiments of the present application, the target access rule template further includes a mapping relationship, and the mapping relationship is used to indicate the relationship between the plurality of initial keywords The multiple initial keywords include the first keyword. The at least one processor is specifically configured to use the first keyword as the input of the mapping relationship, calculate the first keyword using the mapping relationship, and output the target keyword, the target keyword conforming to the first feature The characteristics of the first keyword indicated by the associated word.
本实施方式中,提出该目标访问规律模板还包含映射关系,该映射关系是在对前述多个初始关键词进行训练时生成的。该映射关系用于指示该多个初始关键词之间的关联方式,可以理解为,当已知该多个初始关键词中的一个初始关键词和该映射关系之后,该存储设备既可以计算出其余的初始关键词。因此,当该第一关键词与该目标访问规律模板对应时,该第一关键词也适用于该目标访问规律模板中的映射关系。此时,该存储设备可以将该第一关键词作为该映射关系的输入,采用该映射关系对该第一关键词进行计算,从而输出该目标关键词。在这样的实施方式中,明确了确定目标关键词的具体方式,提高了方案的可 靠性。In this embodiment, it is proposed that the target access rule template further includes a mapping relationship, which is generated when the foregoing multiple initial keywords are trained. The mapping relationship is used to indicate the association mode between the multiple initial keywords. It can be understood that when one of the multiple initial keywords and the mapping relationship are known, the storage device can either calculate The remaining initial keywords. Therefore, when the first keyword corresponds to the target access rule template, the first keyword is also applicable to the mapping relationship in the target access rule template. At this time, the storage device may use the first keyword as the input of the mapping relationship, and use the mapping relationship to calculate the first keyword, thereby outputting the target keyword. In such an implementation manner, the specific method for determining the target keyword is clarified, and the reliability of the solution is improved.
根据第二方面的第二种实施方式或第二方面的第三种实施方式,本申请实施例第二方面的第四种实施方式中,该至少一个处理器,还用于:判断访问规律模板是否包含该第一关键词;若该访问规律模板包含该第一关键词,则确定该访问规律模板为该目标访问规律模板;若该访问规律模板不包含该第一关键词,则进一步判断该第一关键词是否符合该访问规律模板中的特征关联词指示的特征;若该第一关键词符合该访问规律模板中的特征关联词指示的特征,则确定该访问规律模板为该目标访问规律模板。According to the second implementation manner of the second aspect or the third implementation manner of the second aspect, in the fourth implementation manner of the second aspect of the embodiments of the present application, the at least one processor is further configured to: determine the access rule template Whether the first keyword is included; if the access rule template contains the first keyword, the access rule template is determined to be the target access rule template; if the access rule template does not include the first keyword, then the access rule template is further judged Whether the first keyword meets the characteristics indicated by the characteristic associated words in the visit rule template; if the first keyword meets the characteristics indicated by the characteristic associated words in the visit rule template, then the visit rule template is determined to be the target visit rule template.
根据第二方面的第二种实施方式或第二方面的第三种实施方式,本申请实施例第二方面的第五种实施方式中,该读请求携带的文件访问信息还用于生成第二关键词,该第二关键词和该第一关键词均为与该目标访问规律模板匹配的关键词;该至少一个处理器,还用于:判断访问规律模板是否包含该第一关键词和该第二关键词;若该访问规律模板包含该第一关键词和该第二关键词,则确定该访问规律模板为该目标访问规律模板;若该访问规律模板仅包含该第一关键词和该第二关键词中的一个或不包含该第一关键词和该第二关键词,则进一步判断该第一关键词和该第二关键词是否均符合该访问规律模板中的特征关联词指示的特征;若该第一关键词和该第二关键词均符合该访问规律模板中的特征关联词指示的特征,则确定该访问规律模板为该目标访问规律模板。According to the second implementation manner of the second aspect or the third implementation manner of the second aspect, in the fifth implementation manner of the second aspect of the embodiments of the present application, the file access information carried in the read request is also used to generate a second Keywords, the second keyword and the first keyword are keywords that match the target access pattern template; the at least one processor is also used to: determine whether the access pattern template contains the first keyword and the The second keyword; if the access pattern template contains the first keyword and the second keyword, the access pattern template is determined to be the target access pattern template; if the access pattern template only contains the first keyword and the One of the second keywords or does not include the first keyword and the second keyword, then it is further determined whether the first keyword and the second keyword are both in line with the characteristics indicated by the characteristic related words in the access rule template ; If both the first keyword and the second keyword meet the characteristics indicated by the feature associated words in the access rule template, then the access rule template is determined to be the target access rule template.
根据第二方面、第二方面的第一种实施方式至第二方面的第五种实施方式,本申请实施例第二方面的第六种实施方式中,该文件访问信息为来自不同访问目录的文件的信息,该多个预测文件位于不同的访问目录。According to the second aspect and the first implementation manner of the second aspect to the fifth implementation manner of the second aspect, in the sixth implementation manner of the second aspect of the embodiments of the present application, the file access information is from different access directories File information, the multiple predicted files are located in different access directories.
本实施方式中,提出由于确定第一关键词的文件访问信息为来自不同访问目录的文件的信息,基于该第一关键词采用前述方法预测出的预测文件也可以位于不同的访问目录。在这样的实施方式中,相比于仅对同一访问目录下的文件进行预测,扩展了预测文件的存储范围,使得被预取的预测文件可以来自不同的访问目录。In this embodiment, it is proposed that since the file access information of the first keyword is determined to be information from files in different access directories, the predicted files predicted by the aforementioned method based on the first keyword may also be located in different access directories. In such an embodiment, compared to only predicting files in the same access directory, the storage range of the prediction file is expanded, so that the prefetched prediction file can come from different access directories.
根据第二方面、第二方面的第一种实施方式至第二方面的第五种实施方式,本申请实施例第二方面的第七种实施方式中,该至少一个处理器还包括第一处理器,该第一处理器用于采用该基于文本语义的训练模型对该多个初始关键词进行训练得到该访问规律模板。According to the second aspect and the first implementation manner of the second aspect to the fifth implementation manner of the second aspect, in the seventh implementation manner of the second aspect of the embodiments of the present application, the at least one processor further includes a first processing The first processor is configured to use the text semantic-based training model to train the multiple initial keywords to obtain the access rule template.
根据第二方面的第七种实施方式,本申请实施例第二方面的第八种实施方式中,该第一处理器还可以对访问规律模板集合进行更新,该访问规律模板集合指前述位于该存储设备中的多个访问规律模板。According to the seventh implementation manner of the second aspect, in the eighth implementation manner of the second aspect of the embodiments of the present application, the first processor may also update the access rule template set, and the access rule template set refers to the aforementioned set of templates located in the Multiple access regularity templates in the storage device.
本实施方式中,该访问规律模板集合包括一个该目标访问规律模板和至少一个候选访问规律模板。具体地,该第一处理器,用于确定该目标访问规律模板与该至少一个候选访问规律模板中每个该候选访问规律模板的关联度,其中,该关联度用于指示该目标访问规律模板中的映射关系与该候选访问规律模板中的映射关系之间的相似程度。该第一处理器,还用于将高于预设值的关联度对应的候选访问规律模板与该目标候选访问规律模板合并,得到更新的访问规律模板集合。In this embodiment, the visit rule template set includes one target visit rule template and at least one candidate visit rule template. Specifically, the first processor is configured to determine the degree of association between the target visit rule template and each of the at least one candidate visit rule template, wherein the degree of association is used to indicate the target visit rule template The degree of similarity between the mapping relationship in and the mapping relationship in the candidate access rule template. The first processor is further configured to merge the candidate visit rule template corresponding to the degree of association higher than the preset value with the target candidate visit rule template to obtain an updated set of visit rule templates.
本实施方式中,提出该存储设备中的访问规律模板集合可以基于与该第一关键词对应的目标访问规律模板进行更新,即该存储设备中的至少一个处理器在基于第一关键词确定了预测文件之后,该存储设备中的第一处理器再更新该存储设备中的访问规律模板集合。 这样的实施方式,可以使得经过多次更新的访问规律模板集合中的各个访问规律模板可以更准确地基于输入的关键词确定更准确的预测文件,进而使得将前述预测文件预取至缓存后可以提高主机在该存储设备中的缓存内获取数据的命中率。In this embodiment, it is proposed that the access rule template set in the storage device can be updated based on the target access rule template corresponding to the first keyword, that is, at least one processor in the storage device determines that the access rule template is based on the first keyword. After predicting the file, the first processor in the storage device then updates the access rule template set in the storage device. Such an implementation manner can enable each access rule template in the set of multiple updated access rule templates to more accurately determine a more accurate prediction file based on the input keywords, so that the aforementioned prediction file can be prefetched to the cache. Improve the hit rate of the host acquiring data in the cache in the storage device.
第三方面,本申请实施例提供了一种预取装置,该预取装置位于存储设备中,该存储设备还包括缓存和硬盘。该存储设备存储有计算机程序或指令,该预取装置调用该计算机程序或指令执行如下模块:关键词生成模块,用于根据读请求携带的文件访问信息生成第一关键词;计算模块,用于采用该第一关键词和目标访问规律模板生成目标关键词,其中,该目标关键词与该第一关键词符合相同的特征,该目标关键词用于指示预测文件;数据迁移模块,用于将该目标关键词指示的该预测文件从该硬盘预取至该缓存中。In a third aspect, an embodiment of the present application provides a prefetching device, the prefetching device is located in a storage device, and the storage device further includes a cache and a hard disk. The storage device stores a computer program or instruction, and the prefetching device invokes the computer program or instruction to execute the following modules: a keyword generation module for generating the first keyword according to the file access information carried in the read request; a calculation module for Use the first keyword and the target access rule template to generate a target keyword, where the target keyword and the first keyword match the same characteristics, and the target keyword is used to indicate the prediction file; the data migration module is used to transfer The predicted file indicated by the target keyword is prefetched from the hard disk to the cache.
其中,该文件访问信息为与该读请求指示的文件相关的信息,例如,文件名称、文件类型、访问时间、文件创建者名称、文件访问者名称以及访问目录等。The file access information is information related to the file indicated by the read request, for example, file name, file type, access time, file creator name, file visitor name, and access directory.
本申请实施例中,采用第一关键词和目标访问规律模板可以生成目标关键词,然后,基于该目标关键词确定进行预取的预测文件。由于,该第一关键词和该目标关键词符合相同的特征,因此,可以认为该目标关键词指示的预测文件为用户在下一时刻需要读取的文件。因此,即使确定第一关键词的文件访问信息来自第一次访问的文件,该存储设备也可以通过前述第一关键词和目标访问规律模板推算出未访问过的预测文件,并将该预测文件预取至缓存中。有利于提升主机在存储设备中的缓存内获取数据的命中率。In the embodiment of the present application, the target keyword can be generated by using the first keyword and the target access rule template, and then the prediction file to be prefetched is determined based on the target keyword. Since the first keyword and the target keyword match the same characteristics, it can be considered that the predicted file indicated by the target keyword is a file that the user needs to read at the next moment. Therefore, even if it is determined that the file access information of the first keyword comes from the file accessed for the first time, the storage device can use the aforementioned first keyword and the target access rule template to calculate the predicted file that has not been accessed, and calculate the predicted file Prefetch into the cache. It is helpful to improve the hit rate of the host acquiring data in the cache in the storage device.
根据第三方面,本申请实施例第三方面的第一种实施方式中,该预取装置存储有多个访问规律模板,该目标访问规律模板为该多个访问规律模板中与该第一关键词对应的访问规律模板,每个访问规律模板由采用基于文本语义的训练模型对多个初始关键词进行训练而得。According to the third aspect, in the first implementation manner of the third aspect of the embodiments of the present application, the prefetching device stores a plurality of access rule templates, and the target access rule template is the first key among the plurality of access rule templates. The access rule template corresponding to the word, each access rule template is obtained by training multiple initial keywords using a training model based on text semantics.
根据第三方面的第一种实施方式,本申请实施例第三方面的第二种实施方式中,该目标访问规律模板包含该第一关键词和/或第一特征关联词,该第一特征关联词用于指示该第一关键词符合的特征。According to the first implementation manner of the third aspect, in the second implementation manner of the third aspect of the embodiments of the present application, the target access rule template includes the first keyword and/or the first characteristic related word, and the first characteristic related word It is used to indicate the characteristics of the first keyword.
根据第三方面的第二种实施方式,本申请实施例第三方面的第三种实施方式中,该目标访问规律模板还包含映射关系,该映射关系用于指示该多个初始关键词之间的关联方式,该多个初始关键词包括该第一关键词;该计算模块,具体用于将该第一关键词作为该映射关系的输入,采用该映射关系对该第一关键词进行计算,输出该目标关键词,该目标关键词符合该第一特征关联词指示的该第一关键词的特征。According to the second implementation manner of the third aspect, in the third implementation manner of the third aspect of the embodiments of the present application, the target access rule template further includes a mapping relationship, and the mapping relationship is used to indicate the relationship between the plurality of initial keywords The multiple initial keywords include the first keyword; the calculation module is specifically configured to use the first keyword as the input of the mapping relationship, and use the mapping relationship to calculate the first keyword, The target keyword is output, and the target keyword meets the characteristics of the first keyword indicated by the first characteristic associated word.
根据第三方面的第二种实施方式或第三方面的第三种实施方式,本申请实施例第三方面的第四种实施方式中,该计算模块,还用于:判断访问规律模板是否包含该第一关键词;若访问规律模板包含该第一关键词,则确定该访问规律模板为该目标访问规律模板;若访问规律模板不包含该第一关键词,则进一步判断该第一关键词是否符合该访问规律模板中的特征关联词指示的特征;若该第一关键词符合访问规律模板中的特征关联词指示的特征,则确定访问规律模板为该目标访问规律模板。According to the second implementation manner of the third aspect or the third implementation manner of the third aspect, in the fourth implementation manner of the third aspect of the embodiments of the present application, the calculation module is further used to determine whether the access rule template contains The first keyword; if the access pattern template contains the first keyword, the access pattern template is determined to be the target access pattern template; if the access pattern template does not contain the first keyword, the first keyword is further determined Whether it conforms to the characteristics indicated by the characteristic related words in the visit rule template; if the first keyword matches the characteristics indicated by the characteristic related words in the visit rule template, then the visit rule template is determined to be the target visit rule template.
根据第三方面的第二种实施方式或第三方面的第三种实施方式,本申请实施例第三方面的第五种实施方式中,该读请求携带的文件访问信息还用于生成第二关键词,该第二关键词和该第一关键词均为与该目标访问规律模板匹配的关键词;该计算模块,还用于:判 断访问规律模板是否包含该第一关键词和该第二关键词;若该访问规律模板包含该第一关键词和该第二关键词,则确定该访问规律模板为该目标访问规律模板;若该访问规律模板仅包含该第一关键词和该第二关键词中的一个或不包含该第一关键词和该第二关键词,则进一步判断该第一关键词和该第二关键词是否均符合该访问规律模板中的特征关联词指示的特征;若该第一关键词和该第二关键词均符合该访问规律模板中的特征关联词指示的特征,则确定该访问规律模板为该目标访问规律模板。According to the second implementation manner of the third aspect or the third implementation manner of the third aspect, in the fifth implementation manner of the third aspect of the embodiments of the present application, the file access information carried in the read request is also used to generate a second Keywords, the second keyword and the first keyword are both keywords that match the target access rule template; the calculation module is also used to determine whether the access rule template contains the first keyword and the second keyword Keywords; if the visit rule template contains the first keyword and the second keyword, then the visit rule template is determined to be the target visit rule template; if the visit rule template only contains the first keyword and the second keyword If one of the keywords or does not include the first keyword and the second keyword, it is further determined whether the first keyword and the second keyword both conform to the characteristics indicated by the characteristic associated words in the access rule template; if Both the first keyword and the second keyword conform to the characteristics indicated by the characteristic associated words in the visit rule template, and the visit rule template is determined to be the target visit rule template.
根据第三方面、第三方面的第一种实施方式至第三方面的第五种实施方式,本申请实施例第三方面的第六种实施方式中,该文件访问信息为来自不同访问目录的文件的信息,该多个预测文件位于不同的访问目录。According to the third aspect and the first implementation manner of the third aspect to the fifth implementation manner of the third aspect, in the sixth implementation manner of the third aspect of the embodiments of the present application, the file access information is from different access directories File information, the multiple predicted files are located in different access directories.
根据第三方面、第三方面的第一种实施方式至第三方面的第五种实施方式,本申请实施例第三方面的第七种实施方式中,该预取装置还包括:模板生成模块,用于采用该基于文本语义的训练模型对该多个初始关键词进行训练得到该访问规律模板。According to the third aspect and the first implementation manner of the third aspect to the fifth implementation manner of the third aspect, in the seventh implementation manner of the third aspect of the embodiments of the present application, the prefetching device further includes: a template generation module , Is used to train the multiple initial keywords using the text semantic training model to obtain the access rule template.
根据第三方面的第七种实施方式,本申请实施例第三方面的第八种实施方式中,该模板生成模块,还用于对访问规律模板集合进行更新,该访问规律模板集合指前述位于该存储设备中的多个访问规律模板。该访问规律模板集合包括一个该目标访问规律模板和至少一个候选访问规律模板。According to the seventh implementation manner of the third aspect, in the eighth implementation manner of the third aspect of the embodiments of the present application, the template generation module is also used to update the access rule template set. The access rule template set refers to the aforementioned location Multiple access rule templates in the storage device. The visit rule template set includes one target visit rule template and at least one candidate visit rule template.
该模板生成模块,具体用于确定该目标访问规律模板与该至少一个候选访问规律模板中每个该候选访问规律模板的关联度,并将高于预设值的关联度对应的候选访问规律模板与该目标候选访问规律模板合并,得到更新的访问规律模板集合。The template generation module is specifically configured to determine the degree of association between the target access rule template and each of the at least one candidate access rule template, and compare the candidate access rule template corresponding to an association degree higher than a preset value Merge with the target candidate visit rule template to obtain an updated set of visit rule templates.
其中,该关联度用于指示该目标访问规律模板中的映射关系与该候选访问规律模板中的映射关系之间的相似程度。Wherein, the degree of association is used to indicate the degree of similarity between the mapping relationship in the target visit rule template and the mapping relationship in the candidate visit rule template.
第四方面,本申请实施例提供了一种智能芯片,该智能芯片位于前述实施方式中的存储设备中,该智能芯片用于对输入的样本数据进行训练以输出预测模型。例如,该智能芯片位于本申请实施例第三方面的第七种实施方式中的第一处理器中,该智能芯片可以采用该基于文本语义的训练模型对该多个初始关键词进行训练得到访问规律模板。又例如,该智能芯片位于本申请实施例第四方面的第七种实施方式中的模板生成模块中,该智能芯片可以采用该基于文本语义的训练模型对该多个初始关键词进行训练得到访问规律模板。该智能芯片还可以对访问规律模板集合进行更新。In a fourth aspect, an embodiment of the present application provides a smart chip, the smart chip is located in the storage device in the foregoing embodiment, and the smart chip is used to train the input sample data to output a prediction model. For example, the smart chip is located in the first processor in the seventh implementation of the third aspect of the embodiments of the present application, and the smart chip may use the text semantic-based training model to train the multiple initial keywords to obtain access Regular template. For another example, the smart chip is located in the template generation module in the seventh implementation of the fourth aspect of the embodiments of the present application, and the smart chip may use the text-based semantic training model to train the multiple initial keywords to obtain access Regular template. The smart chip can also update the access rule template collection.
从以上技术方案可以看出,本申请实施例具有以下优点:It can be seen from the above technical solutions that the embodiments of the present application have the following advantages:
本申请实施例中,采用第一关键词和目标访问规律模板可以生成目标关键词,然后,基于该目标关键词确定进行预取的预测文件。由于,该第一关键词和该目标关键词符合相同的特征,因此,可以认为该目标关键词指示的预测文件为用户在下一时刻需要读取的文件。因此,即使确定第一关键词的文件访问信息来自第一次访问的文件,该存储设备也可以通过前述第一关键词和目标访问规律模板推算出未访问过的预测文件,并将该预测文件预取至缓存中。有利于提升主机在存储设备中的缓存内获取数据的命中率。In the embodiment of the present application, the target keyword can be generated by using the first keyword and the target access rule template, and then the prediction file to be prefetched is determined based on the target keyword. Since the first keyword and the target keyword match the same characteristics, it can be considered that the predicted file indicated by the target keyword is a file that the user needs to read at the next moment. Therefore, even if it is determined that the file access information of the first keyword comes from the file accessed for the first time, the storage device can use the aforementioned first keyword and the target access rule template to calculate the predicted file that has not been accessed, and calculate the predicted file Prefetch into the cache. It is helpful to improve the hit rate of the host acquiring data in the cache in the storage device.
附图说明Description of the drawings
为了更清楚地说明本申请实施例的技术方案,下面将对实施例描述中所需要使用的附 图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例。In order to more clearly describe the technical solutions of the embodiments of the present application, the accompanying drawings required in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application.
图1A为本申请实施例中文件预取方法适用的一个系统架构图;FIG. 1A is a system architecture diagram to which the file prefetching method in an embodiment of the application is applicable;
图1B为本申请实施例中文件预取方法的一个流程图;FIG. 1B is a flowchart of a file prefetching method in an embodiment of this application;
图2A为本申请实施例中文件访问信息的一个实施例示意图;2A is a schematic diagram of an embodiment of file access information in an embodiment of this application;
图2B为本申请实施例中文件访问信息的另一个实施例示意图;2B is a schematic diagram of another embodiment of file access information in an embodiment of this application;
图3为本申请实施例中文件预取方法的另一个流程图;FIG. 3 is another flowchart of the file prefetching method in the embodiment of the application;
图4为本申请实施例中存储设备的一个实施例示意图;FIG. 4 is a schematic diagram of an embodiment of a storage device in an embodiment of the application;
图5为本申请实施例中预取装置的一个实施例示意图。FIG. 5 is a schematic diagram of an embodiment of a prefetching device in an embodiment of the application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if any) in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects, without having to use To describe a specific order or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances so that the embodiments described herein can be implemented in a sequence other than the content illustrated or described herein. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusions. For example, a process, method, system, product, or device that includes a series of steps or units is not necessarily limited to those clearly listed. Those steps or units may include other steps or units that are not clearly listed or are inherent to these processes, methods, products, or equipment.
下面先对本申请实施例所提出的文件预取方法适用的系统架构和应用场景进行介绍:The following first introduces the applicable system architecture and application scenarios of the file prefetching method proposed in the embodiments of the present application:
如图1A所示,为本申请实施例提供的系统架构图。本实施例提供的存储系统包括主机01、控制器00和多个硬盘02。其中,主机01和控制器00之间通过网络文件系统(network file system,NFS)/通用网络文件系统(common internet file system,CIFS)协议或者光纤通道(fiber channel,FC)协议进行通信。该控制器00包括处理器001和缓存002。具体的,主机01可以向控制器00发送写数据请求(简称为写请求)。该控制器00接收该写数据请求之后将该写数据请求携带的数据写入硬盘02中。另外,主机01还可以向控制器00发送读数据请求(简称为读请求)。该控制器00接收该读数据请求之后,根据该读数据请求中的地址查找其缓存002中是否保存有待读取的数据。如果有,则直接将该待读取的数据(即后文将介绍的预测文件)发送给主机01;如果没有,则从硬盘02中获取该数据并发送给主机01。在实际应用中,控制器00和前述多个硬盘02可以集成在本申请实施例提出的存储设备中,也可以作为两个相互独立的设备构成本申请实施例提出的存储设备。As shown in FIG. 1A, it is a system architecture diagram provided by an embodiment of this application. The storage system provided in this embodiment includes a host 01, a controller 00, and multiple hard disks 02. Among them, the host 01 and the controller 00 communicate through a network file system (NFS)/common internet file system (CIFS) protocol or a fiber channel (FC) protocol. The controller 00 includes a processor 001 and a cache 002. Specifically, the host 01 may send a data write request (referred to as a write request for short) to the controller 00. After the controller 00 receives the data write request, the data carried in the data write request is written into the hard disk 02. In addition, the host 01 can also send a data read request (referred to as a read request for short) to the controller 00. After the controller 00 receives the data read request, it searches for the data to be read in the buffer 002 according to the address in the data read request. If there is, directly send the data to be read (that is, the prediction file described later) to the host 01; if not, obtain the data from the hard disk 02 and send it to the host 01. In practical applications, the controller 00 and the aforementioned multiple hard disks 02 may be integrated in the storage device proposed in the embodiment of the present application, or may be used as two mutually independent devices to form the storage device proposed in the embodiment of the present application.
而在本申请实施例中,为该处理器001配置了新的功能,或者,在该控制器00中增加一个或多个AI芯片003,以使得该文件预取方法中的处理器001可以对主机01第一次访问的文件进行预测,以实现对未访问过的文件进行预取,有利于主机01在访问新文件时提升读请求的命中率。In the embodiment of the present application, a new function is configured for the processor 001, or one or more AI chips 003 are added to the controller 00, so that the processor 001 in the file prefetching method can The file that the host 01 accesses for the first time is predicted to implement prefetching of unvisited files, which is beneficial to the host 01 to increase the hit rate of read requests when accessing new files.
基于前述系统架构和应用场景,下面将对前述文件预取方法的主要流程进行介绍,具 体如图1B所示,该存储设备将执行如下步骤:Based on the foregoing system architecture and application scenarios, the main process of the foregoing file prefetching method will be introduced below. As shown in Figure 1B, the storage device will perform the following steps:
101、根据读请求携带的文件访问信息生成第一关键词。101. Generate a first keyword according to the file access information carried in the read request.
本实施例中,当主机(例如图1A中的主机01)需要访问存储介质中的某文件时,该主机将会下发读请求。其中,该读请求用于指示需要访问的文件,并且,该读请求将携带该文件的文件访问信息。该文件访问信息指与该需要访问的文件相关的信息。可选的,该文件访问信息可以包括文件属性信息和访问属性信息。其中,文件属性信息与该文件相关而不受该文件是否被访问影响,例如,文件名称、文件类型、文件创建者名称以及访问目录等信息。该访问属性信息指本次访问操作涉及的信息,对同一文件不同次访问生成的访问属性信息不同。例如,访问时间以及文件访问者名称等。为便于理解,以图2A为例对前述文件访问信息进行介绍。如图2A所示,为一个文件夹,该文件夹中包含四个文件。以左下角的文件为例,该文件的文件名称为“2020年五年级语文模拟试卷3”;文件类型为文本类型(即word类型,也可以被称为word文档);文件创建者为“张老师”;访问目录为“C:\期末模拟试卷\语文”,该访问目录也可以被称为存储路径或存储地址,即用于指示被访问的文件存储于哪个文件夹中。此外,访问时间以及文件访问者名称等信息在图2A中并未示出,该访问时间以及文件访问者名称等信息可以隐含于前述读请求中。In this embodiment, when a host (such as host 01 in FIG. 1A) needs to access a certain file in the storage medium, the host will issue a read request. Wherein, the read request is used to indicate the file to be accessed, and the read request will carry file access information of the file. The file access information refers to information related to the file that needs to be accessed. Optionally, the file access information may include file attribute information and access attribute information. Among them, the file attribute information is related to the file and is not affected by whether the file is accessed, for example, the file name, file type, file creator name, and access directory information. The access attribute information refers to the information involved in this access operation, and the access attribute information generated for different accesses to the same file is different. For example, the access time and the name of the file visitor. For ease of understanding, the aforementioned file access information is introduced by taking FIG. 2A as an example. As shown in Figure 2A, it is a folder containing four files. Take the file in the lower left corner as an example. The file name of the file is "2020 5th grade language simulation test paper 3"; the file type is text type (that is, the word type, which can also be called a word document); the file creator is "Zhang Teacher"; the access directory is "C:\Final Test Paper\Chinese", the access directory can also be called the storage path or storage address, which is used to indicate which folder the accessed file is stored in. In addition, information such as the access time and the name of the file visitor is not shown in FIG. 2A, and the information such as the access time and the name of the file visitor may be implicit in the aforementioned read request.
可选的,该文件访问信息为来自不同访问目录的文件的信息,也就是说,该读请求可以携带针对不同文件的文件访问信息,前述不同文件来自不同的访问目录。以图2A和图2B为例,该读请求用于指示读取访问目录为“C:\期末模拟试卷\语文”下且文件名称为“2020年五年级语文模拟试卷3”的文件(具体如图2A所示),以及访问目录为“C:\期末模拟试卷答案\语文”下且文件名称为“2020年五年级语文模拟试卷答案3”的文件(具体如图2B所示)。Optionally, the file access information is information of files from different access directories, that is, the read request may carry file access information for different files, and the aforementioned different files are from different access directories. Taking Figure 2A and Figure 2B as an example, the reading request is used to instruct to read the file whose access directory is "C:\Final Mock Paper\Language" and the file name is "2020 Fifth Grade Chinese Mock Paper 3" (specifically as Figure 2A), and the access directory is "C:\final mock test paper answer\language" and the file name is "2020 fifth grade language mock test answer 3" (specifically shown in Figure 2B).
此时,存储设备中的处理器(例如图1A中的处理器001)将会获取该读请求携带的文件访问信息,并基于该文件访问信息生成第一关键词。其中,该第一关键词指该存储设备基于前述文件访问信息对其中一部分文件访问信息进行分割或组合而构成的词。可选的,该第一关键词可以是一个词,也可以为一个词组,具体此处不做限定。可选的,该第一关键词可以是由文件名称分割而成,例如,文件名称为“2020年五年级语文模拟试卷3”的文件的第一关键词可以为“2020年”、“五年级”、“语文模拟试卷”或“3”。可选的,该第一关键词可以是由文件名称分割后又组合而成,例如,“2020-五-3”。可选的,该第一关键词可以是由不同的文件访问信息组成的词组,例如,{“五年级语文”,“张老师”,“word”}。此外,该第一关键词还可以是由访问目录组成的词组,例如,{“期末模拟试卷”,“语文”}。在实际应用中,可以根据具体需求对前述第一关键词进行调整,具体本申请实施例对第一关键词的具体形式不做限定。At this time, the processor in the storage device (for example, the processor 001 in FIG. 1A) will obtain the file access information carried in the read request, and generate the first keyword based on the file access information. Wherein, the first keyword refers to a word formed by the storage device dividing or combining part of the file access information based on the aforementioned file access information. Optionally, the first keyword may be a word or a phrase, which is not specifically limited here. Optionally, the first keyword may be split from the file name. For example, the first keyword of a file whose file name is "2020 fifth grade Chinese simulation test paper 3" can be "2020" and "fifth grade". ", "Language mock test paper" or "3". Optionally, the first keyword may be formed by dividing the file name and then combining it, for example, "2020-May-3". Optionally, the first keyword may be a phrase composed of different file access information, for example, {"fifth grade Chinese", "Mr. Zhang", "word"}. In addition, the first keyword may also be a phrase composed of an access directory, for example, {"final mock test paper", "language"}. In practical applications, the aforementioned first keyword can be adjusted according to specific needs, and the specific embodiment of the present application does not limit the specific form of the first keyword.
102、采用该第一关键词和目标访问规律模板生成目标关键词。102. Use the first keyword and the target access rule template to generate target keywords.
本实施例中,当该存储设备确定第一关键词之后,该存储设备可以基于前述第一关键词和目标访问规律模板生成目标关键词。其中,该目标关键词与该第一关键词符合相同的特征,可以理解为,该目标关键词指示的事物与该第一关键词指示的事物具有相同的特征,并且,基于第一关键词和该第一关键词符合的特征容易联想到该目标关键词。也可以理解为,该目标关键词与第一关键词的词性相同,该第一关键词与该目标关键词的词义相近或 相关。可选的,该第一关键词与该目标关键词均为数量词,例如,第一关键词为“3”,目标关键词为“4”;又例如,第一关键词为“第一节”,目标关键词为“第二节”。可选的,该第一关键词与该目标关键词互为近义词或反义词,例如,第一关键词为“上册”,目标关键词为“下册”;又例如,第一关键词为“试题”,目标关键词为“答案”。该目标关键词用于指示预测文件。也可以理解为,该目标关键词包含于该预测文件的文件属性信息中,例如,该目标关键词为该预测文件的文件名称、文件类型或者文件创建者名称等,具体此处不做限定。In this embodiment, after the storage device determines the first keyword, the storage device may generate the target keyword based on the aforementioned first keyword and the target access rule template. Wherein, the target keyword and the first keyword match the same characteristics, it can be understood that the thing indicated by the target keyword has the same characteristics as the thing indicated by the first keyword, and is based on the first keyword and The characteristics of the first keyword are easily associated with the target keyword. It can also be understood that the part of speech of the target keyword is the same as that of the first keyword, and the meaning of the first keyword is similar to or related to the target keyword. Optionally, the first keyword and the target keyword are both quantitative words, for example, the first keyword is "3" and the target keyword is "4"; for another example, the first keyword is "first section" , The target keyword is "Section 2". Optionally, the first keyword and the target keyword are synonyms or antonyms for each other. For example, the first keyword is "Book 1" and the target keyword is "Book 2"; for another example, the first keyword is "Exam Questions" ", the target keyword is "answer". The target keyword is used to indicate the prediction file. It can also be understood that the target keyword is included in the file attribute information of the prediction file. For example, the target keyword is the file name, file type, or file creator name of the prediction file, etc., which is not specifically limited here.
此外,该目标访问规律模板为多个访问规律模板中与该第一关键词对应的访问规律模板。该目标访问规律模板用于对将该第一关键词作为输入,对该第一关键词按照一定规律计算,以输出该目标关键词。也可以将该目标访问规律模板看作一个综合的计算模型,输入第一关键词即可输出目标关键词。In addition, the target access rule template is an access rule template corresponding to the first keyword among the plurality of access rule templates. The target access rule template is used to input the first keyword and calculate the first keyword according to a certain rule to output the target keyword. The target access rule template can also be regarded as a comprehensive calculation model, and the target keyword can be output by inputting the first keyword.
103、将该目标关键词指示的预测文件预取至缓存中。103. Prefetch the prediction file indicated by the target keyword into the cache.
本实施例中,当该存储设备确定该目标关键词之后,该存储设备可以采用该目标关键词在硬盘(例如图1A中的硬盘02)中查找到该预测文件,并将该预测文件预取至缓存(例如图1A中的缓存002)中。可选的,该预取文件可以为某一个文件,也可以为某一组关联的文件,具体此处不做限定。为便于理解,依然以前述图2A和图2B为例进行介绍。假设,该读请求待访问的文件的文件名称为“2020年五年级语文模拟试卷3”,并且,该第一关键词为词组{“期末模拟试卷”,“语文”,“3”}。此时,若经该目标访问规律模板预测,该目标关键词为词组{“期末模拟试卷答案”,“语文”,“3”}。此时,该存储设备可以确定该预测文件的文件名称为“2020年五年级语文模拟试卷答案3”。然后,该存储设备可以将该预测文件从硬盘中读取至该缓存中。In this embodiment, after the storage device determines the target keyword, the storage device can use the target keyword to find the prediction file in a hard disk (for example, hard disk 02 in FIG. 1A), and prefetch the prediction file To the cache (for example, cache 002 in FIG. 1A). Optionally, the prefetched file may be a certain file or a certain group of related files, and the specifics are not limited here. For ease of understanding, the description will still be made by taking the aforementioned FIG. 2A and FIG. 2B as an example. Assume that the file name of the file to be accessed by the reading request is "2020 fifth grade language mock test paper 3", and the first keyword is the phrase {"final mock test paper", "language", "3"}. At this time, if predicted by the target access rule template, the target keyword is the phrase {"final simulation test paper answer", "language", "3"}. At this time, the storage device can determine that the file name of the prediction file is "2020 fifth grade Chinese simulation test paper answer 3". Then, the storage device can read the prediction file from the hard disk to the cache.
本实施例中,采用第一关键词和目标访问规律模板可以生成目标关键词,然后,基于该目标关键词确定进行预取的预测文件。由于,该第一关键词和该目标关键词符合相同的特征,因此,可以认为该目标关键词指示的预测文件为用户在下一时刻需要读取的文件。因此,即使确定第一关键词的文件访问信息来自第一次访问的文件,该存储设备也可以通过前述第一关键词和目标访问规律模板推算出未访问过的预测文件,并将该预测文件预取至缓存中。有利于提升主机在存储设备中的缓存内获取数据的命中率。In this embodiment, the target keyword can be generated using the first keyword and the target access rule template, and then the prediction file to be prefetched is determined based on the target keyword. Since the first keyword and the target keyword match the same characteristics, it can be considered that the predicted file indicated by the target keyword is a file that the user needs to read at the next moment. Therefore, even if it is determined that the file access information of the first keyword comes from the file accessed for the first time, the storage device can use the aforementioned first keyword and the target access rule template to calculate the predicted file that has not been accessed, and calculate the predicted file Prefetch into the cache. It is helpful to improve the hit rate of the host acquiring data in the cache in the storage device.
基于前述实施例,对前述文件预取方法的进行进一步介绍,具体如图3所示,该存储设备将执行如下步骤:Based on the foregoing embodiment, the foregoing file prefetching method is further introduced. Specifically, as shown in FIG. 3, the storage device will perform the following steps:
301a、采用基于文本语义的训练模型对多个初始关键词进行训练得到访问规律模板。301a. Use a training model based on text semantics to train multiple initial keywords to obtain an access rule template.
本实施例中,该存储设备可以预先训练出访问规律模板,以使得当有读请求时,该存储设备可以基于该读请求携带的文件访问信息生成第一关键词。具体地,该存储设备可以采用基于文本语义的训练模型对多个初始关键词进行训练得到访问规律模板。该访问规律模板一般存储于该存储设备中的缓存(例如,图1A中的缓存002)中。可选的,该存储设备还可以将该访问规律模板备份至硬盘(例如,图1A中的硬盘02)中,以供后续随时调用。In this embodiment, the storage device can pre-train the access rule template, so that when there is a read request, the storage device can generate the first keyword based on the file access information carried by the read request. Specifically, the storage device may use a training model based on text semantics to train a plurality of initial keywords to obtain an access rule template. The access rule template is generally stored in a cache in the storage device (for example, cache 002 in FIG. 1A). Optionally, the storage device may also back up the access rule template to a hard disk (for example, hard disk 02 in FIG. 1A) for subsequent recall at any time.
其中,该基于文本语义的训练模型指该训练模型可以识别文本语义,并基于文本语义 进行联想训练。在实际应用中,该基于文本语义的训练模型有多种,例如,基于文本语义的聚类模型,该模型可以将关键词按照语义进行分类。又例如,基于神经网络文本语义训练模型,该模型可以包括如下功能:语义文本相似性(semantic textual similarity)功能,即用于度量文本配对片段(例如,前述第一关键词)的基本语义中的等价程度;释义识别(paraphrase identification,PI),即识别两个词(例如,前述第一关键词和目标关键词)是否表达相同的含义;自然语言推理(natural language inference),用于解析假设和前提之间的语义相似性等等。又例如,基于文本语义的朴素贝叶斯模型,该模型可以根据多个关键词在推理过程中出现的概率进行训练。Among them, the training model based on text semantics means that the training model can recognize text semantics and perform association training based on text semantics. In practical applications, there are many types of training models based on text semantics, for example, a clustering model based on text semantics, which can classify keywords according to semantics. For another example, based on a neural network text semantic training model, the model may include the following functions: semantic textual similarity function, which is used to measure the basic semantics of text matching segments (for example, the aforementioned first keyword) Equivalence degree; Paraphrase identification (PI), that is, to identify whether two words (for example, the first keyword and target keyword mentioned above) express the same meaning; natural language inference, used to parse hypotheses The semantic similarity between the premise and the premise, etc. For another example, a naive Bayesian model based on text semantics can be trained based on the probability of multiple keywords appearing in the inference process.
该初始关键词与前述实施例中的第一关键词类似,该初始关键词可以采用生成第一关键词的方式生成,例如,基于历史某一次或多次读请求携带的相关信息生成前述初始关键词。具体请参阅前述步骤101。此外,该初始关键词还可以由用户根据实际需求预设。具体地,该初始关键词可以是一个词或一个词组,可以是名词、形容词以及数量词等,具体此处不做限定。The initial keyword is similar to the first keyword in the foregoing embodiment. The initial keyword can be generated by generating the first keyword. For example, the foregoing initial keyword is generated based on the relevant information carried by a certain historical read request or multiple times. word. For details, please refer to step 101 above. In addition, the initial keyword can also be preset by the user according to actual needs. Specifically, the initial keyword may be a word or a phrase, and may be a noun, an adjective, a quantifier, etc. The specific keyword is not limited here.
应当理解的是,该存储设备可以多次执行前述步骤301a以生成多个访问规律模板,前述多个访问规律模板可以组成一个集合,在本申请实施例中称之为访问规律模板集合。此外,该存储设备也可以时刻对该访问规律模板集合中的各个访问规律模板进行更新。可选的,该访问规律模板集合可以作为存储设备的内部数据直接存储于该存储设备中。在这种情况下,不同的存储设备内部包含的访问规律模板集合可能不同,此外,同一访问规律模板在不同时刻也可能存在差异。可选的,该访问规律模板集合可以集成于一个芯片中供该存储设备调用。例如,该存储设备可以将该访问规律模板集合集成于AI芯片中。此时,该AI芯片(例如图1A中的AI芯片003)可以位于存储设备中的控制器内(例如,图1A中的控制器00内)处理器外,该处理器通过特定的接口对该AI芯片进行连接调用。It should be understood that the storage device may execute the foregoing step 301a multiple times to generate multiple access rule templates, and the foregoing multiple access rule templates may form a set, which is referred to as an access rule template set in the embodiment of the present application. In addition, the storage device can also update each access rule template in the access rule template set at any time. Optionally, the access rule template set may be directly stored in the storage device as internal data of the storage device. In this case, the set of access rule templates contained in different storage devices may be different, and in addition, the same access rule template may also be different at different times. Optionally, the access rule template set can be integrated in a chip for the storage device to call. For example, the storage device may integrate the access rule template set into the AI chip. At this time, the AI chip (for example, AI chip 003 in FIG. 1A) may be located in a controller in a storage device (for example, in controller 00 in FIG. The AI chip makes a connection call.
301b、根据读请求携带的文件访问信息生成第一关键词。301b. Generate a first keyword according to the file access information carried in the read request.
其中,该文件访问信息为与该读请求指示的文件相关的信息。具体前述步骤101已进行详细介绍,此处不再赘述。Wherein, the file access information is information related to the file indicated by the read request. The specific step 101 has been introduced in detail, and will not be repeated here.
可选的,该存储设备还可以基于该读请求携带的文件访问信息生成第二关键词。该第二关键词与该第一关键词不同,该存储设备将采用前述第一关键词和第二关键词进行后续的预测步骤。此外,该第二关键词也可以为来自另一份文件的关键词。Optionally, the storage device may also generate a second keyword based on the file access information carried in the read request. The second keyword is different from the first keyword, and the storage device will use the aforementioned first keyword and second keyword to perform subsequent prediction steps. In addition, the second keyword may also be a keyword from another document.
应当理解的是,步骤301a与步骤301b之间是相互独立的。也就是说,当该存储设备基于该文件访问信息生成第一关键词的过程中,该存储设备也在利用初始关键词训练访问规律模板以生成访问规律模板集合。因此,前述步骤301a与步骤301b之间无明确的时间先后顺序的限定,可以理解为,该存储设备同时执行前述步骤301a和步骤301b。It should be understood that step 301a and step 301b are independent of each other. That is to say, when the storage device generates the first keyword based on the file access information, the storage device also uses the initial keyword to train the access rule template to generate the access rule template set. Therefore, there is no clear time sequence limitation between the foregoing step 301a and step 301b, and it can be understood that the storage device executes the foregoing step 301a and step 301b at the same time.
302、采用该第一关键词在该访问规律模板集合中确定目标访问规律模板。302. Use the first keyword to determine a target visit rule template in the set of visit rule templates.
本实施例中,当该存储设备确定了第一关键词和访问规律模板集合之后,该存储设备将基于该第一关键词在该访问规律模板集合中查找与该第一关键词对应的目标访问规律模板。In this embodiment, after the storage device determines the first keyword and the access rule template set, the storage device will search for the target visit corresponding to the first keyword in the access rule template set based on the first keyword. Regular template.
本实施例中,将规律模板中的多个初始关键词中的每个初始关键词均符合的特征用文字描述为特征关联词。该特征关联词可以是前述多个初始关键词的共同属性。例如,若某 访问规律模板包括“3”、“5”和“7”这三个初始关键,则前述多个初始关键词的特征关联词可以为“奇数”、“正整数”以及“质数”等。In this embodiment, the feature that each of the multiple initial keywords in the regular template matches each initial keyword is described as a feature related word. The characteristic associated word may be the common attribute of the aforementioned multiple initial keywords. For example, if a certain access rule template includes three initial keys of "3", "5" and "7", the characteristic related words of the aforementioned multiple initial keywords can be "odd", "positive integer", "prime number", etc. .
由于,该目标访问规律模板为与该第一关键词对应的访问规律模板,因此,该第一关键词应当符合该访问规律模板中的特征关联词指示的特征。因此,本实施例中的目标访问规律模板包含该第一关键词和/或第一特征关联词,其中,该第一特征关联词用于指示该第一关键词符合的特征。也就是说,该目标访问规律模板可以仅包含第一关键词,也可以仅包含第一特关联词,还可以包含前述第一关键词和第一特征关联词。具体此处不做限定。Since the target visit rule template is an visit rule template corresponding to the first keyword, the first keyword should conform to the characteristics indicated by the characteristic associated words in the visit rule template. Therefore, the target access rule template in this embodiment includes the first keyword and/or the first characteristic related words, where the first characteristic related words are used to indicate the characteristics that the first keyword conforms to. That is to say, the target access rule template may only include the first keyword, or may only include the first special related word, and may also include the aforementioned first keyword and the first characteristic related word. The details are not limited here.
具体地,该存储设备可以采用如下方式基于该第一关键词从访问规律模板集合中查找该目标访问规律模板:Specifically, the storage device may search for the target access pattern template from the set of access pattern templates based on the first keyword in the following manner:
该存储设备将遍历前述访问规律模板集合中的每个访问规律模板。在该遍历过程中,该存储设备将判断访问规律模板是否包含该第一关键词。若该访问规律模板包含该第一关键词,则该存储设备确定该访问规律模板为该目标访问规律模板。然后,该存储设备将跳过后续判断,直接执行步骤303。若该访问规律模板不包含该第一关键词,则该存储设备将进一步判断该第一关键词是否符合该访问规律模板中的特征关联词指示的特征。若该第一关键词符合该访问规律模板中的特征关联词指示的特征,则确定该访问规律模板为该目标访问规律模板。然后,该存储设备执行步骤303。The storage device will traverse each access regularity template in the aforementioned access regularity template set. During the traversal process, the storage device will determine whether the access rule template contains the first keyword. If the access rule template includes the first keyword, the storage device determines that the access rule template is the target access rule template. Then, the storage device will skip the subsequent judgment and directly execute step 303. If the access rule template does not include the first keyword, the storage device will further determine whether the first keyword meets the characteristics indicated by the feature associated words in the access rule template. If the first keyword meets the characteristics indicated by the characteristic associated words in the visit rule template, then the visit rule template is determined to be the target visit rule template. Then, the storage device executes step 303.
可选的,若存储设备基于前述读请求携带的文件访问信息生成了两组关键词,例如,第一关键词和第二关键词。此时,该存储设备需要在该访问规律模板集合中查找与该第二关键词和该第一关键词均匹配的目标访问规律模板。Optionally, if the storage device generates two sets of keywords based on the file access information carried in the aforementioned read request, for example, the first keyword and the second keyword. At this time, the storage device needs to search for a target access rule template that matches both the second keyword and the first keyword in the set of access rule templates.
具体地,该存储设备可以采用如下方式基于该第一关键词和该第二关键词从访问规律模板集合中查找该目标访问规律模板:Specifically, the storage device may search for the target access rule template from the access rule template collection based on the first keyword and the second keyword in the following manner:
该存储设备将遍历前述访问规律模板集合中的每个访问规律模板。在该遍历过程中,该存储设备将判断访问规律模板是否包含该第一关键词和该第二关键词。若该访问规律模板包含该第一关键词和该第二关键词,则该存储设备确定该访问规律模板为该目标访问规律模板。然后,该存储设备将跳过后续判断,直接执行步骤303。若该访问规律模板仅包含该第一关键词和该第二关键词中的一个或不包含该第一关键词和该第二关键词,则该存储设备进一步判断该第一关键词和该第二关键词是否均符合该访问规律模板中的特征关联词指示的特征。若该第一关键词和该第二关键词均符合该访问规律模板中的特征关联词指示的特征,则该存储设备确定该访问规律模板为该目标访问规律模板。然后,该存储设备执行步骤303。The storage device will traverse each access regularity template in the aforementioned access regularity template set. During the traversal process, the storage device will determine whether the access rule template contains the first keyword and the second keyword. If the visit rule template includes the first keyword and the second keyword, the storage device determines that the visit rule template is the target visit rule template. Then, the storage device will skip the subsequent judgment and directly execute step 303. If the access rule template only includes one of the first keyword and the second keyword or does not include the first keyword and the second keyword, the storage device further determines the first keyword and the second keyword. Whether the two keywords both conform to the characteristics indicated by the characteristic associated words in the access rule template. If the first keyword and the second keyword both conform to the characteristics indicated by the feature associated words in the access rule template, the storage device determines that the access rule template is the target access rule template. Then, the storage device executes step 303.
此外,该目标访问规律模板还包含映射关系,该映射关系用于指示该多个初始关键词之间的关联方式。当采用该多个初始关键词中的某个初始关键词作为该映射关系的输入时,可以输出另一个关键词,并且,该另一个关键词也为前述多个初始关键词中的某一个初始关键词。该关联方式可以是函数式或映射表。例如,若多个初始关键词分别为“3”、“5”和“7”,则该映射关系可以是函数“y=x+2”。例如,若多个初始关键词分别为“语文试卷上册”、“语文试卷中册”和“语文试卷下册”,该映射关系可以是试卷与答案对应的映射表。In addition, the target access rule template also includes a mapping relationship, and the mapping relationship is used to indicate an association manner between the plurality of initial keywords. When an initial keyword among the multiple initial keywords is used as the input of the mapping relationship, another keyword can be output, and the other keyword is also one of the foregoing initial keywords. Key words. The association method can be a function or a mapping table. For example, if the multiple initial keywords are "3", "5" and "7" respectively, the mapping relationship may be the function "y=x+2". For example, if the multiple initial keywords are "Language Test Paper Volume 1", "Language Test Paper Volume 2", and "Language Test Paper Volume 2", the mapping relationship may be a mapping table corresponding to the test paper and the answer.
303、将该第一关键词作为该映射关系的输入,采用该映射关系对该第一关键词进行计 算,输出该目标关键词。303. Use the first keyword as the input of the mapping relationship, calculate the first keyword using the mapping relationship, and output the target keyword.
其中,该目标关键词与该第一关键词符合相同的特征。具体地,该目标关键词符合该第一特征关联词指示的该第一关键词的特征。该目标关键词用于指示预测文件。具体地,该第一关键词和该目标关键词在前文步骤102中已进行详细介绍,具体此处不再赘述。Wherein, the target keyword and the first keyword match the same characteristics. Specifically, the target keyword meets the characteristics of the first keyword indicated by the first characteristic associated word. The target keyword is used to indicate the prediction file. Specifically, the first keyword and the target keyword have been described in detail in step 102 above, and the details are not repeated here.
本实施例中,由于,该目标访问规律模板与该第一关键词对应,因此,该多个初始关键词可以包括该第一关键词,或者该第一关键词也可以适用于前述映射关系。因此,该存储设备可以将该第一关键词作为该映射关系的输入,采用该映射关系对该第一关键词进行计算,输出该目标关键词。例如,若该第一关键词为“7”,该映射关系为函数“y=x+2”,则输出目标关键词“9”。例如,若多个初始关键词分别为“语文试卷上册”、“语文试卷中册”和“语文试卷下册”,该映射关系可以是试卷与答案对应的映射表。则输入“语文试卷上册”则将输出“语文试卷答案上册”。在实际应用中,还存在多种实例,具体此处不再赘述。In this embodiment, since the target access rule template corresponds to the first keyword, the plurality of initial keywords may include the first keyword, or the first keyword may also be applicable to the foregoing mapping relationship. Therefore, the storage device can use the first keyword as the input of the mapping relationship, calculate the first keyword using the mapping relationship, and output the target keyword. For example, if the first keyword is "7" and the mapping relationship is the function "y=x+2", then the target keyword "9" is output. For example, if the multiple initial keywords are "Language Test Paper Volume 1", "Language Test Paper Volume 2", and "Language Test Paper Volume 2", the mapping relationship may be a mapping table corresponding to the test paper and the answer. If you enter "Language Test Paper Volume 1", it will output "Language Test Paper Answer Volume 1". In practical applications, there are still many examples, and the details are not repeated here.
304、将该目标关键词指示的预测文件预取至缓存中。304. Prefetch the prediction file indicated by the target keyword into the cache.
本实施例中,步骤304与前述步骤103类似,具体请参阅步骤103,此处不再赘述。In this embodiment, step 304 is similar to the aforementioned step 103. For details, please refer to step 103, which will not be repeated here.
本实施例中,将第一关键词作为目标访问规律模板中的映射关系的输入可以输出目标关键词,然后,基于该目标关键词确定进行预取的预测文件。由于,该第一关键词和该目标关键词符合相同的特征,因此,可以认为该目标关键词指示的预测文件为用户在下一时刻需要读取的文件。因此,即使确定第一关键词的文件访问信息来自第一次访问的文件,该存储设备也可以通过前述第一关键词和目标访问规律模板推算出未访问过的预测文件,并将该预测文件预取至缓存中。有利于提升主机在存储设备中的缓存内获取数据的命中率。In this embodiment, the first keyword is used as the input of the mapping relationship in the target access rule template to output the target keyword, and then the prediction file to be prefetched is determined based on the target keyword. Since the first keyword and the target keyword match the same characteristics, it can be considered that the predicted file indicated by the target keyword is a file that the user needs to read at the next moment. Therefore, even if it is determined that the file access information of the first keyword comes from the file accessed for the first time, the storage device can use the aforementioned first keyword and the target access rule template to calculate the predicted file that has not been accessed, and calculate the predicted file Prefetch into the cache. It is helpful to improve the hit rate of the host acquiring data in the cache in the storage device.
可选的,在前述步骤301a中,由于该存储设备一直处于对该访问规律模板集合进行更新维护的状态。当基于该第一关键词查找到与该第一关键词对应的目标访问规律模板时,该存储设备将结合该目标访问规律模板和该访问规律模板集合中的其他访问规律模板对该访问规律模板集合进行更新维护。为便于介绍,称该访问规律模板集合包括一个该目标访问规律模板和至少一个候选访问规律模板。具体地,该存储设备确定该目标访问规律模板与至少一个候选访问规律模板中每个该候选访问规律模板的关联度,其中,该关联度用于指示该目标访问规律模板中的映射关系与该候选访问规律模板中的映射关系之间的相似程度。然后,该存储设备将高于预设值的关联度对应的候选访问规律模板与该目标候选访问规律模板合并,得到更新的访问规律模板集合。Optionally, in the foregoing step 301a, the storage device is always in a state of updating and maintaining the access rule template set. When a target access rule template corresponding to the first keyword is found based on the first keyword, the storage device will combine the target access rule template and other access rule templates in the access rule template set to the access rule template. The collection is updated and maintained. For ease of introduction, it is said that the access rule template set includes one target access rule template and at least one candidate access rule template. Specifically, the storage device determines the degree of association between the target access rule template and each candidate access rule template in at least one candidate access rule template, where the association degree is used to indicate the mapping relationship between the target access rule template and the candidate access rule template. The degree of similarity between the mapping relationships in the candidate access pattern template. Then, the storage device merges the candidate visit rule template corresponding to the degree of association higher than the preset value with the target candidate visit rule template to obtain an updated set of visit rule templates.
本实施方式中,提出该存储设备中的访问规律模板集合可以基于与该第一关键词对应的目标访问规律模板进行更新,即该存储设备在基于第一关键词确定了预测文件之后,再更新该存储设备中的访问规律模板集合。这样的实施方式,可以使得经过多次更新的访问规律模板集合中的各个访问规律模板可以更准确地基于输入的关键词确定更准确的预测文件,进而使得将前述预测文件预取至缓存后可以提高主机下发读请求的命中率。In this embodiment, it is proposed that the access rule template set in the storage device can be updated based on the target access rule template corresponding to the first keyword, that is, the storage device updates the predicted file based on the first keyword. A collection of access rule templates in the storage device. Such an implementation manner can enable each access rule template in the set of multiple updated access rule templates to more accurately determine a more accurate prediction file based on the input keywords, so that the aforementioned prediction file can be prefetched to the cache. Improve the hit rate of read requests sent by the host.
下面将对前述实施例中的存储设备的结构进行介绍:The structure of the storage device in the foregoing embodiment will be introduced below:
如图4所示,本实施例提供的一种存储设备40的结构示意图。前述图1B和图3对应的方法实施例中的存储设备可以基于本实施例中图4所示的结构。As shown in FIG. 4, a schematic structural diagram of a storage device 40 provided in this embodiment. The storage device in the method embodiment corresponding to FIG. 1B and FIG. 3 may be based on the structure shown in FIG. 4 in this embodiment.
该存储设备40包括至少一个处理器401以及至少一个存储介质402。The storage device 40 includes at least one processor 401 and at least one storage medium 402.
其中,该处理器401可以是通用中央处理器(central processing unit,CPU)或微处理器(micro processor)。该处理器401可以是单核处理器(single-CPU),也可以是多核处理器(multi-CPU)。此外,该处理器401可以指一个或多个装置、电路、和/或用于处理数据(例如计算机程序指令)的处理核。在本实施例中,该处理器401可以用于接收主机下发的读请求,基于该读请求携带的文件访问信息生成关键词(例如第一关键词和第二关键词等)。此外,该处理器401还用于采用该第一关键词和目标访问规律模板生成目标关键词,并基于该目标关键词预取预测文件。具体地,该处理器401可以执行前述图1B和图3对应实施例中的其他步骤。The processor 401 may be a general-purpose central processing unit (central processing unit, CPU) or a microprocessor (microprocessor). The processor 401 may be a single-core processor (single-CPU) or a multi-core processor (multi-CPU). In addition, the processor 401 may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions). In this embodiment, the processor 401 may be configured to receive a read request issued by the host, and generate keywords (such as the first keyword and the second keyword, etc.) based on the file access information carried in the read request. In addition, the processor 401 is also configured to use the first keyword and the target access rule template to generate a target keyword, and prefetch a prediction file based on the target keyword. Specifically, the processor 401 may execute other steps in the aforementioned embodiment corresponding to FIG. 1B and FIG. 3.
特别地,该存储设备40在执行前述图3对应实施例中的步骤301a时,可以采用如下任意一种实现方式:In particular, when the storage device 40 executes step 301a in the embodiment corresponding to FIG. 3, any one of the following implementation manners can be adopted:
可选的,该处理器401还用于基于多个初始关键词进行训练以生成访问规律模板,以及基于前述目标访问规律模板对访问规律模板集合进行更新。Optionally, the processor 401 is further configured to perform training based on a plurality of initial keywords to generate an access rule template, and update the set of access rule templates based on the aforementioned target access rule template.
可选的,前述至少一个处理器401中包含一个或多个第一处理器4011,该第一处理器4011用于基于多个初始关键词进行训练以生成访问规律模板,以及基于前述目标访问规律模板对访问规律模板集合进行更新。该第一处理器4011可以为该处理器401中的一个功能模块或一个独立的功能芯片,用于对前述访问规律模板集合进行维护。该第一处理器4011还可以为一个具有特定计算功能的AI芯片。Optionally, the aforementioned at least one processor 401 includes one or more first processors 4011, and the first processors 4011 are configured to perform training based on a plurality of initial keywords to generate an access rule template, and based on the aforementioned target access rule The template updates the access rule template collection. The first processor 4011 may be a functional module or an independent functional chip in the processor 401, and is used to maintain the aforementioned access rule template set. The first processor 4011 may also be an AI chip with specific computing functions.
可选的,该存储设备40还包括AI芯片403,该AI芯片403位于该处理器401外,用于实现前述第一处理器4011的功能,即该AI芯片403用于基于多个初始关键词进行训练以生成访问规律模板,以及基于前述目标访问规律模板对访问规律模板集合进行更新。此时,前述处理器401与该AI芯片403之间还包括芯片接口,该处理器401通过该芯片接口与该AI芯片403实现通信,从而调用该AI芯片403中生成的访问规律模板集合。Optionally, the storage device 40 further includes an AI chip 403. The AI chip 403 is located outside the processor 401 and is used to implement the function of the aforementioned first processor 4011, that is, the AI chip 403 is used to base multiple initial keywords. Training is performed to generate an access rule template, and the access rule template set is updated based on the aforementioned target access rule template. At this time, the aforementioned processor 401 and the AI chip 403 also include a chip interface, and the processor 401 communicates with the AI chip 403 through the chip interface, so as to call the access rule template set generated in the AI chip 403.
在前述三种可选的实施方式中,生成访问规律模板以及对访问规律模板集合的更新维护的步骤,具体可以参阅前述图3对应的实施例中的步骤301b中的相关介绍,具体此处不再赘述。In the foregoing three optional implementation manners, for the steps of generating access rule templates and updating and maintaining the access rule template set, please refer to the related introduction in step 301b in the embodiment corresponding to FIG. 3 for details. Go into details again.
此外,该存储介质402包括缓存(cache memory)4021和硬盘4022。其中,该缓存4021也可以被称为内存,是外部存储(即硬盘4022)与处理器401进行沟通的桥梁。该缓存4021可以用于暂时存放处理器401中的运算数据以及与硬盘4022等外部存储器交换的数据。当计算机在运行时,处理器401可以将需要运算的数据从缓存4021中调到处理器401中进行运算。应当理解的是,该硬盘4022与该缓存4021之间还包括一个或多个接口,该一个或多个接口用于实现硬盘4022与缓存4021之间的数据传输。具体地,该处理器401用于并基于该目标关键词从该硬盘4022中将预测文件预取至缓存4021中。In addition, the storage medium 402 includes a cache memory 4021 and a hard disk 4022. Wherein, the cache 4021 may also be referred to as a memory, which is a bridge for communication between external storage (that is, the hard disk 4022) and the processor 401. The cache 4021 can be used to temporarily store the arithmetic data in the processor 401 and data exchanged with an external memory such as the hard disk 4022. When the computer is running, the processor 401 can transfer the data to be calculated from the cache 4021 to the processor 401 for calculation. It should be understood that the hard disk 4022 and the cache 4021 further include one or more interfaces, and the one or more interfaces are used to implement data transmission between the hard disk 4022 and the cache 4021. Specifically, the processor 401 is configured to prefetch the prediction file from the hard disk 4022 to the cache 4021 based on the target keyword.
此外,该缓存4021还用于存储前述处理器401/第一处理器4011/AI芯片403生成的访问规律模板。该处理器401/第一处理器4011/AI芯片403还可以将前述访问规律模板备份至硬盘4022中,以使得该处理器401/第一处理器4011/AI芯片403在需要时可以随时 从该硬盘4022中调用该访问规律模板。特别地,由于该AI芯片403中也包含一部分存储介质,当由AI芯片403生成访问规律模板时,可以直接将该访问规律模板存储于该AI芯片403中。In addition, the cache 4021 is also used to store the access rule template generated by the aforementioned processor 401/first processor 4011/AI chip 403. The processor 401/first processor 4011/AI chip 403 can also back up the aforementioned access rule template to the hard disk 4022, so that the processor 401/first processor 4011/AI chip 403 can access the The access rule template is called in the hard disk 4022. In particular, since the AI chip 403 also contains a part of storage media, when the AI chip 403 generates the access rule template, the access rule template can be directly stored in the AI chip 403.
可选的,前述处理器401、缓存4021以及AI芯片403一般位于同一控制器(图未示)中。具体地,可以参阅前述图1A对应的系统架构图,具体此处不再赘述。Optionally, the aforementioned processor 401, cache 4021, and AI chip 403 are generally located in the same controller (not shown). Specifically, you can refer to the system architecture diagram corresponding to FIG. 1A, and the details are not repeated here.
如图5所示,为本申请实施例提供的一种预取装置50的结构示意图。该预取装置50位于前述存储设备40中,该存储设备40的具体结构可以参阅前述图4。该存储设备40存储有计算机程序或指令,该预取装置50调用该计算机程序或指令执行如下模块:关键词生成模块501、计算模块502、数据迁移模块503以及模板生成模块504。其中,该关键词生成模块501、计算模块502和数据迁移模块503位于图4所示的存储设备40中的处理器401中;该模板生成模块504可以位于图4所示的存储设备40中的处理器401中,例如,为该处理器401中的第一处理器4011;该模板生成模块504也可以位于图4所示的存储设备40中的AI芯片403中。As shown in FIG. 5, it is a schematic structural diagram of a prefetching device 50 provided by an embodiment of this application. The prefetching device 50 is located in the aforementioned storage device 40. For the specific structure of the storage device 40, please refer to the aforementioned FIG. 4. The storage device 40 stores a computer program or instruction, and the prefetching device 50 invokes the computer program or instruction to execute the following modules: a keyword generation module 501, a calculation module 502, a data migration module 503, and a template generation module 504. The keyword generation module 501, the calculation module 502, and the data migration module 503 are located in the processor 401 in the storage device 40 shown in FIG. 4; the template generation module 504 may be located in the storage device 40 shown in FIG. The processor 401 is, for example, the first processor 4011 in the processor 401; the template generation module 504 may also be located in the AI chip 403 in the storage device 40 shown in FIG. 4.
其中,关键词生成模块501,用于根据读请求携带的文件访问信息生成第一关键词。具体地,可以参阅前述步骤101和前述步骤301b中的相关介绍。该计算模块502,用于采用该第一关键词和目标访问规律模板生成目标关键词,该目标关键词与该第一关键词符合相同的特征,该目标关键词用于指示预测文件。具体地,可以参阅前述步骤102、前述步骤302和前述步骤303中的相关介绍。该数据迁移模块503,用于将该目标关键词指示的该预测文件从该硬盘预取至该缓存中。具体地,可以参阅前述步骤103和前述步骤304的相关介绍。Among them, the keyword generating module 501 is configured to generate the first keyword according to the file access information carried in the read request. Specifically, please refer to the relevant introduction in the foregoing step 101 and the foregoing step 301b. The calculation module 502 is configured to use the first keyword and the target access rule template to generate a target keyword. The target keyword matches the same characteristics as the first keyword, and the target keyword is used to indicate a prediction file. Specifically, reference may be made to the related introductions in the foregoing step 102, the foregoing step 302, and the foregoing step 303. The data migration module 503 is configured to prefetch the prediction file indicated by the target keyword from the hard disk to the cache. Specifically, reference may be made to the related introduction of the foregoing step 103 and the foregoing step 304.
其中,该预取装置50存储有多个访问规律模板,或者,该预取装置50可以调用存储设备中存储的多个访问规律模板。该目标访问规律模板为该多个访问规律模板中与该第一关键词对应的访问规律模板,每个该访问规律模板由采用基于文本语义的训练模型对多个初始关键词进行训练而得。该目标访问规律模板包含该第一关键词和/或第一特征关联词,该第一特征关联词用于指示该第一关键词符合的特征。该目标访问规律模板还包含映射关系,该映射关系用于指示该多个初始关键词之间的关联方式,该多个初始关键词包括该第一关键词。Wherein, the prefetching device 50 stores multiple access rule templates, or the prefetching device 50 can call multiple access rule templates stored in the storage device. The target access rule template is an access rule template corresponding to the first keyword among the plurality of access rule templates, and each of the access rule templates is obtained by training a plurality of initial keywords using a training model based on text semantics. The target visit rule template includes the first keyword and/or first characteristic related words, and the first characteristic related words are used to indicate the characteristics that the first keyword conforms to. The target access rule template further includes a mapping relationship, and the mapping relationship is used to indicate an association manner between the plurality of initial keywords, and the plurality of initial keywords include the first keyword.
此外,该计算模块502,具体用于将该第一关键词作为该映射关系的输入,采用该映射关系对该第一关键词进行计算,输出该目标关键词,该目标关键词符合该第一特征关联词指示的该第一关键词的特征。可选的,该计算模块502,还用于:判断访问规律模板是否包含该第一关键词;若该访问规律模板包含该第一关键词,则确定该访问规律模板为该目标访问规律模板;若该访问规律模板不包含该第一关键词,则进一步判断该第一关键词是否符合该访问规律模板中的特征关联词指示的特征;若该第一关键词符合该访问规律模板中的特征关联词指示的特征,则确定该访问规律模板为该目标访问规律模板。具体地,可以参阅前述303中的相关介绍。In addition, the calculation module 502 is specifically configured to use the first keyword as the input of the mapping relationship, calculate the first keyword using the mapping relationship, and output the target keyword. The target keyword matches the first keyword. The feature of the first keyword indicated by the feature related word. Optionally, the calculation module 502 is further configured to: determine whether the access rule template includes the first keyword; if the access rule template includes the first keyword, determine that the access rule template is the target access rule template; If the access rule template does not contain the first keyword, then it is further determined whether the first keyword meets the characteristics indicated by the feature related words in the access rule template; if the first keyword meets the feature related words in the access rule template Indicates the characteristic, the visit rule template is determined to be the target visit rule template. Specifically, please refer to the relevant introduction in 303 above.
此外,该模板生成模块504,用于采用该基于文本语义的训练模型对该多个初始关键词进行训练得到该访问规律模板。具体地,可以参阅前述步骤301a的相关介绍。In addition, the template generation module 504 is used to train the plurality of initial keywords using the text semantic training model to obtain the access rule template. Specifically, reference may be made to the related introduction of the foregoing step 301a.
以上模块的一个或多个可以以软件、硬件或二者结合来实现。当以上任一模块以软件实现的时候,所述软件以计算机程序指令的方式存在,并被存储在存储器中,处理器可以用于执行所述程序指令并实现以上方法流程。此时的处理器可以包括但不限于以下至少一种:中央处理单元、微处理器、数字信号处理器(digital signal processing,DSP)、微控制器(micro controller unit,MCU)、或人工智能处理器等各类运行软件的计算设备,每种计算设备可包括一个或多个用于执行软件指令以进行运算或处理的核。该处理器可以内置于片上系统(system on chip,SoC)或专用集成电路(application specific integrated circuit,ASIC),也可是一个独立的半导体芯片。该处理器内处理用于执行软件指令以进行运算或处理的核外,还可进一步包括必要的硬件加速器,如现场可编程门阵列(field programmable gate array,FPGA)、可编程逻辑电路(programmable logic device,PLD)或者实现专用逻辑运算的逻辑电路。当以上模块或单元以硬件实现的时候,该硬件可以是CPU、微控制器、DSP、MCU、人工智能处理器、ASIC、SoC、FPGA、PLD、专用数字电路、硬件加速器或非集成的分立器件中的任一个或任一组合,其可以运行必要的软件或不依赖于软件以执行以上方法流程。One or more of the above modules can be implemented by software, hardware or a combination of both. When any of the above modules is implemented by software, the software exists in the form of computer program instructions and is stored in the memory, and the processor can be used to execute the program instructions and implement the above method flow. The processor at this time may include but is not limited to at least one of the following: central processing unit, microprocessor, digital signal processing (digital signal processing, DSP), microcontroller (microcontroller unit, MCU), or artificial intelligence processing Various computing devices that run software, such as a computer, and each computing device may include one or more cores for executing software instructions to perform operations or processing. The processor may be built in a system on chip (SoC) or an application specific integrated circuit (ASIC), or it may be an independent semiconductor chip. The processor's internal processing is used to execute software instructions to perform calculations or processing, and may further include necessary hardware accelerators, such as field programmable gate array (FPGA) and programmable logic circuit (programmable logic). device, PLD) or a logic circuit that implements dedicated logic operations. When the above modules or units are implemented in hardware, the hardware can be CPU, microcontroller, DSP, MCU, artificial intelligence processor, ASIC, SoC, FPGA, PLD, dedicated digital circuit, hardware accelerator or non-integrated discrete device For any one or any combination of the above, it can run necessary software or does not rely on software to perform the above method flow.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, the specific working process of the above-described system, device, and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。The above embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still compare the previous embodiments. The recorded technical solutions are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (20)

  1. 一种文件预取方法,应用于存储设备中,其特征在于,包括:A file prefetching method, applied to a storage device, characterized in that it includes:
    根据读请求携带的文件访问信息生成第一关键词,所述文件访问信息为与所述读请求指示的文件相关的信息;Generating a first keyword according to the file access information carried in the read request, where the file access information is information related to the file indicated by the read request;
    采用所述第一关键词和目标访问规律模板生成目标关键词,所述目标关键词与所述第一关键词符合相同的特征,所述目标关键词用于指示预测文件;Generating a target keyword using the first keyword and a target access rule template, the target keyword conforms to the same characteristics as the first keyword, and the target keyword is used to indicate a prediction file;
    将所述目标关键词指示的所述预测文件预取至缓存中。Prefetch the prediction file indicated by the target keyword into a cache.
  2. 根据权利要求1所述的方法,其特征在于,所述存储设备存储有多个访问规律模板,所述目标访问规律模板为所述多个访问规律模板中与所述第一关键词对应的访问规律模板,每个所述访问规律模板由采用基于文本语义的训练模型对多个初始关键词进行训练而得。The method according to claim 1, wherein the storage device stores a plurality of access rule templates, and the target access rule template is an access corresponding to the first keyword among the plurality of access rule templates. Regular template, each of the access regular templates is obtained by training a plurality of initial keywords using a training model based on text semantics.
  3. 根据权利要求2所述的方法,其特征在于,所述目标访问规律模板包含所述第一关键词和/或第一特征关联词,所述第一特征关联词用于指示所述第一关键词符合的特征。The method according to claim 2, wherein the target access rule template contains the first keyword and/or first characteristic related words, and the first characteristic related words are used to indicate that the first keyword matches Characteristics.
  4. 根据权利要求3所述的方法,其特征在于,所述目标访问规律模板还包含映射关系,所述映射关系用于指示所述多个初始关键词之间的关联方式,所述多个初始关键词包括所述第一关键词;The method according to claim 3, wherein the target access rule template further comprises a mapping relationship, and the mapping relationship is used to indicate an association manner between the plurality of initial keywords, and the plurality of initial keywords The word includes the first keyword;
    所述采用所述第一关键词和目标访问规律模板生成目标关键词,包括:The generating target keywords using the first keyword and the target access rule template includes:
    将所述第一关键词作为所述映射关系的输入,采用所述映射关系对所述第一关键词进行计算,输出所述目标关键词,所述目标关键词符合所述第一特征关联词指示的所述第一关键词的特征。The first keyword is used as the input of the mapping relationship, the first keyword is calculated using the mapping relationship, and the target keyword is output, and the target keyword meets the first characteristic associated word indication The characteristics of the first keyword.
  5. 根据权利要求3或4所述的方法,其特征在于,所述方法还包括:The method according to claim 3 or 4, wherein the method further comprises:
    判断访问规律模板是否包含所述第一关键词;Judging whether the access rule template contains the first keyword;
    若所述访问规律模板包含所述第一关键词,则确定所述访问规律模板为所述目标访问规律模板;If the access rule template includes the first keyword, determining that the access rule template is the target access rule template;
    若所述访问规律模板不包含所述第一关键词,则进一步判断所述第一关键词是否符合所述访问规律模板中的特征关联词指示的特征;If the access pattern template does not include the first keyword, it is further determined whether the first keyword meets the characteristics indicated by the characteristic associated words in the access pattern template;
    若所述第一关键词符合所述访问规律模板中的特征关联词指示的特征,则确定所述访问规律模板为所述目标访问规律模板。If the first keyword meets the characteristics indicated by the characteristic associated words in the visit rule template, it is determined that the visit rule template is the target visit rule template.
  6. 根据权利要求1至4中任意一项所述的方法,其特征在于,所述文件访问信息为来自不同访问目录的文件的信息,所述多个预测文件位于不同的访问目录。The method according to any one of claims 1 to 4, wherein the file access information is information of files from different access directories, and the multiple predicted files are located in different access directories.
  7. 一种存储设备,其特征在于,包括:缓存、硬盘以及至少一个处理器;A storage device, characterized by comprising: a cache, a hard disk, and at least one processor;
    所述至少一个处理器,用于:The at least one processor is used to:
    根据读请求携带的文件访问信息生成第一关键词,所述文件访问信息为与所述读请求指示的文件相关的信息;Generating a first keyword according to the file access information carried in the read request, where the file access information is information related to the file indicated by the read request;
    采用所述第一关键词和目标访问规律模板生成目标关键词,所述目标关键词与所述第一关键词符合相同的特征,所述目标关键词用于指示预测文件;Generating a target keyword using the first keyword and a target access rule template, the target keyword conforms to the same characteristics as the first keyword, and the target keyword is used to indicate a prediction file;
    将所述目标关键词指示的所述预测文件从所述硬盘预取至所述缓存中。Prefetching the prediction file indicated by the target keyword from the hard disk into the cache.
  8. 根据权利要求7所述的存储设备,其特征在于,所述存储设备存储有多个访问规律 模板,所述目标访问规律模板为所述多个访问规律模板中与所述第一关键词对应的访问规律模板,每个所述访问规律模板由采用基于文本语义的训练模型对多个初始关键词进行训练而得。The storage device according to claim 7, wherein the storage device stores a plurality of access rule templates, and the target access rule template is one of the plurality of access rule templates corresponding to the first keyword Access regularity templates, each of the access regularity templates is obtained by training a plurality of initial keywords using a training model based on text semantics.
  9. 根据权利要求8所述的存储设备,其特征在于,所述目标访问规律模板包含所述第一关键词和/或第一特征关联词,所述第一特征关联词用于指示所述第一关键词符合的特征。The storage device according to claim 8, wherein the target access rule template contains the first keyword and/or first characteristic related words, and the first characteristic related words are used to indicate the first keyword Compatible characteristics.
  10. 根据权利要求9所述的存储设备,其特征在于,所述目标访问规律模板还包含映射关系,所述映射关系用于指示所述多个初始关键词之间的关联方式,所述多个初始关键词包括所述第一关键词;The storage device according to claim 9, wherein the target access rule template further comprises a mapping relationship, and the mapping relationship is used to indicate an association manner between the plurality of initial keywords, and the plurality of initial keywords Keywords include the first keyword;
    所述至少一个处理器,具体用于将所述第一关键词作为所述映射关系的输入,采用所述映射关系对所述第一关键词进行计算,输出所述目标关键词,所述目标关键词符合所述第一特征关联词指示的所述第一关键词的特征。The at least one processor is specifically configured to use the first keyword as the input of the mapping relationship, calculate the first keyword using the mapping relationship, and output the target keyword, the target The keyword conforms to the feature of the first keyword indicated by the first feature related word.
  11. 根据权利要求9或10所述的存储设备,其特征在于,所述至少一个处理器,还用于:The storage device according to claim 9 or 10, wherein the at least one processor is further configured to:
    判断访问规律模板是否包含所述第一关键词;Judging whether the access rule template contains the first keyword;
    若所述访问规律模板包含所述第一关键词,则确定所述访问规律模板为所述目标访问规律模板;If the access rule template includes the first keyword, determining that the access rule template is the target access rule template;
    若所述访问规律模板不包含所述第一关键词,则进一步判断所述第一关键词是否符合所述访问规律模板中的特征关联词指示的特征;If the access pattern template does not include the first keyword, it is further determined whether the first keyword meets the characteristics indicated by the characteristic associated words in the access pattern template;
    若所述第一关键词符合所述访问规律模板中的特征关联词指示的特征,则确定所述访问规律模板为所述目标访问规律模板。If the first keyword meets the characteristics indicated by the characteristic associated words in the visit rule template, it is determined that the visit rule template is the target visit rule template.
  12. 根据权利要求7至10中任意一项所述的存储设备,其特征在于,所述文件访问信息为来自不同访问目录的文件的信息,所述多个预测文件位于不同的访问目录。The storage device according to any one of claims 7 to 10, wherein the file access information is information of files from different access directories, and the multiple predicted files are located in different access directories.
  13. 根据权利要求7至10中任意一项所述的存储设备,其特征在于,所述至少一个处理器还包括第一处理器,所述第一处理器用于采用所述基于文本语义的训练模型对所述多个初始关键词进行训练得到所述访问规律模板。The storage device according to any one of claims 7 to 10, wherein the at least one processor further comprises a first processor, and the first processor is configured to use the text-based semantic training model pair The multiple initial keywords are trained to obtain the access rule template.
  14. 一种预取装置,其特征在于,所述预取装置位于存储设备中,所述存储设备还包括缓存和硬盘;A prefetching device, characterized in that the prefetching device is located in a storage device, and the storage device further includes a cache and a hard disk;
    所述存储设备存储有计算机程序或指令,所述预取装置调用所述计算机程序或指令执行如下模块:The storage device stores a computer program or instruction, and the prefetching device invokes the computer program or instruction to execute the following modules:
    关键词生成模块,用于根据读请求携带的文件访问信息生成第一关键词,所述文件访问信息为与所述读请求指示的文件相关的信息;The keyword generation module is configured to generate a first keyword according to the file access information carried in the read request, where the file access information is information related to the file indicated by the read request;
    计算模块,用于采用所述第一关键词和目标访问规律模板生成目标关键词,所述目标关键词与所述第一关键词符合相同的特征,所述目标关键词用于指示预测文件;A calculation module, configured to use the first keyword and the target access rule template to generate a target keyword, the target keyword and the first keyword have the same characteristics, and the target keyword is used to indicate a prediction file;
    数据迁移模块,用于将所述目标关键词指示的所述预测文件从所述硬盘预取至所述缓存中。The data migration module is configured to prefetch the prediction file indicated by the target keyword from the hard disk to the cache.
  15. 根据权利要求14所述的预取装置,其特征在于,所述预取装置存储有多个访问规律模板,所述目标访问规律模板为所述多个访问规律模板中与所述第一关键词对应的访问 规律模板,每个所述访问规律模板由采用基于文本语义的训练模型对多个初始关键词进行训练而得。The prefetching device of claim 14, wherein the prefetching device stores a plurality of access rule templates, and the target access rule template is the first keyword among the plurality of access rule templates. Corresponding access rule templates, each of the access rule templates is obtained by training a plurality of initial keywords using a training model based on text semantics.
  16. 根据权利要求15所述的预取装置,其特征在于,所述目标访问规律模板包含所述第一关键词和/或第一特征关联词,所述第一特征关联词用于指示所述第一关键词符合的特征。The prefetching device according to claim 15, wherein the target access rule template contains the first keyword and/or first characteristic related words, and the first characteristic related words are used to indicate the first key The characteristics of the word match.
  17. 根据权利要求16所述的预取装置,其特征在于,所述目标访问规律模板还包含映射关系,所述映射关系用于指示所述多个初始关键词之间的关联方式,所述多个初始关键词包括所述第一关键词;The prefetching device according to claim 16, wherein the target access rule template further comprises a mapping relationship, and the mapping relationship is used to indicate an association manner between the plurality of initial keywords, and the plurality of The initial keywords include the first keyword;
    所述计算模块,具体用于将所述第一关键词作为所述映射关系的输入,采用所述映射关系对所述第一关键词进行计算,输出所述目标关键词,所述目标关键词符合所述第一特征关联词指示的所述第一关键词的特征。The calculation module is specifically configured to use the first keyword as the input of the mapping relationship, calculate the first keyword using the mapping relationship, and output the target keyword, the target keyword Meet the feature of the first keyword indicated by the first feature related word.
  18. 根据权利要求16或17所述的预取装置,其特征在于,所述计算模块,还用于:The prefetching device according to claim 16 or 17, wherein the calculation module is further configured to:
    判断访问规律模板是否包含所述第一关键词;Judging whether the access rule template contains the first keyword;
    若所述访问规律模板包含所述第一关键词,则确定所述访问规律模板为所述目标访问规律模板;If the access rule template includes the first keyword, determining that the access rule template is the target access rule template;
    若所述访问规律模板不包含所述第一关键词,则进一步判断所述第一关键词是否符合所述访问规律模板中的特征关联词指示的特征;If the access pattern template does not include the first keyword, it is further determined whether the first keyword meets the characteristics indicated by the characteristic associated words in the access pattern template;
    若所述第一关键词符合所述访问规律模板中的特征关联词指示的特征,则确定所述访问规律模板为所述目标访问规律模板。If the first keyword meets the characteristics indicated by the characteristic associated words in the visit rule template, it is determined that the visit rule template is the target visit rule template.
  19. 根据权利要求14至17中任意一项所述的预取装置,其特征在于,所述文件访问信息为来自不同访问目录的文件的信息,所述多个预测文件位于不同的访问目录。The prefetching device according to any one of claims 14 to 17, wherein the file access information is information of files from different access directories, and the multiple predicted files are located in different access directories.
  20. 根据权利要求14至17中任意一项所述的预取装置,其特征在于,所述预取装置还包括模板生成模块;所述模板生成模块,用于采用所述基于文本语义的训练模型对所述多个初始关键词进行训练得到所述访问规律模板。The prefetching device according to any one of claims 14 to 17, wherein the prefetching device further comprises a template generation module; the template generation module is configured to use the text semantic-based training model pair The multiple initial keywords are trained to obtain the access rule template.
PCT/CN2021/087840 2020-04-20 2021-04-16 File prefetching method, storage device, and prefetching apparatus WO2021213278A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010311787.9A CN113535658A (en) 2020-04-20 2020-04-20 File prefetching method, storage device and prefetching device
CN202010311787.9 2020-04-20

Publications (1)

Publication Number Publication Date
WO2021213278A1 true WO2021213278A1 (en) 2021-10-28

Family

ID=78093721

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/087840 WO2021213278A1 (en) 2020-04-20 2021-04-16 File prefetching method, storage device, and prefetching apparatus

Country Status (2)

Country Link
CN (1) CN113535658A (en)
WO (1) WO2021213278A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116055429A (en) * 2023-01-17 2023-05-02 杭州鸿钧微电子科技有限公司 PCIE-based communication data processing method, PCIE-based communication data processing device, PCIE-based communication data processing equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109471971A (en) * 2018-02-06 2019-03-15 华南师范大学 A kind of semantic pre-fetching system and method for oriented towards education Domain resources cloud storage
WO2019100263A1 (en) * 2017-11-22 2019-05-31 Intel Corporation File pre-fetch scheduling for cache memory to reduce latency
CN110018970A (en) * 2018-01-08 2019-07-16 腾讯科技(深圳)有限公司 Cache prefetching method, apparatus, equipment and computer readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019100263A1 (en) * 2017-11-22 2019-05-31 Intel Corporation File pre-fetch scheduling for cache memory to reduce latency
CN110018970A (en) * 2018-01-08 2019-07-16 腾讯科技(深圳)有限公司 Cache prefetching method, apparatus, equipment and computer readable storage medium
CN109471971A (en) * 2018-02-06 2019-03-15 华南师范大学 A kind of semantic pre-fetching system and method for oriented towards education Domain resources cloud storage

Also Published As

Publication number Publication date
CN113535658A (en) 2021-10-22

Similar Documents

Publication Publication Date Title
US10586155B2 (en) Clarification of submitted questions in a question and answer system
US10963794B2 (en) Concept analysis operations utilizing accelerators
US9318027B2 (en) Caching natural language questions and results in a question and answer system
US11216504B2 (en) Document recommendation method and device based on semantic tag
Pound et al. Interpreting keyword queries over web knowledge bases
US10229188B2 (en) Automatic corpus expansion using question answering techniques
WO2019003069A1 (en) Adaptive evaluation of meta-relationships in semantic graphs
US9342561B2 (en) Creating and using titles in untitled documents to answer questions
US9720962B2 (en) Answering superlative questions with a question and answer system
JP2021507350A (en) Reinforcement evidence retrieval of complex answers
KR20220114495A (en) Interaction layer neural network for search, retrieval, and ranking
US9092512B2 (en) Corpus search improvements using term normalization
US10885281B2 (en) Natural language document summarization using hyperbolic embeddings
US20200065395A1 (en) Efficient leaf invalidation for query execution
TW202001621A (en) Corpus generating method and apparatus, and human-machine interaction processing method and apparatus
WO2022141872A1 (en) Document abstract generation method and apparatus, computer device, and storage medium
WO2021213278A1 (en) File prefetching method, storage device, and prefetching apparatus
US20190318220A1 (en) Dispersed template-based batch interaction with a question answering system
CN113505196B (en) Text retrieval method and device based on parts of speech, electronic equipment and storage medium
CN115186112A (en) Medicine data retrieval method and device based on syndrome differentiation mapping rule
US11475335B2 (en) Cognitive data preparation for deep learning model training
Qi et al. Salient context-based semantic matching for information retrieval
US20220036007A1 (en) Bootstrapping relation training data
US20210319066A1 (en) Sub-Question Result Merging in Question and Answer (QA) Systems
Lei et al. Semantic Similarity Measures to Disambiguate Terms in Medical Text

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21793241

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21793241

Country of ref document: EP

Kind code of ref document: A1