WO2021190501A1 - 数据预取方法、装置以及存储设备 - Google Patents

数据预取方法、装置以及存储设备 Download PDF

Info

Publication number
WO2021190501A1
WO2021190501A1 PCT/CN2021/082382 CN2021082382W WO2021190501A1 WO 2021190501 A1 WO2021190501 A1 WO 2021190501A1 CN 2021082382 W CN2021082382 W CN 2021082382W WO 2021190501 A1 WO2021190501 A1 WO 2021190501A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
read
level storage
model
prefetch
Prior art date
Application number
PCT/CN2021/082382
Other languages
English (en)
French (fr)
Inventor
鲁鹏
刘金虎
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21774365.7A priority Critical patent/EP4099235A4/en
Publication of WO2021190501A1 publication Critical patent/WO2021190501A1/zh
Priority to US17/951,424 priority patent/US20230009375A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0862Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0868Data transfer between cache memory and other subsystems, e.g. storage devices or host systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/26Using a specific storage system architecture
    • G06F2212/261Storage comprising a plurality of storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/31Providing disk cache in a specific location of a storage system
    • G06F2212/312In storage controller
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/50Control mechanisms for virtual memory, cache or TLB
    • G06F2212/502Control mechanisms for virtual memory, cache or TLB using adaptive policy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/602Details relating to cache prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/6024History based prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of storage technology, and in particular to a data prefetching method, device and storage device.
  • the access speed of the cache is faster than the access speed of the hard disk. Therefore, in the application, when a read data request is received, the data to be read by the next read data request can be predicted according to the read data request. , And read the predicted data from the hard disk to the cache in advance. In this way, when the storage device receives a data read request, it can achieve a read hit of the data in the cache, thereby greatly improving the processing speed of the data read request.
  • the present application provides a data prefetching method, device and storage device, which can save the computing power of the processor.
  • the technical solution is as follows:
  • a data prefetching method is provided, which is applied to a storage device, the storage device includes a processor, an artificial intelligence AI chip, a first-level storage and a second-level storage, and the method includes: the AI chip Generate a prefetch model; the AI sends the generated prefetch model to the processor; the processor predicts the data to be read according to the prefetch model, and the data to be read is stored in the first In secondary storage; the processor reads the data to be read from the second-level storage into the first-level storage.
  • the first aspect can be applied to any application scenario that includes secondary storage, where the primary storage refers to a storage with a smaller capacity and a faster data processing speed, such as the cache 102 or the primary memory or the secondary memory or the hard disk enclosure. In the memory and so on.
  • the second-level storage refers to a memory with a larger capacity and a slower data processing speed than the first-level storage, such as a second-level memory or a hard disk 22.
  • Example 1 The first-level storage refers to the cache 102 inside the controller 11, the second-level storage refers to the hard disk 22 in the hard disk box coupled with the controller 11, and data prefetching means that the data in the hard disk 22 is read in advance To the cache 102 for the host to read.
  • the first-level storage is the first-level memory, also known as the CPU cache (cache memory), which is the temporary storage closest to the processor 101, and the second-level storage refers to the second-level memory, which is usually directly referred to as the memory.
  • Data prefetching refers to reading the data in the secondary memory into the primary memory in advance.
  • Example 3 The first-level storage refers to the second-level memory described above, and the second-level storage refers to the hard disk 22.
  • Data prefetching refers to reading the data in the hard disk 22 into the secondary memory in advance.
  • Example 4 The first-level storage refers to the memory in the hard disk box, and the second-level storage refers to the hard disk 22 in the hard disk box.
  • Data prefetching refers to reading data in the hard disk 22 into the memory of the hard disk enclosure in advance.
  • the AI chip trains the data samples to generate the prefetch model, and the processor only needs to perform the data prefetch operation, which greatly reduces the computational burden of the processor.
  • predicting the data to be read by the processor according to the prefetch model includes: performing inference according to the prefetch model and the received data read request to obtain the predicted address of the data to be read.
  • the AI chip periodically upgrades the prefetch model.
  • the prefetch model includes but is not limited to a serial sequence flow model, a parallel sequence flow model, an interval sequence flow model, or an associated prefetch model.
  • the second aspect provides a data prefetching method, which is applied to a storage device, and the storage device includes a processor, an AI chip, a first-level storage, and a second-level storage.
  • the method includes: the AI chip generates a prefetch model; the AI chip makes inferences based on the prefetch model and the received data read request to obtain the address of the data to be read; and the AI chip sends the data to be read
  • the address of the data to be read is sent to the processor; the processor reads the data to be read from the second-level storage to the first-level storage according to the address of the data to be read.
  • the method further includes: the AI chip periodically upgrades the prefetch model.
  • the prefetch model includes but is not limited to a serial sequence flow model, a parallel sequence flow model, an interval sequence flow model, or an associated prefetch model.
  • a third aspect provides a storage device that includes a processor, an artificial intelligence AI chip, a cache, and a hard disk, and the storage device is used to execute the method provided in the first aspect or the second aspect.
  • a fourth aspect provides a data prefetching device, the device includes a first processing module and a second processing module, and the data prefetching device is configured to execute the method provided in the first aspect or the second aspect.
  • a fifth aspect provides a terminal device.
  • the terminal device includes a first-level storage, a second-level storage, a first processing module, and a second processing module; the first processing module is used to train data samples to Generate a prefetch model; send the generated prefetch model to the second processing module; the second processing module is configured to predict the data to be read according to the prefetch model, and the data to be read is stored In the second-level storage; the data to be read is read from the second-level storage to the first-level storage.
  • FIG. 1 is a schematic diagram of a storage device involved in a data prefetching method provided by an embodiment of the present application
  • FIG. 2 is a flowchart of a data prefetching method provided by an embodiment of the present application
  • Fig. 3 is a schematic structural diagram of a data prefetching device shown in an embodiment of the present application.
  • FIG. 1 is a system architecture diagram provided by an embodiment of the present invention.
  • the storage system provided in this embodiment includes a host 20, a controller 11, and multiple hard disks 22.
  • the host 20 and the controller 11 pass through the network file system (English full name: Network File System, English abbreviation: NFS)/Common Internet File System (English full name: Common Internet File System, English abbreviation: CIFS) protocol or Fibre Channel (English abbreviation: CIFS) Full name: Fiber Channel, English abbreviation: FC) protocol for communication.
  • the host 20 may send a data write request to the controller 11, and the controller 11 writes the data carried in the data write request into the hard disk 22 after receiving the data write request.
  • the host 20 may also send a read data request to the controller 11.
  • the controller 11 After the controller 11 receives the read data request, it searches for the data to be read in its cache 102 according to the address in the read data request. Then the data to be read is directly sent to the host 20, if not, the data is obtained from the hard disk 22 and sent to the host 20.
  • the controller 11 and the hard disk 22 may be integrated in one storage device, or may be located in two separate devices. The embodiment of the present invention does not limit the positional relationship between the controller 11 and the hard disk 22 in any way.
  • a read data request when a user initiates a larger read data request on the host 20 or a client connected to the host 20, the host 20 often splits the read data request into multiple requests and sends them to the controller. 11 processing.
  • the operation of splitting the read data request into multiple requests may be performed by a host bus adapter (Host Bus Adapter, HBA) in the host 20, or may be performed by the HBA in the controller 11.
  • HBA host Bus Adapter
  • the size of each request after splitting may be the same or different, and this embodiment does not limit the size of the request for reading data after splitting.
  • the logical addresses of the multiple read data requests after the split are continuous. Multiple read data requests with consecutive logical addresses are called sequential flow.
  • the host 20 may serially send multiple read data requests in the sequence flow to the controller 11 for processing, or may send multiple read data requests in the sequence flow to the controller 11 in parallel for processing.
  • the sequence stream sent serially to the controller 11 for processing is called a serial sequence stream
  • the sequence stream sent in parallel to the controller 11 for processing is called a parallel sequence stream.
  • the host 20 sequentially sends each of the consecutive multiple read data requests to the controller 11, and sends the next read data request after the previous read data request is processed.
  • the host 20 sends at least two read data requests among the consecutive multiple read data requests to the controller 11, and the storage device can process the at least two read data requests in parallel.
  • the host 20 sends 9 data read requests to the controller 11.
  • the host 20 first sends the first read data request, and the response request of the first read data request is to be received Then send the second read data request, and send the third read data request after receiving the response request of the second read data request, and so on, until all the 9 read data requests are processed. If these 9 read data requests are processed in parallel, then the host 20 can simultaneously send the first read data request, the second read data request, and the third read data request to the controller 11, and the controller 11 processes these in parallel. Three read data requests, when the first read data request is processed, even if the second read data request or the third read data request has not been processed, the host 20 can send the fourth read data request to the controller 11 , And so on, until all 9 read data requests are processed. Interval sequence flow refers to a certain pattern between several read data requests at intervals.
  • the controller 11 provided in this embodiment at least includes a processor 101, an artificial intelligence (AI) chip 105, and a cache 102.
  • AI artificial intelligence
  • the processor 101 is a central processing unit (English: central processing unit, CPU). In the embodiment of the present invention, the processor 101 may be used to receive data from the host 20 and store the data in the cache 102 or the hard disk 22. In addition, the processor 101 is also configured to predict the data to be read according to the data prefetching model, and store it in the cache 102 in advance.
  • CPU central processing unit
  • the AI chip 105 is used to obtain data samples, and perform model training on the data samples to obtain a prefetched model. Then, the AI chip 102 sends the obtained prefetch model to the processor 101.
  • the shape of the AI chip 105 may be a chip or other physical components. For example, it may be a training chip used to construct a neural network model, or it may be an inference chip using a neural network model for inference.
  • the AI chip 21 and the processor 101 communicate through a high-speed interconnection network.
  • the high-speed interconnection network is used to provide the data communication function between the AI chip 21 and the processor 101
  • the high-speed interconnection network 22 can be PCIe, memory fabric, high-speed Ethernet, HCCS, infiniband (IB), fiber channel ( Fibre channel, FC), the form of the high-speed interconnection network 22 may be a bus, and the high-speed interconnection network 22 may also be referred to as a high-speed interconnection switch or a high-speed interconnection bus.
  • the controller 11 may include a high-speed interconnection bus, and the AI chip 21 and the processor 101 may be connected to the high-speed interconnection bus to access the high-speed interconnection network.
  • the AI chip 21 may include a high-speed interconnection network interface
  • the processor 101 may include a high-speed interconnection network interface
  • the AI chip 21 is connected to the high-speed interconnection bus through the high-speed interconnection network interface of the AI chip 21, and the processor 101
  • the high-speed interconnection network interface of the processor 101 is connected to the high-speed interconnection bus.
  • the high-speed interconnection network interface may be a serial bus interface.
  • the high-speed interconnection network interface may be any one of a PCIE interface, an HCCS interface, an Ethernet interface, an IB interface, and an FC interface. If the types of high-speed internet interfaces are different, the speed of transmitting service data between the AI chip 21 and the processor 101 may also be different.
  • the high-speed interconnection bus is only an example of the high-speed interconnection network 22.
  • the high-speed interconnection network 22 may not be a high-speed interconnection bus, but other buses with a memory pass-through function.
  • the specific type of the high-speed interconnection network 22 in this embodiment is not Make a limit.
  • the cache 102 is used to temporarily store data received from the host 20 or data read from the hard disk 22.
  • the controller 11 may temporarily store the data in the multiple write data requests in the cache 102.
  • the capacity of the cache 102 reaches a certain threshold, the data stored in the cache 102 is sent to the hard disk 22.
  • the hard disk 22 stores the data.
  • the cache 102 includes volatile memory, non-volatile memory, or a combination thereof.
  • the volatile memory is, for example, random-access memory (English: random-access memory, RAM).
  • Non-volatile memory such as floppy disks, hard disks, solid state disks (SSD), optical disks, and other machine readable and writable media that can store program codes.
  • the space of the cache 102 can be divided into multiple logical blocks (chunks), and each logical block has the same size.
  • the size of the logical block is 128KB as an example, and each logical block has a logical address (sometimes referred to as :address).
  • the controller 11 receives a read data request, the read data request includes the logical address of the data to be accessed, and the logical block corresponding to the read data request can be determined according to the logical address. If the data is stored in the determined logical block, it indicates that the read data request is hit. If the data is not saved in the determined logical block, it means that the read data request is missed, and the controller 11 needs to read data from the hard disk 22, write it into the logical block, and then return the data to the host 20. data.
  • the speed at which the cache 102 reads data is higher than the speed at which the hard disk 22 reads data. Therefore, if the data to be accessed by the read data request hits in the cache 102, it is unnecessary to read the data from the hard disk 22, thereby improving the efficiency of reading data.
  • a common practice is to read a piece of data from the hard disk 22 in advance (for example, the data 104 shown in FIG. 1) and write it into the cache 102. Then, when the host 20 sends a read data request to the controller 11 to read the data 104, since the data 104 has been read into the cache 102 in advance, the processor 101 can directly send the data 104 to the host 20.
  • the cache 102 is located inside the controller 11, and the hard disk 22 is located in a hard disk frame coupled with the controller 11.
  • Data prefetching means that the data in the hard disk 22 is read in advance to the cache 102 for the host to read.
  • the controller 11 includes two levels of memory.
  • the first level memory is called a CPU cache (cache memory), which is a temporary memory closest to the processor 101, and the second level memory is usually directly called a memory. Secondary memory is slower and has a larger capacity than primary memory.
  • the data in the first-level memory is a small part of the second-level memory, but this small part is about to be accessed by the CPU in a short time.
  • data prefetching may mean that the data in the hard disk 22 is read into the secondary memory in advance, or it may mean that the data in the secondary memory is read into the primary memory in advance.
  • this embodiment can also be applied to a hard disk enclosure.
  • the hard disk enclosure includes a processing chip, an AI chip, a memory, and a hard disk 22.
  • the internal memory of the hard disk enclosure is a temporary memory that is closest to the processing chip. Its data processing speed is faster than that of the hard disk 22 and its capacity is smaller than that of the hard disk.
  • data prefetching may mean that the data in the hard disk 22 is read in advance into the internal memory of the hard disk enclosure.
  • this embodiment can be applied to any application scenario that includes secondary storage, where the first-level storage refers to a memory with a smaller capacity and a faster data processing speed, such as the above-described cache 102 or the first-level memory or Secondary memory or memory in the hard disk box, etc.
  • the second-level storage refers to a memory with a larger capacity and a slower data processing speed than the first-level storage, such as the second-level memory or hard disk 22 described above.
  • the following description is still based on prefetching data from the hard disk 22 to the cache 102. It can be understood that in other application scenarios or architectures, the data prefetching method described in this embodiment is still applicable.
  • the prefetch algorithm also known as the prefetch model, is one of the key factors of the prefetch operation. If the prefetch model is not appropriate, it may lead to too much or too little prefetched data. Too much prefetched data will cause cache resources Waste, too little prefetched data will lead to inefficient prefetching. Therefore, the current approach is to input a large number of data samples, train these data samples to obtain a suitable prefetch model, and perform the data prefetch operation according to the prefetch model.
  • the processor 101 is completely dependent on training data samples, it will inevitably result in insufficient computing power of the processor 101.
  • the AI chip 105 is used to train the data samples, and the prefetched model is output for use by the processor 101.
  • Fig. 2 is a flowchart of a data prefetching method provided by an embodiment of the present application. This method can be applied to a storage device, and the storage device can be a device that integrates the controller 11 and the hard disk 03 shown in FIG. 1. As shown in Figure 2, the method includes the following steps:
  • Step 201 The processor 101 collects data samples.
  • the controller 11 receives a data write request sent by the host 20 or other external devices, and the data write request carries data and an address of the data.
  • the processor 101 temporarily stores the received write data request in the cache 102. When the capacity of the cache 102 reaches a certain threshold, the data samples stored in the cache 102 are sent to the hard disk 22 for persistent storage. Therefore, the processor 101 can extract data and some attribute information of the data, such as a timestamp, size, etc., as data samples from a large number of write data requests.
  • the AI chip 105 may also extract data and attribute information.
  • Step 202 The AI chip 105 obtains data samples from the cache 102 or the hard disk 22 for model training, and obtains a prefetched model.
  • These prefetch models include, but are not limited to, serial sequential stream models, parallel sequential stream models, or interval sequential stream models.
  • Step 203 The AI chip 105 sends the prefetch model to the processor 101, and the processor 101 performs inference according to the prefetch model to obtain the address of the data to be read.
  • the data to be read here refers to data that is predicted to be read. At this time, the storage device has not yet received a data read request for reading the data.
  • the controller 11 will receive one or more read data requests sent by the host 20, and each read data request carries the address of the data to be accessed, such as the logical block address (LBA) and length (length).
  • the controller 11 usually predicts which address data will be read next time based on the address of the data to be accessed.
  • the processor 101 can predict the data to be read based on the addresses carried in these read data requests and the prefetch model obtained through training by the AI chip 105. Specifically, the processor 101 may take the address of the received data read request as input, and output the address of the data to be read according to the prefetch model.
  • Step 204 The processor 101 reads the data to be read from the hard disk into the cache 102.
  • the processor 101 reads the data to be read from the hard disk according to the address obtained in step 203 and stores it in the cache 102.
  • the data will be read by a subsequent read data request with a high probability.
  • the AI chip 105 after the AI chip 105 obtains the prefetch model, it can perform inference by itself, and send the inference result, that is, the address of the data to be read, to the processor 101, and the processor 101 will perform the inference according to the The address prefetches the data to be read.
  • the address carried in the received data read request is stored in the cache 102 or the hard disk 22, and the AI chip 105 can obtain the address from the cache 102 or the hard disk 22, and use the address as input to output the desired address according to the prefetch model. The address of the data to be read.
  • the AI chip 105 sends the output address to the processor 101, and the processor 101 reads the data to be read from the hard disk to the cache 102 according to the address.
  • the AI chip 105 may also periodically upgrade the prefetch model, for example, reacquire data samples for training. Or, instead of reacquiring data samples, only modify or delete the existing prefetch model.
  • the prefetching model is exemplified below, and the granularity of data prefetching is a logical block.
  • the received read data request is used to read the logic block A, and the AI chip 105 calculates the association degree A between each logic block and the logic block through the conditional probability.
  • the AI chip 105 can calculate the correlation between the logic block B and the first logic block (referred to as logic block A in the following formula) through the following formula.
  • A) refers to the degree of association between the logic block A and the logic block B, that is, the probability of reading the data in the logic block B after the data of the logic block A is read.
  • f 1 refers to the number of requests to read logic block B received within a preset period of time t after the request to read logic block A.
  • f 2 refers to the total number of requests received within the preset time period t after the request to read the logic block A.
  • f 1 and f 2 may be obtained by statistics based on multiple historical read data requests received and processed by the AI chip 105.
  • the storage device may obtain historical information of multiple historical read data requests received within a period of time T before the current moment, and the historical information may include the logical block identifier of the logical block where the data read by the corresponding request is located. Information such as the time of receipt of each request.
  • the AI chip 105 may perform statistics on the historical information of multiple historical read data requests.
  • the AI chip 105 can search for requests to read logic block B from multiple requests according to historical information, and count the searched requests The last request is the number of requests for reading logical block A, and the counted number is taken as f 1 .
  • the AI chip 105 can count the number of historical read data requests for reading logic block A among multiple historical read data requests, and use this number as f 2 .
  • the storage device receives 20 historical data read requests within a period of time T before the current time.
  • the logic blocks corresponding to these 20 historical read data requests are: A ⁇ B ⁇ C ⁇ A ⁇ E ⁇ F ⁇ C ⁇ D ⁇ S ⁇ G ⁇ B ⁇ A ⁇ E ⁇ F ⁇ A ⁇ B ⁇ C ⁇ F ⁇ G ⁇ S.
  • the AI chip 105 can count the number of requests to read logical block A in the 20 requests, and the next request is to read logical block B, and there are two in total.
  • f 1 2.
  • f 2 4. From this, it can be seen that P(B
  • A) 2/4.
  • the AI chip 105 may also determine that the number of requests received in each reading logic block A is not equal.
  • the number of read data requests for reading logic block B is received within the preset time period t after the read data request of A, and the sum of these numbers is calculated, and the sum is taken as f 1 .
  • a total of 3 requests to read logic block A and 5 requests to read logic block B are received within a period of time T before the current time.
  • a total of 2 requests are received, of which there is a request to read logic block B.
  • a total of 1 data read request is received, and there is no request to read logic block B.
  • a total of 3 data read requests are received, of which there is a request to read logic block B.
  • the storage device may also calculate the degree of association between each logic block and the first logic block through other set algorithms or models.
  • algorithms or models for example, any one of conditional probability, Bayesian algorithm, convolutional neural network algorithm, long short-term memory network algorithm, neural network algorithm, recurrent neural network algorithm, and probability graph model algorithm.
  • the AI chip 105 can also perform processing based on the historical information of multiple historical read data requests received and processed within a period of time T. The above algorithm is trained so that the correlation between two logic blocks can be calculated according to the trained model.
  • the storage device may use the logical block whose degree of association exceeds the set association threshold as the logical block to be read.
  • the number of logic blocks whose relevance exceeds the set relevance threshold may be one or multiple. That is, in the embodiment of the present application, the selected logic block A has an associated relationship with the logic block A.
  • the read logic block can be one or multiple.
  • the set association threshold may be set by the user during initialization. Moreover, in the embodiment of the present application, at regular intervals, the AI chip 105 or the processor 101 may also adjust the set association threshold according to its own performance.
  • the storage device may count the hit rate of data prefetched to the cache within a period of time t 1 , and count the waste rate of the data prefetched to the cache within a period of time t 1. Calculate the ratio of the hit rate to the waste rate, and get the first ratio.
  • the set ratio range can be stored in the storage device. The first ratio is compared with the ratio range, and if the first ratio is within the ratio range, it is considered that the hit rate and the waste rate meet the requirements. At this time, the current association threshold can be kept unchanged. If the first ratio is less than the lower limit of the ratio range, it means that the current hit rate is low and the waste rate is high. At this time, the storage device can increase the current association threshold.
  • the storage device may adjust the current association threshold to a lower value. After adjusting the association threshold, the storage device can continue to count the hit rate and waste rate within a period of time t 1 from the completion of the adjustment, and continue to determine whether to continue to adjust the adjusted association threshold through the above method, so that the association threshold Try to stay within this ratio as much as possible to maintain the balance of the system's rate of return.
  • FIG. 3 is a schematic diagram of the structure of the device.
  • the device is located in the aforementioned storage device, the storage device includes a cache and a hard disk 22, and the device includes a first processing module 301 and a second processing module 301.
  • the first processing module 301 is configured to train data samples to generate a prefetch model; and send the generated prefetch model to the second processing module.
  • the second processing module 302 is configured to predict the data to be read according to the prefetch model, the data to be read is stored in the hard disk; and the data to be read is read from the hard disk to the hard disk. In the cache.
  • the data prefetching device 300 may be located in a terminal device such as a mobile phone.
  • the terminal device includes a first-level storage and a second-level storage.
  • the processing speed of the first-level storage is faster than that of the second-level storage. But the capacity is smaller than the second level storage.
  • the data prefetching device 300 is provided with the first processing module 301 and the second processing module 302 shown in FIG. Thus, the calculation burden of the second processing module 302 is reduced.
  • the first processing module 301 may be the AI chip 105 shown in FIG. 1 and used to execute the steps executed by the AI chip 105 shown in FIG. 2.
  • the second processing module 302 may be the processor 101 shown in FIG. 1 and configured to execute the steps executed by the processor 101 shown in FIG. 2.
  • the first processing module 301 may also be other chips or devices with computing capabilities other than the central processing unit, such as an accelerator card or a coprocessor or a graphics processing unit (GPU) or a neural network processor. (Neural-network Processing Unit, NPU) etc.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example: floppy disk, hard disk, tape), optical medium (for example: Digital Versatile Disc (DVD)), or semiconductor medium (for example: Solid State Disk (SSD) )Wait.
  • the program can be stored in a computer-readable storage medium.
  • the storage medium mentioned can be a read-only memory, a magnetic disk or an optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

一种数据预取方法、装置及存储设备。采集数据样本(201),AI芯片对数据样本进行训练,得到预取模型(202),AI芯片将预取模型发送给处理器(203),处理器将待读取数据读至缓存(204)。该方案减轻了处理器的计算负担。

Description

数据预取方法、装置以及存储设备 技术领域
本申请涉及存储技术领域,特别涉及一种数据预取方法、装置以及存储设备。
背景技术
在存储设备中,缓存的访问速度比硬盘的访问速度快,因此,在应用中,当接收到一个读数据请求时,可以根据该读数据请求对下一个读数据请求所要读取的数据进行预测,并提前将预测的数据从硬盘中读取到缓存中来。这样,当存储设备接收到读数据请求时,可以实现缓存中数据的读命中,从而大大提高读数据请求的处理速度。
然而,处理器的处理能力是有限的,而预取算法往往涉及较为复杂,处理器的算力会明显不足。总言之,算力已经成为预取算法发展的瓶颈。
发明内容
本申请提供了一种数据预取方法、装置及存储设备,可以节省处理器的算力。所述技术方案如下:
第一方面,提供了一种数据预取方法,应用于存储设备,所述存储设备包括处理器、人工智能AI芯片、第一级存储和第二级存储,所述方法包括:所述AI芯片生成预取模型;所述AI将所述生成的预取模型发送给所述处理器;所述处理器根据所述预取模型预测待读取数据,所述待读取数据存储在所述第二级存储中;所述处理器将所述待读取数据从所述第二级存储读取至所述第一级存储中。
第一方面可以应用在任何包含二级存储的应用场景中,其中,第一级存储是指容量较小,处理数据速度较快的存储器,例如缓存102或者一级内存或者二级内存或者硬盘框中的内存等等。第二级存储是指相对于第一级存储容量较大,处理数据速度较慢的存储器,例如二级内存或硬盘22等等。
示例一,第一级存储是指控制器11内部的缓存102,第二级存储是指与控制器11耦合的硬盘框中的硬盘22,数据预取是指硬盘22中的数据被提前读取到缓存102中供主机读取。
示例二,第一级存储是一级内存,又被称为CPU缓存(cache memory),是距离处理器101最近的临时存储器,第二级存储是指二级内存,通常被直接称为内存。数据预取是指将二级内存中的数据提前读取至一级内存中。
示例三,第一级存储是指上面描述的二级内存,第二级存储是指硬盘22。数据预取是指将硬盘22中的数据提前读取至二级内存中。
示例四,第一级存储是指硬盘框中的内存,第二级存储是指硬盘框中的硬盘22。数据预取是指将硬盘22中的数据提前读取至硬盘框的内存中。
在本申请实施例中,由AI芯片对数据样本进行训练生成预取模型,处理器只需要执行数据预取的操作,大大减轻了处理器的计算负担。
可选的,所述处理器根据所述预取模型预测待读取数据包括:根据所述预取模型以及已接收的读数据请求进行推理,获得所述预测出的待读取数据的地址。
可选的,所述AI芯片周期性地对所述预取模型进行升级。
可选的,所述预取模型包括但不限于串行顺序流模型、并行顺序流模型、间隔顺序流模型或关联预取模型。
第二方面提供了一种数据预取方法,该方法应用于存储设备,所述存储设备包括处理器、AI芯片、第一级存储和第二级存储。所述方法包括,所述AI芯片生成预取模型;所述AI芯片根据所述预取模型和已接收的读数据请求进行推理,获得待读取数据的地址;所述AI芯片将所述待读取数据的地址发送给所述处理器;所述处理器根据所述待读取数据的地址将所述待读取数据从所述第二级存储读取至所述第一级存储中。
可选的,所述方法还包括:所述AI芯片周期性地对所述预取模型进行升级。
可选的,所述预取模型包括但不限于串行顺序流模型、并行顺序流模型、间隔顺序流模型或关联预取模型。
第三方面提供了一种存储设备,所述存储设备包括处理器、人工智能AI芯片、缓存和硬盘,所述存储设备用于执行第一方面或第二方面提供的方法。
第四方面提供了一种数据预取装置,所述装置包括第一处理模块和第二处理模块,所述数据预取装置用于执行第一方面或第二方面提供的方法。
第五方面提供了一种终端设备,所述终端设备包括第一级存储、第二级存储、第一处理模块和第二处理模块;所述第一处理模块,用于对数据样本进行训练以生成预取模型;将所述生成的预取模型发送给所述第二处理模块;所述第二处理模块,用于根据所述预取模型预测待读取数据,所述待读取数据存储在第二级存储中;将所述待读取数据从所述第二级存储读取至第一级存储中。
附图说明
图1是本申请实施例提供的数据预取方法所涉及的存储设备的示意图;
图2是本申请实施例提供的一种数据预取方法的流程图;
图3是本申请实施例示出的一种数据预取装置的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
下面将结合附图,对本发明实施例中的技术方案进行清楚、完整地描述。
图1是本发明实施例提供的系统架构图,本实施例提供的存储系统包括主机20、控制器11和多个硬盘22。主机20和控制器11之间通过网络文件系统(英文全称:Network File System,英文简称:NFS)/通用网络文件系统(英文全称:Common Internet File System,英文简称:CIFS)协议或者光纤通道(英文全称:Fiber Channel,英文简称:FC)协议进行通信。具体的,主机20可以向控制器11发送写数据请求,控制器11接收所述写数据请求之后将所述写数据请求携带的数据写入硬盘22中。另外,主机20还可以向控制器11发送读数据请求,控制器11接收所述读数据请求之后,根据所述读数据请求中的地址查找其缓存102中是否保存有待读取的数据,如果有则直接将所述待读取的数据发送给主机20,如果没有则从硬盘22中获得该数据并发送给主机20。在实际应用中,控制器11和硬盘22可以集成在一 个存储设备中,也可以位于相互独立的两个设备中,本发明实施例不对控制器11和硬盘22的位置关系做任何限定。
以读数据请求为例,当用户在主机20上或者与主机20连接的客户端上提起一个较大的读数据请求时,主机20往往将该读数据请求拆分成多个请求发送给控制器11处理。将读数据请求拆分成多个请求的操作可以由主机20中的主机总线适配器(Host Bus Adapter,HBA)执行,也可以由控制器11中的HBA执行。拆分后的每个请求的大小可以相同也可以不相同,本实施例不对拆分后的读数据请求的大小进行限定。另外,所述拆分后的多个读数据请求的逻辑地址是连续的。这种逻辑地址连续的多个读数据请求被称为顺序流。主机20可以将所述顺序流中的多个读数据请求串行地发送给控制器11处理,也可以将所述顺序流中的多个读数据请求并行地发送给控制器11处理。其中,串行地发送给控制器11处理的顺序流叫做串行顺序流,并行地发送给控制器11处理的顺序流叫做并行顺序流。对于串行顺序流,主机20依次将所述连续的多个读数据请求中的每个读数据请求发送给控制器11,待上一个读数据请求处理完毕之后再发送下一个读数据请求。对于并行顺序流,主机20向控制器11发送所述连续的多个读数据请求中的至少两个读数据请求,所述存储设备可并行处理所述至少两个读数据请求。假设主机20向控制器11发送9个读数据请求,如果这9个读数据请求是串行处理的,那么主机20首先发送第一个读数据请求,待接收第一个读数据请求的响应请求之后再发送第二个读数据请求,待接收第二个读数据请求的响应请求之后再发送第三个读数据请求,依此类推,直至这9个读数据请求全部处理完毕。如果这9个读数据请求是并行处理的,那么主机20可以同时将第一个读数据请求、第二个读数据请求以及第三个读数据请求发送给控制器11,控制器11并行处理这三个读数据请求,当第一个读数据请求处理完毕时,即使第二个读数据请求或者第三个读数据请求尚未处理完毕,主机20都可以向控制器11发送第四个读数据请求,依此类推,直至这9个读数据请求全部处理完毕。间隔顺序流则是指间隔的几个读数据请求之间呈现一定的规律。
如图1所示,本实施例提供的控制器11至少包括处理器101、人工智能(Artificial Intelligence,AI)芯片105和缓存102。
处理器101是一个中央处理器(英文:central processing unit,CPU)。在本发明实施例中,处理器101可以用于接收来自主机20的数据,并将数据存储到缓存102或硬盘22中。另外,处理器101还用于根据数据预取模型预测待读取的数据,将其提前存放在缓存102中。
AI芯片105,用于获取数据样本,对数据样本进行模型训练以获得预取模型。然后,AI芯片102将获得的预取模型发送给处理器101。该AI芯片105的形态可以是一个芯片或者其他物理组件,例如可以是用于构建神经网络模型的训练芯片,也可以是利用神经网络模型进行推理的推理芯片。
AI芯片21与处理器101之间通过高速互联网络通信。其中,高速互联网络用于提供AI芯片21与处理器101之间的数据通信功能,高速互联网络22可以是PCIe、memory fabric、高速以太网、HCCS、无限带宽(infiniband,IB)、光纤通道(fibre channel,FC)中的任意一种,高速互联网络22的形态可以是总线,那么,高速互联网络22也可以称为高速互联交换机或高速互联总线。示例性地,控制器11可以包括高速互联总线,AI芯片21和处理器101可以与该高速互联总线连接,从而接入高速互联网络。在一些可能的实施例中,AI芯片21可以包括高速互联网络接口,处理器101可以包括高速互联网络接口,AI芯片21通过AI芯 片21的高速互联网络接口与高速互联总线连接,处理器101通过处理器101的高速互联网络接口与高速互联总线连接。其中,高速互联网络接口可以为串行总线接口,具体地,高速互联网络接口可以为PCIE接口、HCCS接口、以太网接口、IB接口、FC接口中的任意一种。如果高速互联网络接口的类型不同,AI芯片21与处理器101之间传输业务数据的速度可以也不同。应理解,高速互联总线仅是高速互联网络22的举例说明,高速互联网络22也可以不是高速互联总线,而是具有内存直通功能的其他总线,本实施例对高速互联网络22的具体类型并不做限定。
缓存102用于临时存储从主机20接收的数据或从硬盘22读取的数据。控制器11接收主机发送的多个写数据请求时,可以将所述多个写数据请求中的数据暂时保存在缓存102中。当缓存102的容量达到一定阈值时,将缓存102存储的数据发送给硬盘22。硬盘22存储所述数据。缓存102包括易失性存储器,非易失性存储器或其组合。易失性存储器例如为随机访问存储器(英文:random-access memory,RAM)。非易失性存储器例如软盘、硬盘、固态硬盘(solid state disk,SSD)、光盘等各种可以存储程序代码的机器可读写介质。缓存102的空间可以划分为多个逻辑块(chunk),每个逻辑块的大小相同,本实施例以逻辑块的大小是128KB为例,每个逻辑块具有一段逻辑地址(有的地方简称为:地址)。当控制器11接收一个读数据请求时,所述读数据请求包括待访问数据的逻辑地址,根据所述逻辑地址可以确定所述读数据请求对应的逻辑块。所述确定出的逻辑块中保存有该数据,那么说明所述读数据请求命中。所述确定出的逻辑块中没有保存该数据,则说明所述读数据请求未命中,控制器11需要从硬盘22中读取数据,并写入所述逻辑块,再向主机20返回所述数据。
可以理解的是,缓存102读取数据的速度高于硬盘22读取数据的速度。因此,如果该读数据请求待访问的数据在缓存102中命中,那么就不必再从硬盘中22中读取该数据,从而提高读取数据的效率。为了提高缓存的数据命中率,通常的做法是从硬盘22中预先读取一段数据(例如图1所示的数据104),写入缓存102。那么,当主机20向控制器11发送读数据请求要求读取数据104时,由于数据104已经被提前读取到缓存102中了,因此处理器101可以直接向主机20发送数据104。
需要说明的是,在上面的描述中仅是本实施例的其中一种实施方式,在这种实施方式中,缓存102位于控制器11内部,而硬盘22位于与控制器11耦合的硬盘框中的持久化存储器,数据预取是指硬盘22中的数据被提前读取到缓存102中供主机读取。在其他实施方式中,控制器11包含两级内存,一级内存被称为CPU缓存(cache memory),是距离处理器101最近的临时存储器,二级内存则通常被直接称为内存。二级内存比一级内存速度更慢,容量更大。一级内存中的数据是二级内存中一小部分,但这一小部分是短时间内CPU即将访问的,当CPU调用大量数据时,就可避开二级内存直接从一级内存中调用,从而加快读取速度。在这种实施方式中,数据预取既可以是指硬盘22中的数据被提前读取到二级内存中,也可以指二级内存中的数据被提前读取到一级内存中。
另外,本实施例还可以应用在硬盘框中,硬盘框内部包括处理芯片、AI芯片,内存和硬盘22。硬盘框内部的内存距离所述处理芯片最近的临时存储器,其处理数据的速度比硬盘22快,容量比硬盘小。在这种实施方式中,数据预取可以指硬盘22中的数据被提前读取到硬盘框内部的内存中。
综上,本实施例可以应用在任何包含二级存储的应用场景中,其中,第一级存储是指容 量较小,处理数据速度较快的存储器,例如上面描述的缓存102或者一级内存或者二级内存或者硬盘框中的内存等等。第二级存储是指相对于第一级存储容量较大,处理数据速度较慢的存储器,例如上面描述的二级内存或硬盘22等等。为了方便描述,下面仍然以将数据从硬盘22预取至缓存102中予以说明。可以理解的是,在其他应用场景或者架构中,本实施例所描述的数据预取方法仍然适用。
预取算法,又称为预取模型,是预取操作的关键因素之一,如果预取模型不恰当可能导致预取的数据太多或太少,预取的数据太多会造成缓存资源的浪费,预取的数据太少又会导致预取效率不高。因此,目前的做法是输入大量数据样本,对这些数据样本进行训练得到合适的预取模型,根据该预取模型来执行数据预取操作。
如果完全依靠处理器101训练数据样本,势必会造成处理器101的算力不足,本实施例利用AI芯片105对数据样本进行训练,输出预取模型后供处理器101使用。
接下来对本申请实施例提供的数据预取方法进行介绍。
图2是本申请实施例提供的一种数据预取方法的流程图。该方法可以应用于存储设备中,该存储设备可以为集成有图1所示的控制器11和硬盘03的设备。如图2所示,该方法包括以下步骤:
步骤201:处理器101采集数据样本。控制器11接收主机20或者其他外部设备发送的写数据请求,写数据请求中携带数据以及该数据的地址。处理器101将接收的写数据请求临时存储在缓存102中,待缓存102的容量达到一定阈值时,缓存102存储的数据样本会被发送到硬盘22中进行持久化存储。因此,处理器101可以从大量写数据请求中提取数据以及该数据的一些属性信息,例如时间戳,大小等作为数据样本。在其他实施方式中,也可以由AI芯片105来提取数据以及属性信息。
步骤202:AI芯片105从缓存102或者硬盘22中获取数据样本进行模型训练,得到预取模型。这些预取模型包括但不限于串行顺序流模型、并行顺序流模型或间隔顺序流模型。
步骤203:AI芯片105将预取模型发送给处理器101,处理器101根据所述预取模型进行推理获得待读取数据的地址。这里的待读取数据是指预测出的将要被读取的数据,此时存储设备尚且没有收到用于读取该数据的读数据请求。
根据前面的介绍,控制器11会接收主机20发送的一个或多个读数据请求,每个读数据请求携带待访问的数据的地址,例如逻辑块起始地址(Logical Block Address,LBA)和长度(length)。控制器11通常根据待访问的数据的地址来预测下一次将要读取哪些地址的数据。在本实施例中,处理器101可以根据这些读数据请求所携带的地址以及AI芯片105训练获得的预取模型来预测将要被读取的数据。具体的,处理器101可以以已经接收的读数据请求的地址为输入,根据所述预取模型输出将要被读取的数据的地址。
步骤204:处理器101将所述待读取数据从硬盘读取至缓存102中。
处理器101根据步骤203中获得的地址从硬盘中读取所述待读取数据,存储在缓存102中,该数据大概率地将被后来的读数据请求读取。
在另一种实施方式中,AI芯片105获取所述预取模型之后,可以自己进行推理,将推理结果,也就是待读取数据的地址,发送给处理器101,由处理器101根据所述地址预取所述待读取数据。具体的,已经接收的读数据请求所携带的地址存储在缓存102或者硬盘22中,AI芯片105可以从缓存102或者硬盘22中获取该地址,以该地址为输入根据所述预取模型 输出将要被读取的数据的地址。然后,AI芯片105将所述输出的地址发送给处理器101,处理器101根据该地址将所述待读取数据从硬盘读取至缓存102中。
此外,AI芯片105还可以周期性地对预取模型进行升级,例如重新获取数据样本进行训练。或者,不再重新获取数据样本,仅对已有的预取模型进行修改或删除。
下面对预取模型进行示例性说明,数据预取的粒度为逻辑块。在本申请实施例中,假设已经接收的读数据请求用于读取逻辑块A,AI芯片105通过条件概率来计算每个逻辑块与逻辑块之间的关联度A。其中,以剩余逻辑块中的逻辑块B为例,AI芯片105可以通过下述公式来计算该逻辑块B与第一逻辑块(下述公式中称为逻辑块A)之间的关联度。
Figure PCTCN2021082382-appb-000001
其中,P(B|A)是指逻辑块A和逻辑块B之间的关联度,也即,在读取逻辑块A的数据之后,读取逻辑块B中的数据的概率。f 1是指在读取逻辑块A的请求之后的预设时长t内接收到的读取逻辑块B的请求的数量。f 2是指在读取逻辑块A的请求之后的预设时长t内接收到的请求的总数量。
需要说明的是,f 1和f 2可以是根据该AI芯片105接收并处理的多个历史读数据请求统计得到。示例性地,存储设备可以获取当前时刻之前的一段时长T内接收到的多个历史读数据请求的历史信息,该历史信息可以包括相应请求所读取的数据所在的逻辑块的逻辑块标识,每个请求的接收时间等信息。在获取到历史信息之后,AI芯片105可以对多个历史读数据请求的历史信息进行统计。例如,当预设时长t较短,该预设时长t内仅包含一个请求时,AI芯片105可以根据历史信息从多个请求中查找读取逻辑块B的请求,统计查找到的这些请求中上一个请求为读取逻辑块A的请求的数量,将统计的数量作为f 1。同理,AI芯片105可以统计多个历史读数据请求中读取逻辑块A的历史读数据请求的数量,将该数量作为f 2
例如,假设存储设备在当前时刻之前的一段时长T内接收到20个历史读数据请求。这20个历史读数据请求对应读取的逻辑块依次为:A→B→C→A→E→F→C→D→S→G→B→A→E→F→A→B→C→F→G→S。在此基础上,AI芯片105可以统计20个请求中读取逻辑块A的请求的下一个请求为读取逻辑块B的请求的数量,一共有2个,此时,f 1=2。统计20个请求中读取逻辑块A的请求的数量,一共有4个,此时,f 2=4。由此可知,P(B|A)=2/4。
可选地,在本申请实施例中,当每个读取逻辑块A的请求之后的预设时长内接收到的请求的数量不相等时,AI芯片105也可以确定在每个读取逻辑块A的读数据请求之后的预设时长t内接收到读取逻辑块B的读数据请求的数量,并计算这些数量的总和,将该总和作为f 1。统计每个读取逻辑块A的读数据请求之后的预设时长t内接收到的所有请求的数量的总和,将该总和作为f 2
例如,假设在当前时刻之前的一段时长T内一共接收到3个读取逻辑块A的请求,5个读取逻辑块B的请求。从接收到第一个读取逻辑块A的请求起的第一个预设时长内,一共接收到2个请求,其中,有一个读取逻辑块B的请求。从接收到第二个读取逻辑块A的请求起的第二个预设时长内,一共接收到1个读数据请求,其中不存在读取逻辑块B的请求。从接收到第三个读取逻辑块A的请求起的第三个预设时长内,一共接收到3个读数据请求,其中存在一个读取逻辑块B的请求。由此可知,在三个读取逻辑块A的请求之后的三个预设时长内接收到的读取逻辑块B的请求的总数量为2,也即,f 1=2。而在这三个预设时长内接收到的所有请求的总数量为6,也即,f 2=6,因此,P(B|A)=2/6。
上述仅是本申请实施例给出的一种计算每个逻辑块与逻辑块A之间的关联度的方法。可选地,存储设备也可以通过其他设定的算法或者模型来计算每个逻辑块与第一逻辑块之间的关联度。其中,例如条件概率、贝叶斯算法、卷积神经网络算法、长短期记忆网络算法、神经网络算法、循环神经网络算法、概率图模型算法中的任意一种。需要说明的是,在采用上述任一种算法来计算两个逻辑块之间的关联度之前,AI芯片105同样可以根据一段时长T内接收并处理的多个历史读数据请求的历史信息来对上述算法进行训练,以便后续可以根据训练好的模型来计算两个逻辑块之间的关联度。
在计算出每个逻辑块与逻辑块A之间的关联度之后,存储设备可以将关联度超过设定的关联阈值的逻辑块作为待读取的逻辑块。需要说明的是,关联度超过设定的关联阈值的逻辑块的数量可能为一个,也可能为多个,也即,在本申请实施例中,选择出的与逻辑块A有关联关系的待读取的逻辑块可以为一个,也可以为多个。
需要说明的是,设定的关联阈值可以是由用户在初始化时设置的。并且,在本申请实施例中,每隔一段时间,AI芯片105或者处理器101还可以根据自身的性能来对该设定的关联阈值进行调整。
示例性地,存储设备可以统计一段时间t 1内预取至缓存的数据的命中率,并统计该段时间t 1内预取至缓存的数据的浪费率。计算命中率与浪费率的比值,得到第一比值。存储设备中可以存储有设定的比值范围。将第一比值与该比值范围进行比较,如果该第一比值位于该比值范围内,则认为命中率和浪费率满足要求,此时,可以保持当前的关联阈值不变。如果第一比值小于该比值范围的下限值,则说明当前命中率较低,浪费率较高。此时,该存储设备可以将当前的关联阈值调大。如果第一比值大于该比值范围的上限值,则说明当前命中率较高,此时,该存储设备可以将当前的关联阈值调小。在将关联阈值调整之后,存储设备可以继续统计从调整完成起一段时间t 1内的命中率和浪费率,并继续通过上述方法来对决定是否对调整后的关联阈值继续调整,以使得关联阈值尽可能的处于该比值范围内,以此来维持系统收益率的平衡。
接下来对本申请实施例提供的数据预取装置300进行介绍,图3是该装置的结构示意图。所述装置位于上述存储设备中,所述存储设备包括缓存和硬盘22,所述装置包括第一处理模块301和第二处理模块301。所述第一处理模块301,用于对数据样本进行训练以生成预取模型;将所述生成的预取模型发送给所述第二处理模块。所述第二处理模块302,用于根据所述预取模型预测待读取数据,所述待读取数据存储在所述硬盘中;将所述待读取数据从所述硬盘读取至所述缓存中。
此外,在其他实施方式中,数据预取装置300可以位于手机等终端设备中,所述终端设备包括第一级存储和第二级存储,第一级存储的处理速度比第二级存储快,但容量小于所述第二级存储。在终端设备中,同样需要将第二级存储中的数据预取到第一级存储中以提高缓存命中率。因此,数据预取装置300设置有图3所示的第一处理模块301和第二处理模块302,由第一处理模块301生成预取模型,由第二处理模块302执行数据预取的操作,从而减轻了第二处理模块302的计算负担。
在实际应用中,第一处理模块301可以是图1所示的AI芯片105,用于执行图2所示的由AI芯片105执行的步骤。第二处理模块302可以是图1所示的处理器101,用于执行图2所示的由处理器101执行的步骤。
另外,第一处理模块301也可以是其他除中央处理器之外的,具有计算能力的芯片或装 置,例如加速卡或协处理器或图形处理器(Graphics Processing Unit,GPU)或神经网络处理器(Neural-network Processing Unit,NPU)等。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意结合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如:同轴电缆、光纤、数据用户线(Digital Subscriber Line,DSL))或无线(例如:红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如:软盘、硬盘、磁带)、光介质(例如:数字通用光盘(Digital Versatile Disc,DVD))、或者半导体介质(例如:固态硬盘(Solid State Disk,SSD))等。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述为本申请提供的实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (19)

  1. 一种数据预取方法,其特征在于,该方法应用于存储设备,所述存储设备包括处理器、人工智能AI芯片、第一级存储和第二级存储,所述方法包括:
    所述AI芯片生成预取模型;
    所述AI芯片将所述生成的预取模型发送给所述处理器;
    所述处理器根据所述预取模型预测待读取数据,所述待读取数据存储在所述第二级存储中;
    所述处理器将所述待读取数据从所述第二级存储读取至所述第一级存储中,所述第一级存储的处理速度快于所述第二级存储,并且所述第一级存储的容量小于所述第二级存储。
  2. 根据权1所述的方法,其特征在于,所述处理器根据所述预取模型预测待读取数据包括:根据所述预取模型以及已接收的读数据请求进行推理,获得所述预测出的待读取数据的地址。
  3. 根据权1所述的方法,其特征在于,所述方法还包括:所述AI芯片周期性地对所述预取模型进行升级。
  4. 根据权1所述的方法,其特征在于,所述预取模型包括但不限于串行顺序流模型、并行顺序流模型、间隔顺序流模型或关联预取模型。
  5. 一种数据预取方法,其特征在于,该方法应用于存储设备,所述存储设备包括处理器、人工智能AI芯片、第一级存储和第二级存储,所述方法包括:
    所述AI芯片生成预取模型;
    所述AI芯片根据所述预取模型和已接收的读数据请求进行推理,获得待读取数据的地址;
    所述AI芯片将所述待读取数据的地址发送给所述处理器;
    所述处理器根据所述待读取数据的地址将所述待读取数据从所述第二级存储读取至所述第一级存储中,所述第一级存储的处理速度快于所述第二级存储,并且所述第一级存储的容量小于所述第二级存储。
  6. 根据权5所述的方法,其特征在于,所述方法还包括:所述AI芯片周期性地对所述预取模型进行升级。
  7. 根据权5所述的方法,其特征在于,所述预取模型包括但不限于串行顺序流模型、并行顺序流模型、间隔顺序流模型或关联预取模型。
  8. 一种存储设备,其特征在于,所述存储设备包括处理器、人工智能AI芯片、第一级存储和第二级存储;
    所述AI芯片,用于生成预取模型;将所述生成的预取模型发送给所述处理器;
    所述处理器,用于根据所述预取模型预测待读取数据,所述待读取数据存储在所述第一级存储中;将所述待读取数据从所述第一级存储读取至所述第二级存储中,所述第一级存储 的处理速度快于所述第二级存储,并且所述第一级存储的容量小于所述第二级存储。
  9. 根据权8所述的存储设备,其特征在于,
    所述处理器具体用于根据所述预取模型以及已接收的读数据请求进行推理,获得所述预测出的待读取数据的地址。
  10. 根据权8所述的存储设备,其特征在于,
    所述AI芯片,还用于周期性地对所述预取模型进行升级。
  11. 根据权8所述的存储设备,其特征在于,所述预取模型包括但不限于串行顺序流模型、并行顺序流模型、间隔顺序流模型或关联预取模型。
  12. 一种存储设备,其特征在于,所述存储设备包括处理器、人工智能AI芯片、第一级存储和第二级存储;
    所述AI芯片用于生成预取模型;根据所述预取模型和已接收的读数据请求进行推理,获得待读取数据的地址;将所述待读取数据的地址发送给所述处理器;
    所述处理器用于根据所述待读取数据的地址将所述待读取数据从所述第二级存储读取至所述第一级存储中,所述第一级存储的处理速度快于所述第二级存储,并且所述第一级存储的容量小于所述第二级存储。
  13. 根据权12所述的存储设备,其特征在于,所述AI芯片还用于周期性地对所述预取模型进行升级。
  14. 根据权12所述的存储设备,其特征在于,所述预取模型包括但不限于串行顺序流模型、并行顺序流模型、间隔顺序流模型或关联预取模型。
  15. 一种数据预取装置,其特征在于,所述装置包括第一处理模块和第二处理模块;
    所述第一处理模块,用于生成预取模型;将所述生成的预取模型发送给所述第二处理模块;
    所述第二处理模块,用于根据所述预取模型预测待读取数据,所述待读取数据存储在第二级存储中;将所述待读取数据从所述第二级存储读取至第一级存储中,所述第一级存储的处理速度快于所述第二级存储,并且所述第一级存储的容量小于所述第二级存储。
  16. 根据权15所述的装置,其特征在于,所述第二处理模块具体用于根据所述预取模型以及已接收的读数据请求进行推理,获得所述预测出的待读取数据的地址。
  17. 根据权15所述的装置,其特征在于,所述第一处理模块,还用于周期性地对所述预取模型进行升级。
  18. 一种数据预取装置,其特征在于,所述装置包括第一处理模块和第二处理模块;
    所述第一处理模块用于生成预取模型;根据所述预取模型和已接收的读数据请求进行推理,获得待读取数据的地址;将所述待读取数据的地址发送给所述第二处理模块;
    所述第二处理模块用于根据所述待读取数据的地址将所述待读取数据从第二级存储读取至第一级存储中,所述第一级存储的处理速度快于所述第二级存储,并且所述第一级存储的容量小于所述第二级存储。
  19. 根据权18所述的装置,其特征在于,所述第一处理模块还用于周期性地对所述预取模型进行升级。
PCT/CN2021/082382 2020-03-23 2021-03-23 数据预取方法、装置以及存储设备 WO2021190501A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21774365.7A EP4099235A4 (en) 2020-03-23 2021-03-23 METHOD AND DEVICE FOR PREFERRING DATA AND STORAGE DEVICE
US17/951,424 US20230009375A1 (en) 2020-03-23 2022-09-23 Data prefetching method and apparatus, and storage device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010209712.X 2020-03-23
CN202010209712.XA CN113435601A (zh) 2020-03-23 2020-03-23 数据预取方法、装置以及存储设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/951,424 Continuation US20230009375A1 (en) 2020-03-23 2022-09-23 Data prefetching method and apparatus, and storage device

Publications (1)

Publication Number Publication Date
WO2021190501A1 true WO2021190501A1 (zh) 2021-09-30

Family

ID=77752699

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/082382 WO2021190501A1 (zh) 2020-03-23 2021-03-23 数据预取方法、装置以及存储设备

Country Status (4)

Country Link
US (1) US20230009375A1 (zh)
EP (1) EP4099235A4 (zh)
CN (1) CN113435601A (zh)
WO (1) WO2021190501A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114065947B (zh) * 2021-11-15 2022-07-22 深圳大学 一种数据访问推测方法、装置、存储介质及电子设备
CN116955223B (zh) * 2023-09-18 2024-01-23 浪潮电子信息产业股份有限公司 一种数据预取方法、系统、电子设备及计算机存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105955709A (zh) * 2016-04-16 2016-09-21 浙江大学 基于机器学习的预取能效优化自适应装置及方法
CN109564552A (zh) * 2016-08-29 2019-04-02 英特尔公司 增强基于每页当前特权等级的存储器访问许可
CN110321306A (zh) * 2018-03-28 2019-10-11 英特尔Ip公司 用于将数据预取到分级存储器布置的第一级存储器的技术
US20190332525A1 (en) * 2018-04-27 2019-10-31 International Business Machines Corporation Computerized methods for prefetching data based on machine learned sequences of memory addresses

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7353339B2 (en) * 2003-12-24 2008-04-01 Intel Corporation Adaptive caching
US9098418B2 (en) * 2012-03-20 2015-08-04 Apple Inc. Coordinated prefetching based on training in hierarchically cached processors
US10394706B2 (en) * 2017-11-02 2019-08-27 Western Digital Technologies, Inc. Non-volatile storage with adaptive command prediction
CN110018970B (zh) * 2018-01-08 2023-07-21 腾讯科技(深圳)有限公司 缓存预取方法、装置、设备及计算机可读存储介质
US10671460B2 (en) * 2018-02-05 2020-06-02 Micron Technology, Inc. Memory access communications through message passing interface implemented in memory systems
US10963394B2 (en) * 2018-04-16 2021-03-30 Samsung Electronics Co., Ltd. System and method for optimizing performance of a solid-state drive using a deep neural network
JP2019204335A (ja) * 2018-05-24 2019-11-28 株式会社日立製作所 データ処理装置およびプリフェッチ方法
CN110765034B (zh) * 2018-07-27 2022-06-14 华为技术有限公司 一种数据预取方法及终端设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105955709A (zh) * 2016-04-16 2016-09-21 浙江大学 基于机器学习的预取能效优化自适应装置及方法
CN109564552A (zh) * 2016-08-29 2019-04-02 英特尔公司 增强基于每页当前特权等级的存储器访问许可
CN110321306A (zh) * 2018-03-28 2019-10-11 英特尔Ip公司 用于将数据预取到分级存储器布置的第一级存储器的技术
US20190332525A1 (en) * 2018-04-27 2019-10-31 International Business Machines Corporation Computerized methods for prefetching data based on machine learned sequences of memory addresses

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
See also references of EP4099235A4 *
ZHANG, JINGYANG, MOORE ELITE: "Jeff Dean Talks About the Future Development Trend of AI Chips", 22 December 2019 (2019-12-22), XP055853282, Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/98753249> [retrieved on 20211020] *

Also Published As

Publication number Publication date
EP4099235A1 (en) 2022-12-07
CN113435601A (zh) 2021-09-24
US20230009375A1 (en) 2023-01-12
EP4099235A4 (en) 2023-07-19

Similar Documents

Publication Publication Date Title
US20230009375A1 (en) Data prefetching method and apparatus, and storage device
CN110287010B (zh) 一种面向Spark时间窗口数据分析的缓存数据预取方法
EP3229142A1 (en) Read cache management method and device based on solid state drive
EP3822795B1 (en) Data storage and acquisition method and device
US9501419B2 (en) Apparatus, systems, and methods for providing a memory efficient cache
CN112667528A (zh) 一种数据预取的方法及相关设备
WO2014183514A1 (zh) 一种分级存储方法、装置和计算机存储介质
CN112256599A (zh) 一种数据预取方法、装置及存储设备
CN112199304B (zh) 数据预取方法及装置
CN115470157A (zh) 预取方法、电子设备、存储介质及程序产品
CN117235088B (zh) 一种存储系统的缓存更新方法、装置、设备、介质及平台
CN109947667B (zh) 数据访问预测方法和装置
CN112612728B (zh) 缓存管理方法及装置、设备
CN113157609A (zh) 存储系统、数据处理方法、装置、电子设备及存储介质
WO2023165543A1 (zh) 共享缓存的管理方法、装置及存储介质
WO2024032015A1 (zh) 数据缩减方法、装置及系统
KR20210126773A (ko) 파티셔닝 방법 및 그 장치
WO2022152086A1 (zh) 数据缓存方法、装置、设备及计算机可读存储介质
US11593014B2 (en) System and method for approximating replication completion time
CN114461590A (zh) 一种基于关联规则的数据库文件页预取方法及装置
CN115794366A (zh) 一种内存预取方法及装置
CN103631726B (zh) 一种串接流式计算节点的文件处理方法及装置
US20240103719A1 (en) Memory Control for Data Processing Pipeline Optimization
EP4261712A1 (en) Data elimination method and apparatus, cache node, and cache system
CN116107926B (zh) 缓存替换策略的管理方法、装置、设备、介质和程序产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21774365

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021774365

Country of ref document: EP

Effective date: 20220829

NENP Non-entry into the national phase

Ref country code: DE