WO2023226505A1 - Procédé de planification de prélecture et planificateur de prélecture - Google Patents
Procédé de planification de prélecture et planificateur de prélecture Download PDFInfo
- Publication number
- WO2023226505A1 WO2023226505A1 PCT/CN2023/079293 CN2023079293W WO2023226505A1 WO 2023226505 A1 WO2023226505 A1 WO 2023226505A1 CN 2023079293 W CN2023079293 W CN 2023079293W WO 2023226505 A1 WO2023226505 A1 WO 2023226505A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- prefetch
- data
- requests
- request
- access
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 81
- 238000012545 processing Methods 0.000 claims abstract description 24
- 238000003860 storage Methods 0.000 claims description 247
- 230000008569 process Effects 0.000 claims description 28
- 230000015654 memory Effects 0.000 claims description 23
- 238000004590 computer program Methods 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims 1
- 230000014759 maintenance of location Effects 0.000 claims 1
- 238000004891 communication Methods 0.000 description 18
- 238000012549 training Methods 0.000 description 15
- 230000006870 function Effects 0.000 description 14
- 238000010586 diagram Methods 0.000 description 12
- 238000005516 engineering process Methods 0.000 description 8
- 238000005457 optimization Methods 0.000 description 7
- 230000003993 interaction Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000005538 encapsulation Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/547—Remote procedure calls [RPC]; Web services
- G06F9/548—Object oriented; Remote method invocation [RMI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/54—Indexing scheme relating to G06F9/54
- G06F2209/544—Remote
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/54—Indexing scheme relating to G06F9/54
- G06F2209/548—Queue
Definitions
- the present application relates to the field of computer technology, and in particular, to a prefetch scheduling method and a prefetch scheduler.
- the local data center and the remote data center are usually connected through a wide area network.
- the remote data center can be the cloud of the local data center.
- the cloud includes private cloud, public cloud, hybrid cloud, etc.
- hybrid cloud is a private cloud
- Hybrid clouds can have the advantages of public clouds and private clouds.
- Hybrid clouds can handle different types of workloads. For example, using an easily scalable public cloud to handle less sensitive workloads and transferring more sensitive and critical workloads to The load is left to the private cloud to handle.
- local data centers usually use prefetching technology to obtain data that is expected to be accessed by applications (data that needs to be prefetched) from the remote data center and store it in the local data center before the application actually accesses the data.
- applications data that needs to be prefetched
- the local data center can respond to the data directly from the local area to the application, thereby reducing network latency when the application accesses it.
- the existing prefetching technology often cannot guarantee that the data that needs to be prefetched will be obtained from the local data center before being accessed by the application, and cannot reduce the data access by the application. network delay.
- This application provides a prefetch scheduling method and a prefetch scheduler, which are used to schedule the prefetch sequence of multiple requests that require prefetching data, so that all data that needs to be prefetched can be obtained as much as possible before being accessed by the application. In the local data center, it can reduce the network delay for applications to access data.
- the present application provides a prefetch scheduling method, which can be executed by a prefetch scheduler.
- the method includes: the prefetch scheduler obtains multiple prefetch requests, wherein each prefetch request includes a prefetch data Characteristics.
- the characteristics of prefetched data include the size of the prefetched data and the expected access time of the prefetched data; among them, the prefetched data is the data that needs to be prefetched and is expected to be accessed by the application; the prefetch scheduler is based on each The characteristics of the prefetch data included in the prefetch request determine the prefetch order of multiple prefetch requests; the prefetch scheduler processes multiple prefetch requests in sequence according to the prefetch order.
- the prefetch sequence may also be referred to as the processing sequence for processing multiple prefetch requests.
- the prefetch scheduler in this solution
- the prefetch sequence of multiple prefetch requests (that is, optimizing the acquisition sequence of multiple prefetch data) can be optimized according to the size of the prefetch data in each prefetch request and the expected access time, so as to avoid the common problems that exist in the existing technology.
- each prefetched data will be obtained in the local data center before it is accessed by the application. That is, this solution can make all the data that needs to be prefetched obtained in the local data center before it is accessed by the application, thereby reducing application access. network delay.
- the prefetch scheduler is located in a first data center (ie, a local data center).
- the first data center also includes multiple storage devices.
- the multiple prefetch requests are made by the prefetch scheduler based on the application. Obtained from the access request of a storage device.
- the multiple storage devices may be storage devices of different types from the same manufacturer, or may be storage devices from different manufacturers.
- the prefetch scheduler can obtain multiple prefetch requests based on the access requests of multiple storage devices in the first data center. That is, the scheduling scope of the prefetch scheduler of this application is to access multiple storage devices in the first data center.
- the prefetch request corresponding to the device instead of just the prefetch request corresponding to a certain storage device in the first data center, so that the prefetch request of multiple prefetch requests in the first data center can be optimized from the scope of the entire data center. Take the order.
- each prefetch request also includes an identification of the prefetch data; when the prefetch scheduler processes multiple prefetch requests in sequence according to the prefetch order, specifically, the prefetch scheduler may process the prefetch request according to the prefetch order.
- the prefetch data is obtained from the second data center (ie, the remote data center) according to the identification of the prefetch data included in each prefetch request.
- the second data center is one or more data centers that are different from the first data center.
- the second data center can be understood as the cloud of the first data center, such as a private cloud, a public cloud or a hybrid cloud.
- each prefetch request also includes a destination address.
- the destination address is used to indicate the storage address of the prefetch scheduler after obtaining the prefetch data.
- the destination address is the identification of multiple storage devices.
- the identification of the storage device is, for example, the Internet Protocol (IP) address of the storage device, or it may be the name of the storage device or the serial number of the storage device, etc.
- the plurality of prefetch requests include a first prefetch request, the identifier of the prefetch data included in the first prefetch request is the identifier of the first data, and the destination address included in the first prefetch request is in the first data center.
- the prefetch scheduler may also send the obtained first data to the first storage device for storage, so that when the first storage device receives the application's access request for the first data, it can obtain the first data response locally. application.
- the prefetch request obtained by the prefetch scheduler also includes the destination address, so that when processing the prefetch request, the prefetch scheduler can determine which storage device in the first data center to store the prefetched data. .
- the expected access time of the prefetched data included in the first prefetch request is the first time; sending the obtained first data to the first storage device for storage includes: Send the first data to the first storage device for storage before the first time.
- the data in the first storage device can meet the access requirements of the application, where the access requirements of the application can be that the prefetch data corresponding to the prefetch request of the storage device can be obtained and stored on the storage device before it is accessed by the application. .
- the multiple prefetch requests also include a second prefetch request
- the destination address included in the second prefetch request is an identification of a second storage device among the multiple storage devices.
- the second storage device Different from the first storage device.
- the second storage device and the first storage device may be of the same type or different types.
- the prefetch scheduler can not only prefetch data for the first storage device, but also prefetch data for the second storage device. That is, the prefetch scheduler can prefetch data for multiple storage devices in the first data center, and can sort the prefetch requests of the multiple storage devices, which helps to realize the data storage in each storage device in the first data center. All can meet the access requirements of the application.
- the prefetch scheduler can also obtain multiple access requests to multiple storage devices by the application in the historical period; and then obtain multiple prefetch requests based on the multiple access requests. Get the request.
- each access request among the multiple access requests carries the characteristics of the application's access to the access data on multiple storage devices in the historical period.
- the characteristics of the access data include the size of the access data and the access time of the access data; and more The characteristics of the prefetched data in each prefetch request are obtained based on the characteristics of the accessed data in multiple access requests.
- the prefetch scheduler predicts the characteristics of the prefetch data in multiple prefetch requests based on the characteristics of the access data carried in the multiple access requests, and then obtains multiple prefetch requests, and schedules the multiple prefetch requests.
- the implementation will prefetch data in advance Stored in the storage device of the first data center, the application can obtain the data relatively quickly.
- each access request carries an identifier of one of the multiple storage devices; the destination addresses in the multiple prefetch requests are obtained based on the identifiers of the storage devices included in the multiple access requests.
- the prefetch scheduler determines the data that the storage device needs to store in advance (that is, the prefetch data corresponding to the storage device) based on the application's access request to a storage device, and the prefetch scheduler generates a prefetch request corresponding to the storage device.
- the prefetch request corresponding to the storage device includes the identification of the storage device, that is, the destination address).
- the characteristics of the prefetch data also include the priority of the prefetch data; that is, when scheduling the prefetch sequence of each prefetch request, the prefetch scheduler not only considers the
- the size and expected access time of the prefetched data also take into account the priority of the prefetched data, which can further improve the accuracy of scheduling and help the application obtain data from the storage device in a time and/or bandwidth that meets the requirements of the application. service level agreement requirements.
- the priority of the prefetched data in the multiple prefetch requests may be obtained based on the priority of the application that generates the multiple access requests.
- the prefetch scheduler includes multiple prediction engine instances, and the prefetch scheduler is also used to execute any one of the multiple access requests: according to the access data requested by the access request, The access request is assigned to one of multiple access request queues in the namespace of the file system; or, the access request is assigned to the directory in the namespace of the file system corresponding to the access data requested by the access request. in one of multiple access request queues. Further, each access request queue corresponds to its own prediction engine instance, and the prefetch scheduler inputs multiple access requests in each access request queue into its corresponding prediction engine instance to predict each access request. Prefetch requests corresponding to multiple access requests in the queue. Optionally, multiple access requests in each access request queue are carried in one bucket.
- the prefetch scheduler places access requests corresponding to different namespaces (or directories in different namespaces) into different access request queues (or buckets), and the access requests in each access request queue can be used for training.
- Respective prediction engine instances in this way, access requests can be divided into different queues (or buckets) according to certain characteristics (for example, according to the above namespace, or directory, etc.), and each queue uses the corresponding one Prediction engine instances are used to predict, that is, access requests are managed and predicted in a more granular manner, which helps to improve the accuracy of predicted prefetch requests.
- the prefetch scheduler determines the prefetch order of multiple prefetch requests based on the characteristics of the prefetch data included in each prefetch request
- the prefetch scheduler determines the prefetch sequence according to the characteristics of each prefetch request.
- a convex optimization algorithm is used to determine the prefetch sequence. In this way, as much as possible, all data that needs to be prefetched can be obtained from the local data center before being accessed by the application, thereby reducing the network delay of application access and helping to achieve the overall waiting time of multiple prefetch requests (or multiple prefetch requests). Take the average waiting time of the request) to be the minimum.
- the prefetch scheduler can also obtain a third prefetch request.
- the third prefetch request is obtained when the prefetch scheduler processes multiple prefetch requests in sequence according to the prefetch order. ;
- the prefetch scheduler determines the prefetch sequence of the third prefetch request and the unprocessed prefetch requests among the plurality of prefetch requests based on the characteristics of the prefetch data included in the third prefetch request.
- the prefetch scheduler obtains a new prefetch request while processing multiple prefetch requests, it can dynamically adjust the third prefetch request and the current prefetch request based on the newly obtained third prefetch request.
- the prefetch order in which prefetch requests are processed that is to say, this solution can dynamically adjust the prefetch order of prefetch requests in real time, thereby better realizing the prefetch operation.
- the number of the above multiple prefetch requests is M.
- the prefetch scheduler obtains the third prefetch request, the number of unprocessed prefetch requests among the M prefetch requests is N, M and N are integers, M is greater than 1, N is greater than 0 and less than M; the prefetch scheduler determines the third prefetch request and multiple prefetch data according to the characteristics of the prefetch data included in the third prefetch request.
- the specific method may be to predict the first prefetch time when the prefetch data requested by the fourth prefetch request is prefetched to its corresponding storage device, and the fourth prefetch time.
- the prefetch request is the last prefetch request among N unprocessed prefetch requests; the prefetch scheduler packages the request based on the first prefetch time and the third prefetch request.
- the size of the included prefetch data determines the second prefetch time, which is the prefetch requested by the third prefetch request when the third prefetch request is queued after the fourth prefetch request. The time at which data is prefetched to its corresponding storage device.
- Case 1 When the second prefetch time and the expected access time of the prefetch data included in the third prefetch request meet the preset conditions, the prefetch scheduler places the third prefetch request into the fourth prefetch request. Process it later.
- Case 2 When the second prefetch time and the expected access time of the prefetch data included in the third prefetch request do not meet the preset conditions, the prefetch scheduler determines the access time based on the prefetch data included in the third prefetch request. Feature: Redetermine the prefetch order of the third prefetch request and the N unprocessed prefetch requests, so that the access time and prefetch time corresponding to the third prefetch request and the N unprocessed prefetch requests are equal. Meet the preset conditions.
- the preset condition includes the prefetch time corresponding to the prefetch request, which is before the expected access time of the prefetch data in the prefetch request; or, the preset condition includes the prefetch time corresponding to the prefetch request.
- the prefetch time is after the expected access time of the prefetched data in the prefetch request, and the time difference between the two is less than the threshold; where, the prefetch time corresponding to the prefetch request is the prefetch time in the prefetch request. The time for data to be prefetched to the corresponding storage device.
- the prefetch scheduler may also adjust the threshold according to a preset step size before redetermining the prefetch order of the third prefetch request and N unprocessed prefetch requests.
- embodiments of the present application provide a prefetch scheduler that has the function of implementing the method in the above first aspect or any possible implementation of the first aspect.
- the prefetch scheduler may also be called a prefetch scheduling device.
- the function of the above-mentioned prefetch scheduler can be implemented by hardware, or by hardware executing corresponding software.
- the hardware or software includes one or more modules or units or means corresponding to the above-mentioned functions.
- the prefetch scheduler includes a prediction engine manager, a prefetch optimizer and a prefetch executor, where the prediction engine manager is used to obtain multiple prefetch requests, where each Each prefetch request includes the characteristics of the prefetched data, and the characteristics of the prefetched data include the size of the prefetched data and the expected access time of the prefetched data; the prefetch optimizer is used to determine the prefetched data included in each prefetch request. The characteristics of the data determine the prefetch order of multiple prefetch requests; the prefetch executor is used to process multiple prefetch requests in sequence according to the prefetch order.
- the prefetch scheduler includes a processing module and a transceiver module, wherein the processing module is configured to support the prefetch scheduler to execute the above-mentioned first aspect or any implementation of the first aspect.
- the transceiver module is used to support communication between the prefetch scheduler and other devices, for example, to obtain prefetch data from a second data center.
- the prefetch scheduler may also include a storage module, which is coupled to the processing module and stores necessary program instructions and data for the prefetch scheduler.
- the processing module can be a processor
- the transceiver module can be a transceiver
- the storage module can be a memory.
- the memory can be integrated with the processor, or can be provided separately from the processor.
- the prefetch scheduler includes a processor and a memory.
- the processor is coupled to the memory and may be used to execute computer program instructions stored in the memory, so that the prefetch scheduler executes the method in the above-mentioned first aspect or any possible implementation of the first aspect.
- the prefetch scheduler further includes a communication interface, the processor is coupled to the communication interface, and the communication interface is used to support communication between the prefetch scheduler and other devices.
- embodiments of the present application provide a chip system, including: a processor, the processor is coupled to a memory, and the memory is used to store programs or instructions. When the programs or instructions are executed by the processor, the chip system implements the above-mentioned third aspect.
- the chip system further includes an interface circuit for communicating code instructions to the processor.
- processors in the chip system there can be one or more processors in the chip system, and the processors can be implemented in hardware or Implemented through software.
- the processor may be a logic circuit, an integrated circuit, or the like.
- the processor may be a general-purpose processor implemented by reading software code stored in memory.
- the memory can be integrated with the processor or can be provided separately from the processor.
- the memory may be a non-transient processor, such as a read-only memory ROM, which may be integrated with the processor on the same chip, or may be separately provided on different chips.
- the present application provides a computer-readable storage medium.
- Computer programs or instructions are stored in the computer-readable storage medium.
- the device is used to implement the first aspect or the first aspect. any possible implementation method.
- embodiments of the present application provide a computer program product, which when a computer reads and executes the computer program product, causes the computer to execute the method in the above-mentioned first aspect or any possible implementation of the first aspect.
- Figure 1 is a schematic diagram of an architecture suitable for an application to access cloud data
- Figure 2 is a schematic diagram of an architecture suitable for an application to access cloud data provided in this application;
- Figure 3 is a schematic structural diagram of a prefetch scheduler provided by this application.
- Figure 4 is a schematic flow chart of a prefetch scheduling method provided by this application.
- Figure 5 is a schematic structural diagram of another prefetch scheduler provided by this application.
- Figure 6 is a schematic structural diagram of yet another prefetch scheduling device provided by this application.
- Metadata Data that describes data (data about data), mainly information describing data attributes (property), used to indicate the storage location of the data, indicate the historical data of the data, search for resources, record files, etc. .
- Mounting Associating a device with a specific location in the directory tree so that the operating system can find the newly added device starting from the root directory and access the file data in the device.
- the device here can refer to a real device such as universal serial bus (USB), or a directory in the operating system.
- USB universal serial bus
- Namespace also called namespace, namespace, etc. Namespaces can be used as additional information to distinguish functions, classes, variables, etc. with the same name in different libraries. Using a namespace defines a context.
- Service level agreement refers to a service commitment made by the system service provider (provider) to the customer (costomer). SLA can be used to measure whether the service meets user expectations.
- Convex optimization It is a subfield of mathematical optimization that studies the problem of minimizing convex functions defined in convex sets. Convex optimization is used in many subject areas, such as automatic control systems, signal processing, communications and networks, electronic circuit design, data analysis and modeling, statistics (optimal design), and financial fields.
- Figure 1 is a schematic diagram of an architecture suitable for an application to access cloud data.
- the architecture includes a local data center 10 and a remote data center 20.
- the local data center 10 and the remote data center 20 can usually be connected through a wide area network.
- the remote data center 20 may be one or more, for example, three remote data centers 20 are illustrated in FIG. 1 .
- Far The end data center 20 can also be considered as a cloud.
- the cloud includes one or more public clouds, or includes one or more private clouds, or includes one or more public clouds and one or more private clouds, where, A cloud that includes both public and private clouds can be considered a hybrid cloud.
- Each public cloud can include one or more file systems, and each private cloud can also include one or more file systems.
- Each file system stores its own data.
- the local data center 10 includes a plurality of storage devices 11.
- a plurality of storage devices 11 For example, three storage devices 11 are exemplarily shown in FIG. 1, and each storage device 11 stores data.
- the local data center 10 runs multiple applications (applications, APPs), such as the three applications (denoted as application a, application b, and application c) exemplarily shown in FIG. 1 .
- the application can access the storage device 11 of the local data center 10 to obtain the data required by the application. Since the data stored in the storage device 11 of the local data center 10 is limited, the data required by the application may not be stored in the storage device 11 of the local data center 10 . Specifically, when the storage device 11 of the local data center 10 does not store the data that the application needs to access, the storage device 11 of the local data center 10 can obtain the corresponding data from the remote data center 20 and respond to the application.
- the application can run on a host (not shown in Figure 1) in the local data center 10.
- the user operates the application through the host.
- the host generates an access request for the application, and the access request can be used by the host to access the storage device 11 from the storage device 11.
- the storage device 11 of the local data center 10 can obtain the data corresponding to the access request from the remote data center 20 through the wide area network at the request of the host. The corresponding data, and the data corresponding to the access request is fed back to the host.
- the application has a many-to-many relationship with the storage device 11 of the local data center 10 , that is, the same application can access multiple storage devices 11 of the local data center 10 , that is, the data of one application can be stored in multiple storage devices 11 superior.
- One storage device 11 in the local data center 10 can provide access services for multiple applications, that is, one storage device 11 can store data of multiple applications.
- the three storage devices 11 in Figure 1 are recorded as storage device a, storage device b, and storage device c from top to bottom, where the data of application a can be stored in storage device a and storage device On b, the data of application b can be stored on storage device a and storage device c, and the data of application c can be stored on storage device a, etc.
- storage device a can provide access services to application a, application b, application c, etc., which will not be described again.
- applications can also be called applications, application threads, user applications, etc.
- the storage device 11 of the local data center 10 usually needs to access the data stored in the remote data center 20 through the wide area network, and there will be a problem of network delay.
- a prefetch module 111 is usually provided on each storage device 11 of the local data center 10. For each storage device 11, the prefetch module 111 on the storage device 11 accesses the storage device 11 according to the application. Access data (or, based on multiple access requests initiated by the application to the storage device 11), predict the data that the application may access in the future, and generate multiple predictions for the storage device 11 based on the data that may be accessed. Get the request. The prefetch module 111 then performs a prefetch operation based on the multiple prefetch requests, that is, prefetching data that may be accessed by the application from the remote data center 20 based on the multiple prefetch requests, and stores the data in the storage. on device 11.
- the prefetch module 111 performs the prefetch operation according to the first come first served (FCFS) principle when performing the prefetch operation based on multiple prefetch requests. Specifically, the prefetch module 111 of each storage device 11 generates a corresponding prefetch request according to the access request on the storage device 11 to which it belongs, and uses a first in first out (FIFO) queue to manage or maintain the corresponding prefetch request. prefetch request.
- FCFS first come first served
- each prefetch module 111 converts the new prefetch The fetch request is sent to the tail of its own maintained FIFO queue; and the prefetch request is fetched from the head of the FIFO queue in turn, and the prefetch operation is performed based on the fetched prefetch request. That is to say, there are multiple prefetch modules 111 in the local data center 10. Each prefetch module 111 usually performs prefetching for one storage device 11, and each prefetch module 111 processes multiple prefetch modules according to the principle of FCFS.
- this implementation method does not consider the business factors of each prefetch request, such as the required time of the prefetched data in each prefetch request (the time when the prefetched data is accessed by the application), the time required for each prefetch request to be There may be unreasonable scheduling issues such as the acquisition time of prefetched data, which results in the prefetched data in some prefetch requests being obtained after the required time, which cannot reduce the application access delay.
- the prefetch module 111 will perform the prefetch operation of prefetch request B after completing the prefetch operation of prefetch request A. Assuming that the prefetch module 111 takes a long time to perform the prefetch operation of prefetch request A, it needs to wait for a long time before performing the prefetch operation of prefetch request B. There may be applications in which the prefetch operation from the local data center 10 is performed.
- the prefetch operation of the prefetch request B has not yet been completed, that is, the data corresponding to the prefetch request B has not yet been stored in the storage device 11 of the local data center 10. This results in long delays for applications to access data.
- This application provides an architecture used by applications to access cloud data.
- the local data center 10 also includes a prefetch scheduler 12.
- the prefetch scheduler 12 can be used to process multiple storage devices 11. Prefetch requests are scheduled.
- the architecture is shown in Figure 2.
- the architecture provided by this application can specifically be an object storage (object-based storage devices, OSD) network, a storage area network (SAN), or a network attached storage (network attached storage, NAS) architecture.
- FIG. 3 is a schematic structural diagram of a prefetch scheduler 12 provided by the present application.
- the prefetch scheduler 12 may include a prediction engine manager (prefetch engine manager) 121, a prefetch optimizer ( prefetch opt imizer122 and prefetch executor123. Each module/unit is explained as follows:
- the prediction engine manager 121 is used to obtain an application's access request to the storage device 11 and generate a prefetch request based on the application's access request to the storage device 11 .
- the prediction engine manager 121 is one.
- the prediction engine manager 121 is used to manage the access requests of all storage devices 11 of the local data center 10 and generate corresponding data according to the access requests of all the storage devices 11 of the local data center 10 .
- prefetch request there are multiple prediction engine managers 121, and each prediction engine manager 121 can manage access requests of one or more storage devices 11 in the local data center 10, and generate the one or more storage devices 11 according to the managed access requests. Prefetch requests corresponding to multiple storage devices 11.
- the prefetch optimizer 122 is used to obtain multiple prefetch requests generated by the prediction engine manager 121, and then sort the prefetch orders of the multiple prefetch requests to obtain multiple sorted prefetch requests.
- the prefetch optimizer 122 may be one.
- the prefetch optimizer 122 obtains the prefetch requests generated by all prediction engine managers 121 of the local data center 10 and sorts the prefetch order of these prefetch requests (i.e.
- the prefetch optimizer 122 can sort the prefetch order of prefetch requests in the entire local data center 10), and instruct the prefetch executor 123 to obtain each prefetch request from the remote data center 20 according to the sorted prefetch order. Get the data requested by the request.
- the prefetch executor 123 is used to obtain the data requested by each prefetch request from the remote data center 20 in sequence according to the sorted prefetch order according to the instructions of the prefetch optimizer 122, and will obtain The received data is sent to the corresponding storage device 11 in the local data center 10 for storage. For example, there may be one prefetch executor 123.
- the prefetch scheduler 12 also includes a collector 124, which may also be called an IO collector (IO collector).
- the collector 124 can be deployed on each storage device 11 or on the host where the application runs.
- the collector 124 is used to collect access requests by applications to access data on the storage device 11 , and send the access requests to the prediction engine manager 121 .
- the prediction engine manager 121 receives the multiple accesses from the collector 124 requests and generate prefetch requests based on these access requests.
- collectors 124 There may be one or more collectors 124 .
- the collector 124 is one, and the one collector 124 is used to collect access requests that occur on all storage devices 11 of the local data center 10 .
- there are multiple collectors 124 and each collector 124 of the plurality of collectors 124 is used to collect access requests that occur on one or more storage devices 11 .
- the collector 124 can also be located outside the prefetch scheduler 12 , that is, the collector 124 is connected to the prefetch scheduler 12 , and the collector 124 can collect application access requests to the storage device 11 . Similarly, in this implementation, the collector 124 can be deployed on each storage device 11 or on the host where the application runs. For convenience of description, the following description takes the collector 124 located within the prefetch scheduler 12 as an example.
- the present application provides a prefetch scheduling method, which can be applied to the architectural diagram shown in Figure 2.
- the prefetch scheduling method can be executed by the prefetch scheduler 12 shown in Figure 2 or Figure 3.
- the local data center 10 can also be called the first data center, and the "local data center 10" in this application can be replaced by the "first data center”; the remote data center 20 can also be called the second data center.
- Data center, "remote data center 20" in this application can be replaced by "second data center”.
- Step 400 The prefetch scheduler 12 obtains multiple access requests to multiple storage devices 11 in the local data center 10 by the application in a historical period (recorded as the first historical period).
- the first historical period can be considered as a period with a preset duration before the current time.
- the prefetch scheduler 12 may obtain multiple access requests generated by the application in the first historical period.
- Each access request obtained by the prefetch scheduler 12 includes the identification of the access data that the application accessed on the storage device 11 in the first historical period and the characteristics of the access data. It should be understood that the identification of the access data is used to indicate what data the application wants to access, and the characteristics of the access data may include the size of the data accessed by the application and the time during which the application accesses the data. Optionally, the characteristics of the accessed data may also include the priority of the data accessed by the application.
- each of the above access requests obtained by the prefetch scheduler 12 may also explicitly or implicitly carry the identification of the storage device 11 , and the identification of the storage device 11 may be used to indicate where the access data corresponding to the access request is stored. on the storage device 11, or may indicate the access data on which storage device 11 the application accesses. It can be understood that each access request obtained by the prefetch scheduler 12 can indicate the following information: at what time which data on which storage device 11 the application accessed, and what is the priority of the accessed data.
- the prefetch scheduler 12 collects multiple access requests sent by the application to each storage device 11 .
- the prefetch scheduler 12 can also further process the access request after collecting the application's access request to the storage device 11.
- the prefetch scheduler 12 collects the application's access request to the storage device 11. Then, explicitly add the access time of the application to the access request, or add the identification of the storage device 11 corresponding to the access request, etc.
- step 400 is an optional step.
- the prefetch scheduler 12 may execute step 401 after executing step 400, or the prefetch scheduler 12 may directly execute step 401.
- Step 401 The prefetch scheduler 12 obtains multiple prefetch requests based on multiple access requests.
- Each of the plurality of prefetch requests includes characteristics of prefetched data, where the prefetched data is data that is expected to be accessed by the application and needs to be prefetched.
- the characteristics of the prefetched data include the size of the prefetched data (that is, the data size) and the expected access time of the prefetched data.
- the estimated access time of the prefetched data is the time point when the application is expected to access the prefetched data. For example, the current time is 11:30, and the expected access time of the prefetch data included in the prefetch request is 12:00, which means that the application is expected to access the data at 12:00, and the data should usually be accessed before 12:00.
- the data is obtained from the remote data center 20.
- the characteristics of the prefetched data also include the priority of the prefetched data
- the prefetch scheduler 12 may preferentially obtain the prefetched data with a higher priority.
- the priority of the prefetched data can be obtained in the following way: first confirm which access requests the prefetch requests corresponding to the prefetched data are based on, and then obtain the priority of the prefetched data based on the priorities of the applications that generate these access requests.
- priority can be the priority of the application.
- the priority of prefetching data is also priority 1; for another example, the priority of prefetching data
- the level can be corresponding to the priority of the application. For example, when the priority of the application is low priority, the priority of prefetching data is priority 1. When the priority of the application is medium priority, the priority of prefetching data is It's priority level 2.
- the characteristics of the prefetched data for each prefetch request include the priority of the prefetched data.
- multiple prefetch requests are prefetch request a, prefetch request b, and prefetch request c.
- the priority of the prefetch data contained in prefetch request a is priority 1
- the prefetch data contained in prefetch request b is priority 1.
- the priority of the data is priority 2
- the priority of the prefetched data contained in the prefetch request c is priority 3, where priority 1 is lower than priority 2, and priority 2 is lower than priority 3.
- the characteristics of the prefetch data of some of the prefetch requests include the priority of the prefetch data, while the characteristics of the prefetch data of another part of the prefetch requests do not include the priority of the prefetch data.
- the priority of the prefetched data in the characteristics of the prefetched data that does not include the priority of the prefetched data can be set to the lowest priority.
- multiple prefetch requests are prefetch request a, prefetch request b, and prefetch request c.
- the priority of the prefetch data contained in prefetch request a is priority 1
- prefetch request b and prefetch request If request c does not include the priority of the prefetch data, it can be considered that the priority of the prefetch data included in prefetch request b and prefetch request c is priority 0, where priority 0 is the lowest priority.
- each prefetch request also includes an identifier of the prefetch data
- the identifier of the prefetch data is used by the prefetch scheduler 12 to obtain the corresponding prefetch data from the remote data center 20 .
- the identifier of the prefetched data specifically includes the identifier of the file system of the file to which the prefetched data belongs, the file identifier of the file to which the prefetched data belongs, and the file offset of the prefetched data in the file to which it belongs.
- the identifier of the file system of the file to which the prefetched data belongs is used to indicate which file system of the remote data center 20 the prefetched data is in.
- the file identification of the file to which the prefetched data belongs may specifically include the file name of the file to which the prefetched data belongs and the file directory of the file in the file system of the remote data center 20 .
- the file identifier of file a includes file name a1 and file directory a2, that is, there is a file named a1 under file directory a2 in the file system of the remote data center 20, and the file is file a.
- the file system identifier of the file to which the prefetched data belongs can jointly indicate the prefetched data.
- the file system identifier of the file to which the prefetched data belongs is file system A
- the file to which the prefetched data belongs is file a.
- the file identifier of file a includes the file name a1 and the file directory a2; the prefetched data is in file a.
- the file offset in is 5M
- the size of the prefetched data is 50M.
- File system A, file name a1, file directory a2, prefetched data size 50M and file offset 5M jointly indicate that the prefetched data is the 5Mth to 55thM data in file a located under file directory a2 in file system A .
- each prefetch request can also carry a destination address.
- the destination address is used to indicate the storage address of the prefetch scheduler 12 after acquiring the prefetch data.
- the destination address may be one of the identities of multiple storage devices 11 in the local data center 10 .
- the identification of the storage device 11 may be one or a combination of the name, number, device serial number, and IP address of the storage device 11 .
- the prefetch scheduler 12 can obtain the prefetch data requested by the prefetch request.
- the obtained prefetched data is sent to storage device a for storage. In this way, when storage device a subsequently receives an application's access to the prefetched data, it can directly obtain the data locally to respond to the application, thereby reducing the application access delay. Late.
- the prefetch scheduler 12 After explaining the multiple prefetch requests, the following explains how the prefetch scheduler 12 obtains the implementation of multiple prefetch requests based on the multiple acquired access requests.
- the multiple access requests mentioned above carry the access rules of the application in the first historical period (including which access requests the application generated in the first historical period, and when each access request was accessed by the application). Which data of which storage device 11 , etc.), and further, the prefetch scheduler 12 can obtain one or more prefetch requests based on these access requests.
- the prefetch scheduler 12 can predict the possible access patterns of future applications based on the access patterns of multiple access requests in the first historical period, that is, what data on which storage device 11 the application may access at what time in the future. wait.
- the prefetch scheduler 12 may determine, based on multiple access requests, data related to which access requests may be accessed by the application in the future (ie, determine which data to prefetch). Then, the prefetch scheduler may determine the characteristics of the prefetch data in the corresponding prefetch request based on the determined characteristics of the access data in these access requests. Optionally, the prefetch scheduler 12 may also obtain the destination address of the corresponding prefetch request based on the identification of the storage device 11 carried in the determined access requests. Optionally, the prefetch scheduler 12 may also determine the priority of the prefetched data in the corresponding prefetch request based on the determined priorities of the data accessed by the applications corresponding to these access requests.
- the prefetch scheduler 12 can also manage a prediction engine instance.
- the multiple access requests mentioned above can be input into the prediction engine instance, and multiple prefetch requests can be predicted. Based on the number of prediction engine instances managed in the prefetch scheduler 12, two situations are described below.
- the prefetch scheduler 12 can manage one prediction engine instance, that is, the prefetch scheduler 12 can predict the prefetch request through one prediction engine instance.
- the prefetch scheduler 12 may input multiple access requests (for example, all access requests generated in the local data center 10 ) that the application accesses the plurality of storage devices 11 in the first historical period into the In the prediction engine instance, correspondingly, the prediction engine instance outputs multiple prefetch requests corresponding to the multiple storage devices 11 .
- a prediction engine instance can be considered a pre-trained prediction model.
- the prediction model is used to predict multiple prefetch requests based on multiple access requests.
- the prefetch scheduler 12 initializes a prediction model, and adjusts parameters in the prediction model through multiple rounds of iterations based on the training data to obtain a prediction model that meets expectations.
- the prefetch scheduler 12 may input training data into the prediction model, and adjust parameters of the prediction model in the current round based on the training data and the output results of the prediction model.
- the training data here may be access requests by the application to access multiple storage devices 11 in the local data center 10 in a certain historical period (recorded as the second historical period).
- the second historical period is before the first historical period.
- the second historical period is longer than the first historical period.
- the prefetch scheduler 12 can manage multiple prediction engine instances. For example, one prediction engine instance corresponds to one storage device 11, and the prefetch scheduler 12 can target one storage device 11 (denoted as the first storage device 11) and direct the application to access one or more of the first storage device 11.
- the access request (denoted as the first access request) is input to the prediction engine instance (denoted as the first prediction engine instance) corresponding to the first storage device 11.
- the first prediction engine instance can output one or more prefetched request (recorded as the first prefetch request).
- the prefetch scheduler 12 may determine the identity of the first storage device 11 (such as the IP address of the first storage device 11) as the destination address included in the first prefetch request.
- the first prediction engine instance may correspond to a pre-trained prediction model (denoted as the first prediction model).
- the first prediction model is used to predict one or more first prefetch requests based on one or more first access requests.
- the way in which the prefetch scheduler 12 trains to obtain the first prediction model, or the way in which the prefetch scheduler 12 optimizes the parameters in the first training model, can be seen in the above situation 1.
- the difference is that the training data here is for the first prediction model.
- One storage device 11 training data (note is the first training data)
- the first training data may include a first access request for the application to access the first storage device 11 in a certain historical period (recorded as the third historical period), where the third historical period is in the third historical period. Before one historical period, optionally, the duration of the third historical period is greater than the duration of the first historical period.
- the prefetch scheduler 12 can also manage multiple prediction engine instances corresponding to each storage device 11. Each prediction engine instance corresponds to its own access request queue, and each access request queue has Includes multiple access requests. Still taking the first storage device 11 as an example, when the prefetch scheduler 12 obtains multiple first access requests corresponding to the first storage device 11, it can execute for each first access request: determine the first access request. Which access request queue (denoted as the first access request queue) should be placed in the first storage device 11, and then input the first access request queue into the first prediction engine instance corresponding to the first access request queue, Correspondingly, the first prediction engine instance may obtain one or more first prefetch requests based on multiple first access requests included in the first access request queue.
- the prefetch scheduler 12 may specifically based on the naming of the file system corresponding to the access data requested by the first access request. space, allocate the first access request to one of the first access request queues; or, the prefetch scheduler 12 assigns the first access request to the directory in the namespace of the file system corresponding to the access data requested by the first access request. The request is assigned to one of the first access request queues.
- the first storage device 11 includes the file system A, and the first storage device 11 receives the first access request a to the first access request d, where the first access request a and the first access request b are used to access the file system Directory A1 in A, the first access request c and the first access request d are used to access directory A2 in file system A.
- the prefetch scheduler 12 places the first access request a and the first access request b in the first access request queue corresponding to the directory A1, and places the first access request c and the first access request d in the first access request queue corresponding to the directory A2. access request queue.
- the prefetch scheduler 12 may carry multiple access requests included in one access request queue into one bucket.
- a bucket is a container used to carry multiple access requests. Multiple access requests in the bucket can be understood as an IO stream.
- An IO stream usually has certain same characteristics, that is, multiple access requests in a bucket. Usually have some of the same characteristics.
- the IO stream in bucket 1 is used to request continuous data (such as video data) under directory A1 in file system A
- the IO stream in bucket 2 is used to request discontinuous data (such as web page data) under directory A2 in file system A. .
- each first prediction engine instance may correspond to a pre-trained first prediction model.
- the first prediction model is used to predict one or more first prefetch requests based on one or more first access requests in the first prediction request queue corresponding to the first prediction model.
- the way in which the prefetch scheduler 12 trains to obtain the first prediction model corresponding to each first prediction engine instance, or the way in which the prefetch scheduler 12 optimizes the parameters in the first training model corresponding to each first prediction engine instance, can be seen.
- the training data here is the first training data for each first prediction engine instance of the first storage device 11, and the first training data may include a certain historical period (record In the fourth historical period), the application accesses the namespace of the file system on the first storage device 11 or multiple first access requests to directories in the namespace of the file system, wherein the fourth historical period is in the first historical period. Before the period, optionally, the duration of the fourth historical period is greater than the duration of the first historical period.
- the local data center 10 also includes a second storage device 11 , which is different from the first storage device 11 .
- the second storage device 11 and the first storage device 11 may be of the same type or different types.
- the second storage device 11 and the first storage device 11 may be produced by the same manufacturer or by different manufacturers.
- the prefetch scheduler 12 may also generate a second prefetch request.
- the destination address included in the second prefetch request is the identification of the second storage device 11 to indicate that the prefetch scheduler 12 obtains the second prefetch request. After obtaining the requested prefetch data, the prefetch data is stored in the second storage device 11 .
- the prefetch scheduler 12 may target the second storage device 11 and assign the application access to one of the second storage devices 11 or multiple access requests (denoted as second access requests) are input into the prediction engine instance (denoted as second prediction engine instance) corresponding to the second storage device 11.
- the second prediction engine instance may output one or more A second prefetch request.
- a second prefetch request For details, please refer to the description in Situation 2 above. You can replace "first storage device 11" with "second storage device 11" and replace "first prefetch request" with “second prefetch request”. Request” and other understandings will not be repeated.
- Step 402 The prefetch scheduler 12 determines the prefetch order of multiple prefetch requests based on the characteristics of the prefetch data included in each prefetch request.
- the prefetch scheduler 12 determines the prefetch order of multiple prefetch requests, which can be understood as the prefetch scheduler 12 optimizing the original order of the multiple prefetch requests to obtain an optimized prefetch order.
- the original order is, for example, the order arranged according to the generation time (or acquisition time) of the prefetch request.
- the original order obtained by the prefetch scheduler 12 according to the generation time of the prefetch request is prefetch request a, prefetch request b, and prefetch request c, while the optimized prefetch order is prefetch request c, Prefetch request b, prefetch request a.
- the prefetch scheduler 12 may determine the prefetch order of the multiple prefetch requests based on the characteristics that each prefetch request in the multiple prefetch requests includes prefetch data. In one example, the prefetch scheduler 12 specifically determines the prefetch order of the multiple prefetch requests based on the expected access time of the prefetch data in each of the multiple prefetch requests and the size of the prefetch data. In yet another example, the prefetch scheduler 12 specifically determines multiple prefetch requests based on the expected access time of the prefetch data included in each prefetch request, the size of the prefetch data, and the priority of the prefetch data. The prefetch order for prefetch requests.
- the prefetch scheduler 12 may use a convex optimization algorithm to optimize the order of multiple prefetch requests. Taking the above-mentioned prefetch request a, prefetch request b, and prefetch request c as an example, the prefetch scheduler 12 combines each prefetch request in prefetch request a, prefetch request b, and prefetch request c according to the convex optimization algorithm. including the expected access time of the prefetched data, the size of the prefetched data and the priority of the prefetched data, the optimal order of the three prefetch requests is obtained, that is, prefetch request c, prefetch request b, Prefetch request a.
- Step 403 The prefetch scheduler 12 processes multiple prefetch requests in sequence according to the prefetch order.
- the prefetch scheduler 12 obtains the corresponding prefetch data from the remote data center 20 according to the prefetch order and the identification of the prefetch data included in each prefetch request, and obtains the corresponding prefetch data according to the purpose included in each prefetch request.
- the address sends the obtained prefetched data to the corresponding storage device 11 for storage.
- the identifier of the prefetch data included in the first prefetch request obtained by the prefetch scheduler 12 is the identifier of the first data
- the first prefetch request The included destination address is the identification of the first storage device 11, that is, the prefetch data requested by the first prefetch request is the first data.
- the prefetch scheduler 12 can obtain the information from the remote data center 20 according to the identification of the first data.
- the first data is obtained from the file system, and the obtained first data is sent to the first storage device 11 for storage.
- the expected access time of the prefetch data included in the first prefetch request is the first time
- the prefetch scheduler 12 may send the first data to the first time before the first time. stored on the storage device. In this way, when an application accesses the first data in the first storage device 11, the first storage device 11 can provide the locally stored first data to the application.
- the prefetch scheduler 12 optimizes the prefetch sequence of multiple prefetch requests based on the size of the prefetch data in each prefetch request and the expected access time, It helps to meet the access requirements of the application. That is, when the application accesses a certain storage device 11 in the local data center 10, the storage device 11 has pre/already stored data that the application needs to access, thereby reducing the network delay of the application access. . And the prefetch scheduler 12 is implemented to obtain the prefetch data corresponding to multiple prefetch requests from the remote data center 20 to the local data center 10, and the required overall waiting time is The duration (or the average wait time of multiple prefetch requests) is the smallest.
- the prefetch scheduler 12 can also combine the priorities of the data in the multiple prefetch requests, which can make the prefetch order determined by the prefetch scheduler 12 more accurate and accurate. Complying with business requirements helps the application to obtain data from the storage device 11 for a duration and/or bandwidth that meets the application's service level agreement requirements.
- the prefetch scheduler 12 in this application is When sorting multiple prefetch requests, the prefetch requests generated in the entire local data center 10 are sorted together, so the prefetch order is determined more accurately, so that the prefetch data of all prefetch requests can be processed as much as possible. It is obtained before accessed by the application.
- the prefetch scheduling method provided by this application can be a dynamic scheduling process. It can be understood that in the above steps 400 to 403, the prefetch scheduler 12 sorts the M prefetch requests, and in the process of sequentially processing the M prefetch requests, the prefetch scheduler 12 again receives the new prefetch requests (denoted as the third prefetch request). At this time, the prefetch scheduler 12 has not completed processing the M prefetch requests. For example, the number of unprocessed prefetch requests is N (that is, the current The prefetch scheduler 12 processes the first M-N of M prefetch requests) according to the optimized prefetch order, where M and N are integers, M is greater than 1, and N is greater than 0 and less than M.
- the above scheduling method also includes:
- Step 404 The prefetch scheduler 12 obtains the third prefetch request, and determines the prefetch order of the third prefetch request and the N prefetch requests that have not been processed according to the characteristics of the prefetch data included in the third prefetch request. .
- the characteristics of the prefetched data in the third prefetch request specifically include the size of the prefetched data and the expected access time of the prefetched data.
- the characteristics of the prefetched data in the third prefetch request also include: Including the priority of the prefetched data, for details, please refer to the description of the characteristics of the prefetched data in step 401 above.
- the fourth prefetch request the last prefetch request among the N prefetch requests that have not yet been processed in the prefetch sequence.
- the prefetch scheduler 12 can predict that the prefetch data requested by the fourth prefetch request will be The prefetch time to the corresponding storage device 11 (recorded as the first prefetch time).
- the prefetch scheduler 12 may determine the second prefetch time according to the first prefetch time and the size of the prefetch data included in the third prefetch request, where the second prefetch time is the third prefetch request. The time when the prefetch data requested by the third prefetch request is prefetched to the corresponding storage device 11 if it is arranged after the fourth prefetch request.
- the prefetch scheduler 12 can determine the need for the prefetch scheduler 12 to prefetch the prefetch data requested by the third prefetch request to the corresponding storage device 11 based on the size of the prefetch data 600M and the download speed 10M/s.
- the duration (or download duration) is 60s.
- the prefetch scheduler 12 determines that when the third prefetch request is arranged after the fourth prefetch request, the prefetch data requested by the third prefetch request will be prefetched.
- the time of fetching to the corresponding storage device 11 ie, the second prefetch time
- the download speed of prefetched data is related to the network bandwidth.
- the prefetch scheduler 12 may arrange the third prefetch request to the fourth prefetch request. After fetching the request.
- Case 2 When the second prefetch time and the expected access time of the prefetch data included in the third prefetch request do not meet the preset conditions, the prefetch scheduler 12 Characteristics of the data, redetermine the prefetch order of the third prefetch request and the N unprocessed prefetch requests, so that the third prefetch request and the N unprocessed prefetch requests Meet the preset conditions.
- the preset condition is specifically the prefetch time corresponding to the prefetch request, which is before the expected access time of the prefetch data in the prefetch request; or, the preset condition is specifically the prefetch time corresponding to the prefetch request.
- the prefetch time is after the expected access time of the prefetched data in the prefetch request, and the time difference between the two is less than the threshold.
- the prefetch time corresponding to the prefetch request is the time to prefetch the prefetch data in the prefetch request to the corresponding storage device 11 .
- the prefetch scheduler 12 You can first adjust the threshold according to the preset step size, and then re-determine the prefetch sequence.
- the threshold in the preset condition is 1ms, and in the process of re-determining the prefetch sequence by the prefetch scheduler 12, it is impossible to achieve that the third prefetch request and the N unprocessed prefetch requests all meet the preset condition, then the prefetch scheduler 12 cannot satisfy the preset condition.
- the scheduler 12 can adjust the threshold 1ms to 2ms according to a preset step size (such as 1ms), and then re-determine the prefetch sequence until the third prefetch request and the N unprocessed prefetch requests all satisfy the preset condition.
- a preset step size such as 1ms
- the prefetch scheduler 12 may obtain multiple third prefetch requests sequentially within a preset period, and the prefetch scheduler 12 may execute the multiple third prefetch requests in sequence in the order of acquisition time. : Determine the prefetch order of the third prefetch request and multiple prefetch requests that have not been processed according to the characteristics of the prefetch data included in the third prefetch request.
- the prefetch scheduler 12 sequentially acquires three third prefetch requests in a preset period, and in the order of acquisition, they are the third prefetch request 1, the third prefetch request 2, and the third prefetch request 3. .
- the prefetch scheduler 12 first determines the prefetch order of the third prefetch request 1 and the N prefetch requests that have not yet been processed.
- the prefetch scheduler 12 determines the prefetch order of the third prefetch request 2 and the N+1 prefetch requests that have not yet been processed (including the N prefetch requests and the third prefetch request 1).
- the prefetch scheduler 12 determines the prefetch order of the third prefetch request 3 and the N+2 prefetch requests that have not yet been processed (including the N prefetch requests, the third prefetch request 1 and the third prefetch request 2). .
- the obtained plurality of third prefetch requests can also be placed sequentially after the N prefetch requests that have not yet been processed in the order of acquisition time, and it is determined that each third prefetch request corresponds to Whether the prefetch time meets the preset conditions.
- the prefetch time corresponding to each third prefetch request satisfies the preset conditions, multiple third prefetch requests will be placed after the N prefetch requests that have not yet been processed in order of acquisition time; if there are one or more The prefetch times corresponding to the third prefetch requests do not meet the preset conditions, then according to the characteristics of the data included in each of the plurality of third prefetch requests, and each of the N prefetch requests that have not yet been processed, Based on the characteristics of the data included in the prefetch requests, the plurality of third prefetch requests and the N unprocessed prefetch requests are reordered to obtain the plurality of third prefetch requests and the N unprocessed N prefetch requests. The prefetch order for prefetch requests.
- the prefetch scheduler 12 has not processed the completed prefetch request.
- this application does not exclude that in these two processes, the prefetch scheduler 12 completes processing of the previous one or more prefetch requests in accordance with the current prefetch sequence.
- step 401 is also a dynamic adjustment process, that is, the prefetch scheduler 12 receives a new prefetch request while processing the prefetch requests sequentially, and the prefetch scheduler 12 adjusts the new prefetch request.
- the prefetch request and the currently unprocessed prefetch request form multiple prefetch requests (ie, the multiple prefetch requests in step 401), and the prefetch scheduler 12 reorders the multiple prefetch requests.
- the collector 124 may be used to obtain multiple access requests to multiple storage devices 11 in the local data center 10 by applications in the first historical period, and send the multiple access requests to the prediction engine manager 121 .
- the collector 124 can also After the application's access request to the storage device 11 is collected, the access request is further processed and then sent to the prediction engine manager 121 .
- the specific implementation method of the collector 124 obtaining multiple access requests please refer to the description in the above step 400. It can be considered that the collector 124 is used to execute the above step 400.
- the prediction engine manager 121 can be used to predict multiple prefetch requests based on the multiple access requests sent by the collector 124. For details, see the description in step 401 above. It can be considered that the prediction engine manager 121 is used to perform the above steps. 401. The prediction engine manager 121 may also send the predicted plurality of prefetch requests to the prefetch optimizer 122 .
- the prefetch optimizer 122 may be configured to determine the prefetch order of multiple prefetch requests based on the characteristics of the prefetch data included in each prefetch request in the multiple prefetch requests.
- the method of determining the prefetch order may be specific. See description in step 402. It can be considered that the prefetch optimizer 122 is used to perform the above step 402.
- the prefetch optimizer 122 may also instruct the prefetch executor 123 to obtain the prefetch data corresponding to each prefetch request from the remote data center 20 in sequence according to the prefetch order. Specifically, when the prefetch optimizer 122 determines a prefetch request that needs to be processed according to the prefetch order, it may send the identification of the prefetch data in the prefetch request to the prefetch executor 123 to instruct the prefetch executor 123 According to the identification of the prefetched data, the prefetched data is requested from the remote data center 20 .
- the prefetch executor 123 may request prefetch data from the remote data center 20 according to the instructions of the prefetch optimizer 122.
- the prefetch executor 123 performs the above step 403.
- the prefetch sequence determined by the prefetch optimizer 122 is prefetch request c, prefetch request b, and prefetch request a.
- the prefetch optimizer 122 may instruct the prefetch executor 123 to obtain the required information of each prefetch request from the remote data center 20 in sequence according to the prefetch order of prefetch request c, prefetch request b, and prefetch request a.
- Prefetch data It can be understood that the function of the prefetch executor 123 is mainly to obtain the prefetch data required for a certain prefetch request from the remote data center 20 based on the instructions of the prefetch optimizer 122, and store the requested prefetch data in in the corresponding storage device 11.
- the prefetch optimizer 122 may consider that the prefetch request is made after instructing the prefetch executor 123 to obtain the prefetch data required for a certain prefetch request from the remote data center 20. The request has been executed. In this way, the prefetch optimizer 122 can determine which prefetch request has been executed and which prefetch request has not yet been executed. Therefore, when the prefetch scheduler 12 is processing multiple prefetch requests, if a new prefetch request (ie, the third prefetch request) is received, the new prefetch request can be dynamically adjusted according to the execution status of the current prefetch request. The prefetch order for prefetch requests and unprocessed prefetch requests.
- a new prefetch request ie, the third prefetch request
- the prediction engine manager 121 After the prediction engine manager 121 predicts the third prefetch request again, the prediction engine manager 121 sends the third prefetch request to the prefetch optimizer 122. After processing the prefetch requests, re-sort the prefetch requests. For details, please refer to the description in step 404.
- the multiple prediction engine managers 121 may correspond to the same prefetch optimizer 122 and the same prefetch manager 121 .
- Fetch executor 123 specifically refer to the structural diagram of a prefetch scheduler exemplarily shown in Figure 5, that is, the prefetch scheduler 12 deployed in the local data center 10 includes multiple prediction engine managers 121, a prefetch optimization processor 122 and a prefetch executor 123.
- the remote data center 20 For descriptions of the local data center 10, the remote data center 20, the storage device 11 in the local data center 10, and the applications running in the local data center 10 shown in Figure 5, please refer to the description in the embodiment of Figure 1;
- Local Data Center For descriptions of the prefetch scheduler 12 in 10, the prediction engine manager 121, the prefetch optimizer 122, the prefetch executor 123 and the collector 124 in the prefetch scheduler 12, please refer to the relevant embodiments in Figures 2 to 4 described in.
- the prefetch optimizer 122 may receive prefetch requests from n prediction engine managers 121 , and each prediction engine manager 121 may send one or more prefetch requests to the prefetch optimizer 122 , that is, the prefetch optimizer 122 122 can be derived from n predictions
- the engine manager 121 obtains m prefetch requests in total, where m is greater than or equal to n, and m and n are both positive integers.
- the prefetch optimizer 122 may determine the order of the m prefetch requests based on the characteristics of the prefetch data contained in each of the m prefetch requests.
- the m prefetch requests may correspond to the same or different remote data centers 20 .
- the prefetch optimizer 122 sequentially instructs the prefetch executor 123 to obtain the data from the remote data center corresponding to the prefetch request according to the order of the m prefetch requests. Get data in 20.
- the prefetch optimizer 122 obtains a total of 4 prefetch requests (respectively represented as prefetch request 1 to prefetch request 4) from multiple prediction engine managers 121, and the prefetch optimizer 122 determines The characteristics of the prefetched data in the request determine the prefetch sequence, such as prefetch request 4, prefetch request 3, prefetch request 2 and prefetch request 1.
- the remote data center 20 corresponding to prefetch request 4 and prefetch request 3 is represented as remote data center 201
- the remote data center 20 corresponding to prefetch request 2 is represented as remote data center 202
- the prefetch request The remote data center 20 corresponding to 1 is represented as remote data center 203.
- the prefetch optimizer 122 instructs the prefetch executor 123 to obtain the data requested by prefetch request 4 from the remote data center 201; the prefetch optimizer 122 then instructs the prefetch executor 123 to obtain the prefetch data from the remote data center 201.
- the data requested by prefetch request 1 is obtained from the end data center 203.
- this application exemplarily provides an interface protocol involved in the prefetch scheduling method.
- the interaction between the remote data center 20 and the local data center 10 may be based on a cloud data management interface (CDMI) protocol. Specifically, the remote data center 20 may prefetch with the local data center 10 through the CDMI protocol. Executor 123 communicates.
- CDMI cloud data management interface
- the collector 124 and the prediction engine manager 121 may also communicate through the CDMI protocol.
- the interaction between the prediction engine manager 121 and the prefetch optimizer 122 in the prefetch scheduler 12 may be based on a hypertext transfer protocol (hyper text transfer protocol, HTTP) protocol.
- HTTP hypertext transfer protocol
- the prediction engine manager 121 includes a prefetch request information encapsulation interface based on the HTTP protocol
- the prefetch optimizer 122 includes a prefetch request information parsing interface based on the HTTP protocol.
- the prediction engine manager 121 uses the prefetch request information encapsulation interface to encapsulate, for example, the identifier of the file system of the file to which the prefetched data belongs, the file identifier of the file to which the prefetched data belongs, the file offset of the prefetched data in the file to which it belongs, Information such as the size of the prefetched data and expected access time are obtained to obtain the prefetch request.
- the prefetch optimizer 122 parses the prefetch request through the prefetch request information parsing interface, and then sorts the multiple prefetch requests.
- the interface protocol described above is only an example, between the local data center 10 and the remote data center 20 , between the prefetch scheduler of the local data center 10 and other modules (such as storage devices), and between the prefetch scheduler and the remote data center 20 .
- the protocol for interaction between sub-modules within the scheduler may also be in other forms, which is not limited in this application.
- the prefetch scheduler 12 includes a processing module and a transceiver module, where the processing module can be used to perform the above method embodiments.
- the processing functions of the prefetch scheduler 12 (such as one or more of the prediction engine manager 121, the prefetch optimizer 122, the prefetch executor 123, or the collector 124), such as a processing module, may be used to obtain multiple prefetch Request, determine the prefetch order of the multiple prefetch requests based on the characteristics of the prefetch data included in each of the multiple prefetch requests.
- the transceiver module can be used to perform the transceiver function of the prefetch scheduler 12 (such as one or more of the prediction engine manager 121, the prefetch optimizer 122, the prefetch executor 123 or the collector 124) in the above method embodiment, such as The transceiver module can be used to fetch remote data based on multiple prefetch requests. Center 20 obtains prefetched data, etc.
- Figure 6 schematically shows a prefetch scheduling device 600 provided by the embodiment of the present application.
- the device 600 shown in Figure 6 can be a piece of hardware of the prefetch scheduler 12 in the above method embodiment. How the circuit is implemented. This device can be adapted to perform the functions of the prefetch scheduler 12 in the above method embodiment in the flow chart shown above.
- FIG. 6 For ease of explanation, only the main components of the device 600 are shown in FIG. 6 .
- the device 600 shown in Figure 6 includes a communication interface 610, a processor 620 and a memory 630, where the memory 630 is used to store program instructions and/or data.
- Processor 620 may cooperate with memory 630.
- Processor 620 may execute program instructions stored in memory 630.
- the processor 620 is used to execute the prefetch scheduler 12 (such as the prediction engine manager 121, the prefetch optimizer 122, the prefetch executor 123 or the acquisition
- the prefetch scheduler 12 such as the prediction engine manager 121, the prefetch optimizer 122, the prefetch executor 123 or the acquisition
- One or more processing functions in the processor 124 for example, used to obtain multiple prefetch requests, determine the prefetch order of multiple prefetch requests according to the characteristics of the prefetch data included in each prefetch request, and determine the prefetch order according to the prefetch request. Take the order and process multiple prefetch requests in sequence.
- the communication interface 610 is used to perform the sending and receiving functions of the prefetch scheduler 12 (such as one or more of the prediction engine manager 121, the prefetch optimizer 122, the prefetch executor 123 or the collector 124) in the above method embodiment, For example, the obtained prefetched data is sent to the storage device corresponding to the prefetched data.
- the prefetch scheduler 12 such as one or more of the prediction engine manager 121, the prefetch optimizer 122, the prefetch executor 123 or the collector 124
- Memory 630 and processor 620 are coupled.
- the coupling in the embodiment of this application is an indirect coupling or communication connection between devices, units or modules, which may be in electrical, mechanical or other forms, and is used for information interaction between devices, units or modules.
- At least one of the memories 630 may be included in the processor 620 .
- the communication interface may be a transceiver, a circuit, a bus, a module, or other types of communication interfaces.
- the transceiver when the communication interface is a transceiver, the transceiver may include an independent receiver or an independent transmitter; it may also be a transceiver with integrated transceiver functions or a communication interface.
- Apparatus 600 may also include communication lines 640.
- the communication interface 610, the processor 620 and the memory 630 can be connected to each other through a communication line 640; the communication line 640 can be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (extended industry standard architecture). , referred to as EISA) bus, etc.
- the communication line 640 can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only one thick line is used in Figure 6, but it does not mean that there is only one bus or one type of bus.
- this application provides a computer-readable storage medium.
- Computer programs or instructions are stored in the computer-readable storage medium.
- the device is caused to perform the above method embodiments. Prefetch scheduler functionality.
- the present application provides a computer program product.
- the computer program product includes a computer program or instructions.
- the device executes the prefetch scheduler in the above method embodiment. function.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Information Transfer Between Computers (AREA)
Abstract
L'invention concerne un procédé de planification de prélecture et un planificateur de prélecture, qui sont utilisés pour réduire un retard de réseau d'une application dans l'accès à des données. Le procédé peut être exécuté au moyen du planificateur de prélecture, et comporte spécifiquement les étapes suivantes: un planificateur de prélecture acquiert une pluralité de demandes de prélecture, chaque demande de prélecture comportant des caractéristiques de données de prélecture, et les caractéristiques des données de prélecture comportant la taille des données de prélecture et un temps d'accès prédit des données de prélecture; le planificateur de prélecture détermine une séquence de prélecture de la pluralité de demandes de prélecture d'après les caractéristiques des données de prélecture, qui sont comprises dans les demandes de prélecture; et le planificateur de prélecture traite séquentiellement la pluralité de demandes de prélecture selon la séquence de prélecture.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210560367.3A CN117149449A (zh) | 2022-05-23 | 2022-05-23 | 一种预取调度方法及预取调度器 |
CN202210560367.3 | 2022-05-23 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023226505A1 true WO2023226505A1 (fr) | 2023-11-30 |
Family
ID=88910595
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/079293 WO2023226505A1 (fr) | 2022-05-23 | 2023-03-02 | Procédé de planification de prélecture et planificateur de prélecture |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN117149449A (fr) |
WO (1) | WO2023226505A1 (fr) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090199190A1 (en) * | 2008-02-01 | 2009-08-06 | Lei Chen | System and Method for Priority-Based Prefetch Requests Scheduling and Throttling |
CN102123318A (zh) * | 2010-12-17 | 2011-07-13 | 曙光信息产业(北京)有限公司 | 一种iptv应用的io加速方法 |
US20170031823A1 (en) * | 2015-07-31 | 2017-02-02 | Oracle International Corporation | Systems and methods for prefetching data |
CN109446112A (zh) * | 2013-01-15 | 2019-03-08 | 美普思技术有限责任公司 | 用于预取流量的改进控制的方法和系统 |
CN112256599A (zh) * | 2019-07-22 | 2021-01-22 | 华为技术有限公司 | 一种数据预取方法、装置及存储设备 |
-
2022
- 2022-05-23 CN CN202210560367.3A patent/CN117149449A/zh active Pending
-
2023
- 2023-03-02 WO PCT/CN2023/079293 patent/WO2023226505A1/fr unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090199190A1 (en) * | 2008-02-01 | 2009-08-06 | Lei Chen | System and Method for Priority-Based Prefetch Requests Scheduling and Throttling |
CN102123318A (zh) * | 2010-12-17 | 2011-07-13 | 曙光信息产业(北京)有限公司 | 一种iptv应用的io加速方法 |
CN109446112A (zh) * | 2013-01-15 | 2019-03-08 | 美普思技术有限责任公司 | 用于预取流量的改进控制的方法和系统 |
US20170031823A1 (en) * | 2015-07-31 | 2017-02-02 | Oracle International Corporation | Systems and methods for prefetching data |
CN112256599A (zh) * | 2019-07-22 | 2021-01-22 | 华为技术有限公司 | 一种数据预取方法、装置及存储设备 |
Also Published As
Publication number | Publication date |
---|---|
CN117149449A (zh) | 2023-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI620075B (zh) | 用於雲端巨量資料運算架構之伺服器及其雲端運算資源最佳化方法 | |
US20210342193A1 (en) | Multi-cluster container orchestration | |
US9998531B2 (en) | Computer-based, balanced provisioning and optimization of data transfer resources for products and services | |
US10389800B2 (en) | Minimizing execution time of a compute workload based on adaptive complexity estimation | |
US11119813B1 (en) | Mapreduce implementation using an on-demand network code execution system | |
US8918474B2 (en) | Determining priorities for cached objects to order the transfer of modifications of cached objects based on measured network bandwidth | |
US20160205039A1 (en) | Prediction-based provisioning planning for cloud environments | |
CN108762885B (zh) | 一种虚拟机创建方法、装置、管理设备及终端设备 | |
US10956214B2 (en) | Time frame bounded execution of computational algorithms | |
US11914894B2 (en) | Using scheduling tags in host compute commands to manage host compute task execution by a storage device in a storage system | |
WO2024016596A1 (fr) | Procédé et appareil de planification de grappe de conteneurs, dispositif et support d'enregistrement | |
US11144500B2 (en) | Assignment of data within file systems | |
US10824339B1 (en) | Snapshot-based garbage collection in an on-demand code execution system | |
US20170147400A1 (en) | Method, apparatus, and computer-readable medium for performing a data exchange | |
CN112486653A (zh) | 调度多类型计算资源的方法、装置和系统 | |
US20170371707A1 (en) | Data analysis in storage system | |
CN108228323B (zh) | 基于数据本地性的Hadoop任务调度方法及装置 | |
WO2023226505A1 (fr) | Procédé de planification de prélecture et planificateur de prélecture | |
US11388050B2 (en) | Accelerating machine learning and profiling over a network | |
WO2018089339A1 (fr) | Procédé et système d'équilibrage de charge avec affinité | |
KR102642396B1 (ko) | 제한된 gpu리소스를 사용한 딥러닝 추론 모델을 위한 배치 스케줄링 장치 | |
US20220171657A1 (en) | Dynamic workload tuning | |
US12026540B2 (en) | Working memory management | |
US20240177050A1 (en) | Neural network-based load balancing in distributed storage systems | |
CN115907031A (zh) | 一种业务处理方法、装置、设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23810581 Country of ref document: EP Kind code of ref document: A1 |