CN110795226B

CN110795226B - Method for processing task using computer system, electronic device and storage medium

Info

Publication number: CN110795226B
Application number: CN202010003595.1A
Authority: CN
Inventors: 不公告发明人
Original assignee: Cambricon Technologies Corp Ltd
Current assignee: Cambricon Technologies Corp Ltd
Priority date: 2020-01-03
Filing date: 2020-01-03
Publication date: 2020-10-27
Anticipated expiration: 2040-01-03
Also published as: CN110795226A

Abstract

A method, an electronic device, and a non-transitory computer-readable storage medium for processing a task using a computer system are provided; the electronic device comprises a processor and a memory, wherein the memory stores a computer program, and the computer program causes the processor to execute the computer storage management method when being executed by the processor. The task processing method and the task processing device can process the tasks in a certain state of the system by adopting the depth estimation network based on the idea of deep reinforcement learning, and in the process of processing the tasks, the storage resources are required to be allocated for the data storage requests.

Description

Method for processing task using computer system, electronic device and storage medium

Technical Field

The present application relates to the field of computers, and more particularly, to a method, electronic device, and non-transitory computer-readable storage medium for processing tasks using a computer system.

Background

As computer technology has evolved, more and more tasks may be handled by computer systems, such as, for example, mainframe computing, image analysis and/or image processing, and so forth. As the computer system processes more and more tasks, the situation of parallel processing of computing tasks is inevitably encountered, and the task also inevitably encounters the problem of data caching in the process of parallel processing. Storage management is a common problem in computer systems. With the development of computer technology, a large amount of data needs to be stored or cached in the process of task execution/data processing.

In the process of processing tasks in a computer system, how to efficiently store or cache data becomes a problem to be solved urgently.

Disclosure of Invention

Based on this, the present application provides a method for processing tasks by a computer system for data cache management, comprising:

acquiring a task to be processed and configuration information of the task to be processed, wherein the configuration information comprises storage information and calculation information;

according to the storage information of the tasks to be processed and the currently available storage resource information of the computer system, a pre-trained depth estimation network is utilized to allocate storage resources for a storage request set of the tasks to be processed, wherein the currently available storage resource information is variable in size, the pre-trained depth estimation network is a depth neural network used for representing an estimation function in an reinforcement learning algorithm, the depth neural network uses an experience replay method, sampling and training are separated, off-policy training is used, the result of each sampling is put back into an experience pool, samples are extracted from the experience pool for training during training, and the samples can be reused.

Scheduling the calculation data to the allocated storage resources according to the calculation information and performing calculation,

the configuration information further includes splitting information of the task, the splitting information is used for splitting the task to be processed into a plurality of subtasks, wherein the plurality of subtasks include subtask configuration information, the subtask configuration information includes subtank storage information and subtask calculation information, the subtask is defined as a subtask to be processed,

when the computer system processes the plurality of to-be-processed subtasks, the following operations are carried out:

according to the sub-storage information of the subtasks to be processed and the currently available storage resource information of the computer system, allocating storage resources for the storage request set of the subtasks to be processed by utilizing a pre-trained depth estimation network;

and scheduling the calculation data to the allocated storage resources according to the sub-calculation information and calculating.

According to another aspect of the present application, there is provided an electronic device including:

a processor;

a memory storing a computer program which, when executed by the processor, causes the processor to perform the method as described above.

According to another aspect of the present application, there is provided a non-transitory computer readable storage medium having stored thereon computer readable instructions which, when executed by a processor, cause the processor to perform the method as described above.

The task processing method and the task processing device can process the tasks in a certain state of the system by adopting the depth estimation network based on the idea of deep reinforcement learning, and in the process of processing the tasks, the storage resources are required to be allocated for the data storage requests. Since the available storage resources of the system change in real time, the data storage requests also change in real time, and such depth estimation networks are being adapted to handle the allocation and release of such variable-sized storage blocks. Moreover, this approach can give a global optimization of storage allocation for known storage resources and storage requests, rather than a fixed policy local optimization for only the current request and remaining memory space placement being processed. This approach is particularly useful for data cache management during operation.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a diagram illustrating a computer system architecture in one embodiment;

FIG. 2 is a schematic diagram of circuitry included in a processor of the computer system, according to one embodiment;

FIG. 3 is a schematic diagram of another embodiment of a circuit included in a processor of a computer system;

FIG. 4 illustrates a flow diagram of a method for processing tasks with a computer system according to one embodiment of the present application;

FIG. 5 illustrates a basic model diagram of reinforcement learning;

FIG. 6 illustrates a flow diagram for allocating storage resources for a set of storage requests for a pending task according to one embodiment of the present application;

FIG. 7 illustrates a schematic diagram of different hierarchical memory blocks in accordance with one embodiment of the present application;

FIG. 8 shows a schematic diagram of a task processing device according to an embodiment of the present application;

FIG. 9 shows a schematic diagram of an electronic device according to an embodiment of the application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be understood that the terms "first", "second", etc. in the claims, description, and drawings of the present application are used for distinguishing between different objects and not for describing a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this application, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the application. As used in the specification and claims of this application, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this application refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

In the present application, an artificial intelligence processor, also referred to as a special purpose processor, is a processor that is specific to a particular application or domain. For example, a Graphics Processing Unit (GPU), also called a display core, a visual processor, and a display chip, is a special processor dedicated to image operation on a personal computer, a workstation, a game machine, and some mobile devices (e.g., a tablet computer, a smart phone, etc.). For another example, a Neural Network Processor (NPU) is a special processor for matrix multiplication in the field of artificial intelligence, and adopts a architecture of "data-driven parallel computation", which is particularly good at processing massive multimedia data such as video and images.

The Deep Reinforcement Learning (Deep Learning) described in the present application refers to an artificial intelligence method that combines the perception capability of Deep Learning (Deep Learning) in the field of artificial intelligence with the decision-making capability of Reinforcement Learning (Deep Learning), and is closer to the way of human thinking.

As shown in fig. 1-2, a block diagram of a computer system is presented for one embodiment, the computer system comprising: a task assigning device a100, at least one processor a200 and a memory a 300. The task assigning device a100 is connected to the processor a200 and shares the memory a 300. The task assigning device a100 is configured to receive a plurality of tasks in an application program run by the computer system, and distribute the tasks to the processor a200 for task processing. Alternatively, the memory a300 may be used to store application programs and various data, and to perform access to the programs or data automatically and at high speed during the operation of the computer system. Optionally, memory a300 may contain caches for variable data storage and scalar data storage registers.

Optionally, in the process of running the application program, the computer system may temporarily store the task in the application program in the cache, and the task allocation device a100 may take out the task from the cache and distribute the task to the processor a200 to process the task, so as to obtain a processing result of the task. After the processing result of the task is obtained, the processing result of the task is generally written into a cache. Optionally, the buffers of memory a300 store tasks using a queue structure. Alternatively, the tasks may be stored in different task queues according to task type. Each type of task comprises at least one task queue.

Optionally, the processor may include a master processing circuit a101 and a plurality of slave processing circuits a102, as shown in fig. 2 in particular. The plurality of slave processing circuits are distributed in an array; each slave processing circuit is connected with other adjacent slave processing circuits, the master processing circuit is connected with k slave processing circuits in the plurality of slave processing circuits, and the k slave processing circuits are as follows: it should be noted that, as shown in fig. 2, the K slave processing circuits include only the n slave processing circuits in the 1 st row, the n slave processing circuits in the m th row, and the m slave processing circuits in the 1 st column, that is, the K slave processing circuits are slave processing circuits directly connected to the master processing circuit among the plurality of slave processing circuits. And the K slave processing circuits are used for forwarding data and instructions between the main processing circuit and the plurality of slave processing circuits.

Alternatively, as shown in fig. 3, the processor may also include a branch processing circuit a 103; the specific connection structure is shown in fig. 3, in which a main processing circuit a101 is connected to a branch processing circuit a103, and the branch processing circuit a103 is connected to a plurality of slave processing circuits a 102; a branch processing circuit a103 for executing forwarding data or instructions between the master processing circuit a101 and the slave processing circuit a 102.

The process of processing tasks with a computer system includes the allocation, scheduling, and computation of tasks. The problem of storage management is fully considered in the task allocation process. In the register allocation scenario in a compiler, there is a conventional way of memory management. The storage management mode distributes a large number of program variables to the limited registers, so that the memory is read and written as few as possible in the program execution process, and the program operation efficiency is improved. There are many kinds of register allocation strategies, which can be used to perform local register allocation based on basic program blocks in functions, or perform global register allocation in the whole functions, or even perform interactive register allocation among functions based on function call graphs of the whole programs, but all strategies are unified, namely, to reduce the swap-in and swap-out of data between registers and memories. There are also various algorithms for register allocation, such as graph coloring (graph coloring) algorithm, linear-scan (linear-scan) algorithm, and so on. The good register allocation strategy can greatly reduce the access frequency of the program, thereby improving the program operation efficiency.

However, the storage unit handled by this storage management method is fixed in size, i.e. management of one register granularity, and is not suitable for allocation and release of variable-size storage blocks. Moreover, the graph coloring problem is a difficult calculation problem, so the register allocation technology of graph coloring has high time complexity and cannot be applied to a storage management scene sensitive to response speed.

In the memory allocation management scene of the computer operating system, another conventional storage management mode is available. For each process in the operating system's consecutive dynamic memory allocation requests (malloc) and memory release requests (free), the operating system allocates and releases a correspondingly sized space in the computer's memory space. The size of each block of space is not fixed, depending on the size of the storage space required by the request. The storage management system needs to reduce the occurrence of memory fragmentation and waste during allocation as much as possible to ensure subsequent allocation. For a memory management algorithm in a single-process environment, first adaptation, cyclic first adaptation, optimal adaptation, worst adaptation and the like are common. For example, the partner system in Linux memory management uses the optimal adaptive allocation algorithm with 2^ i pages as the unit. For memory management in a multi-process environment, strategies such as fixed allocation local replacement, variable allocation global replacement, variable allocation local replacement and the like are provided.

However, this storage management approach handles immediate memory allocation and release requests, and cannot optimize storage allocation for known application and release sequences of storage blocks. Moreover, the storage management mode is coarse-grained for the management of the storage blocks, and cannot be applied to the scene that the on-chip storage resources are rare and fine-grained allocation, release and use optimization are needed. In addition, this storage management approach cannot handle multi-level storage management. Thus, memory management issues affect the efficiency of computer processing tasks. FIG. 4 illustrates a flow diagram of a method for processing tasks with a computer system according to one embodiment of the present application. The method is described as applied to the computer system in fig. 4 as an example. As shown in fig. 4, the method 100 may include steps S110, S120, and S130.

In step S110, the task to be processed and the configuration information of the task are acquired. The configuration information may include storage information and calculation information.

In particular, in a computer system, there are often a plurality of pending tasks (e.g., computing tasks) waiting to be executed in parallel, and these pending tasks are usually tasks generated when an application is executed locally or tasks received from the outside through a network. During the process of executing the task to be processed, a storage request is generated so as to cache intermediate data during the task execution process. The storage requests of a plurality of tasks to be executed in a certain time period can form a storage request set. Each task to be processed may have configuration information, the storage information in the configuration information may represent information, such as a storage space that needs to be occupied during the task execution process, related to the task, and the calculation information in the configuration information may represent information of a specific calculation behavior related to the task.

In step S120, according to the storage information of the to-be-processed task and the currently available storage resource information of the computer system, a preset depth estimation network is used to allocate a storage resource to the storage request set of the to-be-processed task.

At any time, the computer system may have current state information s (state), which may include storage resource information s (source) currently available in the computer system and storage information (e.g., storage request set d (demand)) of the pending task. For a computer system, the currently available storage resource information S may represent information for storage resources within the system that are currently unoccupied, i.e., information for storage resources that are available for allocation to data storage requests to store data. At a certain time, the currently available storage resource information S may be represented in the form of a set S ═ { S1, S2, …, Sn }, where S1, S2, …, Sn represent different free resource blocks, respectively. Wherein each resource block Si comprises a section of physical memory space for storing data, e.g. a section of memory space from address XXX to address YYY on a certain memory. The storage information of the pending task, that is, the storage request set D of the pending task may include data storage requests of all pending tasks currently in the system, and the storage request set D of the pending task may be denoted as D ═ D1, D2, …, dm, where D1, D2, …, dm respectively denote different data storage requests. Then S and D should satisfy S ≧ D, that is, the sum of all available storage resources should be greater than or equal to the sum of all data storage requests.

Optionally, the depth estimation network is obtained by deep reinforcement learning training. For the deep reinforcement learning algorithm, the deep learning in the artificial intelligence field is combined with the reinforcement learning.

FIG. 5 illustrates a basic model diagram for reinforcement learning. As shown in fig. 5, Q-learning (ql) is an estimation-based algorithm in the reinforcement learning algorithm, where Q is Q (s, a), that is, in the s state at a certain time, an action a is taken to obtain an expectation of a benefit, and an Environment (Environment) feeds back a corresponding reward (reward) according to an action (action) of an operation agent (agent), so the main idea of the algorithm is to construct a table Q-table storing Q values by a state and an action, and then select an action capable of obtaining a maximum benefit according to the Q values. The Deep Q-Learning (DQN) technique is a variation of Q-Learning, an algorithm for Deep reinforcement Learning. DQN differs from QL in that in the QL algorithm a defined two-dimensional table is learned, each table entry corresponding to an estimated reward value for an action at a certain state. However, for DQN, the core idea is to convert the two-dimensional table into an evaluation function, where the input of the function is a certain state and a certain action, and the output is a value, i.e. the estimated reward value (if the value is negative, it can also be called penalty value), which is equivalent to the value of a certain entry in QL. The core idea on the other hand is to use Deep Neural Networks (DNN) to represent this estimation function, i.e. the depth estimation Network. The input to the network is a certain state and a certain action and the output is an estimate of the prize value.

The DQN adopts the deep estimation network, and has the advantages of strong deep learning expression capability, perfect training method, and capability of well learning the estimation function, which is much stronger than the Q-table of the QL algorithm. Another advantage is that, since the current state information s of the system is variable length, the storage management method can be implemented using DQN instead of QL, i.e. using deep neural networks (e.g. recurrent neural networks such as lstm network and gru network) to learn the valuation function, and can use random initial resource sets and storage request sets, or grab initial resource sets and storage request sets from real software behavior for training. And the circulating neural network can be used as a deep neural network, so that the problems that the state and the action space are large and the Q-table is difficult to express can be well avoided. The traditional QL training needs to sample repeatedly for many times, the time complexity is high, the DQN training method uses an experience rehearsal method, namely, the sampling and the training are separated, off-policy training is used, the result of each sampling can be put back into an experience pool, samples are extracted from the experience pool for training during training, and the samples can be reused, so that the training efficiency is greatly improved. Moreover, the extracted samples can reduce the correlation among samples used for training, and the update of the neural network parameters is more efficient. On the other hand, in the training of the DQN, except for an estimation network (namely DNN simulating an estimation function), target networks with the same structure and different parameters are introduced, the parameters of the target networks are updated in a delayed mode, and the target networks are used for independently processing the deviation in the time difference algorithm, so that the training correlation can be broken.

Based on the above description, at a certain time, when the system is in a certain current state (i.e. a certain available storage resource and a certain data storage request form a set), a depth estimation network preset based on a depth reinforcement learning algorithm (the network is trained in advance) can be used to determine the action to be made currently, so as to allocate the storage resource for the storage request set D of the task to be processed.

In step S130, the calculation data is scheduled to the allocated storage resource according to the calculation information and calculation is performed. When a task to be processed needs to be executed, the calculation data needs to be scheduled to the allocated storage resources according to the calculation information, and corresponding calculation processing is performed to complete the task. The scheduling and calculation process in step S130 may be performed in a known manner.

In addition, after the execution of step S130 is completed, it can be determined whether all the tasks to be processed in the computer system have been completed. In computer systems, pending tasks are continually emerging, each having data storage requirements, which constitute a distinct set of storage requests. If all the tasks to be processed are executed and finished, the task processing process is finished. If not, returning to the step S110, and repeatedly executing the steps S110 to S130 until all the tasks to be processed are executed completely.

When returning to step S110, the to-be-processed task and its configuration information in the system are obtained again, and then in step S120, it is known that the current state information has become S ' according to the available storage resource information S ' subjected to the last allocation and the storage information (the next storage request set D ') of the to-be-processed task. When storage resources are allocated to all data storage requests in the set D, the available storage resources of the system change, i.e. from S to S ', and in the next time slice, all data storage requests will constitute the next storage request set D' of the pending task. Therefore, the current state information of the system can be updated to s' based on this. To avoid confusion, it should be noted that in this application, S represents the current state information of the computer system, S represents the currently available storage resource information, and D represents the storage request set of the current pending task, where S includes S and D. Subsequently, in step S130, the calculation data is scheduled to the allocated storage resource according to the calculation information and calculation is performed.

Therefore, based on the idea of deep reinforcement learning, the depth estimation network can be adopted to process tasks in a certain state for the system, and in the process of processing the tasks, storage resources need to be allocated for data storage requests. Since the available storage resources of the system change in real time, the data storage requests also change in real time, and such depth estimation networks are being adapted to handle the allocation and release of such variable-sized storage blocks. Moreover, this approach can give a global optimization of storage allocation for known storage resources and storage requests, rather than a fixed policy local optimization for only the current request and remaining memory space placement being processed. This approach is particularly useful for data cache management during operation.

Moreover, the prior art solution is to allocate and release the storage resources in real time, which means that one request is processed, that is, only the local optimization of the fixed policy is performed on the current request and the remaining memory space arrangement, which has great limitation. However, if the system knows the sequence of applications and releases of all memory blocks, then global optimization can be achieved.

According to an embodiment of the present application, the configuration information of the task to be processed may include splitting information of the task, and the task to be processed may be split into a plurality of subtasks according to the splitting information. For example, a task to be processed may be composed of multiple sub-tasks, and the sub-tasks may be in a serial relationship or in a parallel relationship. In this embodiment, the to-be-processed task may be split into a plurality of sub-tasks for execution according to the splitting information in the configuration information. In addition, the subtask may also include subtask configuration information that defines the subtask as a pending subtask.

The computer system may perform processing with reference to operations similar to the above-described steps S110 to S130 when processing the to-be-processed subtasks. For example, the computer system may obtain the to-be-processed subtask and the sub-configuration information of the to-be-processed subtask (where the sub-configuration information includes sub-storage information and sub-computation information); according to the sub-storage information of the subtasks to be processed and the currently available storage resource information of the computer system, allocating storage resources for the storage request set of the subtasks to be processed by using a preset depth estimation network; and scheduling the calculation data to the allocated storage resources according to the sub-calculation information and performing calculation.

FIG. 6 illustrates a flow diagram for allocating storage resources for a set of storage requests for a pending task according to one embodiment of the present application. As shown in fig. 6, the step S130 may include sub-steps S131, S132, and S133. In sub-step S131, an action meeting a preset condition corresponding to the storage information of the task to be processed and the currently available storage resource information of the computer system is selected from the preset action set a through the depth estimation network (for example, the action meeting the preset condition is action a with the highest score), and the selected action is executed. As described above, the input to the depth estimation network is a state and an action, and the output is an estimate of the prize value. Therefore, for the current state of the system, the corresponding action with the highest score can be known through the depth estimation network and executed. Then, the sub-step S131 may include: and determining scores of the actions in the preset action set A relative to the current state information, and selecting the action with the highest score corresponding to the current state information. For the preset action set a, a detailed description will be made hereinafter.

Subsequently, in sub-step S132, it is confirmed whether the storage request set D of the pending task has been allocated for completion. If the allocation of all data storage requests in the storage request set D is completed after the action performed in the above sub-step S131, the step S130 is ended. Otherwise, if the allocation is not completed, the substep S133 is entered.

In sub-step S133, the storage information of the pending task of the system and the currently available storage resource information are updated. If the set of storage requests D for the pending task is not allocated to completion, then the system may have performed an action such as storage resource consolidation, which may cause a change in the currently available storage resources S, thus requiring updating of the current state information, and then go back to substep S131, and repeat the process until allocation of all data storage requests in the set of storage requests is completed.

Therefore, the depth estimation network of the depth reinforcement learning algorithm can provide an action with the highest corresponding score from a preset action set for any current state of the system, and the action is the result of global optimization and not local optimization, so that the optimization of the storage management of the system is facilitated.

According to one embodiment of the present application, a storage resource of a system may include at least one tier of storage blocks. For example, a hierarchy of memory blocks from high to low may include: an in-core memory block located on a core in a processor, a shared memory block located between multiple cores in a processor, a common memory block located in a processor, an off-chip memory block located outside a processor. According to one embodiment, a processor described herein may be a multi-core processor, an artificial intelligence processor, or the like, which may have multiple cores, and one or more of an intra-core memory block, an inter-core memory block, a common memory block, and an off-chip memory block.

FIG. 7 shows a schematic diagram of different hierarchical memory blocks according to one embodiment of the present application. As shown in fig. 7, in the process of executing a task to be processed (i.e., data operation processing), an intra-core memory block 202 (e.g., RAM) on each core 201 in the processor 200 has the fastest access speed, and thus the intra-core memory block 202 has the highest hierarchy level. Secondly, a cluster (cluster)210 of the processor 200 is provided with a memory block 212 shared by a plurality of cores, and the shared memory block 212 also has a higher access speed, only in the core memory block 202, and therefore has a hierarchy next to the core memory block 202. Again, there may be a common memory 220 of multiple clusters 210 throughout the processor 200, with the memory blocks in the common memory 220 being accessed at a lower speed and at a lower level. Furthermore, external to the processor 200, an off-chip memory 300 (e.g., DDR memory) may also be provided, with the memory blocks thereon being at a lower level because the access speed of the off-chip memory 300 is relatively low. The storage space of the storage block is gradually increased from a high level to a low level, the unit storage cost is reduced, and the access speed is gradually reduced. The method and the device can solve the problem of multilevel storage management, namely unified storage management can be carried out on the multilevel storage resources.

According to one embodiment of the present application, the storage resource information S may include a storage block location and a start address and an end address of the storage block. As described above, the storage resources in the system may be located at different hierarchical levels, and the storage resource information may include the location of each memory block (e.g., located in-core storage, inter-core shared storage, on-chip common storage, or off-chip storage) and the start address and end address of each memory block.

According to an embodiment of the present application, the preset action set may include at least one of the following actions: storage resource allocation, storage resource consolidation within a hierarchy, and storage data migration across hierarchies. That is, one of the set of actions described above may be performed while the system is in a certain state.

Firstly, storage resource allocation: i.e. directly allocating the available storage resources S to each data storage request in the storage request set D. The specific allocation process may be implemented by mapping D ═ D1, D2, …, dm } onto S ═ S1, S2, …, Sn, from the storage request to the storage resource, and the specific allocation or mapping method may be any known appropriate method. After one allocation, the available storage resources of the system will become S', and the free space of some of the storage resources will be reduced or split. For example, a memory block has a start address of 0 and an end address of 100, and becomes small if data is allocated at 0 to 20 at the time of allocation; if the data is allocated at 20-40, the memory block will be split into two memory blocks of 0-20 and 40-100. The next time segment in the time sequence will have a new set of storage requests D'. Based on S 'and D', the current state of the system can be updated.

II, storage resource arrangement in a hierarchy: in some states of the system, it is necessary to consolidate the storage resources in the same hierarchy, that is, to transfer the data stored in the storage blocks to other storage resources in the same hierarchy, in order to reduce the storage fragmentation by carrying the data in the same storage hierarchy. After this action, the available storage resources of the system will become S', but the pending storage request set D remains unchanged because no allocation of storage resources is made.

Thirdly, migration of storage data across hierarchies: in some states of the system, data stored on a higher-level storage resource may be swapped out to a lower-level storage resource or data stored on a lower-level storage resource may be swapped out to a higher-level storage resource. When the storage resources of the higher level are not enough, part of the data can be swapped out to a larger space of the lower level, or when the storage resources of the higher level are idle, the data in the storage resources of the lower level can be swapped in to the storage resources of the higher level. After this action, the available storage resources of the system will become S', but the pending storage request set D remains unchanged because no allocation of storage resources is made.

According to one embodiment of the present application, the scores of the actions in the preset action set relative to the current state information may include the following rewards and/or penalties (reward >0, penalty < 0):

the reward positively correlated to the size of a memory block is obtained by distributing one memory block in each successful way: for successful allocation of a memory block, a reward corresponding to the size can be obtained, thereby facilitating faster storage of the requested data into free memory resources;

secondly, completing a first preset reward obtained by a storage request set every distribution: the preset reward can be obtained when one storage request set is distributed, so that the storage request set on the next time slice in the time sequence can be promoted to enter, and the efficiency of storage management is improved;

thirdly, distributing a second preset reward obtained by completing the storage request set of all the tasks to be processed: for all storage request sets which are distributed and completed, a preset reward can be obtained, so that all storage request sets in the whole storage request set sequence can be promoted to be completed as soon as possible;

and fourthly, punishment which is positively correlated with the size of the sorted storage block and is received when the storage resources in the hierarchy are sorted each time: for consolidation of storage resources within a hierarchy, the system does not encourage consolidation when it is not necessary and therefore is penalized, choosing to perform this action only when it has to be consolidated (e.g. too many pieces of storage within the hierarchy), or in order to obtain a higher reward in the next step or steps, so as to achieve global optimization rather than local optimization;

and fifthly, punishment which is positively correlated with the size of the migrated data and is borne when the cross-hierarchy storage data migration is carried out each time: for migration of storage data across a hierarchy, the system does not encourage migration when it is not necessary, and therefore is penalized, and only when it has to migrate (e.g., storage resources within the hierarchy are not sufficient or storage resources at higher levels are too idle), or in order to obtain a higher reward in the next step or steps, is the action chosen to be performed, thereby achieving global optimization rather than local optimization.

According to one embodiment of the application, the storage request sets of the tasks to be processed in time sequence form a storage request set sequence, and the intersection of any two adjacent storage request sets in the storage request set sequence is not an empty set. For example, in the pending storage request set sequence, the two adjacent storage request sets refer to the request sets Di and Di +1 on the adjacent time slices. Request set Di +1 will add some newly-occurring storage requests and will also release some storage requests compared to Di, but the intersection of the two cannot be an empty set. If the intersection is empty, indicating that the two request sets Di and Di +1 are completely incoherent, their storage management is also completely incoherent, which can be considered as not being request sets in the same sequence.

FIG. 8 shows a schematic diagram of a task processing device according to an embodiment of the present application. As shown in fig. 8, the apparatus 400 may include an obtaining unit 410, a resource allocating unit 420, and a scheduling calculating unit 430. The obtaining unit 410 obtains a task to be processed and configuration information of the task to be processed, where the configuration information includes storage information and calculation information. The resource allocation unit 420 allocates storage resources to the storage request set of the to-be-processed task by using a preset depth estimation network according to the storage information of the to-be-processed task and the currently available storage resource information of the computer system. The scheduling calculation unit 430 schedules the calculation data to the allocated storage resource according to the calculation information and performs calculation.

According to an embodiment of the present application, the resource allocation unit 420 is specifically operable to: selecting actions which correspond to preset conditions and correspond to the storage information of the tasks to be processed and the currently available storage resource information of the computer system from a preset action set through the depth estimation network, and executing the selected actions; after the selected action is executed, confirming whether the storage request set of the task to be processed is distributed and completed; and if the allocation is not completed, updating the storage information of the to-be-processed task and the currently available storage resource information of the computer system and repeatedly executing the steps until the allocation of the storage request set of the to-be-processed task is completed.

According to one embodiment of the present application, the storage resource may include at least one hierarchy of storage blocks. For example, a hierarchy of memory blocks from high to low may include: an in-core memory block located on a core in a processor; a shared memory block located between a plurality of cores in the processor; a common memory block located in the processor; and/or an off-chip memory block located outside of the processor.

According to an embodiment of the present application, the storage resource information may include a location of the storage block and a start address and an end address of the storage block.

According to an embodiment of the present application, the preset action set may include at least one of the following actions: allocating storage resources; sorting storage resources in a hierarchy; and storage data migration across tiers.

According to an embodiment of the present application, selecting, by the depth estimation network, an action meeting a preset condition corresponding to storage information of the to-be-processed task and currently available storage resource information of the computer system from a preset action set includes: determining scores of actions in the preset action set relative to the stored information of the task to be processed and the currently available stored resource information of the computer system; and selecting the action with the highest score corresponding to the storage information of the to-be-processed task and the currently available storage resource information of the computer system.

According to an embodiment of the present application, the scores of the actions in the preset action set with respect to the stored information of the to-be-processed task and the currently available stored resource information of the computer system may include: a reward positively correlated with the size of a memory block obtained each time a memory block is successfully allocated; a first predetermined reward obtained per allocation of a set of storage requests; distributing a second preset reward obtained by completing the storage request set of all the tasks to be processed; punishment positively correlated with the size of the sorted memory block is received when the memory resources in the hierarchy are sorted each time; and/or a penalty positively correlated to the size of the data migrated each time a cross-tier storage data migration is performed.

According to one embodiment of the application, the storage request sets of all the tasks to be processed in time sequence form a storage request set sequence, and the intersection of any two adjacent storage request sets in the storage request set sequence is not an empty set.

According to one embodiment of the application, the depth estimation network is obtained according to deep reinforcement learning training.

According to one embodiment of the present application, the configuration information of the task to be processed may further include priority information of the task. The priority information may characterize the priority of the task by a value, for example, the priority may be set from 1 to 10, the higher the value is, or alternatively, the higher the priority is, the smaller the value is. Then, when the computer system processes each task to be processed, each task can be processed according to the priority information of each task and the priority order from high to low.

FIG. 9 shows a schematic diagram of an electronic device according to an embodiment of the application. As shown in fig. 9, the electronic device 500 may include a processor 510 and a memory 530. The memory 530 stores a computer program. The computer program stored in the memory 530, when executed by the processor 510, can cause the processor 510 to perform a task processing method as described in any of the above embodiments.

According to another aspect of the present application, there is provided a non-transitory computer-readable storage medium having stored thereon computer-readable instructions, which, when executed by a processor, can cause the processor to execute a task processing method according to any one of the above embodiments.

It will be appreciated that the above described apparatus embodiments are merely illustrative and that the apparatus of the present application may be implemented in other ways. For example, the division of the units/modules in the above embodiments is only one logical function division, and there may be another division manner in actual implementation. For example, multiple units, modules, or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented.

The units or modules described as separate parts may or may not be physically separate. A component described as a unit or a module may or may not be a physical unit, and may be located in one apparatus or may be distributed over a plurality of apparatuses. The scheme of the embodiment in the application can be implemented by selecting some or all of the units according to actual needs.

In addition, unless otherwise specified, each functional unit/module in the embodiments of the present application may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may be integrated together. The integrated units/modules may be implemented in the form of hardware or software program modules.

If the integrated unit/module is implemented in hardware, the hardware may be digital circuits, analog circuits, etc. Physical implementations of hardware structures include, but are not limited to, transistors, memristors, and the like. The processor may be any suitable hardware processor, such as a CPU, GPU, FPGA, DSP, ASIC, etc., unless otherwise specified. Unless otherwise specified, the Memory unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as resistive Random Access Memory rram (resistive Random Access Memory), Dynamic Random Access Memory dram (Dynamic Random Access Memory), Static Random Access Memory SRAM (Static Random-Access Memory), enhanced Dynamic Random Access Memory edram (enhanced Dynamic Random Access Memory), High-Bandwidth Memory HBM (High-Bandwidth Memory), hybrid Memory cubic hmc (hybrid Memory cube), and so on.

The integrated units/modules, if implemented in the form of software program modules and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. The technical features of the embodiments may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the description of the embodiments is only intended to facilitate the understanding of the methods and their core concepts of the present application. Meanwhile, a person skilled in the art should, according to the idea of the present application, change or modify the embodiments and applications of the present application based on the scope of the present application. In view of the above, the description should not be taken as limiting the application.

Claims

1. A method for processing tasks for data cache management by a computer system, the method comprising:

allocating storage resources to a storage request set of the task to be processed by utilizing a pre-trained depth estimation network according to storage information of the task to be processed and currently available storage resource information of the computer system, wherein the currently available storage resource information is variable in size, the pre-trained depth estimation network is a depth neural network used for representing an estimation function in an enhanced learning algorithm, the depth neural network uses an experience replay method, sampling and training are separated, off-policy training is used, the result of each sampling is put back into an experience pool, samples are extracted from the experience pool for training during training, and the samples can be reused;

2. The method of claim 1, wherein allocating storage resources for the set of storage requests for the task to be processed using a pre-trained depth estimation network based on the storage information for the task to be processed and information about currently available storage resources of the computer system comprises:

step A: selecting actions which correspond to preset conditions and correspond to the storage information of the tasks to be processed and the currently available storage resource information of the computer system from a preset action set through the depth estimation network, and executing the selected actions;

and B: after the selected action is executed, confirming whether the storage request set of the task to be processed is distributed and completed; and

and if the allocation is not completed, updating the storage information of the to-be-processed task and the currently available storage resource information of the computer system, and repeatedly executing the step A and the step B until the allocation of the storage request set of the to-be-processed task is completed.

3. The method of claim 2, wherein the storage resource comprises at least one hierarchy of storage blocks, the hierarchy of storage blocks comprising at least one of:

an in-core memory block located on a core in a processor;

a shared memory block located between a plurality of cores in the processor;

a common memory block located in the processor;

an off-chip memory block located outside the processor.

4. The method of claim 2, wherein the preset set of actions comprises at least one action of:

allocating storage resources;

sorting storage resources in a hierarchy;

migration of storage data across a hierarchy.

5. The method of claim 4, wherein selecting, through the depth estimation network, actions corresponding to the storage information of the to-be-processed task and the currently available storage resource information of the computer system from a preset action set, which meet a preset condition, comprises:

determining scores of actions in the preset action set relative to the stored information of the task to be processed and the currently available stored resource information of the computer system; and

and selecting the action with the highest score corresponding to the storage information of the task to be processed and the currently available storage resource information of the computer system.

6. The method of claim 5, wherein scores of actions in the preset set of actions relative to stored information of the pending task and currently available stored resource information of the computer system comprise at least one of:

a reward positively correlated with the size of a memory block obtained each time a memory block is successfully allocated;

a first predetermined reward obtained per allocation of a set of storage requests;

distributing a second preset reward obtained by completing the storage request set of all the tasks to be processed;

punishment positively correlated with the size of the sorted storage blocks is received when the storage resources in the hierarchy are sorted each time;

the storage data migration across the hierarchy is carried out each time the storage data migration is carried out, the penalty which is positively correlated with the size of the migrated data is received.

7. The method according to any one of claims 1 to 6, wherein the chronological storage request sets of all the tasks to be processed form a storage request set sequence, and the intersection of any two adjacent storage request sets in the storage request set sequence is not an empty set.

8. The method according to any of claims 1-6, wherein the configuration information further comprises priority information of the tasks to be processed, and the computer system executes the tasks to be processed sequentially according to the priority information.

9. An electronic device, comprising:

a processor; a memory in which the computer program is stored,

wherein the computer program, when executed by the processor, causes the processor to perform the method of any of claims 1-8.

10. A non-transitory computer readable storage medium having stored thereon computer readable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1-8.