CN110795226B - Method for processing task using computer system, electronic device and storage medium - Google Patents

Method for processing task using computer system, electronic device and storage medium Download PDF

Info

Publication number
CN110795226B
CN110795226B CN202010003595.1A CN202010003595A CN110795226B CN 110795226 B CN110795226 B CN 110795226B CN 202010003595 A CN202010003595 A CN 202010003595A CN 110795226 B CN110795226 B CN 110795226B
Authority
CN
China
Prior art keywords
storage
information
processed
task
computer system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010003595.1A
Other languages
Chinese (zh)
Other versions
CN110795226A (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cambricon Technologies Corp Ltd
Original Assignee
Cambricon Technologies Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cambricon Technologies Corp Ltd filed Critical Cambricon Technologies Corp Ltd
Priority to CN202010003595.1A priority Critical patent/CN110795226B/en
Publication of CN110795226A publication Critical patent/CN110795226A/en
Application granted granted Critical
Publication of CN110795226B publication Critical patent/CN110795226B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0631Configuration or reconfiguration of storage systems by allocating resources to storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5022Mechanisms to release resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition

Abstract

A method, an electronic device, and a non-transitory computer-readable storage medium for processing a task using a computer system are provided; the electronic device comprises a processor and a memory, wherein the memory stores a computer program, and the computer program causes the processor to execute the computer storage management method when being executed by the processor. The task processing method and the task processing device can process the tasks in a certain state of the system by adopting the depth estimation network based on the idea of deep reinforcement learning, and in the process of processing the tasks, the storage resources are required to be allocated for the data storage requests.

Description

Method for processing task using computer system, electronic device and storage medium
Technical Field
The present application relates to the field of computers, and more particularly, to a method, electronic device, and non-transitory computer-readable storage medium for processing tasks using a computer system.
Background
As computer technology has evolved, more and more tasks may be handled by computer systems, such as, for example, mainframe computing, image analysis and/or image processing, and so forth. As the computer system processes more and more tasks, the situation of parallel processing of computing tasks is inevitably encountered, and the task also inevitably encounters the problem of data caching in the process of parallel processing. Storage management is a common problem in computer systems. With the development of computer technology, a large amount of data needs to be stored or cached in the process of task execution/data processing.
In the process of processing tasks in a computer system, how to efficiently store or cache data becomes a problem to be solved urgently.
Disclosure of Invention
Based on this, the present application provides a method for processing tasks by a computer system for data cache management, comprising:
acquiring a task to be processed and configuration information of the task to be processed, wherein the configuration information comprises storage information and calculation information;
according to the storage information of the tasks to be processed and the currently available storage resource information of the computer system, a pre-trained depth estimation network is utilized to allocate storage resources for a storage request set of the tasks to be processed, wherein the currently available storage resource information is variable in size, the pre-trained depth estimation network is a depth neural network used for representing an estimation function in an reinforcement learning algorithm, the depth neural network uses an experience replay method, sampling and training are separated, off-policy training is used, the result of each sampling is put back into an experience pool, samples are extracted from the experience pool for training during training, and the samples can be reused.
Scheduling the calculation data to the allocated storage resources according to the calculation information and performing calculation,
the configuration information further includes splitting information of the task, the splitting information is used for splitting the task to be processed into a plurality of subtasks, wherein the plurality of subtasks include subtask configuration information, the subtask configuration information includes subtank storage information and subtask calculation information, the subtask is defined as a subtask to be processed,
when the computer system processes the plurality of to-be-processed subtasks, the following operations are carried out:
according to the sub-storage information of the subtasks to be processed and the currently available storage resource information of the computer system, allocating storage resources for the storage request set of the subtasks to be processed by utilizing a pre-trained depth estimation network;
and scheduling the calculation data to the allocated storage resources according to the sub-calculation information and calculating.
According to another aspect of the present application, there is provided an electronic device including:
a processor;
a memory storing a computer program which, when executed by the processor, causes the processor to perform the method as described above.
According to another aspect of the present application, there is provided a non-transitory computer readable storage medium having stored thereon computer readable instructions which, when executed by a processor, cause the processor to perform the method as described above.
The task processing method and the task processing device can process the tasks in a certain state of the system by adopting the depth estimation network based on the idea of deep reinforcement learning, and in the process of processing the tasks, the storage resources are required to be allocated for the data storage requests. Since the available storage resources of the system change in real time, the data storage requests also change in real time, and such depth estimation networks are being adapted to handle the allocation and release of such variable-sized storage blocks. Moreover, this approach can give a global optimization of storage allocation for known storage resources and storage requests, rather than a fixed policy local optimization for only the current request and remaining memory space placement being processed. This approach is particularly useful for data cache management during operation.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a diagram illustrating a computer system architecture in one embodiment;
FIG. 2 is a schematic diagram of circuitry included in a processor of the computer system, according to one embodiment;
FIG. 3 is a schematic diagram of another embodiment of a circuit included in a processor of a computer system;
FIG. 4 illustrates a flow diagram of a method for processing tasks with a computer system according to one embodiment of the present application;
FIG. 5 illustrates a basic model diagram of reinforcement learning;
FIG. 6 illustrates a flow diagram for allocating storage resources for a set of storage requests for a pending task according to one embodiment of the present application;
FIG. 7 illustrates a schematic diagram of different hierarchical memory blocks in accordance with one embodiment of the present application;
FIG. 8 shows a schematic diagram of a task processing device according to an embodiment of the present application;
FIG. 9 shows a schematic diagram of an electronic device according to an embodiment of the application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be understood that the terms "first", "second", etc. in the claims, description, and drawings of the present application are used for distinguishing between different objects and not for describing a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this application, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the application. As used in the specification and claims of this application, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this application refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.
In the present application, an artificial intelligence processor, also referred to as a special purpose processor, is a processor that is specific to a particular application or domain. For example, a Graphics Processing Unit (GPU), also called a display core, a visual processor, and a display chip, is a special processor dedicated to image operation on a personal computer, a workstation, a game machine, and some mobile devices (e.g., a tablet computer, a smart phone, etc.). For another example, a Neural Network Processor (NPU) is a special processor for matrix multiplication in the field of artificial intelligence, and adopts a architecture of "data-driven parallel computation", which is particularly good at processing massive multimedia data such as video and images.
The Deep Reinforcement Learning (Deep Learning) described in the present application refers to an artificial intelligence method that combines the perception capability of Deep Learning (Deep Learning) in the field of artificial intelligence with the decision-making capability of Reinforcement Learning (Deep Learning), and is closer to the way of human thinking.
As shown in fig. 1-2, a block diagram of a computer system is presented for one embodiment, the computer system comprising: a task assigning device a100, at least one processor a200 and a memory a 300. The task assigning device a100 is connected to the processor a200 and shares the memory a 300. The task assigning device a100 is configured to receive a plurality of tasks in an application program run by the computer system, and distribute the tasks to the processor a200 for task processing. Alternatively, the memory a300 may be used to store application programs and various data, and to perform access to the programs or data automatically and at high speed during the operation of the computer system. Optionally, memory a300 may contain caches for variable data storage and scalar data storage registers.
Optionally, in the process of running the application program, the computer system may temporarily store the task in the application program in the cache, and the task allocation device a100 may take out the task from the cache and distribute the task to the processor a200 to process the task, so as to obtain a processing result of the task. After the processing result of the task is obtained, the processing result of the task is generally written into a cache. Optionally, the buffers of memory a300 store tasks using a queue structure. Alternatively, the tasks may be stored in different task queues according to task type. Each type of task comprises at least one task queue.
Optionally, the processor may include a master processing circuit a101 and a plurality of slave processing circuits a102, as shown in fig. 2 in particular. The plurality of slave processing circuits are distributed in an array; each slave processing circuit is connected with other adjacent slave processing circuits, the master processing circuit is connected with k slave processing circuits in the plurality of slave processing circuits, and the k slave processing circuits are as follows: it should be noted that, as shown in fig. 2, the K slave processing circuits include only the n slave processing circuits in the 1 st row, the n slave processing circuits in the m th row, and the m slave processing circuits in the 1 st column, that is, the K slave processing circuits are slave processing circuits directly connected to the master processing circuit among the plurality of slave processing circuits. And the K slave processing circuits are used for forwarding data and instructions between the main processing circuit and the plurality of slave processing circuits.
Alternatively, as shown in fig. 3, the processor may also include a branch processing circuit a 103; the specific connection structure is shown in fig. 3, in which a main processing circuit a101 is connected to a branch processing circuit a103, and the branch processing circuit a103 is connected to a plurality of slave processing circuits a 102; a branch processing circuit a103 for executing forwarding data or instructions between the master processing circuit a101 and the slave processing circuit a 102.
The process of processing tasks with a computer system includes the allocation, scheduling, and computation of tasks. The problem of storage management is fully considered in the task allocation process. In the register allocation scenario in a compiler, there is a conventional way of memory management. The storage management mode distributes a large number of program variables to the limited registers, so that the memory is read and written as few as possible in the program execution process, and the program operation efficiency is improved. There are many kinds of register allocation strategies, which can be used to perform local register allocation based on basic program blocks in functions, or perform global register allocation in the whole functions, or even perform interactive register allocation among functions based on function call graphs of the whole programs, but all strategies are unified, namely, to reduce the swap-in and swap-out of data between registers and memories. There are also various algorithms for register allocation, such as graph coloring (graph coloring) algorithm, linear-scan (linear-scan) algorithm, and so on. The good register allocation strategy can greatly reduce the access frequency of the program, thereby improving the program operation efficiency.
However, the storage unit handled by this storage management method is fixed in size, i.e. management of one register granularity, and is not suitable for allocation and release of variable-size storage blocks. Moreover, the graph coloring problem is a difficult calculation problem, so the register allocation technology of graph coloring has high time complexity and cannot be applied to a storage management scene sensitive to response speed.
In the memory allocation management scene of the computer operating system, another conventional storage management mode is available. For each process in the operating system's consecutive dynamic memory allocation requests (malloc) and memory release requests (free), the operating system allocates and releases a correspondingly sized space in the computer's memory space. The size of each block of space is not fixed, depending on the size of the storage space required by the request. The storage management system needs to reduce the occurrence of memory fragmentation and waste during allocation as much as possible to ensure subsequent allocation. For a memory management algorithm in a single-process environment, first adaptation, cyclic first adaptation, optimal adaptation, worst adaptation and the like are common. For example, the partner system in Linux memory management uses the optimal adaptive allocation algorithm with 2^ i pages as the unit. For memory management in a multi-process environment, strategies such as fixed allocation local replacement, variable allocation global replacement, variable allocation local replacement and the like are provided.
However, this storage management approach handles immediate memory allocation and release requests, and cannot optimize storage allocation for known application and release sequences of storage blocks. Moreover, the storage management mode is coarse-grained for the management of the storage blocks, and cannot be applied to the scene that the on-chip storage resources are rare and fine-grained allocation, release and use optimization are needed. In addition, this storage management approach cannot handle multi-level storage management. Thus, memory management issues affect the efficiency of computer processing tasks. FIG. 4 illustrates a flow diagram of a method for processing tasks with a computer system according to one embodiment of the present application. The method is described as applied to the computer system in fig. 4 as an example. As shown in fig. 4, the method 100 may include steps S110, S120, and S130.
In step S110, the task to be processed and the configuration information of the task are acquired. The configuration information may include storage information and calculation information.
In particular, in a computer system, there are often a plurality of pending tasks (e.g., computing tasks) waiting to be executed in parallel, and these pending tasks are usually tasks generated when an application is executed locally or tasks received from the outside through a network. During the process of executing the task to be processed, a storage request is generated so as to cache intermediate data during the task execution process. The storage requests of a plurality of tasks to be executed in a certain time period can form a storage request set. Each task to be processed may have configuration information, the storage information in the configuration information may represent information, such as a storage space that needs to be occupied during the task execution process, related to the task, and the calculation information in the configuration information may represent information of a specific calculation behavior related to the task.
In step S120, according to the storage information of the to-be-processed task and the currently available storage resource information of the computer system, a preset depth estimation network is used to allocate a storage resource to the storage request set of the to-be-processed task.
At any time, the computer system may have current state information s (state), which may include storage resource information s (source) currently available in the computer system and storage information (e.g., storage request set d (demand)) of the pending task. For a computer system, the currently available storage resource information S may represent information for storage resources within the system that are currently unoccupied, i.e., information for storage resources that are available for allocation to data storage requests to store data. At a certain time, the currently available storage resource information S may be represented in the form of a set S ═ { S1, S2, …, Sn }, where S1, S2, …, Sn represent different free resource blocks, respectively. Wherein each resource block Si comprises a section of physical memory space for storing data, e.g. a section of memory space from address XXX to address YYY on a certain memory. The storage information of the pending task, that is, the storage request set D of the pending task may include data storage requests of all pending tasks currently in the system, and the storage request set D of the pending task may be denoted as D ═ D1, D2, …, dm, where D1, D2, …, dm respectively denote different data storage requests. Then S and D should satisfy S ≧ D, that is, the sum of all available storage resources should be greater than or equal to the sum of all data storage requests.
Optionally, the depth estimation network is obtained by deep reinforcement learning training. For the deep reinforcement learning algorithm, the deep learning in the artificial intelligence field is combined with the reinforcement learning.
FIG. 5 illustrates a basic model diagram for reinforcement learning. As shown in fig. 5, Q-learning (ql) is an estimation-based algorithm in the reinforcement learning algorithm, where Q is Q (s, a), that is, in the s state at a certain time, an action a is taken to obtain an expectation of a benefit, and an Environment (Environment) feeds back a corresponding reward (reward) according to an action (action) of an operation agent (agent), so the main idea of the algorithm is to construct a table Q-table storing Q values by a state and an action, and then select an action capable of obtaining a maximum benefit according to the Q values. The Deep Q-Learning (DQN) technique is a variation of Q-Learning, an algorithm for Deep reinforcement Learning. DQN differs from QL in that in the QL algorithm a defined two-dimensional table is learned, each table entry corresponding to an estimated reward value for an action at a certain state. However, for DQN, the core idea is to convert the two-dimensional table into an evaluation function, where the input of the function is a certain state and a certain action, and the output is a value, i.e. the estimated reward value (if the value is negative, it can also be called penalty value), which is equivalent to the value of a certain entry in QL. The core idea on the other hand is to use Deep Neural Networks (DNN) to represent this estimation function, i.e. the depth estimation Network. The input to the network is a certain state and a certain action and the output is an estimate of the prize value.
The DQN adopts the deep estimation network, and has the advantages of strong deep learning expression capability, perfect training method, and capability of well learning the estimation function, which is much stronger than the Q-table of the QL algorithm. Another advantage is that, since the current state information s of the system is variable length, the storage management method can be implemented using DQN instead of QL, i.e. using deep neural networks (e.g. recurrent neural networks such as lstm network and gru network) to learn the valuation function, and can use random initial resource sets and storage request sets, or grab initial resource sets and storage request sets from real software behavior for training. And the circulating neural network can be used as a deep neural network, so that the problems that the state and the action space are large and the Q-table is difficult to express can be well avoided. The traditional QL training needs to sample repeatedly for many times, the time complexity is high, the DQN training method uses an experience rehearsal method, namely, the sampling and the training are separated, off-policy training is used, the result of each sampling can be put back into an experience pool, samples are extracted from the experience pool for training during training, and the samples can be reused, so that the training efficiency is greatly improved. Moreover, the extracted samples can reduce the correlation among samples used for training, and the update of the neural network parameters is more efficient. On the other hand, in the training of the DQN, except for an estimation network (namely DNN simulating an estimation function), target networks with the same structure and different parameters are introduced, the parameters of the target networks are updated in a delayed mode, and the target networks are used for independently processing the deviation in the time difference algorithm, so that the training correlation can be broken.
Based on the above description, at a certain time, when the system is in a certain current state (i.e. a certain available storage resource and a certain data storage request form a set), a depth estimation network preset based on a depth reinforcement learning algorithm (the network is trained in advance) can be used to determine the action to be made currently, so as to allocate the storage resource for the storage request set D of the task to be processed.
In step S130, the calculation data is scheduled to the allocated storage resource according to the calculation information and calculation is performed. When a task to be processed needs to be executed, the calculation data needs to be scheduled to the allocated storage resources according to the calculation information, and corresponding calculation processing is performed to complete the task. The scheduling and calculation process in step S130 may be performed in a known manner.
In addition, after the execution of step S130 is completed, it can be determined whether all the tasks to be processed in the computer system have been completed. In computer systems, pending tasks are continually emerging, each having data storage requirements, which constitute a distinct set of storage requests. If all the tasks to be processed are executed and finished, the task processing process is finished. If not, returning to the step S110, and repeatedly executing the steps S110 to S130 until all the tasks to be processed are executed completely.
When returning to step S110, the to-be-processed task and its configuration information in the system are obtained again, and then in step S120, it is known that the current state information has become S ' according to the available storage resource information S ' subjected to the last allocation and the storage information (the next storage request set D ') of the to-be-processed task. When storage resources are allocated to all data storage requests in the set D, the available storage resources of the system change, i.e. from S to S ', and in the next time slice, all data storage requests will constitute the next storage request set D' of the pending task. Therefore, the current state information of the system can be updated to s' based on this. To avoid confusion, it should be noted that in this application, S represents the current state information of the computer system, S represents the currently available storage resource information, and D represents the storage request set of the current pending task, where S includes S and D. Subsequently, in step S130, the calculation data is scheduled to the allocated storage resource according to the calculation information and calculation is performed.
Therefore, based on the idea of deep reinforcement learning, the depth estimation network can be adopted to process tasks in a certain state for the system, and in the process of processing the tasks, storage resources need to be allocated for data storage requests. Since the available storage resources of the system change in real time, the data storage requests also change in real time, and such depth estimation networks are being adapted to handle the allocation and release of such variable-sized storage blocks. Moreover, this approach can give a global optimization of storage allocation for known storage resources and storage requests, rather than a fixed policy local optimization for only the current request and remaining memory space placement being processed. This approach is particularly useful for data cache management during operation.
Moreover, the prior art solution is to allocate and release the storage resources in real time, which means that one request is processed, that is, only the local optimization of the fixed policy is performed on the current request and the remaining memory space arrangement, which has great limitation. However, if the system knows the sequence of applications and releases of all memory blocks, then global optimization can be achieved.
According to an embodiment of the present application, the configuration information of the task to be processed may include splitting information of the task, and the task to be processed may be split into a plurality of subtasks according to the splitting information. For example, a task to be processed may be composed of multiple sub-tasks, and the sub-tasks may be in a serial relationship or in a parallel relationship. In this embodiment, the to-be-processed task may be split into a plurality of sub-tasks for execution according to the splitting information in the configuration information. In addition, the subtask may also include subtask configuration information that defines the subtask as a pending subtask.
The computer system may perform processing with reference to operations similar to the above-described steps S110 to S130 when processing the to-be-processed subtasks. For example, the computer system may obtain the to-be-processed subtask and the sub-configuration information of the to-be-processed subtask (where the sub-configuration information includes sub-storage information and sub-computation information); according to the sub-storage information of the subtasks to be processed and the currently available storage resource information of the computer system, allocating storage resources for the storage request set of the subtasks to be processed by using a preset depth estimation network; and scheduling the calculation data to the allocated storage resources according to the sub-calculation information and performing calculation.
FIG. 6 illustrates a flow diagram for allocating storage resources for a set of storage requests for a pending task according to one embodiment of the present application. As shown in fig. 6, the step S130 may include sub-steps S131, S132, and S133. In sub-step S131, an action meeting a preset condition corresponding to the storage information of the task to be processed and the currently available storage resource information of the computer system is selected from the preset action set a through the depth estimation network (for example, the action meeting the preset condition is action a with the highest score), and the selected action is executed. As described above, the input to the depth estimation network is a state and an action, and the output is an estimate of the prize value. Therefore, for the current state of the system, the corresponding action with the highest score can be known through the depth estimation network and executed. Then, the sub-step S131 may include: and determining scores of the actions in the preset action set A relative to the current state information, and selecting the action with the highest score corresponding to the current state information. For the preset action set a, a detailed description will be made hereinafter.
Subsequently, in sub-step S132, it is confirmed whether the storage request set D of the pending task has been allocated for completion. If the allocation of all data storage requests in the storage request set D is completed after the action performed in the above sub-step S131, the step S130 is ended. Otherwise, if the allocation is not completed, the substep S133 is entered.
In sub-step S133, the storage information of the pending task of the system and the currently available storage resource information are updated. If the set of storage requests D for the pending task is not allocated to completion, then the system may have performed an action such as storage resource consolidation, which may cause a change in the currently available storage resources S, thus requiring updating of the current state information, and then go back to substep S131, and repeat the process until allocation of all data storage requests in the set of storage requests is completed.
Therefore, the depth estimation network of the depth reinforcement learning algorithm can provide an action with the highest corresponding score from a preset action set for any current state of the system, and the action is the result of global optimization and not local optimization, so that the optimization of the storage management of the system is facilitated.
According to one embodiment of the present application, a storage resource of a system may include at least one tier of storage blocks. For example, a hierarchy of memory blocks from high to low may include: an in-core memory block located on a core in a processor, a shared memory block located between multiple cores in a processor, a common memory block located in a processor, an off-chip memory block located outside a processor. According to one embodiment, a processor described herein may be a multi-core processor, an artificial intelligence processor, or the like, which may have multiple cores, and one or more of an intra-core memory block, an inter-core memory block, a common memory block, and an off-chip memory block.
FIG. 7 shows a schematic diagram of different hierarchical memory blocks according to one embodiment of the present application. As shown in fig. 7, in the process of executing a task to be processed (i.e., data operation processing), an intra-core memory block 202 (e.g., RAM) on each core 201 in the processor 200 has the fastest access speed, and thus the intra-core memory block 202 has the highest hierarchy level. Secondly, a cluster (cluster)210 of the processor 200 is provided with a memory block 212 shared by a plurality of cores, and the shared memory block 212 also has a higher access speed, only in the core memory block 202, and therefore has a hierarchy next to the core memory block 202. Again, there may be a common memory 220 of multiple clusters 210 throughout the processor 200, with the memory blocks in the common memory 220 being accessed at a lower speed and at a lower level. Furthermore, external to the processor 200, an off-chip memory 300 (e.g., DDR memory) may also be provided, with the memory blocks thereon being at a lower level because the access speed of the off-chip memory 300 is relatively low. The storage space of the storage block is gradually increased from a high level to a low level, the unit storage cost is reduced, and the access speed is gradually reduced. The method and the device can solve the problem of multilevel storage management, namely unified storage management can be carried out on the multilevel storage resources.
According to one embodiment of the present application, the storage resource information S may include a storage block location and a start address and an end address of the storage block. As described above, the storage resources in the system may be located at different hierarchical levels, and the storage resource information may include the location of each memory block (e.g., located in-core storage, inter-core shared storage, on-chip common storage, or off-chip storage) and the start address and end address of each memory block.
According to an embodiment of the present application, the preset action set may include at least one of the following actions: storage resource allocation, storage resource consolidation within a hierarchy, and storage data migration across hierarchies. That is, one of the set of actions described above may be performed while the system is in a certain state.
Firstly, storage resource allocation: i.e. directly allocating the available storage resources S to each data storage request in the storage request set D. The specific allocation process may be implemented by mapping D ═ D1, D2, …, dm } onto S ═ S1, S2, …, Sn, from the storage request to the storage resource, and the specific allocation or mapping method may be any known appropriate method. After one allocation, the available storage resources of the system will become S', and the free space of some of the storage resources will be reduced or split. For example, a memory block has a start address of 0 and an end address of 100, and becomes small if data is allocated at 0 to 20 at the time of allocation; if the data is allocated at 20-40, the memory block will be split into two memory blocks of 0-20 and 40-100. The next time segment in the time sequence will have a new set of storage requests D'. Based on S 'and D', the current state of the system can be updated.
II, storage resource arrangement in a hierarchy: in some states of the system, it is necessary to consolidate the storage resources in the same hierarchy, that is, to transfer the data stored in the storage blocks to other storage resources in the same hierarchy, in order to reduce the storage fragmentation by carrying the data in the same storage hierarchy. After this action, the available storage resources of the system will become S', but the pending storage request set D remains unchanged because no allocation of storage resources is made.
Thirdly, migration of storage data across hierarchies: in some states of the system, data stored on a higher-level storage resource may be swapped out to a lower-level storage resource or data stored on a lower-level storage resource may be swapped out to a higher-level storage resource. When the storage resources of the higher level are not enough, part of the data can be swapped out to a larger space of the lower level, or when the storage resources of the higher level are idle, the data in the storage resources of the lower level can be swapped in to the storage resources of the higher level. After this action, the available storage resources of the system will become S', but the pending storage request set D remains unchanged because no allocation of storage resources is made.
According to one embodiment of the present application, the scores of the actions in the preset action set relative to the current state information may include the following rewards and/or penalties (reward >0, penalty < 0):
the reward positively correlated to the size of a memory block is obtained by distributing one memory block in each successful way: for successful allocation of a memory block, a reward corresponding to the size can be obtained, thereby facilitating faster storage of the requested data into free memory resources;
secondly, completing a first preset reward obtained by a storage request set every distribution: the preset reward can be obtained when one storage request set is distributed, so that the storage request set on the next time slice in the time sequence can be promoted to enter, and the efficiency of storage management is improved;
thirdly, distributing a second preset reward obtained by completing the storage request set of all the tasks to be processed: for all storage request sets which are distributed and completed, a preset reward can be obtained, so that all storage request sets in the whole storage request set sequence can be promoted to be completed as soon as possible;
and fourthly, punishment which is positively correlated with the size of the sorted storage block and is received when the storage resources in the hierarchy are sorted each time: for consolidation of storage resources within a hierarchy, the system does not encourage consolidation when it is not necessary and therefore is penalized, choosing to perform this action only when it has to be consolidated (e.g. too many pieces of storage within the hierarchy), or in order to obtain a higher reward in the next step or steps, so as to achieve global optimization rather than local optimization;
and fifthly, punishment which is positively correlated with the size of the migrated data and is borne when the cross-hierarchy storage data migration is carried out each time: for migration of storage data across a hierarchy, the system does not encourage migration when it is not necessary, and therefore is penalized, and only when it has to migrate (e.g., storage resources within the hierarchy are not sufficient or storage resources at higher levels are too idle), or in order to obtain a higher reward in the next step or steps, is the action chosen to be performed, thereby achieving global optimization rather than local optimization.
According to one embodiment of the application, the storage request sets of the tasks to be processed in time sequence form a storage request set sequence, and the intersection of any two adjacent storage request sets in the storage request set sequence is not an empty set. For example, in the pending storage request set sequence, the two adjacent storage request sets refer to the request sets Di and Di +1 on the adjacent time slices. Request set Di +1 will add some newly-occurring storage requests and will also release some storage requests compared to Di, but the intersection of the two cannot be an empty set. If the intersection is empty, indicating that the two request sets Di and Di +1 are completely incoherent, their storage management is also completely incoherent, which can be considered as not being request sets in the same sequence.
FIG. 8 shows a schematic diagram of a task processing device according to an embodiment of the present application. As shown in fig. 8, the apparatus 400 may include an obtaining unit 410, a resource allocating unit 420, and a scheduling calculating unit 430. The obtaining unit 410 obtains a task to be processed and configuration information of the task to be processed, where the configuration information includes storage information and calculation information. The resource allocation unit 420 allocates storage resources to the storage request set of the to-be-processed task by using a preset depth estimation network according to the storage information of the to-be-processed task and the currently available storage resource information of the computer system. The scheduling calculation unit 430 schedules the calculation data to the allocated storage resource according to the calculation information and performs calculation.
According to an embodiment of the present application, the resource allocation unit 420 is specifically operable to: selecting actions which correspond to preset conditions and correspond to the storage information of the tasks to be processed and the currently available storage resource information of the computer system from a preset action set through the depth estimation network, and executing the selected actions; after the selected action is executed, confirming whether the storage request set of the task to be processed is distributed and completed; and if the allocation is not completed, updating the storage information of the to-be-processed task and the currently available storage resource information of the computer system and repeatedly executing the steps until the allocation of the storage request set of the to-be-processed task is completed.
According to one embodiment of the present application, the storage resource may include at least one hierarchy of storage blocks. For example, a hierarchy of memory blocks from high to low may include: an in-core memory block located on a core in a processor; a shared memory block located between a plurality of cores in the processor; a common memory block located in the processor; and/or an off-chip memory block located outside of the processor.
According to an embodiment of the present application, the storage resource information may include a location of the storage block and a start address and an end address of the storage block.
According to an embodiment of the present application, the preset action set may include at least one of the following actions: allocating storage resources; sorting storage resources in a hierarchy; and storage data migration across tiers.
According to an embodiment of the present application, selecting, by the depth estimation network, an action meeting a preset condition corresponding to storage information of the to-be-processed task and currently available storage resource information of the computer system from a preset action set includes: determining scores of actions in the preset action set relative to the stored information of the task to be processed and the currently available stored resource information of the computer system; and selecting the action with the highest score corresponding to the storage information of the to-be-processed task and the currently available storage resource information of the computer system.
According to an embodiment of the present application, the scores of the actions in the preset action set with respect to the stored information of the to-be-processed task and the currently available stored resource information of the computer system may include: a reward positively correlated with the size of a memory block obtained each time a memory block is successfully allocated; a first predetermined reward obtained per allocation of a set of storage requests; distributing a second preset reward obtained by completing the storage request set of all the tasks to be processed; punishment positively correlated with the size of the sorted memory block is received when the memory resources in the hierarchy are sorted each time; and/or a penalty positively correlated to the size of the data migrated each time a cross-tier storage data migration is performed.
According to one embodiment of the application, the storage request sets of all the tasks to be processed in time sequence form a storage request set sequence, and the intersection of any two adjacent storage request sets in the storage request set sequence is not an empty set.
According to one embodiment of the application, the depth estimation network is obtained according to deep reinforcement learning training.
According to one embodiment of the present application, the configuration information of the task to be processed may further include priority information of the task. The priority information may characterize the priority of the task by a value, for example, the priority may be set from 1 to 10, the higher the value is, or alternatively, the higher the priority is, the smaller the value is. Then, when the computer system processes each task to be processed, each task can be processed according to the priority information of each task and the priority order from high to low.
FIG. 9 shows a schematic diagram of an electronic device according to an embodiment of the application. As shown in fig. 9, the electronic device 500 may include a processor 510 and a memory 530. The memory 530 stores a computer program. The computer program stored in the memory 530, when executed by the processor 510, can cause the processor 510 to perform a task processing method as described in any of the above embodiments.
According to another aspect of the present application, there is provided a non-transitory computer-readable storage medium having stored thereon computer-readable instructions, which, when executed by a processor, can cause the processor to execute a task processing method according to any one of the above embodiments.
It will be appreciated that the above described apparatus embodiments are merely illustrative and that the apparatus of the present application may be implemented in other ways. For example, the division of the units/modules in the above embodiments is only one logical function division, and there may be another division manner in actual implementation. For example, multiple units, modules, or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented.
The units or modules described as separate parts may or may not be physically separate. A component described as a unit or a module may or may not be a physical unit, and may be located in one apparatus or may be distributed over a plurality of apparatuses. The scheme of the embodiment in the application can be implemented by selecting some or all of the units according to actual needs.
In addition, unless otherwise specified, each functional unit/module in the embodiments of the present application may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may be integrated together. The integrated units/modules may be implemented in the form of hardware or software program modules.
If the integrated unit/module is implemented in hardware, the hardware may be digital circuits, analog circuits, etc. Physical implementations of hardware structures include, but are not limited to, transistors, memristors, and the like. The processor may be any suitable hardware processor, such as a CPU, GPU, FPGA, DSP, ASIC, etc., unless otherwise specified. Unless otherwise specified, the Memory unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as resistive Random Access Memory rram (resistive Random Access Memory), Dynamic Random Access Memory dram (Dynamic Random Access Memory), Static Random Access Memory SRAM (Static Random-Access Memory), enhanced Dynamic Random Access Memory edram (enhanced Dynamic Random Access Memory), High-Bandwidth Memory HBM (High-Bandwidth Memory), hybrid Memory cubic hmc (hybrid Memory cube), and so on.
The integrated units/modules, if implemented in the form of software program modules and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. The technical features of the embodiments may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the description of the embodiments is only intended to facilitate the understanding of the methods and their core concepts of the present application. Meanwhile, a person skilled in the art should, according to the idea of the present application, change or modify the embodiments and applications of the present application based on the scope of the present application. In view of the above, the description should not be taken as limiting the application.

Claims (10)

1. A method for processing tasks for data cache management by a computer system, the method comprising:
acquiring a task to be processed and configuration information of the task to be processed, wherein the configuration information comprises storage information and calculation information;
allocating storage resources to a storage request set of the task to be processed by utilizing a pre-trained depth estimation network according to storage information of the task to be processed and currently available storage resource information of the computer system, wherein the currently available storage resource information is variable in size, the pre-trained depth estimation network is a depth neural network used for representing an estimation function in an enhanced learning algorithm, the depth neural network uses an experience replay method, sampling and training are separated, off-policy training is used, the result of each sampling is put back into an experience pool, samples are extracted from the experience pool for training during training, and the samples can be reused;
scheduling the calculation data to the allocated storage resources according to the calculation information and performing calculation,
the configuration information further includes splitting information of the task, the splitting information is used for splitting the task to be processed into a plurality of subtasks, wherein the plurality of subtasks include subtask configuration information, the subtask configuration information includes subtank storage information and subtask calculation information, the subtask is defined as a subtask to be processed,
when the computer system processes the plurality of to-be-processed subtasks, the following operations are carried out:
according to the sub-storage information of the subtasks to be processed and the currently available storage resource information of the computer system, allocating storage resources for the storage request set of the subtasks to be processed by utilizing a pre-trained depth estimation network;
and scheduling the calculation data to the allocated storage resources according to the sub-calculation information and calculating.
2. The method of claim 1, wherein allocating storage resources for the set of storage requests for the task to be processed using a pre-trained depth estimation network based on the storage information for the task to be processed and information about currently available storage resources of the computer system comprises:
step A: selecting actions which correspond to preset conditions and correspond to the storage information of the tasks to be processed and the currently available storage resource information of the computer system from a preset action set through the depth estimation network, and executing the selected actions;
and B: after the selected action is executed, confirming whether the storage request set of the task to be processed is distributed and completed; and
and if the allocation is not completed, updating the storage information of the to-be-processed task and the currently available storage resource information of the computer system, and repeatedly executing the step A and the step B until the allocation of the storage request set of the to-be-processed task is completed.
3. The method of claim 2, wherein the storage resource comprises at least one hierarchy of storage blocks, the hierarchy of storage blocks comprising at least one of:
an in-core memory block located on a core in a processor;
a shared memory block located between a plurality of cores in the processor;
a common memory block located in the processor;
an off-chip memory block located outside the processor.
4. The method of claim 2, wherein the preset set of actions comprises at least one action of:
allocating storage resources;
sorting storage resources in a hierarchy;
migration of storage data across a hierarchy.
5. The method of claim 4, wherein selecting, through the depth estimation network, actions corresponding to the storage information of the to-be-processed task and the currently available storage resource information of the computer system from a preset action set, which meet a preset condition, comprises:
determining scores of actions in the preset action set relative to the stored information of the task to be processed and the currently available stored resource information of the computer system; and
and selecting the action with the highest score corresponding to the storage information of the task to be processed and the currently available storage resource information of the computer system.
6. The method of claim 5, wherein scores of actions in the preset set of actions relative to stored information of the pending task and currently available stored resource information of the computer system comprise at least one of:
a reward positively correlated with the size of a memory block obtained each time a memory block is successfully allocated;
a first predetermined reward obtained per allocation of a set of storage requests;
distributing a second preset reward obtained by completing the storage request set of all the tasks to be processed;
punishment positively correlated with the size of the sorted storage blocks is received when the storage resources in the hierarchy are sorted each time;
the storage data migration across the hierarchy is carried out each time the storage data migration is carried out, the penalty which is positively correlated with the size of the migrated data is received.
7. The method according to any one of claims 1 to 6, wherein the chronological storage request sets of all the tasks to be processed form a storage request set sequence, and the intersection of any two adjacent storage request sets in the storage request set sequence is not an empty set.
8. The method according to any of claims 1-6, wherein the configuration information further comprises priority information of the tasks to be processed, and the computer system executes the tasks to be processed sequentially according to the priority information.
9. An electronic device, comprising:
a processor; a memory in which the computer program is stored,
wherein the computer program, when executed by the processor, causes the processor to perform the method of any of claims 1-8.
10. A non-transitory computer readable storage medium having stored thereon computer readable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1-8.
CN202010003595.1A 2020-01-03 2020-01-03 Method for processing task using computer system, electronic device and storage medium Active CN110795226B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010003595.1A CN110795226B (en) 2020-01-03 2020-01-03 Method for processing task using computer system, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010003595.1A CN110795226B (en) 2020-01-03 2020-01-03 Method for processing task using computer system, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN110795226A CN110795226A (en) 2020-02-14
CN110795226B true CN110795226B (en) 2020-10-27

Family

ID=69448432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010003595.1A Active CN110795226B (en) 2020-01-03 2020-01-03 Method for processing task using computer system, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN110795226B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111831415B (en) * 2020-07-10 2024-01-26 广东石油化工学院 Multi-queue multi-cluster task scheduling method and system
WO2023208027A1 (en) * 2022-04-29 2023-11-02 北京灵汐科技有限公司 Information processing method and information processing unit, and device, medium and product
CN115114028B (en) * 2022-07-05 2023-04-28 南方电网科学研究院有限责任公司 Task allocation method and device for secondary control of electric power simulation
CN116896483B (en) * 2023-09-08 2023-12-05 成都拓林思软件有限公司 Data protection system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8209417B2 (en) * 2007-03-08 2012-06-26 Oracle International Corporation Dynamic resource profiles for clusterware-managed resources
CN110502330A (en) * 2018-05-16 2019-11-26 上海寒武纪信息科技有限公司 Processor and processing method
CN109992404B (en) * 2017-12-31 2022-06-10 中国移动通信集团湖北有限公司 Cluster computing resource scheduling method, device, equipment and medium

Also Published As

Publication number Publication date
CN110795226A (en) 2020-02-14

Similar Documents

Publication Publication Date Title
CN110795226B (en) Method for processing task using computer system, electronic device and storage medium
US10990561B2 (en) Parameter server and method for sharing distributed deep learning parameter using the same
CN112199190B (en) Memory allocation method and device, storage medium and electronic equipment
CN110262901B (en) Data processing method and data processing system
JP7078758B2 (en) Improving machine learning models to improve locality
CN113946431B (en) Resource scheduling method, system, medium and computing device
CN109447253B (en) Video memory allocation method and device, computing equipment and computer storage medium
CN112181613B (en) Heterogeneous resource distributed computing platform batch task scheduling method and storage medium
CN111143039B (en) Scheduling method and device of virtual machine and computer storage medium
CN112764936B (en) Edge calculation server information processing method and device based on deep reinforcement learning
CN112559165A (en) Memory management method and device, electronic equipment and computer readable storage medium
CN110750363B (en) Computer storage management method and device, electronic equipment and storage medium
CN114356587B (en) Calculation power task cross-region scheduling method, system and equipment
CN113723443A (en) Distributed training method and system for large visual model
CN115934344A (en) Heterogeneous distributed reinforcement learning calculation method, system and storage medium
CN113407343A (en) Service processing method, device and equipment based on resource allocation
CN113452546A (en) Dynamic quality of service management for deep learning training communications
CN116893904A (en) Memory management method, device, equipment, medium and product of neural network model
CN116915869A (en) Cloud edge cooperation-based time delay sensitive intelligent service quick response method
CN111597035A (en) Simulation engine time advancing method and system based on multiple threads
CN116860469A (en) Model service system and method, server, computing device, and storage medium
CN116010051A (en) Federal learning multitasking scheduling method and device
CN116367190A (en) Digital twin function virtualization method for 6G mobile network
CN114138484A (en) Resource allocation method, device and medium
CN114896070A (en) GPU resource allocation method for deep learning task

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant