WO2023071643A1 - 用于处理任务的方法、装置、电子设备和介质 - Google Patents

用于处理任务的方法、装置、电子设备和介质 Download PDF

Info

Publication number
WO2023071643A1
WO2023071643A1 PCT/CN2022/120604 CN2022120604W WO2023071643A1 WO 2023071643 A1 WO2023071643 A1 WO 2023071643A1 CN 2022120604 W CN2022120604 W CN 2022120604W WO 2023071643 A1 WO2023071643 A1 WO 2023071643A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
real
time
processing unit
execute
Prior art date
Application number
PCT/CN2022/120604
Other languages
English (en)
French (fr)
Inventor
彭席汉
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023071643A1 publication Critical patent/WO2023071643A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • Embodiments of the present disclosure mainly relate to the field of computer technology, especially artificial intelligence (AI) technology. More specifically, the embodiments of the present disclosure relate to a method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product for a user to process a task.
  • AI artificial intelligence
  • AI technology has played a very good role in image analysis (eg, face recognition, text recognition), natural language processing, speech recognition and other fields.
  • image analysis eg, face recognition, text recognition
  • speech recognition e.g., speech recognition
  • researchers have also begun to actively explore the use of AI technology combined with motion control technology to enable robots to complete some more open tasks, not limited to traditional automation tasks.
  • AI robots collect data from the surrounding environment through sensors, and then use AI technology to identify objects in the environment. For example, industrial robots use cameras to sort items on the conveyor belt using AI technology, or service robots recognize whether there are obstacles in the surrounding environment, and take corresponding processing measures, such as parking and obstacle avoidance.
  • processing measures such as parking and obstacle avoidance.
  • the robot application system not only the traditional ontology control tasks (for example, controlling motor movement), but also AI perception or reasoning tasks (for example, recognizing or detecting objects in images) are added. Due to the limited computing resources of the robot and the real-time requirements of its tasks, this brings challenges to researchers.
  • Embodiments of the present disclosure provide solutions for processing real-time tasks in robotic systems.
  • a method for processing tasks includes, based on a profile of the task to be performed, determining a real-time requirement of the task and a computing resource for executing the task, and causing the computing resource to execute the task if the real-time requirement indicates that the task is a real-time task.
  • a computing resource includes at least one processing unit having a first thread for performing a non-real-time task and a second thread for performing a real-time task.
  • the at least one processing unit executes the task using a second thread.
  • a signal to stop executing the non-real-time task is sent to the at least one processing unit.
  • At least one processing unit includes a plurality of processing units having corresponding third threads, wherein causing the computing resource to execute the task comprises: generating a plurality of parallel subtasks from the task; The plurality of processing units execute the plurality of parallel subtasks using the third thread; and determine a combined processing result based on results of the plurality of processing units executing the plurality of parallel subtasks.
  • the at least one processing unit is a CPU core.
  • a task is a control task for controlling the motion of the robot.
  • the computing resource includes a processing unit, the processing unit has a first task queue, the first task queue includes at least one non-real-time subtask of a non-real-time task, and causing the computing resource to execute the task includes: causing the The processing unit stops executing at least one non-real-time subtask in the first task queue.
  • the processing unit also has a second task queue
  • causing the computing resource to execute the task includes: decomposing the task into a plurality of real-time subtasks; adding the plurality of real-time subtasks to the first task queue of the processing unit a task queue; and causing the processing unit to execute multiple real-time subtasks in the second task queue.
  • causing the processing unit to stop executing at least one non-real-time subtask in the first task queue includes: determining the remaining time required for the processing unit to complete the non-real-time subtask being executed, if the remaining time exceeds a preset threshold, the processing unit is reset. After the reset is complete, the processing unit performs the task.
  • the processing unit is made to execute the task after the non-real-time subtask is completed.
  • the method further includes storing position information of the stopped non-real-time subtask in the first task queue, and in response to completion of execution of the task, causing the processing unit to resume execution of the first task queue based on the position information At least one non-real-time subtask in .
  • the processing unit may be a neural network processing unit or a graphics processing unit.
  • the task may be an artificial intelligence AI reasoning task.
  • the configuration file includes information about the real-time requirements of the task, information about the type of the task, and information about the computing resources used to execute the task.
  • an apparatus for processing a task includes a task configuration determination unit configured to determine the real-time requirements of the task and the computing resources used to execute the task based on the configuration file of the task to be performed.
  • the apparatus also includes a task control unit configured to cause the computing resource to execute the task if the real-time requirement indicates that the task is a real-time task.
  • an electronic device including a processing unit and a memory, and the processing unit executes instructions in the memory, so that the electronic device executes the method according to the first aspect of the present disclosure.
  • a computer-readable storage medium on which one or more computer instructions are stored, wherein the one or more computer instructions are executed by a processor to cause the processor to perform the first method according to the present disclosure. method described in the aspect.
  • a computer program product comprising machine-executable instructions which, when executed by a device, cause the device to perform the method according to the first aspect of the present disclosure.
  • Figure 1 shows a schematic diagram of an example environment in which various embodiments of the present disclosure can be implemented
  • Fig. 2 shows a schematic block diagram of a system architecture according to an embodiment of the present disclosure
  • FIG. 3 shows a schematic flowchart of a process for processing tasks according to an embodiment of the present disclosure
  • FIG. 4 shows a schematic diagram of a solution for switching computing resources for real-time CPU tasks according to an embodiment of the present disclosure
  • FIG. 5 shows a schematic flowchart of a process for switching computing resources for a real-time CPU task according to an embodiment of the present disclosure
  • FIG. 6 shows a schematic diagram of a solution for switching computing resources for a real-time AI reasoning task according to an embodiment of the present disclosure
  • FIG. 7 shows a schematic flowchart of a process for switching computing resources for a real-time AI inference task according to an embodiment of the present disclosure
  • Fig. 8 shows a schematic block diagram of a task processing device according to an embodiment of the present disclosure
  • Fig. 9 shows a schematic block diagram of an example device that may be used to implement embodiments of the present disclosure.
  • the motion control task of the robot body requires certain real-time requirements.
  • the industrial mechanical arm needs to send motion control commands at a fixed frequency.
  • the real-time means that the execution time of the task meets a certain volatility requirement. If it exceeds this range, it will have an impact on the robot's executive body, such as unsmooth and unsmooth movement.
  • the AI reasoning task is associated with the motion execution task, for example, the motion execution task needs to be determined according to the processing results of the AI reasoning task, which inevitably requires the AI reasoning task to have certain real-time requirements.
  • the user when implementing a real-time task, the user needs to call the real-time process setting interface provided by the operating system kernel in the implementation code of the task, so that its execution environment becomes the context of a real-time process, and then the real-time task will be performed by the operating system kernel Scheduling and computing resource allocation.
  • the task involves multi-core parallel computing and AI reasoning tasks
  • the user also needs to add an SDK interface that explicitly calls the parallel computing library (such as the OpenMP library) and the AI model framework in the real-time code of the task. Therefore, inconvenience is brought to the user.
  • task scheduling and resource allocation are completely implemented by the operating system kernel, and the operating system kernel may allocate real-time tasks to computing resources with heavy loads, resulting in uncertainty in scheduling time and affecting the execution time of real-time tasks.
  • the embodiments of the present disclosure utilize configuration files to provide information such as real-time requirements of tasks and required computing resources, so as to provide required computing resources for real-time tasks in a more efficient and accurate manner.
  • tasks with real-time processing requirements such as traditional robot computing tasks and AI reasoning tasks, can be processed more quickly and efficiently, thereby optimizing the scheduling of various tasks and improving system processing efficiency.
  • FIG. 1 shows a schematic diagram of an example environment 100 in which various embodiments of the present disclosure can be implemented.
  • the environment 100 is a typical AI robot system architecture.
  • the environment 100 includes an application program 110, a robot development framework 120, and an operating system 130 as software parts of the robot system, and a robot system-on-chip (SoC) chip 140 and memory 150 as hardware parts.
  • SoC robot system-on-chip
  • Application programs 110 are developed by users to implement various functions and tasks associated with specific application scenarios.
  • the application program 110 can implement tasks such as motion control of the robot such as rotation, grasping, translation, etc., and can also implement AI such as recognizing objects in conveyor belt images captured by the camera on the robot.
  • Task For a service robot, the application program 110 can implement robot motion control tasks such as forward, backward, braking, and steering, and can also implement AI tasks such as recognizing received voice information.
  • the application program 110 includes a non-real-time CPU task 112 , a real-time CPU task 114 , a real-time AI task 116 , and a real-time AI task 118 .
  • the motion control tasks of the robot can be executed by the central processing unit (CPU) cores 142 and 144 in the SoC chip 140 , and correspondingly, such tasks are also referred to as CPU tasks.
  • the reasoning tasks of the robot (for example, image recognition and detection) can be performed by the AI processing units 146 and 148 in the SoC chip, and correspondingly, such tasks are also called AI tasks.
  • Fig. 1 schematically shows tasks 112, 114, 116, 118 with different real-time requirements and types, it should be understood that the number of tasks is not limited here.
  • Application 110 may include more or fewer tasks, and any number of each task.
  • the robot development framework 120 may be a robot operating system (Robot Operating System: ROS).
  • ROS TM Robot Operating System
  • ROS TM is an open source meta-operating system suitable for robots. It provides the services that an operating system should have, including hardware abstraction, low-level device control, implementation of common functions, inter-process message passing, and package management. It also provides the tools and library functions needed to fetch, compile, write, and run code across computers.
  • Users can create robot task nodes (Nodes) through the application programming interface (Application Programming Interface: API) provided by ROS, and the message communication mechanism between nodes is implemented by the ROS framework. Users only need to call the API to realize the internal logic of specific tasks.
  • the robot development framework 120 may also be other development frameworks except ROS TM , which is not limited in the present disclosure.
  • the operating system 130 provides an interface between the application program 110 and the robot development framework 120 and the hardware environment SoC 140 and the memory 150 .
  • the operating system 130 may be, for example, open source Linux TM or any other commercially available version of the operating system.
  • SoC chip 140 integrates several CPU cores 142 and 144 , and neural network processing units (NPUs) 146 and 148 .
  • the CPU cores 142 and 144 have, for example, a cache memory, a control unit, and an arithmetic unit (not shown in the figure), and generally execute codes or instructions sequentially, and are suitable for executing relatively complex logic control.
  • Each CPU core 142 or 144 may have one or more threads, and the threads are executed in a time-multiplexed manner.
  • the AI processing units 146 and 148 adopt a parallel computing structure, which is more suitable for running AI models to process data such as video and images.
  • the AI processing unit may be, for example, a neural network processing unit (NPU).
  • the NPU may include multiple (eg, 16, 32, 64 or more) parallel multiply-add modules, activation function modules, and the like.
  • the multiply-add module is used to calculate matrix multiply-add, convolution, dot product, etc.
  • the activation function module implements the activation function in the neural network by means of, for example, parameter fitting.
  • the AI processing units 146 and 148 may also be Graphic Processing Units (Graphic Processing Unit: GPU) or other devices with a parallel computing structure, such as Field Programmable Logic Gate Array (FPGA), Application Integrated Circuit (ASIC), etc.
  • FPGA Field Programmable Logic Gate Array
  • ASIC Application Integrated Circuit
  • the tasks 112 and 114 in the robot's application program 110 can be performed by the CPU cores 142, 144, while the AI tasks can be performed by the AI processing units 146, 148.
  • Fig. 1 shows two CPU cores 142, 144 and two AI processing units 146, 148 of the SoC chip 140, it should be understood that the present disclosure does not limit the number of CPU cores and AI processing units on the SoC.
  • the SoC chip 140 may include more or fewer CPU cores and AI processing units.
  • the memory 150 may also be called internal memory or main memory.
  • the memory 150 may be any type of memory that exists or will be developed in the future, such as DDR memory.
  • the memory 150 stores executable codes and data required to run the application program 110, the robot development framework 120 and the operating system 130, for example, the data of the thread to be executed, the image data acquired by the sensor, the AI model used for reasoning tasks, etc., In order to be accessed and executed by CPU cores and NPU.
  • the embodiments of the present disclosure may also be implemented in environments different from those shown in FIG. 1 .
  • the CPU core and AI processing unit are not necessarily integrated on the same SoC chip, but can also be implemented on different SoC chips or devices.
  • FIG. 2 shows a schematic block diagram of a system architecture 200 according to an embodiment of the present disclosure.
  • the system architecture 200 includes an application program 210 and a robot development framework 220 .
  • the application program 210 can be implemented as the application program 110 shown in FIG. 1
  • the robot development framework 220 can be implemented as the robot development framework 120 shown in FIG. 1 .
  • the application program 210 includes two parts: a task-oriented configuration file 212 and a task implementation logic code 214 . These contents are written and implemented by users.
  • the configuration file 212 may be defined in an editable, readable format, eg, JSON, YAML or XML format. In some embodiments, the configuration file 212 defines the name of the task, the computing resources required for the task to run, and whether the task is a real-time task. Table 1 below gives an exemplary configuration file.
  • This exemplary configuration file indicates that the task named "my task” is a real-time CPU task, and it is specified to use CPU core 0 and CPU core 1 to execute when it is a CPU task, and to use AI when it is an AI task. processing unit 0 to execute.
  • the availability of the required computing resources depends on the field "type”. That is to say, when the type is "0", the corresponding CPU resource is used to execute the task, and when the type is "1", the corresponding NPU resource is used to execute the task. It should be noted that if the required CPU resources are designated as multiple CPU cores, it means that the task can be processed in parallel by multiple CPU cores when the parallel execution conditions are met.
  • the configuration file 212 is customizable, for example, may include more, fewer or different fields, for example, the configuration file 212 may only include one of the CPU resource field and the AI computing resource field. Therefore, by writing the configuration file 212 of the task, the user can easily specify information such as the type of the task, real-time requirements, and computing resources used to execute the task.
  • the task implementation logic code 214 includes the specific task logic code implemented by the user.
  • the task implementation logic code 214 is implemented based on the abstract class programming interface 225 provided by the robot development framework 220 .
  • the task implementation logic code 214 may call or inherit the tools or library functions provided by the robot development framework 220 , and may use the development framework 220 to implement communication between tasks.
  • the user only needs to implement the task logic code without manually creating the threads required to execute the task, and the user only needs to arrange the configuration file to allocate computing resources to the task without coding.
  • the corresponding work is realized by the robot development framework 220 according to the embodiment of the present disclosure.
  • the robot development framework 220 includes a configuration parameter management module 221 , a thread resource scheduling management module 222 , a task scheduling management module, an NPU operator scheduling management module, and an abstract class programming interface 225 .
  • the configuration parameter management module 221 defines a description file for describing resources required for task execution, that is, the configuration file 212 described above.
  • the configuration file 212 written by the user can be parsed by the configuration parameter management module 221 to obtain information related to task execution.
  • Configuration parameter management module 221 utilizes this information to facilitate scheduling of computing resources for tasks. For example, for a real-time task whose "real" field is TRUE, it is necessary to make the corresponding CPU core or AI processing unit resources serve it as soon as possible, and ensure that they can monopolize these resources, so as to meet the real-time requirements.
  • the thread pool management module 222 is used for generating and binding reserved threads for CPU cores on the SoC chip. Reserved threads may include a first thread for non-real-time tasks (also called a non-real-time thread), a second thread for real-time tasks (also called a real-time thread), and a third thread for parallel subtasks (also called a real-time thread). called parallel subtask threads). When the robot starts, these threads can be spawned and bound to the corresponding CPU cores. That is, one CPU core has three reserved threads for performing different tasks respectively.
  • the thread pool management module 222 allocates storage space for reserving threads in the memory 150 for storing data of tasks to be executed in the future. Utilizing the thread reserved for the CPU core in the memory 150, once the CPU core receives a task, it can quickly switch to the corresponding task thread to meet real-time requirements and improve the efficiency of task execution.
  • the task scheduling management module 223 is used to complete the allocation of tasks and underlying thread resources when the tasks are scheduled.
  • the task scheduling management module 223 may write the data of the task to be executed into the storage space of the corresponding reserved thread, and then send a switching signal to the CPU core to switch the CPU core to the corresponding reserved thread.
  • the threads of the CPU core are time-division multiplexed, and the CPU core can only execute one of the threads. For example, when a real-time CPU task 114 is triggered and assigned to the CPU core specified by the configuration file.
  • a context switch context switch
  • the task scheduling management module 223 switches the CPU core from, for example, a non-real-time thread executing the non-real-time CPU task 112 to a real-time thread.
  • the context environment includes register data in the CPU core, such as the program counter and so on.
  • CPU cores are known to provide hardware-based context switching mechanisms. Using the hardware-based context switching mechanism, the context of the interrupted thread is saved and used for subsequent fast recovery of the interrupted thread.
  • the scheduling mechanism for real-time CPU tasks will be described in detail below with reference to FIG. 4 and FIG. 5 .
  • the operator scheduling management unit 224 is used to schedule AI tasks to be executed by the NPU.
  • AI models related to AI reasoning tasks eg, neural network models
  • a typical neural network model for example, a convolutional neural network model (CNN) can have dozens or even hundreds of layers.
  • the AI model can be decomposed into several operators (operators) with reference to the layer structure.
  • An operator can consist of one or more layers of parallel computation. Operators can be provided to AI processing units suitable for parallel computing to execute.
  • the AI processing unit completes the AI task by sequentially executing the operators of the AI model. In some embodiments, the AI processing unit may alternately execute operators from multiple AI reasoning tasks.
  • AI tasks include non-real-time AI tasks 116 and real-time AI tasks 118 .
  • the NPU operator scheduling management unit 224 provides a mechanism for scheduling real-time AI tasks by resetting the NPU. This will be described in detail below with reference to FIGS. 6 and 7 .
  • the abstract class programming interface 225 provides the user with task templates for implementing the logic code 214 for the task.
  • the abstract class programming interface 225 adopts the inheritance mechanism provided by the object-oriented programming language to define the task implementation functions that users must overload.
  • the tasks in the robot system can be divided into three parts: preparation work before task execution, task executor, and processing after task execution. Therefore, these three parts can be abstractly defined into three functions, which can be implemented by users.
  • Tasks are eventually encapsulated as Task objects. Task objects can be loaded and executed by CPU cores or AI processing units. Table 2 below provides an implementation example taking C++ language as an example.
  • part or all of some CPU tasks may be divided into multiple parallel subtasks.
  • a certain CPU task may include an image enhancement operation that scales the pixel value of each pixel in the image. Since each pixel value can be considered independent, this task can be divided into multiple subtasks that execute independently.
  • independently executed subtasks are also called small tasks (TaskLet). TaskLets are assigned to those CPU cores specified in configuration file 212 .
  • the embodiment of the present disclosure also provides an API interface for running TaskLet.
  • An example of its implementation is given below:
  • TaskLetFunc is a TaskLet execution function, implemented by the user.
  • a TaskLet may include functions for scaling on a per-pixel basis.
  • LaunchTaskLet can automatically generate TaskLet according to the number of CPU cores executed in the configuration file.
  • the generated multiple parallel subtasks can be scheduled to the corresponding CPU cores described in the configuration file 212 to be executed. For real-time tasks, this can reduce the execution time to meet their real-time requirements.
  • the CPU core can utilize the reserved third thread to perform parallel subtasks. The execution time of real-time tasks can be further reduced by reserving dedicated threads to execute parallel subtasks.
  • system architecture 200 An exemplary system architecture 200 according to an embodiment of the present disclosure is described above. It should be understood that the embodiments of the present disclosure may also include different architectures, for example, any module in the system architecture 200 may be divided into more modules, and two or more modules may be combined to form a single module. This disclosure does not limit this.
  • FIG. 3 shows a schematic flowchart of a process 300 for processing tasks according to an embodiment of the present disclosure.
  • Process 300 may be implemented in, for example, robot development framework 120 of FIG. 1 and robot development framework 220 of FIG. 2 .
  • the real-time requirements of the task and computing resources for performing the task are determined.
  • the application 110 or 210 generates a task to be performed in response to a task being triggered, eg, user input or detection of a specific event.
  • the service robot in response to changes in the acceleration of gravity collected by the acceleration sensor (the robot may fall), the service robot can generate motion tasks for controlling the robot to maintain balance.
  • the server robot in response to the surrounding environment image captured by the image sensor, the server robot may generate a recognition task for recognizing an object in the image.
  • the task implementation logic code 214 has been implemented by the user through the abstract programming interface 225 .
  • the user also writes a configuration file 212 corresponding to the task to specify the corresponding computing resources.
  • the configuration file 212 may include real-time requirement information of the task, for example, whether the task is a real-time task or a non-real-time task.
  • the configuration file 212 may also include task type information for the task, eg, whether the task is a control task for controlling the motion of the robot or an AI inference task.
  • the control task of controlling the movement of the robot can be assigned to one or more CPU cores for execution, so it can also be called a CPU task.
  • AI reasoning tasks can be assigned to one or more NPUs or GPUs for execution, so they can also be called AI tasks.
  • the computing resources used to perform the task are specified in the configuration file 212, as described above.
  • the task's computing resources are caused to execute the task.
  • the task type information of the configuration file 212 indicates that the task to be executed is a CPU task, and the computing resource information indicates CPU core 0 and CPU core 1, then the CPU task will be handed over to CPU core 142 and CPU core 1 in the SoC chip 140, for example. CPU core 144 to execute.
  • the task type information of the configuration file 212 indicates that the task to be executed is a CPU task, and the computing resource information indicates NPU 0, then the NPU task will be executed by, for example, the AI processing unit 146 in the SoC chip 140.
  • Embodiments of the present disclosure use configuration files to provide real-time requirements of tasks, required computing resources and other information, thereby providing required computing resources for real-time tasks in a more efficient and accurate manner.
  • tasks with real-time processing requirements such as traditional robot computing tasks and AI reasoning tasks, can be processed more quickly and efficiently, thereby optimizing the scheduling of various tasks and improving system processing efficiency.
  • embodiments of the present disclosure provide corresponding computing resource switching mechanisms to meet the requirements of these two types of tasks on execution time fluctuations.
  • FIG. 4 shows a schematic diagram of an exemplary scheme 400 for switching computing resources for real-time CPU tasks according to an embodiment of the present disclosure.
  • the scheme 400 may be implemented in, for example, the robot development framework 120 described in FIG. 1 and the robot development framework 220 shown in FIG. 2 .
  • the configuration file 212 specifies that the computing resources of the real-time CPU task are CPU core 0 and CPU core 1 (for example, corresponding to CPU cores 142 and 144 in FIG. 1 ) for description. It should be understood that the scheme 400 is also applicable when the configuration file 212 specifies more or fewer CPU cores, or specifies other different CPU cores.
  • threads 401, 402, and 403 are generated for CPU core 0
  • threads 411, 412, and 413 are generated for CPU core 1
  • threads 421, 422, and 423 are generated for CPU core 2, and so on.
  • Threads in the CPU cores are created and kept in the memory of the robot system as reserved threads.
  • the execution units of these threads are Task objects abstractly encapsulated according to the embodiments of the present disclosure, and are used for different Task objects. For example, in CPU core 0, thread 401 is used to execute non-real-time tasks, thread 402 is used to execute real-time tasks, and thread 403 is used to execute parallel sub-tasks. Threads 411 , 412 and 413 in a CPU core are similar to CPU core 40 .
  • the thread of the CPU core can be in an idle state until a corresponding task object is added to its execution queue.
  • CPU core 0 is using the non-real-time task thread 401 to execute the non-real-time CPU task
  • CPU core 1 is also using the non-real-time task thread 411 to execute the non-real-time CPU task.
  • Real-time CPU tasks In response to the real-time CPU task being triggered, CPU core 0 and CPU core 1 will switch computing resources to execute the real-time CPU task.
  • FIG. 5 shows a schematic flowchart of a process 500 of switching computing resources for real-time CPU tasks according to an embodiment of the present disclosure.
  • Process 500 may be implemented in, for example, robot development framework 120 described in FIG. 1 and robot development framework 220 shown in FIG. 2 .
  • the process 500 is described in conjunction with FIG. 4 .
  • CPU cores 0 and 1 eg, corresponding to CPU cores 142 and 144 of FIG. 1 associated with the real-time CPU task to be performed are executing non-real-time tasks.
  • the statuses of the non-real-time task threads of CPU cores 0 and 1 can be checked, thereby determining whether CPU cores 0 and 1 are executing non-real-time tasks. If any one or both of CPU cores 0 and 1 are executing non-real-time tasks, then at block 520, a signal to suspend the non-real-time tasks (for example, a SIG_STOP signal based on the Linux TM operating system) is sent to the corresponding CPU core.
  • SIG_STOP can call the hardware context switching mechanism for the CPU core, so as to save the information of the suspended non-real-time task (for example, the register data of the CPU core).
  • real-time CPU tasks are scheduled to a CPU core. For example, by adding the pointer or address of the Task object pointing to the real-time CPU task into the execution queue of the real-time task thread 402 of the CPU core 0, the real-time CPU task is scheduled so that the CPU core 0 utilizes the thread 402 to execute the task, as As shown in the execution progress bar 407 and corresponding arrows in Figure 4
  • a function implemented eg, through inheritance
  • LaunchTaskLet it is determined that there are multiple subtasks to be executed in parallel.
  • Multiple parallel subtasks TaskLet can be generated from a real-time CPU task through this abstract programming interface.
  • the Task object of the parallel subtask TaskLet can be added to the parallel subtask threads 403 and 413 of CPU cores 0 and 1, so that CPU cores 0 and 1 utilize threads 403 and 413 to execute parallel subtasks, as shown in Figure 4
  • the execution progress bars 408 and 418 are shown.
  • the execution results obtained by the respective CPU cores are combined to obtain a combined result.
  • real-time CPU tasks are specified by configuration file 212 to be performed by two CPU cores. It should be understood that the designated computing resource of the real-time CPU task may include more or less CPU cores. That is to say, the embodiments of the present disclosure do not limit the scale of parallel subtasks.
  • a signal (eg, SIG_CONT signal based on the Linux TM operating system) to resume the non-real-time task is sent to the CPU cores involved in the task.
  • SIG_STOP may invoke a context switching mechanism for the CPU core, so as to resume the previously suspended non-real-time task by utilizing the stored execution information, as shown in the execution progress bars 409 and 419 in FIG. 4 .
  • FIG. 6 shows a schematic diagram of a solution 600 for switching computing resources for real-time AI inference tasks according to an embodiment of the present disclosure.
  • the scheme 600 may be implemented in, for example, the robot development framework 120 described in FIG. 1 and the robot development framework 220 shown in FIG. 2 .
  • FIG. 6 shows a scheme of switching computing resources for models 610 and 620 of non-real-time tasks and model 630 of real-time tasks. It should be understood that computing resources on the AI processing unit can be switched for any number of models of non-real-time tasks and models of real-time tasks.
  • the computing resource is the AI processing unit 650, for example, a parallel processing unit such as NPU, GPU, FPGA or the like.
  • Any one of the models 610, 620, and 630 may be a trained neural network model, such as a convolutional neural network model, a recurrent neural network model, a graph neural network model, etc., which is not limited in the present disclosure.
  • the trained model can be used for AI reasoning tasks, such as image recognition, object detection, speech processing, etc.
  • the AI reasoning task can be decomposed into several sub-tasks.
  • subtasks obtained from AI reasoning tasks or corresponding models may also be referred to as operators.
  • An operator corresponds to the parallel computation of multiple nodes of one or more layers of an AI model. Operators can be executed serially to complete AI reasoning tasks.
  • model 610 is decomposed into operators 1-1 to 1-4 arranged sequentially in operator flow 611
  • model 620 is decomposed into operators 2-1 to 1-4 arranged in sequence in operator flow 621.
  • the model 630 is decomposed into operators 3-1 to 3-4 arranged in sequence in the operator flow 631 .
  • Operator scheduler 640 may provide operators to AI processing unit 650 for execution. Specifically, the operator scheduler 640 may add operators to the task queue of the AI processing unit 650 to be executed.
  • the to-be-executed task queue of the AI processing unit 650 includes a first task queue 651 and a second task queue 652 .
  • the first task queue 651 is used for non-real-time AI tasks, and includes operators decomposed from models 610 and 620 related to non-real-time tasks.
  • the second task queue 652 is used for real-time AI tasks, including operators decomposed from the model 630 related to real-time tasks.
  • the operator scheduler 640 may add the operators of the non-real-time task models 610 and 620 to the first task queue 651 in a round-robin manner.
  • their operators can also be added to the second task queue 652 in a round-robin manner, or alternatively, after all the operators of a real-time task model are added to the second task queue 652 Add another operator of the real-time task model, so as to at least meet the real-time requirements of the previous real-time task.
  • the first task queue 651 and the second task queue 652 may be stored in the memory 150 in the form of a circular queue.
  • the operators themselves are also stored in memory.
  • Each element in queues 651 and 652 may store a pointer or address to an operator.
  • the first task queue 651 and the second task queue 652 may have a preset depth, that is, a maximum number of operators that can be accommodated. The depth can be set according to the average number of operators of the model, such as 10, 20 or other appropriate values.
  • the operator scheduler 640 may stop obtaining the operators of the model from the corresponding operator streams 611, 621 and 631 until there is an empty position.
  • the operator scheduler 640 may add the operators of the non-real-time task models 610 and 620 to the first task queue 651 in a round-robin manner.
  • their operators can also be added to the second task queue 652 in a round-robin manner, or alternatively, after all the operators of a real-time task model are added to the second task queue 652 Add another operator of the real-time task model, so as to at least meet the real-time requirements of the previous real-time task.
  • the AI processing unit 650 acquires operators from the first task queue 651 or the second task queue 652 for execution.
  • an indicator may be set for each queue, and the AI processing unit 650 acquires a corresponding operator according to the indicator. This indicator is then incremented to point to the next operator in the queue.
  • the AI processing unit 650 in order to meet the real-time requirement of the real-time AI reasoning task, preferentially acquires the operator to be executed from the second task queue 652 .
  • the AI processing unit 650 needs to switch to the second task queue 652 instead of executing the non-real-time task operator in the first task queue 651. son.
  • the operator scheduler 640 may determine whether to send a reset signal (Reset) to the AI processing unit 650 based on a policy. Then, the reset NPU can be switched to execute the real-time task operators in the second task queue 652 .
  • the reset mechanism of the AI processing unit 650 includes a circuit for hardware reset and a circuit for resource initialization after reset. Resource initialization is embodied in the form of chip microcode. When the hardware circuit reset signal is triggered, the AI processing unit 650 will automatically execute this part of the code, and the code execution speed will be very fast.
  • the AI processing unit 650 can switch from executing non-real-time operators in the first task queue 651 to executing real-time operators in the second task queue 652 faster to meet the real-time performance of real-time AI reasoning tasks Require.
  • FIG. 7 shows a schematic flowchart of a process 700 of switching computing resources for a real-time AI inference task according to an embodiment of the present disclosure.
  • Process 700 may be implemented in, for example, robot development framework 120 described in FIG. 1 and robot development framework 220 shown in FIG. 2 .
  • the process 700 is described in conjunction with FIG. 6 .
  • the remaining time of the non-real-time task operator being executed by the AI processing unit 650 is calculated.
  • the time required to execute each operator can be tested in advance and recorded in the operator information table.
  • the AI processing unit 650 starts to execute the operator, record the time point when the operator starts to execute.
  • a real-time AI inference task is triggered during the execution of the operator, subtract the execution start time from the trigger time of the real-time AI inference task to obtain the elapsed execution time of the operator.
  • the remaining time can be obtained by referring to the time required to execute the operator recorded in the operator information table.
  • the preset threshold may be a parameter related to the hardware platform, generally determined according to the time required for NPU reset. For example, when the reset time actually tested by a certain hardware platform is 1ms, the threshold can be set to 1ms. If the remaining time of the operator exceeds 1 ms, it indicates that resetting the AI processing unit can bring greater time benefits, and real-time AI inference tasks can be executed earlier.
  • the AI processing unit is reset. If it is determined that the remaining time is less than the preset threshold, then at block 710, wait for the execution of the non-real-time task operator to complete.
  • the NPU is reset, at block 708, the position of the non-real-time task operator in the first task queue is stored. It should be understood that since the AI processing unit is reset, the non-real-time task operator that was originally being executed is interrupted, and the operator needs to be re-executed when the first task queue 651 is resumed.
  • the operator scheduler 640 may stop taking operators from models corresponding to non-real-time inference tasks, and switch to take operators from models corresponding to real-time AI reasoning tasks, and insert them into the real-time operator execution queue 652 .
  • the real-time task operator in the second task queue is executed.
  • the AI processing unit 650 acquires and executes the task data of the real-time task operator pointed to by the indicator according to the indicator of the second task queue 652 .
  • block 716 it is determined whether the real-time task operators in the second task queue have been executed? If not, go back to block 714 and continue to execute the real-time task operators in the second task queue 652 . That is to say, the AI processing unit 650 will keep executing the operators in the second task queue 652 until there is no real-time task operator therein.
  • the operator scheduler 640 may resume taking the operator from the model corresponding to the non-real-time AI reasoning task and inserting it into the first task queue 651 .
  • a non-real-time task operator is interrupted. In some embodiments, it may be determined whether there is an interrupted non-real-time task operator by checking whether the action in block 708 has been performed. For example, if a certain location information about the first task queue 651 is recorded, it indicates that the execution of the corresponding operator is interrupted.
  • the interrupted non-real-time task operator is re-executed. Otherwise, at block 724, the next non-real-time task operator is executed. Thus, the AI processing unit 650 resumes performing non-real-time AI reasoning tasks.
  • FIG. 8 shows a schematic block diagram of an apparatus 800 for processing tasks according to an embodiment of the present disclosure.
  • the apparatus 800 may be implemented in, for example, the robot development framework 120 of FIG. 1 and the robot development framework 220 of FIG. 2 .
  • the apparatus 800 includes a task configuration determining unit 810 .
  • the task configuration determining unit 810 is configured to determine the real-time requirements of the task and the computing resources used to execute the task based on the configuration file of the task to be executed.
  • the apparatus 800 also includes a task control unit 820 .
  • the task control unit is configured to cause the computing resource to execute the task if the real-time requirement indicates that the task is a real-time task.
  • the configuration file includes information about the real-time requirements of the task, information about the type of task, and information about the computing resources used to execute the task.
  • the computing resource includes at least one processing unit, such as a CPU core, and the task may be a control task for controlling the motion of the robot.
  • the at least one processing unit has a first thread for performing non-real-time tasks and a second thread for performing real-time tasks.
  • the task control unit may also be configured to cause the at least one processing unit to execute the task using the second thread.
  • the task control unit may be further configured to send a signal to the at least one processing unit to stop executing the non-real-time task if it is determined that the at least one processing unit is using the first thread to execute the non-real-time task.
  • At least one processing unit includes multiple processing units, such as multiple CPU cores, and the multiple processing units have corresponding third threads.
  • the task control unit may also be configured to generate a plurality of parallel subtasks from the task, and cause the plurality of processing units to execute the plurality of parallel subtasks using the third thread. Then, the task control unit may also determine a combined processing result based on the results of executing multiple parallel subtasks by multiple processing units.
  • the computing resource may include a processing unit, such as a neural network processing unit or a graphics processing unit, and the task may be an artificial intelligence AI reasoning task.
  • the processing unit may have a first task queue.
  • the first task queue includes at least one non-real-time subtask of the non-real-time task.
  • the task control unit may be configured to stop the processing unit from executing at least one non-real-time subtask in the first task queue.
  • the processing unit may also have a second task queue.
  • the task control unit may also be configured to decompose the task into a plurality of real-time subtasks, add the plurality of real-time subtasks to the second task queue of the processing unit, and cause the processing unit to execute the plurality of real-time subtasks in the second task queue .
  • Real-time subtasks can be operators of AI models.
  • the task control unit may also be configured to determine the remaining time required by the processing unit to complete the non-real-time subtask being executed. If the remaining time exceeds a preset threshold, the task control unit may cause the processing unit to be reset, and cause the processing unit to execute the real-time task after the reset is completed. In some embodiments, if the remaining time is less than a preset threshold, the task control unit causes the processing unit to execute the real-time task after the non-real-time sub-task is completed.
  • the task control unit may also store position information of the stopped non-real-time subtask in the first task queue.
  • the task control unit may cause the processing unit to resume execution of at least one non-real-time subtask in the first task queue based on the location information.
  • FIG. 9 shows a schematic block diagram of an example device 900 that may be used to implement embodiments of the present disclosure.
  • the device 900 may be used to provide an example environment 100 as shown in FIG. 1 , eg, a robot application system.
  • the device 900 includes a central processing unit (CPU) 901 which can be programmed according to computer program instructions stored in a read only memory (ROM) 902 or loaded from a storage unit 908 into a random access memory (RAM) 903 Program instructions to perform various appropriate actions and processes, for example, control tasks to control the motion of the robot, etc.
  • ROM read only memory
  • RAM random access memory
  • Program instructions to perform various appropriate actions and processes for example, control tasks to control the motion of the robot, etc.
  • various programs and data necessary for the operation of the device 900 can also be stored.
  • Device 900 includes a Graphics Processing Unit (GPU) and/or a Neural Network Processing Unit (NPU) 911, which may be loaded into Random Access Memory (RAM) according to computer program instructions stored in Read Only Memory (ROM) 902 or loaded from Storage Unit 908. ) 903 to perform parallel computing, such as AI reasoning tasks.
  • the CPU 901, GPU/NPU 911, ROM 902, and RAM 903 are connected to each other through a bus 904.
  • An input/output (I/O) interface 905 is also connected to the bus 904 .
  • the I/O interface 905 includes: an input and output unit 906, such as a keyboard, mouse, motor, display, speaker, etc.; a sensor 907, such as an acceleration sensor, a gravity sensor, a camera, etc.; a storage unit 908 , such as a magnetic disk, an optical disk, etc.; and a communication unit 909, such as a network card, a modem, a wireless communication transceiver, and the like.
  • the communication unit 909 allows the device 900 to exchange information/data with other devices over a computer network such as the Internet and/or various telecommunication networks.
  • processes 300, 500, and 700 are executed by the CPU 901 and/or the GPU/NPU 911.
  • processes 300 , 500 , 700 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 908 .
  • part or all of the computer program may be loaded and/or installed on the device 900 via the ROM 902 and/or the communication unit 909.
  • a computer program is loaded into RAM 903 and executed by CPU 901 and/or GPU/NPU 911, one or more actions of processes 300, 500, 700 described above may be performed.
  • the present disclosure may be a method, apparatus, system and/or computer program product.
  • a computer program product may include a computer-readable storage medium having computer-readable program instructions thereon for carrying out various aspects of the present disclosure.
  • a computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device.
  • a computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Computer-readable storage media include: portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or flash memory), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanically encoded device, such as a printer with instructions stored thereon A hole card or a raised structure in a groove, and any suitable combination of the above.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • flash memory static random access memory
  • SRAM static random access memory
  • CD-ROM compact disc read only memory
  • DVD digital versatile disc
  • memory stick floppy disk
  • mechanically encoded device such as a printer with instructions stored thereon
  • a hole card or a raised structure in a groove and any suitable combination of the above.
  • computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.
  • Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
  • Computer program instructions for performing the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or Source or object code written in any combination, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as “C” or similar programming languages.
  • Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as via the Internet using an Internet service provider). connect).
  • LAN local area network
  • WAN wide area network
  • an electronic circuit such as a programmable logic circuit, field programmable gate array (FPGA), or programmable logic array (PLA)
  • FPGA field programmable gate array
  • PDA programmable logic array
  • These computer readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processing unit of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.
  • each block in a flowchart or block diagram may represent a module, a program segment, or a portion of an instruction that contains one or more executable instruction.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Numerical Control (AREA)

Abstract

本公开提供了用于处理任务的方法、装置、电子设备和介质。在该方法包括基于待执行的任务的配置文件来确定该任务的实时性要求以及用于执行该任务的计算资源。该方法还包括如果实时性要求指示该任务为实时任务,使该计算资源执行该任务。通过本公开的实施例,具有实时性处理要求的任务,例如传统的机器人计算任务和AI推理任务等,能够被更加快速和高效地处理,从而优化了各种任务的调度方式,提高了系统处理效率。

Description

用于处理任务的方法、装置、电子设备和介质 技术领域
本公开的实施例主要涉及计算机技术领域,尤其是人工智能(AI)技术。更具体地,本公开的实施例涉及用户处理任务的方法、装置、电子设备、计算机可读存储介质以及计算机程序产品。
背景技术
近年来,随着AI技术的发展和成熟,AI技术在图像分析(例如,人脸识别、文字识别)、自然语言处理、语音识别等领域已经发挥了很好的效果。在传统的机器人领域,研究者也开始积极探索使用AI技术结合运动控制技术,使得机器人能够完成一些更加开放性的任务,而不仅限于传统的自动化任务。
这种机器人也被称为AI机器人。AI机器人通过传感器来采集周围环境的数据,然后使用AI技术来识别环境中的物体。例如,工业机器人通过摄像头,使用AI技术传送带上的物品并分拣,或者服务机器人识别周围环境是否存在障碍物,并且采取相应的处理措施,例如停车、避障等。在这种情况下,在机器人应用系统中,不仅包含传统的本体控制任务(例如,控制电机运动),还增加了AI感知或推理任务(例如,识别或检测图像中的物体)。由于机器人的计算资源有限,并且其任务存在一定的实时性要求,这给研究者带来了挑战。
发明内容
本公开的实施例提供了在机器人系统中处理实时任务的方案。
根据本公开的第一方面,提供了一种用于处理任务的方法。方法包括基于待执行的任务的配置文件,确定任务的实时性要求和用于执行任务的计算资源,以及如果实时性要求指示该任务为实时任务,使该计算资源执行该任务。
在一些实施例中,计算资源包括至少一个处理单元,该至少一个处理单元具有用于执行非实时任务的第一线程和用于执行实时任务的第二线程。该至少一个处理单元利用第二线程执行所述任务。在一些实施例中,如果确定该至少一个处理单元正在利用第一线程执行非实时任务,则向该至少一个处理单元发送停止执行非实时任务的信号。
在一些实施例中至少一个处理单元包括多个处理单元,多个处理单元具有相应的第三线程,其中使所述计算资源执行所述任务包括:从所述任务生成多个并行子任务;使所述多个处理单元利用所述第三线程执行所述多个并行子任务;以及基于所述多个处理单元执行所述多个并行子任务的结果,确定合并的处理结果。
在一些实施例中,该至少一个处理单元是CPU核心。任务是用于控制机器人的运动的控制任务。
在一些实施例中,该计算资源包括处理单元,处理单元具有第一任务队列,第一任务队列包括非实时任务的至少一个非实时子任务,使所述计算资源执行所述任务包括:使该处理单元停止执行第一任务队列中的至少一个非实时子任务。
在一些实施例中,处理单元还具有第二任务队列,并且使所述计算资源执行该任务包括:将任务分解为多个实时子任务;将多个实时子任务添加到处理单元的所述第二任务队列;以 及使该处理单元执行第二任务队列中的多个实时子任务。
在一些实施例中,使该处理单元停止执行第一任务队列中的至少一个非实时子任务包括:确定该处理单元完成正在执行的非实时子任务所需的剩余时间,如果剩余时间超过预设阈值,则使处理单元被重置。在重置完成之后,所述处理单元执行所述任务。
在一些实施例中,如果剩余时间小于预设阈值,使处理单元在非实时子任务完成后执行该任务。
在一些实施例中,该方法还包括存储被停止执行的非实时子任务在第一任务队列中的位置信息,以及响应于任务的执行完成,使处理单元基于该位置信息恢复执行第一任务队列中的至少一个非实时子任务。
在一些实施例中,处理单元可以是神经网络处理单元或图形处理单元。任务可以是人工智能AI推理任务。
在一些实施例中,配置文件包括任务的实时性要求信息、任务类型信息、以及用于执行任务的计算资源的信息。
根据本公开的第二方面,提供了一种用于处理任务的装置。装置包括任务配置确定单元,被配置为基于待执行的任务的配置文件,确定该任务的实时性要求和用于执行该任务的计算资源。装置还包括任务控制单元,被配置为如果实时性要求指示该任务为实时任务,使该计算资源执行该任务。
根据本公开的第三方面,提供了一种电子设备,包括处理单元和存储器,处理单元执行所述存储器中的指令,使得所述电子设备执行根据本公开的第一方面所述的方法。
根据本公开的第四方面,提供了一种计算机可读存储介质,其上存储有一条或多条计算机指令,其中一条或多条计算机指令被处理器执行使处理器执行根据本公开的第一方面所述的方法。
根据本公开的第五方面,提供了一种计算机程序产品,包括机器可执行指令,该机器可执行指令在由设备执行时使该设备执行根据本公开的第一方面所述的方法。
附图说明
结合附图并参考以下详细说明,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。在附图中,相同或相似的附图标注表示相同或相似的元素,其中:
图1示出了本公开的多个实施例能够在其中实现的示例环境的示意图;
图2示出了根据本公开的实施例的系统架构的示意框图;
图3示出了根据本公开的实施例的用于处理任务的过程的示意流程图;
图4示出了根据本公开的实施例的针对实时CPU任务来切换计算资源的方案的示意图;
图5示出了根据本公开的实施例的针对实时CPU任务来切换计算资源的过程的示意流程图;
图6示出了根据本公开的实施例的针对实时AI推理任务来切换计算资源的方案的示意图;
图7示出了根据本公开的实施例的针对实时AI推理任务来切换计算资源的过程的示意流程图的过程的示意流程图;
图8示出了根据本公开的实施例的任务处理装置的示意框图;
图9示出了可以用来实施本公开的实施例的示例设备的示意性框图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
在本公开的实施例的描述中,术语“包括”及其类似用语应当理解为开放性包含,即“包括但不限于”。术语“基于”应当理解为“至少部分地基于”。术语“一个实施例”或“该实施例”应当理解为“至少一个实施例”。术语“第一”、“第二”等等可以指代不同的或相同的对象。下文还可能包括其他明确的和隐含的定义。
机器人本体的运动控制任务需要一定的实时性要求,例如,工业机械臂需要按照固定的频率发送运动控制指令。这里的实时性是指任务的执行时间满足一定的波动性要求,如果超出这个范围将会对机器人的执行体产生影响,如运动不流畅、不顺滑等。当AI推理任务与运动执行任务相关联,例如,需要根据AI推理任务的处理结果来确定运动执行任务,这必然也要求AI推理任务具有一定的实时性要求。
传统上,用户在实现一个实时任务时需要在该任务的实现代码中调用操作系统内核提供的实时进程设置接口,使其执行环境变为一个实时进程的上下文,然后将由操作系统内核来进行实时任务调度和计算资源分配。如果任务涉及多核并行计算和AI推理任务,用户也需要在任务的实时代码中添加显式调用并行计算库(例如OpenMP库)和AI模型框架的SDK接口。因此给用户带来了不便。另外,任务调度和资源分配完全交给操作系统内核来实现,而操作系统内核可能将实时任务分配给负载较大的计算资源,导致调度时间存在不确定性,并且会影响实时任务的执行时间。
有鉴于此,本公开的实施例利用配置文件来提供任务的实时性要求、需要的计算资源等信息,从而以更加高效和准确的方式为实时任务提供所需的计算资源。以此方式,具有实时性处理要求的任务,例如传统的机器人计算任务和AI推理任务等,能够被更加快速和高效地处理,从而优化了各种任务的调度方式,提高了系统处理效率。
以下将进一步参照附图描述根据本公开的实施例。
示例环境
图1示出了本公开的多个实施例能够在其中实现的示例环境100的示意图。如图1所示,环境100是典型的AI机器人系统架构。环境100包括作为机器人系统的软件部分的应用程序110、机器人开发框架120和操作系统130,以及作为硬件部分的机器人片上系统(SoC)芯片140和存储器150。
应用程序110由用户开发,实现与具体应用场景相关联的各种功能和任务。例如,对于工业分拣机器人,应用程序110可以实现机器人的诸如转动、抓取、平移等的运动控制类的任务,并且还可以实现诸如识别由机器人上摄像头捕捉到的传送带图像中的物体的AI任务。对于服务机器人,应用程序110可以实现机器人的诸如前进、后退、刹车、转向的运动控制类任务,并且还可以实现诸如识别接收到的语音信息的AI任务。
取决于应用场景,任务可以具有对应的类型信息和实时性要求。如图所示,应用程序110包括非实时CPU任务112、实时CPU任务114、实时AI任务116、实时AI任务118。在本文中,机器人的运动控制类任务可以交给SoC芯片140中的中央处理单元(CPU)核心142 和144来执行,相应地,这类任务也被称为CPU任务。机器人的推理任务(例如,图像识别和检测)可以交给SoC芯片中的AI处理单元146和148来执行,相应地,这类任务也被称为AI任务。图1示意性地示出了具有不同实时性要求和类型的任务112、114、116、118,应理解,这里不限制任务的数目。应用程序110可以包括更多或更少的任务,并且每种任务的数目可以是任意的。
用户可以基于机器人开发框架120来实现应用程序110的各类任务112、114、116和118。例如,机器人开发框架120可以是机器人操作系统(Robot Operating System:ROS)。ROS TM是一个适用于机器人的开源的元操作系统,它提供了操作系统应有的服务,包括硬件抽象、底层设备控制、常用函数的实现、进程间消息传递以及包管理。它也提供用于获取、编译、编写、和跨计算机运行代码所需的工具和库函数。用户可以通过ROS提供应用编程接口(Application Programming Interface:API)来创建机器人任务节点(Node),节点之间消息通信机制由ROS框架负责实现。用户只需要调用API来实现具体任务的内部逻辑。应理解,机器人开发框架120还可以是除ROS TM之外的其他开发框架,本公开对此不做限制。
操作系统130提供应用程序110和机器人开发框架120与硬件环境SoC 140和存储器150之间的接口。操作系统130可以是例如开源Linux TM或者任何其他的商用版本操作系统。
SoC芯片140集成有若干个CPU核心142和144、以及神经网络处理单元(NPU)146和148。CPU核心142和144具有例如高速缓存、控制单元和运算单元(图中未示出),并且一般以顺序方式执行代码或指令,适合执行较为复杂的逻辑控制。每个CPU核心142或144可以具有一个或更多个线程,并且以分时复用方式来执行线程。
AI处理单元146和148采用并行计算结构,更适合运行AI模型来处理例如视频、图像类的数据。AI处理单元可以是例如神经网络处理单元(NPU)。在一些实施例中,NPU可以包括多个(例如,16个、32个、64个或更多个)并行的乘加模块、激活函数模块等。乘加模块用于计算矩阵的乘加、卷积、点乘等。激活函数模块采用例如参数拟合的方式实现神经网络中的激活函数。AI处理单元146和148还可以是图形处理单元(Graphic Processing Unit:GPU)或具有并行计算结构的其他设备,例如现场可编程逻辑门阵列(FPGA)、应用集成电路(ASIC)等。
考虑CPU核心142、144和AI处理单元146、148的性能,机器人的应用程序110中的任务112和114可以由CPU核心142、144执行,而AI任务可以由AI处理单元146、148来执行。图1示出了SoC芯片140的两个CPU核心142、144以及两个AI处理单元146、148,应理解,本公开不限制SoC上的CPU核心和AI处理单元的数量。SoC芯片140可以包括更多或更少的CPU核心和AI处理单元。
在根据本公开的一些实施例中,存储器150也可以被称为内存或主存。存储器150可以是已有或将来开发的任何类型的存储器类型,例如DDR存储器。存储器150存储运行应用程序110、机器人开发框架120和操作系统130所需的可执行代码和数据,例如,待执行的线程的数据、传感器获取到的图像数据、用于推理任务的AI模型等,以便由CPU核心和NPU访问和执行。
应当理解,本公开的实施例还可以与图1所示的不同环境中实现。例如,机器人应用系统中,CPU核心和AI处理单元不一定集成在同一SoC芯片上,还可以分别实现在不同的SoC芯片或设备上。
系统架构和流程
图2示出了根据本公开的实施例的系统架构200的示意框图。总体上,系统结构200包括应用程序210和机器人开发框架220。应用程序210可以被实现为如图1所示的应用程序110,机器人开发框架220可以被实现为图1所示的机器人开发框架120。
应用程序210包括针对任务的配置文件212和任务实现逻辑代码214两部分。这些内容由用户编写和实现。配置文件212可以以可编辑、可阅读的格式定义,例如,JSON、YAML或XML格式。在一些实施例中,配置文件212定义任务名称、该任务运行时所需要的计算资源、该任务是否为实时任务。以下表1给出一个示例性的配置文件。
表1
Figure PCTCN2022120604-appb-000001
该示例性配置文件表示名称为“my task”的任务是一个实时CPU任务,并且被指定为当作为CPU任务时,将使用CPU核心0和CPU核心1来执行,当作为AI任务时将使用AI处理单元0来执行。所需要的计算资源的可用性取决于字段“type”。也就是说,当type为“0”时,使用对应的CPU资源来执行任务,当type为“1”时,使用对应的NPU资源来执行该任务。需要说明的是,如果需要的CPU资源被指定为多个CPU核心,这表示该任务可以满足并行执行条件时由多个CPU核心并行处理。
应当理解,上述配置文件仅仅是示例性的,而非限制性的。根据本公开的实施例,配置文件212是可自定义的,例如可以包括更多、更少或不同的字段,例如,配置文件212可以仅包括CPU资源字段和AI计算资源字段中的一项。因此,通过编写任务的配置文件212,用户可以容易地指定任务的类型、实时性要求以及用于执行该任务的计算资源等信息。
任务实现逻辑代码214包括用户实现的具体任务逻辑代码。任务实现逻辑代码214基于机器人开发框架220提供的抽象类编程接口225来实现。例如,任务实现逻辑代码214可以调用或者继承机器人开发框架220提供的工具或库函数,并且可以借开发框架220来实现任务之间的通信。
根据本公开的实施例,用户只需要实现任务逻辑代码,而无需手动创建执行任务所需的线程,并且用户也只需要编排配置文件就可以对任务进行计算资源的分配就,不需要进行编码,因此节省了开发时间。相应的工作由根据本公开的实施例的机器人开发框架220来实现。
机器人开发框架220包括配置参数管理模块221、线程资源调度管理模块222、任务调度管理模块、NPU算子调度管理模块以及抽象类编程接口225。
配置参数管理模块221定义一种用于描述任务执行所需资源的描述文件,即,上述配置文件212。用户编写的配置文件212可以被配置参数管理模块221解析,从而获取与任务执行有关的信息。配置参数管理模块221利用这些信息来促进任务的计算资源的调度。例如,对于“real”字段为TRUE的实时任务,需要尽快让对应的CPU核心或AI处理单元资源为其服务,并且确保它们能够独占这些资源,从而满足实时性要求。
线程池管理模块222用于为SoC芯片上的CPU核心生成和绑定预留线程。预留线程可以包括用于非实时任务的第一线程(也称为非实时线程)、用于实时任务的第二线程(也称为实时线程)以及用于并行子任务的第三线程(也称为并行子任务线程)。当机器人启动时,这些线程就可以被生成和绑定到对应的CPU核心。也就是说,一个CPU核心具有分别用于执行不同任务的三个预留线程。线程池管理模块222在存储器150中分配用于预留线程的存储空间,用于存储将来的待执行任务的数据。利用在存储器150中为CPU核心预留的线程,CPU核心一旦接收到任务,就能快速切换到对应的任务线程,满足实时性要求和提高执行任务的效率。
任务调度管理模块223用于当任务被调度到时完成任务与底层线程资源的分配。任务调度管理管理模块223可以将待执行任务的数据写入对应的预留线程的存储空间,然后向CPU核心发出切换信号,使CPU核心切换到对应的预留线程。应理解,CPU核心的线程是分时复用的,CPU核心只能执行其中一个线程。例如,当实时CPU任务114被触发并且被分配到配置文件所指定的CPU核心时。通过上下文切换(context switch)机制,任务调度管理管理模块223使CPU核心从例如正在执行非实时CPU任务112的非实时线程切换到实时线程。
非实时任务在执行时如果被切换出去而让出CPU核心来执行实时任务,需要保存非实时任务执行的上下文环境。上下文环境包括CPU核心中的寄存器数据,例如程序计数器等。已知CPU核心提供硬件化的上下文切换机制。利用硬件化的上下文切换机制,被中断线程的上下文环境被保存,并且以用于后续快速恢复中断线程。下文将参照图4和图5详细描述关于实时CPU任务的调度机制。
算子调度管理单元224用于调度AI任务以便由NPU来执行。一般地,与AI推理任务相关的AI模型(例如,神经网络模型)具有分层计算结构,每一层内的多个节点执行并行计算,例如乘加运算、激活函数运算等。典型的神经网络模型,例如,卷积神经网络模型(CNN)可以具有几十甚至上百个层。根据本公开的实施例,可以参照层结构将AI模型分解为若干个算子(operator)。一个算子可以包括一个或更多层的并行计算。算子可以被提供给适合并行计算的AI处理单元来执行。AI处理单元通过依次执行AI模型的算子,完成AI任务。一些实施例中,AI处理单元可以交替地执行来自多个AI推理任务的算子。
如上所述,AI任务包括非实时AI任务116和实时AI任务118。为了使实时AI任务118能够被快速处理满足其实时性要求,NPU算子调度管理单元224提供了通过重置NPU来调度实时AI任务的机制。下文将参照图6和图7详细描述。
抽象类编程接口225向用户提供了针对任务实现逻辑代码214的任务模板。例如,抽象类编程接口225采用面向对象编程语言提供的继承机制,定义用户必须要重载的任务实现函数。仅作为示例,可以将机器人系统中的任务划分为任务执行前的准备工作、任务执行体、任务执行后的处理三部分。由此,可以将这三部分抽象定义出三个函数,由用户实现。任务最终都被封装为任务(Task)对象。Task对象可以被CPU核心或AI处理单元加载和执行。以下表2给出以C++语言为例的实现示例。
表2
Figure PCTCN2022120604-appb-000002
Figure PCTCN2022120604-appb-000003
根据本公开的实施例,一些CPU任务的一部分或全部可以被可以划分为多个并行子任务。例如,某个CPU任务可以包括对图像中的每个像素的像素值进行缩放的图像增强操作。由于每个像素值可以被视为独立的,因此可以从该任务划分为多个独立执行的子任务。根据本公开的实施例,独立执行的子任务也被称为小任务(TaskLet)。TaskLet被分配给在配置文件212中指定的那些CPU核心。针对这种情况,本公开的实施例还提供了用于TaskLet运行的API接口。以下给出了其实现示例:
LaunchTaskLet(InputVec,OutputVec,TaskLetFunc),其中,InputVec和OutputVec为输入数据和输出数据,它们的第一维为任务切分维度,即,按照第一维将输入数据划分为多个子输入数据,以及将输出数据划分为多个子输出数据。TaskLetFunc为TaskLet执行函数,由用户来实现。例如,TaskLet可以包括针对每个像素进行缩放的函数。通过这个API接口,可以将一个较大的任务分为多个独立的TaskLet,由此封装出多个并行子任务。应理解,这些多个独立子任务将由在配置文件212中指定的CPU核心分别执行。因此,LaunchTaskLet可以根据配置文件中执行的CPU核心的数目来自动生成TaskLet。由此,所生成的多个并行子任务可以被调度到在配置文件212中描述的对应的CPU核心而被执行。对于实时任务而言,这能够减少执行时间,以满足其实时性要求。在一些实施例中,CPU核心可以利用预留的第三线程来执行并行子任务。通过预留的专用线程执行并行子任务,可以进一步减少实时任务的执行时间。
与此相比,传统的多核并行计算(例如,OpenMP库)只能指定采用CPU核心的个数,并不能指定使用具体哪些CPU核心,导致并行子任务被操作系统可能分配到高负载的CPU核心,影响实时任务的执行时间。根据本公开的实施例,通过在配置文件212中指定计算资源,实时任务的计算资源可以被预先确定,从而避免了这一问题。
以上描述了根据本公开的实施例的示例性系统架构200。应当理解,本公开的实施例还可以包括与此不同的架构,例如,系统架构200中任何模块可以分为更多的模块,并且两个或更多的模块可以组合形成单个模块。本公开对此不做限制。
图3示出了根据本公开的实施例的用于处理任务的过程300的示意流程图。过程300可以实现在例如图1的机器人开发框架120和图2的机器人开发框架220中。
在框310,基于待执行的任务的配置文件,确定该任务的实时性要求以及用于执行该任务的计算资源。响应于任务的触发,例如,用户输入或检测到特定事件,应用程序110或210生成了待执行的任务。以服务机器人为例,响应于加速度传感器采集到的重力加速度的变化(机器人可能摔倒),服务机器人可以生成用于控制机器人保持平衡的运动任务。再例如,响应于图像传感器采集到周围环境图像,服务器机器人可以生成用于识别图像中的目标的识别任务。如上所述,任务实现逻辑代码214已经由用户通过抽象编程接口225实现。此外,用户还编写了与该任务对应的配置文件212以指定对应的计算资源。
配置文件212可以包括任务的实时性要求信息,例如,该任务是实时任务还是非实时任务。配置文件212还可以包括任务的任务类型信息,例如,该任务是用于控制机器人的运动的控制任务还是AI推理任务。控制机器人运动的控制任务可以被分配给一个或多个CPU核心等来执行,因而也可以被称为CPU任务。AI推理任务可以被分配给一个或多个NPU或GPU来执行,因而也可以被称为AI任务。如上所述,用于执行该任务的计算资源被指定在 配置文件212中。
在框320,如果实时性要求指示该任务为实时任务,使该任务的计算资源执行该任务。作为示例,如果配置文件212的任务类型信息指示待执行的任务是CPU任务,并且计算资源信息指示CPU核心0和CPU核心1,则该CPU任务将交由例如SoC芯片140中的CPU核心142和CPU核心144来执行。如果配置文件212的任务类型信息指示待执行的任务是CPU任务,并且计算资源信息指示NPU 0,则该NPU任务将交由例如SoC芯片140中的AI处理单元146来执行。
以上描述了对机器人应用中的任务进行抽象接口封装并且对任务需要的资源进行配置管理的方案。本公开的实施例利用配置文件来提供任务的实时性要求、需要的计算资源等信息,从而以更加高效和准确的方式为实时任务提供所需的计算资源。以此方式,具有实时性处理要求的任务,例如传统的机器人计算任务和AI推理任务等,能够被更加快速和高效地处理,从而优化了各种任务的调度方式,提高了系统处理效率。
针对机器人应用系统中存在实时CPU任务和实时AI推理任务,本公开的实施例提供了对应的计算资源切换机制,以满足这两种类型任务对执行时间波动性的要求。
实时CPU任务的计算资源切换
图4示出了根据本公开的实施例的针对实时CPU任务来切换计算资源的示例性方案400的示意图。方案400可以在例如图1所述的机器人开发框架120和图2所示的机器人开发框架220中实施。这里,以配置文件212指定了该实时CPU任务的计算资源为CPU核心0和CPU核心1(例如,对应于图1的CPU核心142和144)为例进行描述。应理解,当配置文件212指定了更多或更少的CPU核心,或者指定了其他不同的CPU核心时,方案400同样适用。
机器人系统启动后,为SoC芯片的每个CPU核心生成三个线程。例如,为CPU核心0生成线程401、线程402和线程403,为CPU核心1生成线程411、线程412和线程413,为CPU核心2生成线程421、线程422和线程423,以此类推。CPU核心中的线程被生成之后就作为预留线程保持在机器人系统的存储器之中。这些线程的执行单位是根据本公开的实施例抽象封装的Task对象,并且被用于不同的Task对象。例如,在CPU核心0中,线程401被用于执行非实时任务,线程402被用于执行实时任务,线程403被用于执行并行子任务。CPU核心中的线程411、412和413与CPU核心40类似。
在没有被分配任务时,CPU核心的线程可以处于空闲状态,直到有对应的任务对象被添加到其执行队列中。
如图4下方的执行进度条406和416所示,在实时任务被触发之前,CPU核心0正在利用非实时任务线程401执行非实时CPU任务,CPU核心1也在利用非实时任务线程411执行非实时CPU任务。响应于实时CPU任务被触发,CPU核心0和CPU核心1将切换计算资源执行该实时CPU任务。
图5示出了根据本公开的实施例的针对实时CPU任务来切换计算资源的过程500的示意流程图。过程500可以在例如图1所述的机器人开发框架120和图2所示的机器人开发框架220中实施。为了方便理解,结合图4来说明过程500。
在框510,确定与待执行的实时CPU任务的相关的CPU核心0和1(例如,对应于图1的CPU核心142和144)是否正在执行非实时任务。在一些实施例中,可以检查CPU核心0 和1的非实时任务线程的状态,由此可以确定CPU核心0和1是否正在执行非实时任务。如果CPU核心0和1中的任一个或两个正在执行非实时任务,则在框520,向对应的CPU核心发送暂停非实时任务的信号(例如,基于Linux TM操作系统的SIG_STOP信号)。SIG_STOP可以调用针对该CPU核心的硬件化上下文切换机制,以便将被暂停执行的非实时任务的信息(例如,CPU核心的寄存器数据)保存起来。
然后,在非实时任务被暂停之后,或者所有CPU核心0和1都已经准备好,则方法500前进到框530。在框530,将实时CPU任务调度到一个CPU核心。例如,通过将指向该实时CPU任务的Task对象的指针或地址添加到CPU核心0的实时任务线程402的执行队列中,实时CPU任务被调度,以便CPU核心0利用线程402来执行该任务,如图4的执行进度条407和对应的箭头所示
接下来,在框540,确定是否存在并行子任务。在一些实施例中,当执行到例如基于上述抽象编程接口LaunchTaskLet实现(例如,通过继承)的函数时,确定存在将并行执行的多个子任务。通过该抽象编程接口从实时CPU任务可以生成多个并行子任务TaskLet。
然后,在框550,并行子任务被调度到该实时CPU任务的计算资源。在一些实施例中,并行子任务TaskLet的Task对象可以被添加到CPU核心0和1的并行子任务线程403和413,使得CPU核心0和1利用线程403和413执行并行子任务,如图4的执行进度条408和418所示。
在框560,组合CPU核心各自得到的执行结果,从而获得合并的结果。在该示例中,实时CPU任务通过配置文件212被指定为由两个CPU核心来执行。应理解,被指定的实时CPU任务的计算资源可以包括更多或更少的CPU核心。也就是说,本公开的实施例对并行子任务的规模不做限制。
在框570,响应于实时CPU任务的完成,向该任务涉及的CPU核心发送恢复非实时任务的信号(例如,基于Linux TM操作系统的SIG_CONT信号)。SIG_STOP可以调用针对CPU核心的上下文切换机制,以便利用已经存储执行信息来恢复先前被暂停执行的非实时任务,如图4的执行进度条409和419所示。
实时AI推理任务的计算资源切换
图6示出了根据本公开的实施例的针对实时AI推理任务来切换计算资源的方案600的示意图。方案600可以在例如图1所述的机器人开发框架120和图2所示的机器人开发框架220中实施。仅作为示例,图6示出了针对非实时任务的模型610和模型620以及实时任务的模型630来切换计算资源的方案。应理解,可以针对任意数量的非实时任务的模型和实时任务的模型来切换关于AI处理单元的计算资源。
这里,计算资源是AI处理单元650,例如,诸如NPU、GPU、FPGA等的并行处理单元。模型610、620和630中的任一个可以是经过训练的神经网络模型,例如,卷积神经网络模型、循环神经网络模型、图神经网络模型等,本公开对此不做限制。经过训练的模型可以被用于AI推理任务,例如图像识别、目标检测、语音处理等。如上述所述,根据与AI推理任务有关的模型结构,可以将AI推理任务分解为若干个子任务。在本文中,从AI推理任务或对应的模型得到的子任务也可以被称为算子。算子对应于AI模型的一个或更多个层的多个节点的并行计算。算子可以被串行地执行,从而完成AI推理任务。
如图6所示,模型610被分解为在算子流611中依次排列的算子1-1至1-4等,模型620 被分解为算子流621中依次排列的算子2-1至2-4,模型630被分解为算子流631中依次排列的算子3-1至3-4。应理解,分解得到的算子的数量不限于参考图6描述的数量,可以包括更多或更少的算子。算子调度器640可以将算子提供到AI处理单元650以供执行。具体地,算子调度器640可以将算子添加到AI处理单元650的待执行任务队列。
根据本公开的实施例,AI处理单元650的待执行任务队列包括第一任务队列651和第二任务队列652。第一任务队列651用于非实时AI任务,包括从与非实时任务相关的模型610和620分解得到的算子。第二任务队列652用于实时AI任务,包括从与实时任务相关的模型630分解得到的算子。
为了满足公平性,算子调度器640可以将非实时任务模型610和620的算子按照轮询方式添加到第一任务队列651。当存在多个实时任务模型时,也可以按照轮询方式将它们的算子添加第二任务队列652,或者备选地,在一个实时任务模型的算子全部被添加到第二任务队列652之后再添加另一个实时任务模型的算子,这样可以至少尽量满足前一个实时任务的实时性要求。
第一任务队列651和第二任务队列652可以以循环队列的形式存储在存储器150中。算子本身也存储在存储器中。队列651和652中的每个元素可以存储指向算子的指针或地址。第一任务队列651和第二任务队列652可以具有预设的深度,也就是可容纳的最大算子数目。深度可以根据模型的平均算子数目来设置,该数目例如是10、20或其他适当的值。当队列651和652中的算子达到最大数目之后,算子调度器640可以停止从对应的算子流611、621和631获取模型的算子,直到有空的位置。
为了满足公平性,算子调度器640可以将非实时任务模型610和620的算子按照轮询方式添加到第一任务队列651。当存在多个实时任务模型时,也可以按照轮询方式将它们的算子添加第二任务队列652,或者备选地,在一个实时任务模型的算子全部被添加到第二任务队列652之后再添加另一个实时任务模型的算子,这样可以至少尽量满足前一个实时任务的实时性要求。
AI处理单元650从第一任务队列651或第二任务队列652获取算子以用于执行。在一些实施例中,可以为每个队列分别设置一个指示符,AI处理单元650根据该指示符来获取对应的算子。然后,该指示符递增以指向队列中的下一个算子。
根据本公开的实施例,为了满足实时AI推理任务的实时性要求,AI处理单元650优先从第二任务队列652获取待执行的算子。换句话说,一旦第二任务队列652中被添加了实时AI队列任务的算子,AI处理单元650需要切换到第二任务队列652,而不再执行第一任务队列651中的非实时任务算子。
这时,算子调度器640可以基于策略确定是否向AI处理单元650发出重置信号(Reset)。然后,经过重置的NPU可以切换为执行第二任务队列652中的实时任务算子。AI处理单元650的重置机制包括硬件复位的电路和用于复位后的资源初始化的电路。资源初始化以芯片微码的形式体现。当硬件电路复位信号被触发后,AI处理单元650会自动执行这部分的代码,代码执行的速度会非常快。由此,通过重置,AI处理单元650能够更快地从执行第一任务队列651中的非实时算子切换到执行第二任务队列652中实时算子,以满足实时AI推理任务的实时性要求。
图7示出了根据本公开的实施例的针对实时AI推理任务来切换计算资源的过程700的示意流程图。过程700可以在例如图1所述的机器人开发框架120和图2所示的机器人开发框 架220中实施。为了方便理解,结合图6来说明过程700。
在框702,计算AI处理单元650正在执行的非实时任务算子的剩余时间。在一些实施例中,执行每个算子所需要的时间可以提前测试得到,并记录在算子信息表中。当AI处理单元650开始执行算子时,记录开始执行的时间点。然后,如果在执行该算子的过程中,实时AI推理任务被触发,则用实时AI推理任务触发时间点减去开始执行的时间点,得到该算子的已执行时间。通过参考在算子信息表中记录的执行该算子所需的时间,就可以得到剩余时间。
在框704,确定剩余时间是否超过预设阈值。在一些实施例中,预设阈值可以是与硬件平台相关的参数,一般根据NPU重置所需要的时间确定。例如,当某个硬件平台实际测试的重置时间是1ms,则可以将阈值设置为1ms。如果算子的剩余时间超过1ms,表明重置AI处理单元可以带来的时间收益更大,实时AI推理任务可以更早地被执行。
在框706,如果确定剩余时间超过预设阈值,则在框706,重置AI处理单元。如果确定剩余时间小于预设阈值,则在框710,等待非实时任务算子执行完成。
在重置NPU的情况下,在框708,存储非实时任务算子在第一任务队列中的位置。应理解,由于AI处理单元被重置,原先正在执行的非实时任务算子被中断,需要在恢复执行第一任务队列651时重新执行该算子。
在框712,切换到第二任务队列。在一些实施例中,算子调度器640可以停止从非实时推理任务对应的模型中取出算子,并且切换为从实时AI推理任务对应的模型中取出算子,插入实时算子执行队列652。
在框714,根据执行第二任务队列中的实时任务算子。在一些实施例中,AI处理单元650根据第二任务队列652的指示符,获取该指示符所指向的实时任务算子的任务数据并执行。
在框716,确定是否执行完第二任务队列中的实时任务算子?如否,则回到框714,继续执行第二任务队列652中的实时任务算子。也就是说,AI处理单元650将保持执行第二任务队列652中的算子,直到其中没有实时任务算子为止。
如果执行完第二任务队列652中的所有算子,则在框718,切换到第一任务队列。在一些实施例中,算子调度器640可以恢复从非实时AI推理任务对应的模型中取出算子,插入第一任务队列651。
在框720,确定非实时任务算子是否被中断。在一些实施例中,可以通过检查是否执行过框708的动作来确定是否存在被中断的非实时任务算子。例如,如果记录了关于第一任务队列651的某个位置信息,则表明对应的算子被中断执行。
如果确定有非实时任务算子被中断,则在框722,重新执行被中断的非实时任务算子。否则,在框724,执行下一个非实时任务算子。由此,AI处理单元650恢复执行非实时AI推理任务。
示例装置和设备
图8示出了根据本公开的实施例的处理任务的装置800的示意框图。装置800可以实现在例如图1的机器人开发框架120和图2的机器人开发框架220中。
装置800包括任务配置确定单元810。任务配置确定单元810被配置为基于待执行的任务的配置文件,确定任务的实时性要求和用于执行任务的计算资源。装置800还包括任务控制单元820。任务控制单元被配置为如果实时性要求指示所述任务为实时任务,使计算资源执行任务。在一些实施例中,配置文件包括任务的实时性要求信息、任务类型信息、以及用 于执行任务的计算资源的信息。
在一些实施例,计算资源包括至少一个处理单元,例如CPU核心,并且任务可以是用于控制机器人的运动的控制任务。该至少一个处理单元具有用于执行非实时任务的第一线程和用于执行实时任务的第二线程。任务控制单元还可以被配置为使所述至少一个处理单元利用所述第二线程执行所述任务。
在一些实施例中,任务控制单元还可以被配置为如果确定至少一个处理单元正在利用第一线程执行非实时任务,则向该至少一个处理单元发送停止执行非实时任务的信号。
在一些实施例中,至少一个处理单元包括多个处理单元,例如多个CPU核心,并且多个处理单元具有相应的第三线程。任务控制单元还可以被配置为从任务生成多个并行子任务,以及使多个处理单元利用第三线程执行多个并行子任务。然后,任务控制单元还可以基于多个处理单元执行多个并行子任务的结果来确定合并的处理结果。
在一些实施例中,计算资源可以包括处理单元,例如神经网络处理单元或图形处理单元,并且任务可以是人工智能AI推理任务。该处理单元可以具有第一任务队列。第一任务队列包括非实时任务的至少一个非实时子任务。所述任务控制单元可以被配置为使处理单元停止执行第一任务队列中的至少一个非实时子任务。
在一些实施例中,处理单元还可以具有第二任务队列。任务控制单元还可以被配置为将任务分解为多个实时子任务,将多个实时子任务添加到处理单元的第二任务队列,以及使处理单元执行第二任务队列中的多个实时子任务。实时子任务可以是AI模型的算子。
在一些实施例中,任务控制单元还可以被配置为确定处理单元完成正在执行的非实时子任务所需的剩余时间。如果剩余时间超过预设阈值,该任务控制单元可以使处理单元被重置,并且使处理单元在重置完成之后执行实时任务。在一些实施例中,如果剩余时间小于预设阈值,则任务控制单元使处理单元在非实时子任务完成后执行该实时任务。
在一些实施例中,任务控制单元还可以存储被停止执行的非实时子任务在第一任务队列中的位置信息。由此,响应于任务的执行的完成,任务控制单元可以使处理单元基于该位置信息恢复执行第一任务队列中的至少一个非实时子任务。
图9示出了可以用来实施本公开的实施例的示例设备900的示意性框图。设备900可以用于提供如图1所示的示例环境100,例如,机器人应用系统。如图所示,设备900包括中央处理单元(CPU)901,其可以根据存储在只读存储器(ROM)902中的计算机程序指令或者从存储单元908加载到随机访问存储器(RAM)903中的计算机程序指令,来执行各种适当的动作和处理,例如,控制机器人的运动的控制任务等。在RAM 903中,还可存储设备900操作所需的各种程序和数据。设备900包括图形处理单元(GPU)和/或神经网络处理单元(NPU)911,其可以根据存储在只读存储器(ROM)902中的计算机程序指令或者从存储单元908加载到随机访问存储器(RAM)903中的计算机程序指令,来执行并行计算,例如AI推理任务。CPU 901、GPU/NPU 911、ROM 902以及RAM 903通过总线904彼此相连。输入/输出(I/O)接口905也连接至总线904。
设备900中的多个部件连接至I/O接口905,包括:输入输出单元906,例如键盘、鼠标、电机、显示器、扬声器等;传感器907,例如加速度传感器、重力传感器、摄像头等;存储单元908,例如磁盘、光盘等;以及通信单元909,例如网卡、调制解调器、无线通信收发机等。通信单元909允许设备900通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。
上文所描述的各个过程和处理,例如过程300、500、700由CPU 901和/或GPU/NPU 911执行。例如,在一些实施例中,过程300、500、700可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元908。在一些实施例中,计算机程序的部分或者全部可以经由ROM 902和/或通信单元909而被载入和/或安装到设备900上。当计算机程序被加载到RAM 903并由CPU 901和/或GPU/NPU 911执行时,可以执行上文描述的过程300、500、700的一个或多个动作。
本公开可以是方法、装置、系统和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有用于执行本公开的各个方面的计算机可读程序指令。
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。
用于执行本公开操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本公开的各个方面。
这里参照根据本公开实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理单元,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装 置的处理单元执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本公开的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
以上已经描述了本公开的各实施方式,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施方式。在不偏离所说明的各实施方式的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施方式的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其他普通技术人员能理解本文披露的各实施方式。

Claims (33)

  1. 一种用于处理任务的方法,其特征在于,包括:
    基于待执行的任务的配置文件,确定所述任务的实时性要求以及用于执行所述任务的计算资源;以及
    如果所述实时性要求指示所述任务为实时任务,使所述计算资源执行所述任务。
  2. 根据权利要求1所述的方法,其特征在于,所述计算资源包括至少一个处理单元,所述至少一个处理单元具有用于执行非实时任务的第一线程和用于执行实时任务的第二线程,其中使所述计算资源执行所述任务包括:
    使所述至少一个处理单元利用所述第二线程执行所述任务。
  3. 根据权利要求2所述的方法,其特征在于,使所述计算资源执行所述任务包括:
    如果确定所述至少一个处理单元正在利用所述第一线程执行非实时任务,则向所述至少一个处理单元发送停止执行所述非实时任务的信号。
  4. 根据权利要求2所述的方法,其特征在于,所述至少一个处理单元包括多个处理单元,所述多个处理单元具有相应的第三线程,其中使所述计算资源执行所述任务包括:
    从所述任务生成多个并行子任务;
    使所述多个处理单元利用所述第三线程执行所述多个并行子任务;以及
    基于所述多个处理单元执行所述多个并行子任务的结果,确定合并的处理结果。
  5. 根据权利要求2至4中任一项所述的方法,其中所述至少一个处理单元是CPU核心。
  6. 根据权利要求2至4中任一项所述的方法,其中所述任务是用于控制机器人的运动的控制任务。
  7. 根据权利要求1所述的方法,其特征在于,所述计算资源包括处理单元,所述处理单元具有第一任务队列,所述第一任务队列包括非实时任务的至少一个非实时子任务;
    其中使所述计算资源执行所述任务包括:使所述处理单元停止执行所述第一任务队列中的所述至少一个非实时子任务。
  8. 根据权利要求7所述的方法,其特征在于,所述处理单元还具有第二任务队列,并且其中使所述计算资源执行所述任务包括:
    将所述任务分解为多个实时子任务;
    将所述多个实时子任务添加到所述处理单元的所述第二任务队列;以及
    使所述处理单元执行所述第二任务队列中的所述多个实时子任务。
  9. 根据权利要求7或8所述的方法,其特征在于,使所述处理单元停止执行所述第一任务队列中的所述至少一个非实时子任务包括:
    确定所述处理单元完成正在执行的非实时子任务所需的剩余时间;以及
    如果所述剩余时间超过预设阈值,使所述处理单元被重置。
  10. 根据权利要求9所述的方法,其特征在于,还包括:
    使所述处理单元在重置完成之后执行所述任务。
  11. 根据权利要求9所述的方法,其特征在于,还包括:
    如果所述剩余时间小于所述预设阈值,使所述处理单元在所述非实时子任务完成后执行所述任务。
  12. 根据权利要求7所述的方法,其特征在于,还包括:
    存储被停止执行的所述非实时子任务在所述第一任务队列中的位置信息;以及
    响应于所述任务的执行的完成,使所述处理单元基于所述位置信息恢复执行所述第一任务队列中的所述至少一个非实时子任务。
  13. 根据权利要求7至12中任一项所述的方法,其特征在于,所述处理单元是神经网络处理单元或图形处理单元。
  14. 根据权利要求7至12中任一项所述的方法,其特征在于,所述任务是人工智能AI推理任务。
  15. 根据权利要求1至14中任一项所述的方法,其特征在于,所述配置文件包括所述任务的实时性要求信息、任务类型信息、以及用于执行所述任务的计算资源的信息。
  16. 一种用于处理任务的装置,其特征在于,包括:
    任务配置确定单元,被配置为基于待执行的任务的配置文件,确定所述任务的实时性要求和用于执行所述任务的计算资源;以及
    任务控制单元,被配置为如果所述实时性要求指示所述任务为实时任务,使所述计算资源执行所述任务。
  17. 根据权利要求16所述的装置,其特征在于,所述计算资源包括至少一个处理单元,所述至少一个处理单元具有用于执行非实时任务的第一线程和用于执行实时任务的第二线程,并且
    所述任务控制单元还被配置为使所述至少一个处理单元利用所述第二线程执行所述任务。
  18. 根据权利要求17所述的装置,其特征在于,所述任务控制单元还被配置为:
    如果确定所述至少一个处理单元正在利用所述第一线程执行非实时任务,则向所述至少一个处理单元发送停止执行所述非实时任务的信号。
  19. 根据权利要求17所述的装置,其特征在于,所述至少一个处理单元包括多个处理单元,所述多个处理单元具有相应的第三线程,所述任务控制单元还被配置为:
    从所述任务生成多个并行子任务;
    使所述多个处理单元利用所述第三线程执行所述多个并行子任务;以及
    基于所述多个处理单元执行所述多个并行子任务的结果,确定合并的处理结果。
  20. 根据权利要求17至19中任一项所述的装置,其中所述至少一个处理单元是CPU核心。
  21. 根据权利要求17至19中任一项所述的装置,其中所述任务是用于控制机器人的运动的控制任务。
  22. 根据权利要求16所述的装置,其特征在于,所述计算资源包括处理单元,所述处理单元具有第一任务队列,所述第一任务队列包括非实时任务的至少一个非实时子任务;
    所述任务控制单元被配置为使所述处理单元停止执行所述第一任务队列中的所述至少一个非实时子任务。
  23. 根据权利要求22所述的装置,其特征在于,所述处理单元还具有第二任务队列,并且所述任务控制单元还被配置为:
    将所述任务分解为多个实时子任务;
    将所述多个实时子任务添加到所述处理单元的所述第二任务队列;以及
    使所述处理单元执行所述第二任务队列中的所述多个实时子任务。
  24. 根据权利要求22或23所述的装置,其特征在于,所述任务控制单元还被配置为:
    确定所述处理单元完成正在执行的非实时子任务所需的剩余时间;以及
    如果所述剩余时间超过预设阈值,使所述处理单元被重置。
  25. 根据权利要求24所述的装置,其特征在于,所述任务控制单元还配置为:
    使所述处理单元在重置完成之后执行所述任务。
  26. 根据权利要求24所述的装置,其特征在于,所述任务控制单元还配置为:
    如果所述剩余时间小于所述预设阈值,使所述处理单元在所述非实时子任务完成后执行所述任务。
  27. 根据权利要求22所述的装置,其特征在于,所述任务控制单元还配置为:
    存储被停止执行的所述非实时子任务在所述第一任务队列中的位置信息;以及
    响应于所述任务的执行的完成,使所述处理单元基于所述位置信息恢复执行所述第一任务队列中的所述至少一个非实时子任务。
  28. 根据权利要求22至27中任一项所述的装置,其特征在于,所述处理单元是神经网络处理单元或图形处理单元。
  29. 根据权利要求22至27中任一项所述的装置,其特征在于,所述任务是人工智能AI推理任务。
  30. 根据权利要求1至14中任一项所述的装置,其特征在于,所述配置文件包括所述任务的实时性要求信息、任务类型信息、以及用于执行所述任务的计算资源的信息。
  31. 一种电子设备,包括:
    处理单元和存储器;
    所述处理单元执行所述存储器中的指令,使得所述电子设备执行根据权利要求1至15中任一项所述的方法。
  32. 一种计算机可读存储介质,其上存储有一条或多条计算机指令,其中一条或多条计算机指令被处理器执行使所述处理器执行根据权利要求1至15中任一项所述的方法。
  33. 一种计算机程序产品,包括机器可执行指令,所述机器可执行指令在由设备执行时使所述设备执行根据权利要求1至15中任一项所述的方法。
PCT/CN2022/120604 2021-10-29 2022-09-22 用于处理任务的方法、装置、电子设备和介质 WO2023071643A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111275540.7 2021-10-29
CN202111275540.7A CN116069485A (zh) 2021-10-29 2021-10-29 用于处理任务的方法、装置、电子设备和介质

Publications (1)

Publication Number Publication Date
WO2023071643A1 true WO2023071643A1 (zh) 2023-05-04

Family

ID=86160209

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/120604 WO2023071643A1 (zh) 2021-10-29 2022-09-22 用于处理任务的方法、装置、电子设备和介质

Country Status (2)

Country Link
CN (1) CN116069485A (zh)
WO (1) WO2023071643A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116954721A (zh) * 2023-09-20 2023-10-27 天津南大通用数据技术股份有限公司 一种执行器多模态算子异步非阻塞分裂方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130198758A1 (en) * 2012-01-31 2013-08-01 Electronics And Telecommunications Research Institute Task distribution method and apparatus for multi-core system
US20180081712A1 (en) * 2016-09-16 2018-03-22 Kabushiki Kaisha Toshiba Information processing apparatus, information processing method, and computer program product
WO2019187719A1 (ja) * 2018-03-28 2019-10-03 ソニー株式会社 情報処理装置、および情報処理方法、並びにプログラム
CN112416606A (zh) * 2020-12-16 2021-02-26 苏州挚途科技有限公司 任务调度方法、装置和电子设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130198758A1 (en) * 2012-01-31 2013-08-01 Electronics And Telecommunications Research Institute Task distribution method and apparatus for multi-core system
US20180081712A1 (en) * 2016-09-16 2018-03-22 Kabushiki Kaisha Toshiba Information processing apparatus, information processing method, and computer program product
WO2019187719A1 (ja) * 2018-03-28 2019-10-03 ソニー株式会社 情報処理装置、および情報処理方法、並びにプログラム
CN112416606A (zh) * 2020-12-16 2021-02-26 苏州挚途科技有限公司 任务调度方法、装置和电子设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116954721A (zh) * 2023-09-20 2023-10-27 天津南大通用数据技术股份有限公司 一种执行器多模态算子异步非阻塞分裂方法
CN116954721B (zh) * 2023-09-20 2023-12-15 天津南大通用数据技术股份有限公司 一种执行器多模态算子异步非阻塞分裂方法

Also Published As

Publication number Publication date
CN116069485A (zh) 2023-05-05

Similar Documents

Publication Publication Date Title
CN107885762B (zh) 智能大数据系统、提供智能大数据服务的方法和设备
US20220058054A1 (en) Fpga acceleration for serverless computing
Huang et al. ShuffleDog: characterizing and adapting user-perceived latency of android apps
US11783169B2 (en) Methods and apparatus for thread-based scheduling in multicore neural networks
US9946563B2 (en) Batch scheduler management of virtual machines
CN111310936A (zh) 机器学习训练的构建方法、平台、装置、设备及存储介质
US20120324454A1 (en) Control Flow Graph Driven Operating System
US20220334868A1 (en) Synchronous business process execution engine for action orchestration in a single execution transaction context
Hu et al. On exploring image resizing for optimizing criticality-based machine perception
CN113954104B (zh) 并联机器人的多线程控制器
CN111190741A (zh) 基于深度学习节点计算的调度方法、设备及存储介质
KR20210021263A (ko) 작업부하의 정적 매핑의 비순차적 파이프라이닝된 실행을 가능하게 하기 위한 방법들 및 장치
WO2023071643A1 (zh) 用于处理任务的方法、装置、电子设备和介质
Wu et al. Oops! it's too late. your autonomous driving system needs a faster middleware
Dissaux et al. Stood and cheddar: Aadl as a pivot language for analysing performances of real time architectures
Sai et al. Producer-Consumer problem using Thread pool
EP3779778A1 (en) Methods and apparatus to enable dynamic processing of a predefined workload
US10409762B2 (en) Remote direct memory access-based on static analysis of asynchronous blocks
Valigi Lessons learned building a self driving car on ros
Rouxel et al. PReGO: a generative methodology for satisfying real-time requirements on cots-based systems: Definition and experience report
JP2023544911A (ja) 並列量子コンピューティングのための方法及び装置
Partap et al. On-Device CPU Scheduling for Robot Systems
Partap et al. On-device cpu scheduling for sense-react systems
KR100772522B1 (ko) 모바일 홈 서비스 로봇을 제어하는 서버의 xml 기반의서비스 제공 방법 및 그 장치
Renaux et al. Parallel gesture recognition with soft real-time guarantees

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22885530

Country of ref document: EP

Kind code of ref document: A1