CN116069485A - Method, apparatus, electronic device and medium for processing tasks - Google Patents

Method, apparatus, electronic device and medium for processing tasks Download PDF

Info

Publication number
CN116069485A
CN116069485A CN202111275540.7A CN202111275540A CN116069485A CN 116069485 A CN116069485 A CN 116069485A CN 202111275540 A CN202111275540 A CN 202111275540A CN 116069485 A CN116069485 A CN 116069485A
Authority
CN
China
Prior art keywords
task
real
time
processing unit
tasks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111275540.7A
Other languages
Chinese (zh)
Inventor
彭席汉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202111275540.7A priority Critical patent/CN116069485A/en
Priority to PCT/CN2022/120604 priority patent/WO2023071643A1/en
Publication of CN116069485A publication Critical patent/CN116069485A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Numerical Control (AREA)

Abstract

The present disclosure provides methods, apparatus, electronic devices, and media for processing tasks. The method includes determining real-time requirements of a task to be performed and computing resources for performing the task based on a configuration file of the task. The method also includes causing the computing resource to execute the task if the instantaneity requirement indicates that the task is a real-time task. By the embodiment of the disclosure, tasks with real-time processing requirements, such as traditional robot calculation tasks, AI reasoning tasks and the like, can be processed more quickly and efficiently, so that the scheduling modes of various tasks are optimized, and the system processing efficiency is improved.

Description

Method, apparatus, electronic device and medium for processing tasks
Technical Field
Embodiments of the present disclosure relate generally to the field of computer technology, and more particularly, to Artificial Intelligence (AI) technology. More particularly, embodiments of the present disclosure relate to methods, apparatuses, electronic devices, computer-readable storage media, and computer program products for user processing tasks.
Background
In recent years, with the development and maturity of AI technology, AI technology has exerted a great effect in the fields of image analysis (e.g., face recognition, text recognition), natural language processing, speech recognition, etc. Researchers have also begun to actively explore the use of AI technology in conjunction with motion control technology in the traditional robot field, enabling robots to accomplish some more open tasks than just traditional automated tasks.
Such a robot is also called an AI robot. AI robots collect data of the surrounding environment through sensors and then identify objects in the environment using AI technology. For example, industrial robots use AI technology to transfer and sort articles on a belt through cameras, or service robots recognize whether there is an obstacle in the surrounding environment and take corresponding handling measures such as parking, obstacle avoidance, etc. In this case, in the robot application system, not only a conventional body control task (e.g., controlling motor movement) but also an AI sensing or reasoning task (e.g., recognizing or detecting an object in an image) is added. This presents a challenge to researchers due to the limited computational resources of robots and the somewhat real-time requirements of their tasks.
Disclosure of Invention
Embodiments of the present disclosure provide a solution for handling real-time tasks in a robotic system.
According to a first aspect of the present disclosure, a method for processing tasks is provided. The method includes determining a real-time requirement of a task and a computing resource for executing the task based on a configuration file of the task to be executed, and causing the computing resource to execute the task if the real-time requirement indicates that the task is a real-time task.
In some embodiments, the computing resources include at least one processing unit having a first thread for performing non-real-time tasks and a second thread for performing real-time tasks. The at least one processing unit performs the task using a second thread. In some embodiments, if it is determined that the at least one processing unit is performing a non-real-time task with the first thread, a signal is sent to the at least one processing unit to stop performing the non-real-time task.
In some embodiments at least one processing unit comprises a plurality of processing units having respective third threads, wherein causing the computing resource to perform the task comprises: generating a plurality of parallel subtasks from the task; causing the plurality of processing units to perform the plurality of parallel subtasks using the third thread; and determining a combined processing result based on the results of the plurality of processing units executing the plurality of parallel subtasks.
In some embodiments, the at least one processing unit is a CPU core. The task is a control task for controlling the movement of the robot.
In some embodiments, the computing resource comprises a processing unit having a first task queue comprising at least one non-real-time sub-task of a non-real-time task, causing the computing resource to perform the task comprising: the processing unit is caused to cease executing at least one non-real-time subtask in the first task queue.
In some embodiments, the processing unit further has a second task queue, and causing the computing resource to perform the task includes: decomposing the task into a plurality of real-time subtasks; adding a plurality of real-time subtasks to the second task queue of the processing unit; and causing the processing unit to execute the plurality of real-time subtasks in the second task queue.
In some embodiments, causing the processing unit to stop executing at least one non-real-time subtask in the first task queue comprises: the remaining time required for the processing unit to complete the executing non-real-time subtasks is determined, and if the remaining time exceeds a preset threshold, the processing unit is reset. After the reset is completed, the processing unit performs the task.
In some embodiments, if the remaining time is less than a preset threshold, the processing unit is caused to execute the non-real-time subtask after completion of the task.
In some embodiments, the method further includes storing location information of the non-real-time subtasks in the first task queue that were stopped from being executed, and in response to completion of execution of the tasks, causing the processing unit to resume execution of at least one non-real-time subtask in the first task queue based on the location information.
In some embodiments, the processing unit may be a neural network processing unit or a graphics processing unit. The task may be an artificial intelligence AI reasoning task.
In some embodiments, the configuration file includes real-time requirement information for the task, task type information, and information for computing resources used to perform the task.
According to a second aspect of the present disclosure, an apparatus for processing tasks is provided. The device comprises a task configuration determining unit configured to determine real-time requirements of a task to be executed and computing resources for executing the task based on a configuration file of the task. The apparatus further includes a task control unit configured to cause the computing resource to execute the task if the real-time requirement indicates that the task is a real-time task.
According to a third aspect of the present disclosure, there is provided an electronic device comprising a processing unit and a memory, the processing unit executing instructions in the memory, causing the electronic device to perform the method according to the first aspect of the present disclosure.
According to a fourth aspect of the present disclosure there is provided a computer readable storage medium having stored thereon one or more computer instructions, wherein execution of the one or more computer instructions by a processor causes the processor to perform a method according to the first aspect of the present disclosure.
According to a fifth aspect of the present disclosure, there is provided a computer program product comprising machine executable instructions which, when executed by an apparatus, cause the apparatus to perform the method according to the first aspect of the present disclosure.
Drawings
The above and other features, advantages and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, wherein like or similar reference numerals designate like or similar elements, and wherein:
FIG. 1 illustrates a schematic diagram of an example environment in which various embodiments of the present disclosure may be implemented;
FIG. 2 shows a schematic block diagram of a system architecture according to an embodiment of the present disclosure;
FIG. 3 shows a schematic flow chart of a process for processing tasks according to an embodiment of the disclosure;
FIG. 4 illustrates a schematic diagram of a scheme for switching computing resources for real-time CPU tasks, according to an embodiment of the disclosure;
FIG. 5 shows a schematic flow chart diagram of a process of switching computing resources for real-time CPU tasks according to an embodiment of the disclosure;
FIG. 6 illustrates a schematic diagram of a scheme for switching computing resources for real-time AI reasoning tasks, in accordance with an embodiment of the disclosure;
FIG. 7 is a schematic flow chart diagram of a process of switching computing resources for real-time AI reasoning tasks in accordance with an embodiment of the disclosure;
FIG. 8 shows a schematic block diagram of a task processing device according to an embodiment of the present disclosure;
FIG. 9 shows a schematic block diagram of an example device that may be used to implement embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
In describing embodiments of the present disclosure, the term "comprising" and its like should be taken to be open-ended, i.e., including, but not limited to. The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The terms "first," "second," and the like, may refer to different or the same object. Other explicit and implicit definitions are also possible below.
The motion control task of the robot body needs a certain real-time requirement, for example, the industrial mechanical arm needs to send motion control instructions according to a fixed frequency. The real-time performance refers to that the execution time of the task meets a certain fluctuation requirement, and if the execution time exceeds the range, the execution body of the robot is influenced, such as unsmooth movement, unsmooth movement and the like. When the AI reasoning task is associated with the motion execution task, for example, the motion execution task needs to be determined according to the processing result of the AI reasoning task, which necessarily also requires that the AI reasoning task has a certain real-time requirement.
Conventionally, when implementing a real-time task, a user needs to call a real-time process setting interface provided by an operating system kernel in an implementation code of the task, so that an execution environment of the real-time process setting interface becomes a context of a real-time process, and then the operating system kernel performs real-time task scheduling and computing resource allocation. If the task involves multi-core parallel computing and AI reasoning tasks, the user also needs to add an explicit invocation parallel computing library (e.g., openMP library) and SDK interface of AI model framework in the real-time code of the task. Thus causing inconvenience to the user. In addition, task scheduling and resource allocation are completely handed to the operating system kernel for implementation, and the operating system kernel may allocate real-time tasks to computing resources with larger loads, so that uncertainty exists in scheduling time and execution time of the real-time tasks is affected.
In view of this, embodiments of the present disclosure utilize configuration files to provide information such as real-time requirements of tasks, required computing resources, etc., thereby providing the required computing resources for real-time tasks in a more efficient and accurate manner. In this way, tasks with real-time processing requirements, such as traditional robot calculation tasks and AI reasoning tasks, can be processed more rapidly and efficiently, so that the scheduling modes of various tasks are optimized, and the system processing efficiency is improved.
Embodiments according to the present disclosure will be described below with further reference to the accompanying drawings.
Example Environment
FIG. 1 illustrates a schematic diagram of an example environment 100 in which various embodiments of the present disclosure may be implemented. As shown in fig. 1, environment 100 is a typical AI robot system architecture. The environment 100 includes an application 110, a robot development framework 120, and an operating system 130 as software portions of a robot system, and a system on chip (SoC) chip 140 and a memory 150 as hardware portions.
The application 110 is developed by a user to implement various functions and tasks associated with a particular application scenario. For example, for an industrial sorting robot, the application 110 may implement tasks of the robot such as motion control classes such as rotation, grabbing, translation, etc., and may also implement AI tasks such as identifying objects in conveyor belt images captured by cameras on the robot. For a service robot, the application 110 may implement motion control type tasks of the robot such as forward, backward, braking, steering, and may also implement AI tasks such as recognizing received voice information.
Depending on the application scenario, the task may have corresponding type information and real-time requirements. As shown, the application 110 includes a non-real-time CPU task 112, a real-time CPU task 114, a real-time AI task 116, and a real-time AI task 118. Herein, the motion control type tasks of the robot may be handed to Central Processing Unit (CPU) cores 142 and 144 in SoC chip 140 for execution, and accordingly, such tasks are also referred to as CPU tasks. The reasoning tasks (e.g., image recognition and detection) of the robot may be handed to AI processing units 146 and 148 in the SoC chip for execution, and accordingly, such tasks are also referred to as AI tasks. Fig. 1 schematically illustrates tasks 112, 114, 116, 118 having different real-time requirements and types, it being understood that the number of tasks is not limited herein. The application 110 may include more or fewer tasks, and the number of each task may be arbitrary.
The user may implement various types of tasks 112, 114, 116, and 118 of the application 110 based on the robotic development framework 120. For example, the robotic development framework 120 may be a robotic operating system (Robot Operating System: ROS). ROS (reactive oxygen species) TM The system is an open source meta-operation system suitable for robots, and provides due services of the operation system, including hardware abstraction, bottom device control, realization of common functions, inter-process message transfer and packet management. It also provides the tools and library functions required to obtain, compile, write, and run code across computers. The user can create a robot task Node (Node) by providing an application programming interface (Application Programming Interface: API) through the ROS, with the inter-Node messaging mechanism being responsible for implementation by the ROS framework. The user only needs to adjust The API is used to implement the internal logic of a specific task. It should be appreciated that the robotic development framework 120 may also be ROS-scavenging TM Other development frameworks beyond this, which the present disclosure does not limit.
The operating system 130 provides an interface between the application programs 110 and the robotic development framework 120 and the hardware environment SoC 140 and the memory 150. The operating system 130 may be, for example, open source Linux TM Or any other commercial version of the operating system.
The SoC chip 140 is integrated with several CPU cores 142 and 144, and neural Network Processing Units (NPUs) 146 and 148. CPU cores 142 and 144 have, for example, caches, control units, and arithmetic units (not shown) and typically execute code or instructions in a sequential manner, suitably for performing relatively complex logic controls. Each CPU core 142 or 144 may have one or more threads and execute the threads in a time multiplexed manner.
The AI processing units 146 and 148 employ a parallel computing architecture that is more suitable for running AI models to process data such as video, image, etc. The AI processing unit may be, for example, a neural Network Processing Unit (NPU). In some embodiments, the NPU may include multiple (e.g., 16, 32, 64, or more) parallel multiply-add modules, activate function modules, and the like. The multiply-add module is used for calculating multiply-add, convolution, dot multiplication and the like of the matrix. The activation function module implements the activation function in the neural network in a manner such as parameter fitting. The AI processing units 146 and 148 can also be graphics processing units (Graphic Processing Unit: GPU) or other devices with parallel computing architecture, such as Field Programmable Gate Arrays (FPGAs), application integrated circuits (ASICs), etc.
Considering the performance of the CPU cores 142, 144 and the AI processing units 146, 148, the tasks 112 and 114 in the application program 110 of the robot may be executed by the CPU cores 142, 144, and the AI tasks may be executed by the AI processing units 146, 148. Fig. 1 shows two CPU cores 142, 144 and two AI processing units 146, 148 of SoC chip 140, it being understood that the present disclosure is not limited to the number of CPU cores and AI processing units on the SoC. The SoC chip 140 may include more or fewer CPU cores and AI processing units.
In some embodiments according to the present disclosure, memory 150 may also be referred to as memory or main memory. Memory 150 may be any type of memory type, such as DDR memory, that is present or developed in the future. The memory 150 stores executable code and data required to run the application program 110, the robot development framework 120, and the operating system 130, for example, data of threads to be executed, image data acquired by sensors, AI models for reasoning tasks, and the like, to be accessed and executed by the CPU core and the NPU.
It should be understood that embodiments of the present disclosure may also be implemented in environments other than those shown in fig. 1. For example, in the robot application system, the CPU core and the AI processing unit are not necessarily integrated on the same SoC chip, but may be implemented on different SoC chips or devices, respectively.
System architecture and process
Fig. 2 shows a schematic block diagram of a system architecture 200 according to an embodiment of the present disclosure. In general, the system architecture 200 includes an application 210 and a robot development framework 220. The application 210 may be implemented as the application 110 shown in fig. 1, and the robot development framework 220 may be implemented as the robot development framework 120 shown in fig. 1.
The application 210 includes two parts, a configuration file 212 for the task and task implementation logic code 214. These contents are written and implemented by the user. The configuration file 212 may be defined in an editable, readable format, such as JSON, YAML, or XML format. In some embodiments, the configuration file 212 defines the task name, the computing resources required by the task at runtime, and whether the task is a real-time task. An exemplary configuration file is given in table 1 below.
TABLE 1
Figure BDA0003329908160000051
The exemplary profile indicates that the task named "my task" is a real-time CPU task, and is designated to be executed using CPU core 0 and CPU core 1 when it is a CPU task, and to be executed using AI processing unit 0 when it is an AI task. The availability of the required computing resources depends on the field "type". That is, when the type is "0", a task is performed using a corresponding CPU resource, and when the type is "1", the task is performed using a corresponding NPU resource. It should be noted that if the required CPU resource is designated as a plurality of CPU cores, this means that the task can be processed in parallel by the plurality of CPU cores when the parallel execution condition is satisfied.
It should be understood that the above configuration files are exemplary only and not limiting. According to embodiments of the present disclosure, the configuration file 212 is customizable, for example, may include more, fewer, or different fields, e.g., the configuration file 212 may include only one of a CPU resource field and an AI computing resource field. Thus, by writing the configuration file 212 of a task, a user can easily specify information such as the type of task, real-time requirements, and computing resources used to perform the task.
Task implementation logic code 214 includes user-implemented task-specific logic code. The task implementation logic code 214 is implemented based on an abstract class programming interface 225 provided by the robotic development framework 220. For example, the task implementation logic code 214 may call or inherit tools or library functions provided by the robotic development framework 220, and may enable communication between tasks by the development framework 220.
According to the embodiment of the disclosure, a user only needs to realize task logic codes without manually creating threads required for executing the tasks, and the user also only needs to arrange the configuration file to allocate computing resources to the tasks, so that coding is not needed, and development time is saved. The corresponding work is accomplished by the robot development framework 220 according to embodiments of the present disclosure.
The robotic development framework 220 includes a configuration parameter management module 221, a thread resource scheduling management module 222, a task scheduling management module, an NPU operator scheduling management module, and an abstract class programming interface 225.
The configuration parameter management module 221 defines a description file for describing resources required for task execution, i.e., the above-mentioned configuration file 212. The user-written configuration file 212 may be parsed by the configuration parameter management module 221 to obtain information related to task execution. The configuration parameter management module 221 uses this information to facilitate scheduling of computing resources for tasks. For example, for a real-time task with a real field TRUE, it is necessary to have the corresponding CPU core or AI processing unit resources serve them as soon as possible and ensure that they can monopolize these resources, thus meeting the real-time requirements.
The thread pool management module 222 is used to generate and bind reserved threads for CPU cores on the SoC chip. The reserved threads may include a first thread for non-real-time tasks (also referred to as non-real-time threads), a second thread for real-time tasks (also referred to as real-time threads), and a third thread for parallel subtasks (also referred to as parallel subtask threads). When the robot is started, these threads may be generated and bound to the corresponding CPU cores. That is, one CPU core has three reserved threads for performing different tasks, respectively. The thread pool management module 222 allocates memory space in the memory 150 for reservation of threads for storing data of future tasks to be performed. By utilizing the thread reserved for the CPU core in the memory 150, the CPU core can quickly switch to the corresponding task thread once receiving the task, thereby meeting the real-time requirement and improving the efficiency of executing the task.
The task scheduling management module 223 is configured to complete allocation of the task and the underlying thread resource when the task is scheduled. The task scheduling management module 223 may write the data of the task to be executed into the storage space of the corresponding reserved thread, and then send a switching signal to the CPU core, so that the CPU core is switched to the corresponding reserved thread. It should be appreciated that the threads of the CPU core are time multiplexed, with the CPU core only executing one of the threads. For example, when a real-time CPU task 114 is triggered and assigned to a CPU core specified by a configuration file. The task schedule management module 223 causes the CPU core to switch from, for example, a non-real-time thread that is executing the non-real-time CPU task 112 to a real-time thread through a context switch mechanism.
If the non-real-time task is switched out to give away the CPU core to execute the real-time task when executing, the context environment for executing the non-real-time task needs to be saved. The context includes register data in the CPU core, such as program counters and the like. CPU cores are known to provide a hardwired context switching mechanism. With a hardwired context switch mechanism, the context of the interrupted thread is preserved and used for subsequent fast resumption of the interrupt thread. The scheduling mechanism for real-time CPU tasks will be described in detail below with reference to fig. 4 and 5.
The operator schedule management unit 224 is used to schedule AI tasks for execution by the NPU. Generally, AI models (e.g., neural network models) related to AI reasoning tasks have a hierarchical computational structure, with multiple nodes within each layer performing parallel computations, such as multiply-add operations, activation function operations, and so forth. A typical neural network model, for example, a convolutional neural network model (CNN), may have tens or even hundreds of layers. According to embodiments of the present disclosure, an AI model may be decomposed into several operators (operators) with reference to a layer structure. An operator may include one or more layers of parallel computation. The operators may be provided to AI processing units suitable for parallel computation for execution. And the AI processing unit completes the AI task by sequentially executing operators of the AI model. In some embodiments, the AI processing unit can alternately execute operators from multiple AI reasoning tasks.
As described above, the AI tasks include a non-real-time AI task 116 and a real-time AI task 118. In order for the real-time AI task 118 to be able to be processed quickly to meet its real-time requirements, the NPU operator scheduling management unit 224 provides a mechanism to schedule real-time AI tasks by resetting the NPU. As will be described in detail below with reference to fig. 6 and 7.
The abstract class programming interface 225 provides the user with a task template for the task implementation logic code 214. For example, abstract class programming interface 225 employs inheritance mechanisms provided by object-oriented programming languages to define task-implementation functions that a user must override. By way of example only, tasks in a robotic system may be divided into three parts, a preparation before task execution, a task execution body, and a process after task execution. Thus, the three parts can be abstracted to define three functions, which are realized by a user. The tasks are eventually encapsulated as Task (Task) objects. The Task object may be loaded and executed by the CPU core or the AI processing unit. Table 2 below gives an example of implementation in the c++ language.
TABLE 2
Figure BDA0003329908160000071
According to embodiments of the present disclosure, some or all of the CPU tasks may be divided into multiple parallel sub-tasks. For example, a certain CPU task may include an image enhancement operation that scales the pixel value of each pixel in the image. Since each pixel value may be considered independent, it may be divided from the task into a plurality of independently executed sub-tasks. According to embodiments of the present disclosure, independently executed subtasks are also referred to as tasklets (TaskLet). The TaskLet is assigned to those CPU cores specified in configuration file 212. For this case, embodiments of the present disclosure also provide an API interface for TaskLet running. Examples of implementations are given below:
LaunchTaskLet(InputVec,OutputVec,TaskLetFunc),
Where inputVec and outputVec are input data and output data, their first dimension is the task-slicing dimension, i.e., dividing the input data into multiple sub-input data by the first dimension, and dividing the output data into multiple sub-output data. The TaskLetFunc is a TaskLet executive function, which is implemented by a user. For example, the TaskLet may include a function that scales for each pixel. Through this API interface, a larger task can be split into multiple independent tasklets, thereby encapsulating multiple parallel subtasks. It should be appreciated that these multiple independent sub-tasks would be performed separately by the CPU cores specified in configuration file 212. Thus, launchttasklet may automatically generate TaskLet based on the number of CPU cores executing in the configuration file. Thus, the generated plurality of parallel sub-tasks may be scheduled to be executed to the corresponding CPU cores described in configuration file 212. For real-time tasks, this can reduce execution time to meet its real-time requirements. In some embodiments, the CPU core may utilize a reserved third thread to perform parallel subtasks. The execution time of the real-time tasks can be further reduced by executing the parallel subtasks by the reserved dedicated threads.
In contrast, conventional multi-core parallel computing (e.g., openMP library) can only specify the number of CPU cores to be employed, and cannot specify which CPU cores to use, resulting in parallel subtasks being potentially allocated to highly loaded CPU cores by the operating system, affecting the execution time of real-time tasks. By specifying the computing resources in the configuration file 212, the computing resources of the real-time tasks may be predetermined, thereby avoiding this problem, according to embodiments of the present disclosure.
The above describes an exemplary system architecture 200 according to an embodiment of the present disclosure. It should be understood that embodiments of the present disclosure may also include architectures other than this, e.g., any module in system architecture 200 may be divided into more modules, and two or more modules may be combined to form a single module. The present disclosure is not limited in this regard.
Fig. 3 shows a schematic flow diagram of a process 300 for processing tasks according to an embodiment of the disclosure. The process 300 may be implemented in, for example, the robot development framework 120 of fig. 1 and the robot development framework 220 of fig. 2.
At block 310, based on the configuration file of the task to be performed, real-time requirements of the task and computing resources for performing the task are determined. In response to a trigger for a task, for example, a user input or detection of a particular event, the application 110 or 210 generates a task to be performed. Taking the service robot as an example, in response to a change in gravitational acceleration (the robot may fall) acquired by the acceleration sensor, the service robot may generate a motion task for controlling the robot to keep balance. For another example, in response to the image sensor capturing an image of the surrounding environment, the server robot may generate an identification task for identifying the object in the image. As described above, the task implementation logic code 214 has been implemented by a user through the abstract programming interface 225. In addition, the user has written a configuration file 212 corresponding to the task to specify the corresponding computing resource.
Configuration file 212 may include real-time requirement information for a task, for example, whether the task is a real-time task or a non-real-time task. The configuration file 212 may also include task type information of a task, for example, whether the task is a control task for controlling the motion of the robot or an AI reasoning task. The control tasks that control the movement of the robot may be allocated to one or more CPU cores or the like to be executed, and thus may also be referred to as CPU tasks. AI reasoning tasks may be assigned to one or more NPUs or GPUs for execution and may thus also be referred to as AI tasks. As described above, the computing resources for performing the task are specified in the configuration file 212.
At block 320, if the real-time requirements indicate that the task is a real-time task, the computing resources of the task are caused to execute the task. As an example, if the task type information of the configuration file 212 indicates that the task to be executed is a CPU task, and the computing resource information indicates CPU core 0 and CPU core 1, the CPU task will be handed over to be executed by, for example, CPU core 142 and CPU core 144 in SoC chip 140. If the task type information of the configuration file 212 indicates that the task to be performed is a CPU task and the computing resource information indicates NPU 0, then the NPU task will be submitted to the AI processing unit 146 in the SoC chip 140, for example, for execution.
The above describes a scheme of abstract interface packaging for tasks in robotic applications and configuration management for resources required by the tasks. Embodiments of the present disclosure utilize configuration files to provide information on real-time requirements, required computing resources, etc. of a task, thereby providing the required computing resources for the real-time task in a more efficient and accurate manner. In this way, tasks with real-time processing requirements, such as traditional robot calculation tasks and AI reasoning tasks, can be processed more rapidly and efficiently, so that the scheduling modes of various tasks are optimized, and the system processing efficiency is improved.
Aiming at the existence of a real-time CPU task and a real-time AI reasoning task in a robot application system, the embodiment of the disclosure provides a corresponding computing resource switching mechanism so as to meet the requirements of the two types of tasks on the fluctuation of the execution time.
Computing resource switching for real-time CPU tasks
FIG. 4 illustrates a schematic diagram of an exemplary scheme 400 for switching computing resources for real-time CPU tasks, according to an embodiment of the disclosure. Scheme 400 may be implemented in, for example, robot development framework 120 described in fig. 1 and robot development framework 220 shown in fig. 2. Here, description will be made taking as an example that the configuration file 212 designates the computing resources of the real-time CPU task as the CPU core 0 and the CPU core 1 (for example, corresponding to the CPU cores 142 and 144 of fig. 1). It should be appreciated that scheme 400 applies equally when configuration file 212 specifies more or fewer CPU cores, or specifies other different CPU cores.
After the robot system is started, three threads are generated for each CPU core of the SoC chip. For example, thread 401, thread 402, and thread 403 are generated for CPU core 0, thread 411, thread 412, and thread 413 are generated for CPU core 1, thread 421, thread 422, and thread 423 are generated for CPU core 2, and so on. The threads in the CPU core are then kept as reserved threads in the memory of the robot system after they are generated. The units of execution of these threads are abstract encapsulated Task objects according to embodiments of the present disclosure, and are used for different Task objects. For example, in CPU core 0, thread 401 is used to perform non-real-time tasks, thread 402 is used to perform real-time tasks, and thread 403 is used to perform parallel subtasks. Threads 411, 412, and 413 in the CPU core are similar to CPU core 40.
When no tasks are allocated, the thread of the CPU core may be in an idle state until a corresponding task object is added to its execution queue.
As shown by execution progress bars 406 and 416 below fig. 4, before the real-time task is triggered, CPU core 0 is executing the non-real-time CPU task using non-real-time task thread 401 and CPU core 1 is also executing the non-real-time CPU task using non-real-time task thread 411. In response to the real-time CPU task being triggered, CPU core 0 and CPU core 1 will switch computing resources to execute the real-time CPU task.
Fig. 5 shows a schematic flow chart of a process 500 of switching computing resources for real-time CPU tasks according to an embodiment of the disclosure. The process 500 may be implemented in, for example, the robotic development framework 120 described in fig. 1 and the robotic development framework 220 shown in fig. 2. For ease of understanding, process 500 is described in connection with fig. 4.
At block 510, it is determined whether CPU cores 0 and 1 (e.g., corresponding to CPU cores 142 and 144 of FIG. 1) associated with the real-time CPU task to be performed are executing non-real-time tasks. In some embodiments, the state of the non-real-time task threads of CPU cores 0 and 1 may be checked, whereby it may be determined whether CPU cores 0 and 1 are executing non-real-time tasks. If either or both of CPU cores 0 and 1 are executing non-real-time tasks, then at block 520, a signal is sent to the corresponding CPU core to pause the non-real-time tasks (e.g., based on Linux) TM SIG STOP signal of operating system). SIG STOP may invoke a hardwired context switch mechanism for the CPU core to save information of the non-real-time task (e.g., register data of the CPU core) that is suspended from execution.
Then, after the non-real-time task is suspended, or all CPU cores 0 and 1 are ready, the method 500 proceeds to block 530. At block 530, real-time CPU tasks are scheduled to one CPU core. For example, by adding a pointer or address to the Task object of the real-time CPU Task to the execution queue of the real-time Task thread 402 of CPU core 0, the real-time CPU Task is scheduled so that CPU core 0 utilizes thread 402 to execute the Task, as indicated by execution progress bar 407 and corresponding arrow of FIG. 4
Next, at block 540, a determination is made as to whether parallel subtasks exist. In some embodiments, when executing a function to a launchttask implementation (e.g., via inheritance) based, for example, on the abstract programming interface launchtask described above, it is determined that there are multiple sub-tasks to be executed in parallel. Multiple parallel subtask tasklets can be generated from the real-time CPU task through the abstract programming interface.
Then, at block 550, parallel subtasks are scheduled to the computing resources of the real-time CPU task. In some embodiments, the Task object of the parallel subtask Task may be added to the parallel subtask threads 403 and 413 of CPU cores 0 and 1, such that CPU cores 0 and 1 execute parallel subtasks using threads 403 and 413, as shown by execution progress bars 408 and 418 of FIG. 4.
At block 560, the execution results obtained by the respective CPU cores are combined to obtain a combined result. In this example, the real-time CPU tasks are designated to be performed by both CPU cores via configuration file 212. It should be appreciated that the computing resources of the specified real-time CPU task may include more or fewer CPU cores. That is, embodiments of the present disclosure do not limit the size of the parallel subtasks.
In block 570, in response to completion of the real-time CPU task, a signal is sent to the CPU core to which the task relates to resume the non-real-time task (e.g., based on Linux TM SIG CONT signal of the operating system). SIG STOP may invoke a context switch mechanism for the CPU core to resume a non-real-time task that was previously suspended from execution using already stored execution information, as shown by execution progress bars 409 and 419 of fig. 4.
Computing resource switching for real-time AI reasoning tasks
Fig. 6 illustrates a schematic diagram of a scheme 600 for switching computing resources for real-time AI reasoning tasks, according to an embodiment of the disclosure. Scheme 600 may be implemented in, for example, the robotic development framework 120 described in fig. 1 and the robotic development framework 220 shown in fig. 2. By way of example only, FIG. 6 illustrates a scheme for switching computing resources for non-real-time tasks model 610 and model 620 and real-time task model 630. It should be appreciated that the computing resources for the AI processing unit may be switched for any number of models of non-real-time tasks and models of real-time tasks.
Here, the computing resource is an AI processing unit 650, for example, a parallel processing unit such as NPU, GPU, FPGA or the like. Any of models 610, 620, and 630 may be a trained neural network model, such as a convolutional neural network model, a recurrent neural network model, a graph neural network model, and the like, to which the present disclosure is not limited. The trained models can be used for AI reasoning tasks such as image recognition, object detection, speech processing, etc. As described above, the AI reasoning task can be broken down into several subtasks based on a model structure associated with the AI reasoning task. Subtasks derived from AI inference tasks or corresponding models may also be referred to herein as operators. Operators correspond to parallel computation of multiple nodes of one or more layers of the AI model. Operators may be executed serially to complete AI reasoning tasks.
As shown in fig. 6, model 610 is decomposed into operators 1-1 through 1-4, etc. arranged in sequence in operator stream 611, model 620 is decomposed into operators 2-1 through 2-4 arranged in sequence in operator stream 621, and model 630 is decomposed into operators 3-1 through 3-4 arranged in sequence in operator stream 631. It should be appreciated that the number of decomposed operators is not limited to the number described with reference to fig. 6, and more or fewer operators may be included. The operator scheduler 640 may provide operators to the AI processing unit 650 for execution. Specifically, the operator scheduler 640 may add operators to the task queues to be executed of the AI processing unit 650.
According to an embodiment of the present disclosure, the task queues to be executed of the AI processing unit 650 include a first task queue 651 and a second task queue 652. The first task queue 651 is for non-real-time AI tasks, including operators decomposed from models 610 and 620 associated with the non-real-time tasks. The second task queue 652 is for real-time AI tasks, including operators decomposed from the model 630 associated with the real-time tasks.
To satisfy fairness, operator scheduler 640 may add operators of non-real- time task models 610 and 620 to first task queue 651 in a round robin fashion. When multiple real-time task models exist, their operators may also be added to the second task queue 652 in a round robin fashion, or alternatively, after all operators of one real-time task model are added to the second task queue 652, operators of another real-time task model may be added, so that the real-time requirements of the previous real-time task may be at least met as much as possible.
The first task queue 651 and the second task queue 652 may be stored in the memory 150 in the form of a circular queue. The operators themselves are also stored in memory. Each element in queues 651 and 652 may store a pointer or address to an operator. The first task queue 651 and the second task queue 652 may have a preset depth, i.e., a maximum number of operators that can be accommodated. The depth may be set according to the average operator number of the model, e.g. 10, 20 or other suitable value. When the operators in the queues 651 and 652 reach a maximum number, the operator scheduler 640 may stop acquiring operators of the model from the corresponding operator flows 611, 621, and 631 until there are empty locations.
To satisfy fairness, operator scheduler 640 may add operators of non-real- time task models 610 and 620 to first task queue 651 in a round robin fashion. When multiple real-time task models exist, their operators may also be added to the second task queue 652 in a round robin fashion, or alternatively, after all operators of one real-time task model are added to the second task queue 652, operators of another real-time task model may be added, so that the real-time requirements of the previous real-time task may be at least met as much as possible.
The AI processing unit 650 obtains operators from the first task queue 651 or the second task queue 652 for execution. In some embodiments, an indicator may be set for each queue, respectively, from which the AI processing unit 650 obtains the corresponding operator. The indicator is then incremented to point to the next operator in the queue.
According to an embodiment of the present disclosure, in order to meet the real-time requirement of the real-time AI reasoning task, the AI processing unit 650 preferentially obtains operators to be executed from the second task queue 652. In other words, once the operators of real-time AI-queue tasks are added in the second task queue 652, the AI processing unit 650 needs to switch to the second task queue 652 without executing the non-real-time task operators in the first task queue 651.
At this time, the operator scheduler 640 may determine whether to issue a Reset signal (Reset) to the AI processing unit 650 based on the policy. The reset NPU may then switch to executing the real-time task operator in the second task queue 652. The reset mechanism of the AI processing unit 650 includes a circuit for hardware reset and a circuit for resource initialization after reset. The resource initialization is embodied in the form of chip microcode. When the hardware circuit reset signal is triggered, the AI processing unit 650 automatically executes the code of the portion, and the code execution speed is very fast. Thus, by resetting, the AI processing unit 650 can switch more quickly from executing the non-real-time operator in the first task queue 651 to executing the real-time operator in the second task queue 652 to meet the real-time requirements of the real-time AI reasoning task.
Fig. 7 shows a schematic flow diagram of a process 700 for switching computing resources for real-time AI reasoning tasks, according to an embodiment of the disclosure. The process 700 may be implemented in, for example, the robotic development framework 120 described in fig. 1 and the robotic development framework 220 shown in fig. 2. For ease of understanding, process 700 is described in connection with fig. 6.
At block 702, the remaining time of the non-real-time task operator being executed by the AI processing unit 650 is calculated. In some embodiments, the time required to execute each operator may be tested in advance and recorded in an operator information table. When the AI processing unit 650 starts executing the algorithm, a point in time at which execution starts is recorded. Then, if the real-time AI reasoning task is triggered in the process of executing the operator, the real-time AI reasoning task triggering time point is used for subtracting the time point of starting execution to obtain the executed time of the operator. The remaining time can be obtained by referring to the time required for executing the operator recorded in the operator information table.
At block 704, it is determined whether the remaining time exceeds a preset threshold. In some embodiments, the preset threshold may be a parameter associated with the hardware platform, typically determined based on the time required for the NPU to reset. For example, when the reset time of a hardware platform for actual testing is 1ms, the threshold may be set to 1ms. If the operator's remaining time exceeds 1ms, indicating that the time gain that may be incurred by resetting the AI processing unit is greater, the real-time AI reasoning task may be performed earlier.
If, at block 706, it is determined that the remaining time exceeds the preset threshold, at block 706, the AI processing unit is reset. If it is determined that the remaining time is less than the preset threshold, at block 710, non-real-time task operator execution is awaited completion.
In the event that the NPU is reset, the location of the non-real time task operator in the first task queue is stored at block 708. It should be appreciated that since the AI processing unit is reset, the non-real-time task operator that was originally executing is interrupted, requiring re-execution of the operator upon resuming execution of the first task queue 651.
At block 712, a switch is made to the second task queue. In some embodiments, operator scheduler 640 may stop fetching operators from models corresponding to non-real-time inference tasks and switch to fetching operators from models corresponding to real-time AI inference tasks, inserting real-time operator execution queues 652.
At block 714, the real-time task operator in the second task queue is executed. In some embodiments, the AI processing unit 650 obtains and executes the task data of the real-time task operator pointed to by the indicator of the second task queue 652.
At block 716, a determination is made as to whether the real-time task operator in the second task queue is complete? If not, then return to block 714 and continue to execute the real-time task operator in the second task queue 652. That is, the AI processing unit 650 will remain executing the operators in the second task queue 652 until there are no real-time task operators therein.
If all operators in the second task queue 652 are executed, then at block 718, a switch is made to the first task queue. In some embodiments, the operator scheduler 640 may resume fetching operators from the model corresponding to the non-real-time AI reasoning tasks, inserting into the first task queue 651.
At block 720, a determination is made as to whether the non-real-time task operator is interrupted. In some embodiments, it may be determined whether there are interrupted non-real-time task operators by checking whether the actions of block 708 were performed. For example, if certain location information about the first task queue 651 is recorded, it indicates that the corresponding operator is interrupted for execution.
If it is determined that there are non-real-time task operators to be interrupted, at block 722, the interrupted non-real-time task operators are re-executed. Otherwise, at block 724, the next non-real-time task operator is executed. Thus, the AI processing unit 650 resumes executing the non-real-time AI reasoning task.
Example apparatus and apparatus
Fig. 8 shows a schematic block diagram of an apparatus 800 for processing tasks according to an embodiment of the disclosure. The apparatus 800 may be implemented in, for example, the robot development framework 120 of fig. 1 and the robot development framework 220 of fig. 2.
The apparatus 800 comprises a task configuration determination unit 810. The task configuration determination unit 810 is configured to determine real-time requirements of a task and computing resources for executing the task based on a configuration file of the task to be executed. The apparatus 800 further comprises a task control unit 820. The task control unit is configured to cause the computing resource to execute the task if the instantaneity requirement indicates that the task is a real-time task. In some embodiments, the configuration file includes real-time requirement information for the task, task type information, and information for computing resources used to perform the task.
In some embodiments, the computing resources include at least one processing unit, such as a CPU core, and the tasks may be control tasks for controlling the motion of the robot. The at least one processing unit has a first thread for performing non-real-time tasks and a second thread for performing real-time tasks. The task control unit may be further configured to cause the at least one processing unit to execute the task with the second thread.
In some embodiments, the task control unit may be further configured to send a signal to the at least one processing unit to stop executing the non-real-time task if it is determined that the at least one processing unit is executing the non-real-time task with the first thread.
In some embodiments, the at least one processing unit includes a plurality of processing units, e.g., a plurality of CPU cores, and the plurality of processing units have respective third threads. The task control unit may be further configured to generate a plurality of parallel subtasks from the task, and to cause the plurality of processing units to execute the plurality of parallel subtasks using the third thread. The task control unit may then determine a combined processing result based on the results of the plurality of processing units executing the plurality of parallel subtasks.
In some embodiments, the computing resources may include processing units, such as neural network processing units or graphics processing units, and the tasks may be artificial intelligence AI reasoning tasks. The processing unit may have a first task queue. The first task queue includes at least one non-real-time sub-task of the non-real-time task. The task control unit may be configured to cause the processing unit to stop executing at least one non-real-time subtask in the first task queue.
In some embodiments, the processing unit may also have a second task queue. The task control unit may be further configured to decompose the task into a plurality of real-time subtasks, to add the plurality of real-time subtasks to a second task queue of the processing unit, and to cause the processing unit to execute the plurality of real-time subtasks in the second task queue. The real-time subtask may be an operator of the AI model.
In some embodiments, the task control unit may be further configured to determine a remaining time required for the processing unit to complete the non-real-time subtasks being performed. The task control unit may cause the processing unit to be reset if the remaining time exceeds a preset threshold, and cause the processing unit to perform the real-time task after the reset is completed. In some embodiments, the task control unit causes the processing unit to execute the real-time task after the non-real-time subtask is completed if the remaining time is less than a preset threshold.
In some embodiments, the task control unit may further store location information of the non-real-time subtasks stopped being executed in the first task queue. Thus, in response to completion of execution of the task, the task control unit may cause the processing unit to resume execution of at least one non-real-time subtask in the first task queue based on the location information.
Fig. 9 shows a schematic block diagram of an example device 900 that may be used to implement embodiments of the present disclosure. The device 900 may be used to provide an example environment 100, such as a robotic application, as shown in fig. 1. As shown, the apparatus 900 includes a Central Processing Unit (CPU) 901, which can perform various appropriate actions and processes, such as control tasks for controlling the movement of a robot, etc., according to computer program instructions stored in a Read Only Memory (ROM) 902 or computer program instructions loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The device 900 includes a Graphics Processing Unit (GPU) and/or a neural Network Processing Unit (NPU) 911 that can perform parallel computations, such as AI reasoning tasks, according to computer program instructions stored in a Read Only Memory (ROM) 902 or loaded from a storage unit 908 into a Random Access Memory (RAM) 903. The CPU 901, GPU/NPU 911, ROM 902 and RAM 903 are connected to each other via bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.
Various components in device 900 are connected to I/O interface 905, including: an input-output unit 906 such as a keyboard, a mouse, a motor, a display, a speaker, and the like; a sensor 907 such as an acceleration sensor, a gravity sensor, a camera, or the like; a storage unit 908 such as a magnetic disk, an optical disk, or the like; and a communication unit 909 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.
Various processes and treatments described above, such as processes 300, 500, 700, are performed by CPU 901 and/or GPU/NPU 911. For example, in some embodiments, the processes 300, 500, 700 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 900 via the ROM 902 and/or the communication unit 909. One or more of the acts of the processes 300, 500, 700 described above may be performed when a computer program is loaded into RAM 903 and executed by CPU 901 and/or GPU/NPU 911.
The present disclosure may be methods, apparatus, systems, and/or computer program products. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for performing the operations of the present disclosure can be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The embodiments of the present disclosure have been described above, the foregoing description is illustrative, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (33)

1. A method for processing tasks, comprising:
determining real-time requirements of a task to be executed based on a configuration file of the task and computing resources for executing the task; and
and if the real-time requirement indicates that the task is a real-time task, enabling the computing resource to execute the task.
2. The method of claim 1, wherein the computing resource comprises at least one processing unit having a first thread for performing non-real-time tasks and a second thread for performing real-time tasks, wherein causing the computing resource to perform the tasks comprises:
Causing the at least one processing unit to execute the task with the second thread.
3. The method of claim 2, wherein causing the computing resource to perform the task comprises:
if it is determined that the at least one processing unit is performing a non-real-time task with the first thread, a signal is sent to the at least one processing unit to stop performing the non-real-time task.
4. The method of claim 2, wherein the at least one processing unit comprises a plurality of processing units having respective third threads, wherein causing the computing resource to perform the task comprises:
generating a plurality of parallel subtasks from the task;
causing the plurality of processing units to perform the plurality of parallel subtasks using the third thread; and
based on the results of the plurality of processing units executing the plurality of parallel subtasks, a merged processing result is determined.
5. The method of any of claims 2 to 4, wherein the at least one processing unit is a CPU core.
6. The method according to any one of claims 2 to 4, wherein the task is a control task for controlling the movement of a robot.
7. The method of claim 1, wherein the computing resource comprises a processing unit having a first task queue comprising at least one non-real-time sub-task of a non-real-time task;
wherein causing the computing resource to perform the task comprises: causing the processing unit to cease executing the at least one non-real-time subtask in the first task queue.
8. The method of claim 7, wherein the processing unit further has a second task queue, and wherein causing the computing resource to perform the task comprises:
decomposing the task into a plurality of real-time subtasks;
adding the plurality of real-time subtasks to the second task queue of the processing unit; and
causing the processing unit to execute the plurality of real-time subtasks in the second task queue.
9. The method of claim 7 or 8, wherein causing the processing unit to stop executing the at least one non-real-time subtask in the first task queue comprises:
determining a remaining time required for the processing unit to complete the non-real-time subtasks being performed; and
And if the remaining time exceeds a preset threshold value, enabling the processing unit to be reset.
10. The method as recited in claim 9, further comprising:
causing the processing unit to perform the task after the reset is completed.
11. The method as recited in claim 9, further comprising:
and if the remaining time is smaller than the preset threshold value, enabling the processing unit to execute the task after the non-real-time subtask is completed.
12. The method as recited in claim 7, further comprising:
storing position information of the non-real-time subtasks stopped to be executed in the first task queue; and
in response to completion of execution of the task, causing the processing unit to resume execution of the at least one non-real-time subtask in the first task queue based on the location information.
13. The method according to any one of claims 7 to 12, wherein the processing unit is a neural network processing unit or a graphics processing unit.
14. The method according to any of claims 7 to 12, wherein the task is an artificial intelligence AI reasoning task.
15. The method according to any one of claims 1 to 14, wherein the configuration file includes real-time requirement information of the task, task type information, and information of a computing resource for executing the task.
16. An apparatus for processing tasks, comprising:
a task configuration determining unit configured to determine real-time requirements of a task to be executed and computing resources for executing the task based on a configuration file of the task; and
and the task control unit is configured to enable the computing resource to execute the task if the real-time requirement indicates that the task is a real-time task.
17. The apparatus of claim 16, wherein the computing resources comprise at least one processing unit having a first thread for performing non-real-time tasks and a second thread for performing real-time tasks, and
the task control unit is further configured to cause the at least one processing unit to execute the task with the second thread.
18. The apparatus of claim 17, wherein the task control unit is further configured to:
if it is determined that the at least one processing unit is performing a non-real-time task with the first thread, a signal is sent to the at least one processing unit to stop performing the non-real-time task.
19. The apparatus of claim 17, wherein the at least one processing unit comprises a plurality of processing units having respective third threads, the task control unit further configured to:
Generating a plurality of parallel subtasks from the task;
causing the plurality of processing units to perform the plurality of parallel subtasks using the third thread; and
based on the results of the plurality of processing units executing the plurality of parallel subtasks, a merged processing result is determined.
20. The apparatus of any of claims 17 to 19, wherein the at least one processing unit is a CPU core.
21. The apparatus according to any one of claims 17 to 19, wherein the task is a control task for controlling the movement of a robot.
22. The apparatus of claim 16, wherein the computing resource comprises a processing unit having a first task queue comprising at least one non-real-time sub-task of a non-real-time task;
the task control unit is configured to cause the processing unit to stop executing the at least one non-real-time subtask in the first task queue.
23. The apparatus of claim 22, wherein the processing unit further has a second task queue, and wherein the task control unit is further configured to:
decomposing the task into a plurality of real-time subtasks;
Adding the plurality of real-time subtasks to the second task queue of the processing unit; and
causing the processing unit to execute the plurality of real-time subtasks in the second task queue.
24. The apparatus according to claim 22 or 23, wherein the task control unit is further configured to:
determining a remaining time required for the processing unit to complete the non-real-time subtasks being performed; and
and if the remaining time exceeds a preset threshold value, enabling the processing unit to be reset.
25. The apparatus of claim 24, wherein the task control unit is further configured to:
causing the processing unit to perform the task after the reset is completed.
26. The apparatus of claim 24, wherein the task control unit is further configured to:
and if the remaining time is smaller than the preset threshold value, enabling the processing unit to execute the task after the non-real-time subtask is completed.
27. The apparatus of claim 22, wherein the task control unit is further configured to:
storing position information of the non-real-time subtasks stopped to be executed in the first task queue; and
In response to completion of execution of the task, causing the processing unit to resume execution of the at least one non-real-time subtask in the first task queue based on the location information.
28. The apparatus according to any one of claims 22 to 27, wherein the processing unit is a neural network processing unit or a graphics processing unit.
29. The apparatus of any one of claims 22 to 27, wherein the task is an artificial intelligence AI reasoning task.
30. The apparatus of any one of claims 16 to 29, wherein the configuration file includes real-time requirement information of the task, task type information, and information of a computing resource for performing the task.
31. An electronic device, comprising:
a processing unit and a memory;
the processing unit executing instructions in the memory causing the electronic device to perform the method of any one of claims 1 to 15.
32. A computer-readable storage medium having stored thereon one or more computer instructions, wherein execution of the one or more computer instructions by a processor causes the processor to perform the method of any of claims 1 to 15.
33. A computer program product comprising machine executable instructions which, when executed by a device, cause the device to perform the method of any one of claims 1 to 15.
CN202111275540.7A 2021-10-29 2021-10-29 Method, apparatus, electronic device and medium for processing tasks Pending CN116069485A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111275540.7A CN116069485A (en) 2021-10-29 2021-10-29 Method, apparatus, electronic device and medium for processing tasks
PCT/CN2022/120604 WO2023071643A1 (en) 2021-10-29 2022-09-22 Method and apparatus for processing task, electronic device, and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111275540.7A CN116069485A (en) 2021-10-29 2021-10-29 Method, apparatus, electronic device and medium for processing tasks

Publications (1)

Publication Number Publication Date
CN116069485A true CN116069485A (en) 2023-05-05

Family

ID=86160209

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111275540.7A Pending CN116069485A (en) 2021-10-29 2021-10-29 Method, apparatus, electronic device and medium for processing tasks

Country Status (2)

Country Link
CN (1) CN116069485A (en)
WO (1) WO2023071643A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116954721B (en) * 2023-09-20 2023-12-15 天津南大通用数据技术股份有限公司 Asynchronous non-blocking splitting method for multi-modal operator of actuator
CN118708360A (en) * 2024-08-26 2024-09-27 北京壁仞科技开发有限公司 Operator execution method, device, storage medium and program product

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101733117B1 (en) * 2012-01-31 2017-05-25 한국전자통신연구원 Task distribution method on multicore system and apparatus thereof
JP6615726B2 (en) * 2016-09-16 2019-12-04 株式会社東芝 Information processing apparatus, information processing method, and program
WO2019187719A1 (en) * 2018-03-28 2019-10-03 ソニー株式会社 Information processing device, information processing method, and program
CN112416606A (en) * 2020-12-16 2021-02-26 苏州挚途科技有限公司 Task scheduling method and device and electronic equipment

Also Published As

Publication number Publication date
WO2023071643A1 (en) 2023-05-04

Similar Documents

Publication Publication Date Title
Yang et al. Re-thinking CNN frameworks for time-sensitive autonomous-driving applications: Addressing an industrial challenge
US11783169B2 (en) Methods and apparatus for thread-based scheduling in multicore neural networks
Buttazzo et al. Hard real-time computing systems
US20200089532A1 (en) Fpga acceleration for serverless computing
WO2023071643A1 (en) Method and apparatus for processing task, electronic device, and medium
US20210109796A1 (en) Methods and systems for time-bounding execution of computing workflows
Hu et al. On exploring image resizing for optimizing criticality-based machine perception
JPH10502473A (en) How to implement a hierarchical call structure in a real-time asynchronous software application
CN111190741A (en) Scheduling method, device and storage medium based on deep learning node calculation
US11847497B2 (en) Methods and apparatus to enable out-of-order pipelined execution of static mapping of a workload
CN110825440A (en) Instruction execution method and device
CN116880995B (en) Execution method and device of model task, storage medium and electronic equipment
US11422847B2 (en) Synchronous business process execution engine for action orchestration in a single execution transaction context
EP3779778A1 (en) Methods and apparatus to enable dynamic processing of a predefined workload
CN114327894A (en) Resource allocation method, device, electronic equipment and storage medium
CN108549935B (en) Device and method for realizing neural network model
Carlsson et al. The semantic layers of Timber
Manolache et al. Optimization of soft real-time systems with deadline miss ratio constraints
JP2023544911A (en) Method and apparatus for parallel quantum computing
Bharmal Real-time gpu scheduling with preemption support for autonomous mobile robots
Krejcar et al. Micro operation system for microprocessor applications
Pereira et al. Inferencing on Edge Devices: A Time-and Space-aware Co-scheduling Approach
Foughali A two-step hybrid approach for verifying real-time robotic systems
Walther et al. Generic trigger variables and event flow wrappers in reflex
Trenkwalder Classification and Management of Computational Resources of Robotic Swarms and the Overcoming of their Constraints

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination