WO2019183861A1 - Method, device, and machine readable storage medium for task processing - Google Patents

Method, device, and machine readable storage medium for task processing Download PDF

Info

Publication number
WO2019183861A1
WO2019183861A1 PCT/CN2018/080970 CN2018080970W WO2019183861A1 WO 2019183861 A1 WO2019183861 A1 WO 2019183861A1 CN 2018080970 W CN2018080970 W CN 2018080970W WO 2019183861 A1 WO2019183861 A1 WO 2019183861A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
processing
processed
graphics processor
queue
Prior art date
Application number
PCT/CN2018/080970
Other languages
French (fr)
Chinese (zh)
Inventor
李庆
夏昌奇
张晓炜
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2018/080970 priority Critical patent/WO2019183861A1/en
Priority to CN201880012037.2A priority patent/CN110494848A/en
Publication of WO2019183861A1 publication Critical patent/WO2019183861A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the present invention relates to the field of image processing technologies, and in particular, to a task processing method, device, and machine readable storage medium.
  • the processing of data such as images or radars can be based on platforms such as ARM (Advanced RISC Machines, Advanced Reduced Instruction Set Processor), DSP (Digital Signal Processing), or CPU (Central Processing Unit).
  • platforms such as ARM (Advanced RISC Machines, Advanced Reduced Instruction Set Processor), DSP (Digital Signal Processing), or CPU (Central Processing Unit).
  • the amount of data that can be processed is affected by factors such as processor frequency, memory size, and transmission bandwidth.
  • Artificial intelligence applications such as autopilot, sensor data and acquisition frequency make the data level reach GB/s or higher, resulting in Platforms such as ARM, DSP or CPU are no longer able to meet the needs of real-time processing.
  • the present invention provides a task processing method, apparatus, and machine readable storage medium.
  • a first aspect of the present invention provides a task processing method for a device including a plurality of graphics processors, the method comprising: selecting a task to be processed from a processing queue;
  • the to-be-processed task is processed by the target graphics processor.
  • a second aspect of the present invention provides a task processing apparatus including a scheduler and a plurality of graphics processors, wherein the scheduler is configured to select a task to be processed from a processing queue and select from the plurality of graphics processors At least one target graphics processor corresponding to the to-be-processed task, and assigning the to-be-processed task to the target graphics processor;
  • the target graphics processor is configured to process the to-be-processed task.
  • a computer readable storage medium is stored, the computer readable storage medium storing a plurality of computer instructions, and the task processing method is implemented when the computer instructions are executed.
  • a target graphics processor corresponding to a task to be processed may be selected from a plurality of graphics processors, a task to be processed is allocated to the target graphics processor, and the target graphics processor is adopted
  • the task to be processed is processed, so that the real-time processing of the data is performed based on the graphics processor, and the real-time performance of the data processing is ensured, and the data of the GB/s level can be processed in real time.
  • multiple graphics processors can be reasonably managed, and multiple tasks to be processed can be effectively scheduled, so that the utilization of the graphics processor can be utilized to the maximum extent, and the processing speed can be optimized to ensure the accuracy of the processing results. And effectiveness, increasing reliability.
  • FIG. 1 is a schematic diagram of an application scenario of an embodiment of the present invention.
  • FIG. 2 is a flow chart of an embodiment of a task processing method of the present invention
  • FIG. 3 is a schematic diagram of processing a task to be processed by a target graphics processor of the present invention.
  • FIG. 4 is a schematic diagram of data sharing by different processing threads of the present invention.
  • Figure 5 is a block diagram of one embodiment of a task processing device of the present invention.
  • first, second, third, etc. may be used to describe various information in the present invention, such information should not be limited to these terms. These terms are used to distinguish the same type of information from each other.
  • first information may also be referred to as the second information without departing from the scope of the invention.
  • second information may also be referred to as the first information.
  • word "if” may be interpreted as "when", or "when", or "in response to determination.”
  • a method for processing a task is provided in the embodiment of the present invention.
  • the method can be applied to a processing device including a plurality of graphics processing units (GPUs).
  • the type of the processing device is not limited, as long as there are multiple graphics.
  • the processor is fine.
  • the task processing method may be a multi-task real-time processing method based on multiple graphics processors, and the multi-channel sensor data is used as an input, which can be applied in an application scenario such as automatic driving, assisted driving, indoor and outdoor working robots.
  • the above sensor data may be image data, such as camera to captured image data, image data collected by radar (such as laser radar, millimeter wave radar, etc.), and the data type is not limited.
  • the present embodiment proposes a multi-task real-time processing method based on multiple graphics processors, which can ensure real-time processing of data and accuracy and effectiveness of processing results.
  • FIG. 1 is a schematic diagram of an application scenario according to an embodiment of the present invention.
  • the multi-task scheduling module is configured to cache a task, complete task scheduling, and apply a response task processing result to an upper layer.
  • the processing module is composed of a plurality of graphics processors, and the real-time processing of tasks is completed by the graphics processor.
  • the statistical monitoring module is configured to monitor and collect the status information of the task and the resource information of the graphics processor, and feed back the status information of the task and the resource information of the graphics processor to the multi-task scheduling module, and the multi-task scheduling module utilizes the status information of the task. Task scheduling with resource information of the graphics processor.
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • FIG. 2 it is a flowchart of a task processing method in an embodiment of the present invention, where the method includes:
  • Step 201 Select a task to be processed from the processing queue.
  • the task may be stored in the processing queue.
  • the task may be selected from the processing queue, and the selection manner is not limited.
  • the task selected from the processing queue may be referred to as a pending task.
  • Step 202 Select at least one target graphics processor corresponding to the to-be-processed task from a plurality of graphics processors (ie, all graphics processors).
  • a graphics processor selected from a plurality of graphics processors may be referred to as a target graphics processor corresponding to a task to be processed.
  • the status information of the to-be-processed task and/or the resource information of each graphics processor may be selected, and all the graphics processors are selected to correspond to the to-be-processed task. At least one target graphics processor, there is no limit to this selection.
  • Step 203 Assign the to-be-processed task to the target graphics processor.
  • the to-be-processed task may be allocated to the target graphics processor, that is, the to-be-processed
  • the task is assigned to the idle processing thread of the target graphics processor.
  • a pending task can be assigned to all or part of the idle processing thread of the target graphics processor.
  • Step 204 Process the to-be-processed task by the target graphics processor.
  • the to-be-processed task can be processed by the idle processing thread, and the processing is not limited.
  • the pending task may include data (such as sensor data, etc.) and a task type, and the idle processing thread of the target graphics processor may perform processing corresponding to the task type based on the data.
  • a target graphics processor corresponding to a task to be processed may be selected from a plurality of graphics processors, a task to be processed is allocated to the target graphics processor, and the target graphics processor is adopted
  • the task to be processed is processed, so that the real-time processing of the data is performed based on the graphics processor, and the real-time performance of the data processing is ensured, and the data of the GB/s level can be processed in real time.
  • multiple graphics processors can be reasonably managed, and multiple tasks to be processed can be effectively scheduled, so that the utilization of the graphics processor can be utilized to the maximum extent, and the processing speed can be optimized to ensure the accuracy of the processing results. And effectiveness, increasing reliability.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • step 201 after the task to be processed is received, if the task to be processed is a static scheduling task whose time series and/or data has a dependency, the task to be processed may be cached into the static processing queue; The task is a time series and/or data independent dynamic scheduling task, and the pending task can be cached into the dynamic processing queue.
  • selecting a task to be processed from the processing queue may include: if the processing queue is a static processing queue, the time series and/or the data may be selected from the static processing queue.
  • the processing queue can be divided into a static processing queue (Static) and a dynamic processing queue (Dynamic).
  • the static processing queue may be a task for storing a time series having a dependency relationship, or a task for storing data having a dependency relationship, or a task for storing a time series and data having a dependency relationship; A task with dependencies is called a static scheduled task.
  • the dynamic processing queue may be used to store time series independent tasks, or to store data independent tasks, or to store time series and data independent tasks; for the convenience of distinction, the above independent tasks may be referred to as dynamic Schedule tasks.
  • all tasks stored in the static processing queue can be processed in the same graphics processor, or can be processed in the associated graphics processor, that is, there are restrictions on the graphics processor;
  • the correspondence between the static processing queue and the graphics processor (such as graphics processor 1), therefore, all tasks of the static processing queue need to be processed by the graphics processor 1.
  • all tasks stored in the dynamic processing queue can be processed in the same graphics processor or processed in different graphics processors. That is, there is no limit to the graphics processor, and the dynamic processing queue can be processed. All tasks are assigned to any graphics processor.
  • the time series has a dependency task: the processing of one task depends on the previous task.
  • non-I frame tasks depend on the previous I frame task.
  • the task 1 and the task 2 can be stored in the static processing queue.
  • the task of data dependency is that the processing of one data depends on the previous data, and there is no limitation on this.
  • all tasks stored in the static processing queue can be processed in the same graphics processor or can be processed in the associated graphics processor.
  • the time series independent task means that the processing of one task does not depend on the previous task, that is, the task is an independent task.
  • tasks such as image conversion and point cloud algorithm processing
  • each task is an independent task, and does not depend on the previous task.
  • the task 3 can store the task 3 to the dynamic processing queue.
  • the data-independent task means that the processing of one data is independent of other data, and there is no limitation on this.
  • all tasks stored in the dynamic processing queue can be processed in the same graphics processor or in different graphics processors.
  • the multi-task scheduling module may select a task to be processed from the static processing queue, and may select another pending task from the static processing queue when the pending task is processed or not processed. .
  • the multi-task scheduling module may select a to-be-processed task from the dynamic processing queue, and may select another pending processing from the dynamic processing queue when the pending task is processed or not processed. task.
  • Embodiment 3 is a diagrammatic representation of Embodiment 3
  • the task to be processed may be cached into a task queue (which may also be referred to as a large queue or a Task Queue), and the task queue may include multiple tasks to be processed.
  • a task queue which may also be referred to as a large queue or a Task Queue
  • the task queue may include multiple tasks to be processed.
  • the pending task may be cached into a static processing queue; if the pending task is Time-series and/or data-independent dynamic scheduling tasks can cache pending tasks to a dynamic processing queue.
  • selecting a task to be processed from the processing queue may include: if the processing queue is a static processing queue, the time series and/or the data may be selected from the static processing queue.
  • the processing procedure of the third embodiment is similar to that of the second embodiment, and details are not described herein again.
  • Embodiment 4 is a diagrammatic representation of Embodiment 4:
  • selecting a task to be processed from the processing queue may include, but is not limited to, acquiring a priority of each to-be-processed task in the processing queue; and selecting a high-priority priority from the processing queue based on the priority priority.
  • the obtaining the priority of the to-be-processed task in the processing queue may include: obtaining, for each pending task in the processing queue, a task type of the to-be-processed task; and then querying the mapping by using the task type
  • the table obtains a priority corresponding to the task type, where the mapping table is used to record a correspondence between the task type and the priority.
  • the multi-task scheduling module can be configured with a mapping table. As shown in Table 1, the mapping table is used to record the correspondence between the task type and the priority. On the basis of this, it is assumed that the task to be processed 1 includes the task type A, the task to be processed 2 includes the task type B, and the task to be processed 3 includes the task type C, and the multitask scheduling module queries the mapping table through the task type A to obtain the task to be processed. Priority 5 of 1 is queried by the task type B to obtain the priority 3 of the task 2 to be processed, and the mapping table is queried by the task type C to obtain the priority 1 of the task 3 to be processed.
  • the mapping table is queried by the task type C to obtain the priority 1 of the task 3 to be processed.
  • the multitask scheduling module first processes Select pending task 1 in the queue, then select pending task 2, then select pending task 3, and so on.
  • Embodiment 5 In order to select a target graphics processor, state information of a task to be processed and/or resource information of a graphics processor may be acquired first, and the acquisition process of the information is described in detail below.
  • Case 1 The multi-task scheduling module obtains status information of the task to be processed.
  • the statistical monitoring module may obtain the status information of the to-be-processed task that has been processed, and record the correspondence between the task type of the to-be-processed task and the status information of the to-be-processed task in the status information table. Send the status information table to the multitasking scheduling module.
  • the multi-task scheduling module obtains the task type of the task to be processed after selecting the task to be processed from the processing queue, and queries the state information table through the task type to obtain state information corresponding to the task type. The obtained status information is determined as status information of the task to be processed.
  • the statistical monitoring module can obtain the status information table shown in Table 2, and send the status information table to the multi-task scheduling module.
  • the multi-task scheduling module selects the to-be-processed task 4 from the processing queue, if the task type of the task to be processed 4 is the task type A, the state information A corresponding to the task type A can be obtained, that is, The status information of the task 4 to be processed is status information A.
  • Task type status information Task type A Status information A Task type B Status information B Task type C Status information C
  • Case 2 The multitasking scheduling module acquires resource information of the graphics processor.
  • the statistical monitoring module may acquire (eg, periodically acquire) resource information of each graphics processor, and send the resource information of each graphics processor to the multi-task scheduling module, so that multi-tasking
  • the scheduling module can obtain resource information for each graphics processor.
  • the status information may include a task processing time, which is a time difference between the task completion time and the task reception time.
  • the resource information may include: the number of idle processing threads; and/or the state of the processing thread (eg, occupied state or idle state).
  • the statistical monitoring module may perform statistics on the status information of the processing task and notify the multi-task scheduling module of the correspondence between the task type and the status information.
  • the processing of the to-be-processed task may go through three stages, a task receiving buffer stage, a task scheduling distribution stage, and a task processing stage. Therefore, the statistical monitoring module may perform statistics on the time consumed by the above three stages, and the three The time consumed by the phase is also the state information of the task to be processed, that is, the task processing time of the task to be processed.
  • the task processing time may be the time difference between the task completion time and the task receiving time. Therefore, the statistical monitoring module may also perform statistics on the task receiving time and the task completion time of the processing task, and calculate the time difference between the task completion time and the task receiving time. The time difference is the task processing time of the to-be-processed task.
  • each task to be processed has one task processing time and corresponding to the different tasks to be processed.
  • Task processing times can be the same or different.
  • the maximum value is selected from the task processing time of the plurality of to-be-processed tasks corresponding to the task type, and the maximum value is used as the task processing time corresponding to the task type.
  • select a minimum value from the task processing time of the plurality of to-be-processed tasks corresponding to the task type and use the minimum value as the task processing time corresponding to the task type.
  • the average value of the task processing time of the plurality of to-be-processed tasks corresponding to the task type is calculated, and the average value is taken as the task processing time corresponding to the task type.
  • the maximum value, the minimum value, and the average value may be used as the task processing time corresponding to the task type, and no limitation is imposed thereon.
  • the statistical monitoring module may perform statistics on the resource information of the graphics processor and notify the multi-task scheduling module of the resource information of the graphics processor during the processing of each to-be-processed task.
  • the graphics processor can support multi-task parallel processing, each graphics processor has multiple processing threads, and the to-be-processed tasks are allocated to the processing thread for processing. Therefore, the statistical monitoring module can process threads for each graphics processor. Monitoring is performed, such as monitoring the number of idle processing threads of the graphics processor, monitoring the state of the processing thread of the graphics processor (eg, occupied state or idle state).
  • selecting at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors may include: selecting and waiting from the plurality of graphics processors according to the state information of the to-be-processed task The target graphics processor corresponding to the processing task.
  • a target graphics processor corresponding to the to-be-processed task is selected from a plurality of graphics processors according to resource information of each graphics processor.
  • the target graphics processor corresponding to the to-be-processed task is selected from the plurality of graphics processors according to the status information of the to-be-processed task and the resource information of each graphics processor.
  • the target graphics processor corresponding to the to-be-processed task is selected from the plurality of graphics processors according to the status information of the to-be-processed task, and may include: determining task processing time according to status information of the to-be-processed task, and processing time for the task A plurality of pending tasks that are greater than the time threshold (indicating that their task processing time is relatively long), and different target graphics processors are selected for different pending tasks.
  • the time threshold may be set according to experience. When the task processing time is greater than the time threshold, the processing time is relatively long. Therefore, by assigning multiple to-be-processed tasks whose task processing time is greater than the time threshold to different graphics processors, it is ensured. Each task to be processed is scheduled in time, and all the resources of the graphics processor are allocated and utilized reasonably, to prevent the task processing from timing out, and to ensure that tasks with relatively large processing time do not occupy multiple processing threads of the same graphics processor at the same time.
  • the target graphics processor corresponding to the to-be-processed task is selected from the plurality of graphics processors according to the resource information of the graphics processor, and may include: determining the number of idle processing threads according to the resource information of the graphics processor, and Among the graphics processors, the graphics processor with the most idle processing threads is selected as the target graphics processor. Since the graphics processor with the largest number of idle processing threads is the least utilized graphics processor, the graphics processor with the largest number of idle processing threads is selected as the target graphics processor, and the resources of all graphics processors can be reasonably allocated and use.
  • the target graphics processor corresponding to the to-be-processed task is selected from the plurality of graphics processors according to the resource information of the graphics processor, and may include: determining an idle processing thread according to the resource information of the graphics processor, and setting the idle processing thread The corresponding graphics processor is determined to be the target graphics processor. Since the graphics processor corresponding to the idle processing thread is determined as the target graphics processor, after the task to be processed is allocated to the target graphics processor, the target graphics processor has an idle processing thread that processes the to-be-processed task, avoiding the target graphics processor. There are no idle processing threads that handle pending tasks.
  • the state of the processing thread when the state of the processing thread is an idle state, it indicates that the processing thread is an idle processing thread, and when the state of the processing thread is an occupied state, it indicates that the processing thread is an occupied processing thread.
  • the graphics processor corresponding to the idle processing thread is determined as the target graphics processor, the pending task is allocated when the processing thread is idle, so that the pending task can be processed in time.
  • Case 4 selecting, according to the status information of the task to be processed and the resource information of each graphics processor, a target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors, including: for the task processing time being greater than a time threshold Multiple pending tasks, selecting different target graphics processors for different pending tasks; selecting a graphics processor with more idle processing threads from multiple graphics processors when selecting the target graphics processor, or selecting to have idle A graphics processor that processes threads.
  • selecting at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors may include: selecting, for a plurality of to-be-processed tasks for parallel processing, corresponding to each of the plurality of to-be-processed tasks
  • the target graphics processor is the same or different from the target graphics processor corresponding to different pending tasks. For example, when different to-be-processed tasks correspond to different target graphics processors, multiple to-be-processed tasks processed in parallel may be separately distributed to multiple graphics processors, thereby rationally allocating and utilizing resources of all graphics processors to ensure that each The pending tasks are processed in a timely manner.
  • selecting at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors may include: if the processing queue is a static processing queue, querying from the plurality of graphics processors The graphics processor corresponding to the static processing queue (in the above embodiment, the correspondence between the static processing queue and the graphics processor may be pre-configured, and based on the correspondence, the graphics processing corresponding to the static processing queue may be queried. And determining the queried processor to be the target graphics processor corresponding to the to-be-processed task. If the processing queue is a dynamic processing queue, the target graphics processor is selected from all graphics processors, see the above embodiment.
  • processing the to-be-processed task by the target graphics processor may include: the target graphics processor may utilize image processing, feature point tracking, semi-global stereo block matching, radar and camera self-calibration, point cloud tracking, local Algorithms such as maps and deep learning, which deal with processing tasks, do not limit this process.
  • the target graphics processor may utilize image processing, feature point tracking, semi-global stereo block matching, radar and camera self-calibration, point cloud tracking, local Algorithms such as maps and deep learning, which deal with processing tasks, do not limit this process.
  • the image processing algorithm when the image processing algorithm is used to process the processing task, the source image may be subjected to format conversion, distortion correction, and image data of an expected format or a special format may be output.
  • the feature point tracking algorithm is used to process the processing task, the correspondence between the previous frame and the current frame can be obtained by using the change of the pixel in the image sequence in the time domain and the correlation between adjacent frames.
  • the depth of the target object relative to the camera can be calculated according to the positional relationship between the parallax and the binocular in the binocular imaging of the same target object.
  • the following describes the task to be processed by the target graphics processor in combination with specific situations.
  • Case 1 When a pending task (such as a low priority pending task) is processed by the target graphics processor, if there is a pending task with a higher priority than the low priority pending task, the low priority is interrupted. The pending task is to process the higher priority pending task through the target graphics processor; after the higher priority pending task is processed, the low priority pending task is restored.
  • a pending task such as a low priority pending task
  • the target graphics processor is processing the pending task 1, and a new pending task 2 is received.
  • the target graphics processor determines if there is an idle processing thread. If there is one, the pending task 2 is assigned to the idle processing thread. If not, compare the priority of the task 1 to be processed with the priority of the task 2 to be processed. If the priority of the task to be processed 1 is high, the task 1 to be processed is continuously processed, and the task 2 to be processed is in a waiting state; after the processing of the task 1 to be processed is completed, the task 2 to be processed is processed. If the priority of the task to be processed 2 is high, the task 1 to be processed can be interrupted, and the task 2 to be processed can be processed; after the processing of the task 2 to be processed is completed, the task 1 to be processed is restored.
  • Case 2 When the pending task is processed by the target graphics processor, if the pending task is abnormal, the pending task is interrupted, the priority of the pending task is increased, and the to-be-processed task is cached in the processing queue. The priority of the to-be-processed task is increased, so that the to-be-processed task can be preferentially selected from the processing queue to prevent the pending task from waiting for timeout.
  • Case 3 A plurality of to-be-processed tasks assigned to the target graphics processor are processed in parallel by the target graphics processor; wherein the parallel processing may include synchronous serial processing and kernel asynchronous processing.
  • the synchronous processing or the kernel asynchronous processing may be used to perform parallel processing on the plurality of to-be-processed tasks, and the processing manner is not limited. It can effectively save processing time and improve processing efficiency.
  • Case 4 When the target graphics processor processes the processing task, the address of the central processing unit is latched, and the interaction data between the target graphics processor and the central processing unit is transmitted through the DMA controller. For example, when the amount of data of the task to be processed is large, the address of the central processing unit is page-locked by calling the cudaHostRegister interface, and the data interaction between the central processing unit and the graphics processor is realized by the DMA controller, thereby significantly improving Bandwidth, reducing unnecessary copies of data.
  • Case 5 When the target graphics processor processes the processing task, data sharing is performed through different processing threads. Specifically, if multiple to-be-processed tasks are completed by multiple processing threads, data sharing between processing threads can be performed, and after multiple copies of memory and video memory are avoided, sharing of memory and memory can be realized in the processing thread, and the elimination is omitted. The copy process saves a lot of processing time.
  • the target graphics processor can process the processing task by using image processing, feature point tracking, semi-global stereo block matching, radar and camera self-calibration, point cloud tracking, local map, deep learning, etc.
  • the processing thread may include: an image processing thread, a feature point tracking processing thread, a semi-global block matching processing thread (SBM), a radar and a camera self-calibration processing thread (hereinafter referred to as a self-calibration processing thread), Point cloud tracking processing thread, map processing thread, deep learning processing thread (Deep Learning).
  • the image processing thread is configured to receive the original data and process the original data to obtain a grayscale image and an RGB image after the distortion and epipolar correction.
  • the feature point tracking processing thread is used to perform feature point detection and tracking on the image.
  • the semi-global stereo block matching processing thread is used to match the binocular image, obtain a disparity map, and calculate a three-dimensional point cloud.
  • the self-calibration processing thread is used to calibrate the external parameters between the camera and the camera, radar and camera.
  • the point cloud tracking processing thread is used to split the 3D point cloud into different objects, perform target tracking and area detection, and send the result to the map processing thread.
  • the deep learning processing thread is used to detect and track the RGB image and send the result to the map processing thread.
  • the map processing thread is configured to receive the processing result of the point cloud tracking processing thread and the processing result of the deep learning processing thread, and generate the partial map by using the received information.
  • the output data may also be provided to the self-calibration processing thread, and the self-calibration processing thread processes the processing task according to the output data; or, the output data may be provided to the feature point. Tracking the processing thread, wherein the feature point tracking processing thread processes the processing task according to the output data; or, the output data may be provided to the semi-global stereo block matching processing thread, and the semi-global stereo block matching processing thread according to the output data.
  • the processing task is processed; or the output data can be provided to a deep learning processing thread, and the deep learning processing thread processes the processing task according to the output data.
  • the output data may also be provided to the point cloud tracking processing thread, and the point cloud tracking processing thread processes the processing task according to the output data.
  • the output data may also be provided to the map processing thread; in addition, when the deep learning processing thread processes the processing task, the output data may also be provided to the map processing. Threading; based on this, the map processing thread may process the processing task according to the input data provided by the point cloud tracking processing thread and the input data provided by the deep learning processing thread.
  • the embodiment of the present invention further provides a task processing device.
  • the task processing device includes a scheduler and a plurality of graphics processors; and the scheduler is configured to Selecting a task to be processed in the processing queue, and selecting at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors, and assigning the to-be-processed task to the target graphics processor;
  • the target graphics processor is configured to process the to-be-processed task.
  • the scheduler selects a task to be processed from the processing queue, it is specifically used to: if the processing queue is a static processing queue, select a to-be-processed task whose time series and/or data has a dependency from the static processing queue.
  • the scheduler selects a task to be processed from the processing queue, it is specifically used to: if the processing queue is a dynamic processing queue, select a time series and/or data independent pending task from the dynamic processing queue.
  • the scheduler is further configured to receive a task to be processed; if the task to be processed is a static scheduling task with a time series and/or data having a dependency, the task to be processed is cached into a static processing queue;
  • the to-be-processed task is a time-series and/or data-independent dynamic scheduling task, and the to-be-processed task is cached into a dynamic processing queue.
  • the scheduler is further configured to receive a task to be processed, and cache the task to be processed into a task queue. For the task to be processed in the task queue, if the task to be processed is a static sequence task with time series and/or data having dependencies, The pending task is cached to the static processing queue; if the pending task is a time series and/or data independent dynamic scheduling task, the pending task is cached into the dynamic processing queue.
  • the scheduler selects a task to be processed from the processing queue, it is used to: obtain a priority of a task to be processed in the processing queue; and select a high priority task to be processed from the processing queue based on the priority priority.
  • the method is specifically configured to: obtain a task type of the task to be processed in the processing queue; and query the mapping table by using the task type to obtain a priority corresponding to the task type. Level; the mapping table is used to record the correspondence between task types and priorities.
  • the scheduler when the scheduler selects at least one target graphics processor corresponding to the to-be-processed task, the scheduler is specifically configured to: according to state information of the to-be-processed task and/or The resource information of the graphics processor, selecting at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors.
  • the scheduler is further configured to acquire a task type of the to-be-processed task, and query the state information table by using the task type to obtain state information corresponding to the task type, where the state information table is used to record a task type. Corresponding relationship with the status information; determining the obtained status information as status information of the to-be-processed task.
  • the task processing device further includes: a monitor, configured to acquire state information of the to-be-processed task that has been processed, and record the task of the to-be-processed task in the state information table. Corresponding relationship between the type and the status information of the to-be-processed task, and sending the status information table to the scheduler.
  • the status information includes a task processing time; the task processing time is a time difference between the task completion time and the task receiving time.
  • the monitor is further configured to acquire resource information of the graphics processor, and send the resource information of the graphics processor to the scheduler;
  • the scheduler is further configured to acquire resource information of the graphics processor;
  • the resource information includes: the number of idle processing threads; and/or the state of the processing thread; the state is an occupied state or an idle state.
  • the scheduler specifically uses at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors according to state information of the to-be-processed task and/or resource information of the graphics processor
  • the task processing time is determined according to the state information, and different target graphics processors are selected for different to-be-processed tasks for the plurality of to-be-processed tasks whose task processing time is greater than the time threshold.
  • the scheduler specifically uses at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors according to state information of the to-be-processed task and/or resource information of the graphics processor And determining the number of idle processing threads according to the resource information, and selecting a graphics processor with the largest idle processing thread from the plurality of graphics processors as the target graphics processor.
  • the scheduler specifically uses at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors according to state information of the to-be-processed task and/or resource information of the graphics processor And determining, by the resource information, an idle processing thread, and determining, by the graphics processor corresponding to the idle processing thread, a target graphics processor corresponding to the to-be-processed task.
  • the scheduler selects at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors, specifically, the plurality of to-be-processed tasks for parallel processing are The plurality of to-be-processed tasks respectively select corresponding target graphics processors, and the target graphics processors corresponding to different to-be-processed tasks are the same or different.
  • the scheduler selects at least one target graphics processor corresponding to the to-be-processed task
  • the scheduler is specifically configured to: if the processing queue is a static processing queue, from the multiple graphics
  • the processor is configured to query the graphics processor corresponding to the static processing queue, and determine the queried processor as the target graphics processor corresponding to the to-be-processed task.
  • the method is specifically configured to: if there is a task to be processed that has a higher priority than the to-be-processed task, interrupt the to-be-processed task, and process the priority A high to-be-processed task; after the processing of the higher-priority pending task is completed, the to-be-processed task is restored.
  • the target graphics processor is configured to: when the task to be processed is abnormal, interrupt the to-be-processed task, and increase the priority of the to-be-processed task, and The processing task is cached to the processing queue.
  • the target graphics processor is specifically configured to: parallel process a plurality of to-be-processed tasks allocated to the target graphics processor; wherein the parallel processing includes synchronous serial processing and kernel asynchronous processing.
  • the method is specifically configured to: when processing the to-be-processed task, latching an address of the central processing unit, and transmitting, by using a DMA controller, the target graphics processor and Interaction data between central processors.
  • the data processing is performed by using different processing threads when processing the to-be-processed task.
  • the target graphics processor When the target graphics processor performs data sharing through different processing threads, it is specifically used for:
  • the output data is provided to the self-calibration processing thread, and the self-calibration processing thread processes the processing task according to the output data;
  • the output data is provided to a deep learning processing thread, and the deep learning processing thread processes the processing task according to the output data.
  • the method is specifically configured to: when the semi-global stereo block matching processing thread processes the processing task, provide the output data to the point cloud tracking processing thread, and the point cloud tracking processing is performed.
  • the thread processes the processing task according to the output data.
  • the target graphics processor performs data sharing through different processing threads
  • the specific data is used to: when the point cloud tracking processing thread processes the processing task, the output data is provided to the map processing thread; and the deep learning processing thread performs the processing task.
  • the output data is provided to the map processing thread; the map processing thread processes the processing task according to the input data provided by the point cloud tracking processing thread and the input data provided by the deep learning processing thread.
  • the embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores a plurality of computer instructions, and when the computer instructions are executed, implementing the above claims Task processing method.
  • the system, apparatus, module or unit set forth in the above embodiments may be implemented by a computer chip or an entity, or by a product having a certain function.
  • a typical implementation device is a computer, and the specific form of the computer may be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email transceiver, and a game control.
  • embodiments of the invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, embodiments of the invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • these computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the instruction means implements the functions specified in one or more blocks of the flowchart or in a flow or block diagram of the flowchart.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can be embodied in the form of a computer program product embodied on one or more computer-usable storage media (which may include, but not limited to, disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media which may include, but not limited to, disk storage, CD-ROM, optical storage, etc.

Abstract

Provided are a method, a device, and a machine readable storage medium for task processing. The method comprises: selecting a task to be processed from a processing queue; selecting at least one target graphics processor corresponding to the task from a plurality of graphics processors; allocating the task to the target graphics processor; and processing the task by means of the target graphics processor. By use of embodiments of the present invention, real-time data processing is ensured, multiple graphics processors can be reasonably managed, multiple tasks to be processed are effectively scheduled, the utilization rates of the graphics processors are maximized, the processing speed is optimized, accuracy and effectiveness of a processing result are effectively ensured, and reliability is improved.

Description

任务处理方法、设备及机器可读存储介质Task processing method, device and machine readable storage medium 技术领域Technical field
本发明涉及图像处理技术领域,尤其是涉及一种任务处理方法、设备及机器可读存储介质。The present invention relates to the field of image processing technologies, and in particular, to a task processing method, device, and machine readable storage medium.
背景技术Background technique
目前,图像或者雷达等数据的处理过程,可以基于ARM(Advanced RISC Machines,高级精简指令集处理器)、DSP(Digital Signal Processing,数字信号处理)或者CPU(Central Processing Unit,中央处理器)等平台,其能够处理的数据量受到处理器主频、内存大小、传输带宽等因素的影响,在自动驾驶等人工智能应用场景中,传感器数据和采集频率使得数据量级达到GB/s以上,从而导致ARM、DSP或者CPU等平台已经不能满足实时处理的需求。At present, the processing of data such as images or radars can be based on platforms such as ARM (Advanced RISC Machines, Advanced Reduced Instruction Set Processor), DSP (Digital Signal Processing), or CPU (Central Processing Unit). The amount of data that can be processed is affected by factors such as processor frequency, memory size, and transmission bandwidth. In artificial intelligence applications such as autopilot, sensor data and acquisition frequency make the data level reach GB/s or higher, resulting in Platforms such as ARM, DSP or CPU are no longer able to meet the needs of real-time processing.
发明内容Summary of the invention
本发明提供一种任务处理方法、设备及机器可读存储介质。The present invention provides a task processing method, apparatus, and machine readable storage medium.
本发明第一方面,提供一种任务处理方法,应用于包括多个图形处理器的设备,所述方法包括:从处理队列中选择待处理任务;A first aspect of the present invention provides a task processing method for a device including a plurality of graphics processors, the method comprising: selecting a task to be processed from a processing queue;
从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器;将所述待处理任务分配给所述目标图形处理器;Selecting at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors; assigning the to-be-processed task to the target graphics processor;
通过所述目标图形处理器处理所述待处理任务。The to-be-processed task is processed by the target graphics processor.
本发明第二方面,提供一种任务处理设备,包括调度器和多个图形处理器;所述调度器,用于从处理队列中选择待处理任务,并从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器,并将所述待处理任务分配给所述目标图形处理器;A second aspect of the present invention provides a task processing apparatus including a scheduler and a plurality of graphics processors, wherein the scheduler is configured to select a task to be processed from a processing queue and select from the plurality of graphics processors At least one target graphics processor corresponding to the to-be-processed task, and assigning the to-be-processed task to the target graphics processor;
所述目标图形处理器,用于处理所述待处理任务。The target graphics processor is configured to process the to-be-processed task.
本发明第三方面,提供一种计算机可读存储介质,所述计算机可读存储介质上存储有若干计算机指令,计算机指令被执行时实现上述任务处理方法。According to a third aspect of the present invention, a computer readable storage medium is stored, the computer readable storage medium storing a plurality of computer instructions, and the task processing method is implemented when the computer instructions are executed.
基于上述技术方案,本发明实施例中,可以从多个图形处理器中选择与待处理任务对应的目标图形处理器,将待处理任务分配给该目标图形处理器,并通过该目标图形处理器处理待处理任务,从而基于图形处理器进行数据的实时处理,保证数据处理的实时性,可以针对GB/s量级的数据进行实时处理。此外,可以对多个图形处理器进行合理的管理,对多个待处理任务进行有效调度,使图形处理器的利用率得到最大限度的利用,使处理速度达到最优,有效保证处理结果的精度和有效性,增加可靠性。Based on the foregoing technical solution, in the embodiment of the present invention, a target graphics processor corresponding to a task to be processed may be selected from a plurality of graphics processors, a task to be processed is allocated to the target graphics processor, and the target graphics processor is adopted The task to be processed is processed, so that the real-time processing of the data is performed based on the graphics processor, and the real-time performance of the data processing is ensured, and the data of the GB/s level can be processed in real time. In addition, multiple graphics processors can be reasonably managed, and multiple tasks to be processed can be effectively scheduled, so that the utilization of the graphics processor can be utilized to the maximum extent, and the processing speed can be optimized to ensure the accuracy of the processing results. And effectiveness, increasing reliability.
附图说明DRAWINGS
为了更加清楚地说明本发明实施例或者现有技术中的技术方案,下面将对本发明实施例或者现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据本发明实施例的这些附图获得其它的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings to be used in the embodiments of the present invention or in the description of the prior art will be briefly described below. Obviously, the drawings in the following description It is merely some of the embodiments described in the present invention, and those skilled in the art can also obtain other drawings according to the drawings of the embodiments of the present invention.
图1是本发明实施例的应用场景示意图;1 is a schematic diagram of an application scenario of an embodiment of the present invention;
图2是本发明的任务处理方法的一个实施例流程图;2 is a flow chart of an embodiment of a task processing method of the present invention;
图3是本发明的通过目标图形处理器处理待处理任务的示意图;3 is a schematic diagram of processing a task to be processed by a target graphics processor of the present invention;
图4是本发明的通过不同的处理线程进行数据共享的示意图;4 is a schematic diagram of data sharing by different processing threads of the present invention;
图5是本发明的任务处理设备的一个实施例框图。Figure 5 is a block diagram of one embodiment of a task processing device of the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。 另外,在不冲突的情况下,下述的实施例及实施例中的特征可以相互组合。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention. Further, the features of the following embodiments and examples may be combined with each other without conflict.
本发明使用的术语仅仅是出于描述特定实施例的目的,而非用于限制本发明。本发明和权利要求书所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其它含义。应当理解,本文中使用的术语“和/或”是指包含一个或多个相关联的列出项目的任何或所有可能组合。The terminology used herein is for the purpose of describing particular embodiments, The singular forms "a", "the" and "the" It will be understood that the term "and/or" as used herein refers to any and all possible combinations of one or more of the associated listed items.
尽管在本发明可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语用来将同一类型的信息彼此区分开。例如,在不脱离本发明范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,此外,所使用的词语“如果”可以被解释成为“在……时”,或者“当……时”,或者“响应于确定”。Although the terms first, second, third, etc. may be used to describe various information in the present invention, such information should not be limited to these terms. These terms are used to distinguish the same type of information from each other. For example, the first information may also be referred to as the second information without departing from the scope of the invention. Similarly, the second information may also be referred to as the first information. Depending on the context, in addition, the word "if" may be interpreted as "when", or "when", or "in response to determination."
本发明实施例中提出一种任务处理方法,该方法可以应用于包括多个图形处理器(Graphics Processing Unit,简称GPU)的处理设备,对此处理设备的类型不做限制,只要具有多个图形处理器即可。其中,该任务处理方法可以是基于多个图形处理器的多任务实时处理方法,是以多路传感器数据作为输入,其可以应用在自动驾驶、辅助驾驶、室内外作业机器人等应用场景中。此外,上述传感器数据可以是图像数据,如相机到采集的图像数据、雷达(如激光雷达、毫米波雷达等)采集到的图像数据等,对此数据类型不做限制。A method for processing a task is provided in the embodiment of the present invention. The method can be applied to a processing device including a plurality of graphics processing units (GPUs). The type of the processing device is not limited, as long as there are multiple graphics. The processor is fine. The task processing method may be a multi-task real-time processing method based on multiple graphics processors, and the multi-channel sensor data is used as an input, which can be applied in an application scenario such as automatic driving, assisted driving, indoor and outdoor working robots. In addition, the above sensor data may be image data, such as camera to captured image data, image data collected by radar (such as laser radar, millimeter wave radar, etc.), and the data type is not limited.
在人工智能应用场景,如数据需要实时处理的人工智能应用场景,随着传感器数量的增加、采集帧率的增加,数据量成倍增加,ARM、DSP或者CPU等已无法实现数据的实时处理,因此,本实施例提出基于多个图形处理器的多任务实时处理方法,可以保证数据的实时处理、处理结果的精度和有效性。In artificial intelligence application scenarios, such as artificial intelligence application scenarios where data needs to be processed in real time, as the number of sensors increases and the acquisition frame rate increases, the amount of data increases exponentially, and ARM, DSP, or CPU cannot implement real-time processing of data. Therefore, the present embodiment proposes a multi-task real-time processing method based on multiple graphics processors, which can ensure real-time processing of data and accuracy and effectiveness of processing results.
其中,考虑到需要并行处理多个任务,且不同任务具有不同的处理要求,本实施例中,借助图形处理器强大的计算性能,提出基于多图形处理器的实时任务处理,可以实现多任务的并行调度,并实现任务的公平性、优先级、独立性、中断以及恢复等功能,可以对多个图形处理器进行合理的管理,对 多个待处理任务进行有效调度,使图形处理器的利用率得到最大限度的利用,使处理速度达到最优,有效保证处理结果的精度和有效性,增加可靠性。In this embodiment, in consideration of the need to process multiple tasks in parallel, and different tasks have different processing requirements, in this embodiment, by using the powerful computing performance of the graphics processor, real-time task processing based on multiple graphics processors is proposed, and multi-task can be realized. Parallel scheduling, and achieve tasks such as fairness, priority, independence, interruption and recovery, can manage multiple graphics processors reasonably, effectively schedule multiple tasks to be processed, and make use of graphics processors The utilization rate is maximized, the processing speed is optimized, the accuracy and effectiveness of the processing result are effectively ensured, and the reliability is increased.
参见图1所示,为本发明实施例的应用场景示意图,多任务调度模块用于对任务进行缓存,完成任务调度,向上层应用响应任务处理结果等。处理模块由多个图形处理器组成,通过图形处理器完成任务的实时处理。统计监测模块用于对任务的状态信息和图形处理器的资源信息进行监测统计,将任务的状态信息和图形处理器的资源信息反馈给多任务调度模块,由多任务调度模块利用任务的状态信息和图形处理器的资源信息进行任务调度。FIG. 1 is a schematic diagram of an application scenario according to an embodiment of the present invention. The multi-task scheduling module is configured to cache a task, complete task scheduling, and apply a response task processing result to an upper layer. The processing module is composed of a plurality of graphics processors, and the real-time processing of tasks is completed by the graphics processor. The statistical monitoring module is configured to monitor and collect the status information of the task and the resource information of the graphics processor, and feed back the status information of the task and the resource information of the graphics processor to the multi-task scheduling module, and the multi-task scheduling module utilizes the status information of the task. Task scheduling with resource information of the graphics processor.
实施例一:Embodiment 1:
如图2所示,为本发明实施例中的任务处理方法的流程图,该方法包括:As shown in FIG. 2, it is a flowchart of a task processing method in an embodiment of the present invention, where the method includes:
步骤201,从处理队列中选择待处理任务。Step 201: Select a task to be processed from the processing queue.
具体的,在接收到任务后,可以将该任务存储到处理队列中,在需要对任务进行处理时,则可以从该处理队列中选择任务,对此选择方式不做限制。为了区分方便,可以将从处理队列中选择的任务称为待处理任务。Specifically, after receiving the task, the task may be stored in the processing queue. When the task needs to be processed, the task may be selected from the processing queue, and the selection manner is not limited. For the sake of convenience, the task selected from the processing queue may be referred to as a pending task.
步骤202,从多个图形处理器(即所有图形处理器)中选择与该待处理任务对应的至少一个目标图形处理器。其中,为了区分方便,可以将从多个图形处理器中选择的图形处理器,称为与待处理任务对应的目标图形处理器。Step 202: Select at least one target graphics processor corresponding to the to-be-processed task from a plurality of graphics processors (ie, all graphics processors). Wherein, for convenience of distinction, a graphics processor selected from a plurality of graphics processors may be referred to as a target graphics processor corresponding to a task to be processed.
具体的,在从处理队列中选择待处理任务后,则可以根据该待处理任务的状态信息和/或每个图形处理器的资源信息,从所有图形处理器中选择与该待处理任务对应的至少一个目标图形处理器,对此选择方式不做限制。Specifically, after the task to be processed is selected from the processing queue, the status information of the to-be-processed task and/or the resource information of each graphics processor may be selected, and all the graphics processors are selected to correspond to the to-be-processed task. At least one target graphics processor, there is no limit to this selection.
步骤203,将该待处理任务分配给该目标图形处理器。Step 203: Assign the to-be-processed task to the target graphics processor.
具体的,在从处理队列中选择待处理任务,并为该待处理任务选择目标图形处理器之后,就可以将该待处理任务分配给该目标图形处理器,也就是说,可以将该待处理任务分配给该目标图形处理器的空闲处理线程。例如,可以将待处理任务分配给该目标图形处理器的全部或者部分空闲处理线程。Specifically, after selecting a task to be processed from the processing queue and selecting a target graphics processor for the to-be-processed task, the to-be-processed task may be allocated to the target graphics processor, that is, the to-be-processed The task is assigned to the idle processing thread of the target graphics processor. For example, a pending task can be assigned to all or part of the idle processing thread of the target graphics processor.
步骤204,通过该目标图形处理器处理该待处理任务。Step 204: Process the to-be-processed task by the target graphics processor.
具体的,在将该待处理任务分配给该目标图形处理器的空闲处理线程后,就可以通过该空闲处理线程处理该待处理任务,对此处理过程不做限制。例如,该待处理任务可以包括数据(如传感器数据等)和任务类型,目标图形处理器的该空闲处理线程可以根据该数据,执行与该任务类型对应的处理。Specifically, after the task to be processed is allocated to the idle processing thread of the target graphics processor, the to-be-processed task can be processed by the idle processing thread, and the processing is not limited. For example, the pending task may include data (such as sensor data, etc.) and a task type, and the idle processing thread of the target graphics processor may perform processing corresponding to the task type based on the data.
基于上述技术方案,本发明实施例中,可以从多个图形处理器中选择与待处理任务对应的目标图形处理器,将待处理任务分配给该目标图形处理器,并通过该目标图形处理器处理待处理任务,从而基于图形处理器进行数据的实时处理,保证数据处理的实时性,可以针对GB/s量级的数据进行实时处理。此外,可以对多个图形处理器进行合理的管理,对多个待处理任务进行有效调度,使图形处理器的利用率得到最大限度的利用,使处理速度达到最优,有效保证处理结果的精度和有效性,增加可靠性。Based on the foregoing technical solution, in the embodiment of the present invention, a target graphics processor corresponding to a task to be processed may be selected from a plurality of graphics processors, a task to be processed is allocated to the target graphics processor, and the target graphics processor is adopted The task to be processed is processed, so that the real-time processing of the data is performed based on the graphics processor, and the real-time performance of the data processing is ensured, and the data of the GB/s level can be processed in real time. In addition, multiple graphics processors can be reasonably managed, and multiple tasks to be processed can be effectively scheduled, so that the utilization of the graphics processor can be utilized to the maximum extent, and the processing speed can be optimized to ensure the accuracy of the processing results. And effectiveness, increasing reliability.
实施例二:Embodiment 2:
在步骤201之前,在接收到待处理任务后,若该待处理任务是时间序列和/或数据具有依赖关系的静态调度任务,则可以将该待处理任务缓存到静态处理队列;若该待处理任务是时间序列和/或数据独立的动态调度任务,则可以将该待处理任务缓存到动态处理队列。在此基础上,在步骤201中,从处理队列中选择待处理任务,可以包括:若该处理队列是静态处理队列,则可以从该静态处理队列中选择时间序列和/或数据具有依赖关系的待处理任务;若该处理队列是动态处理队列,则可以从该动态处理队列中选择时间序列和/或数据独立的待处理任务。Before the step 201, after the task to be processed is received, if the task to be processed is a static scheduling task whose time series and/or data has a dependency, the task to be processed may be cached into the static processing queue; The task is a time series and/or data independent dynamic scheduling task, and the pending task can be cached into the dynamic processing queue. On this basis, in step 201, selecting a task to be processed from the processing queue may include: if the processing queue is a static processing queue, the time series and/or the data may be selected from the static processing queue. A task to be processed; if the processing queue is a dynamic processing queue, a time series and/or data independent pending task can be selected from the dynamic processing queue.
在本实施例中,可以将处理队列划分为静态处理队列(Static)和动态处理队列(Dynamic)。其中,该静态处理队列可以是用于存储时间序列具有依赖关系的任务,或者,存储数据具有依赖关系的任务,或者,存储时间序列和数据均具有依赖关系的任务;为了区分方便,可以将上述具有依赖关系的 任务称为静态调度任务。此外,该动态处理队列可以是用于存储时间序列独立的任务,或者,存储数据独立的任务,或者,存储时间序列和数据均独立的任务;为了区分方便,可以将上述独立的任务称为动态调度任务。In this embodiment, the processing queue can be divided into a static processing queue (Static) and a dynamic processing queue (Dynamic). The static processing queue may be a task for storing a time series having a dependency relationship, or a task for storing data having a dependency relationship, or a task for storing a time series and data having a dependency relationship; A task with dependencies is called a static scheduled task. In addition, the dynamic processing queue may be used to store time series independent tasks, or to store data independent tasks, or to store time series and data independent tasks; for the convenience of distinction, the above independent tasks may be referred to as dynamic Schedule tasks.
而且,针对存储在静态处理队列的所有任务,可以在同一个图形处理器中进行处理,或者,可以在相关的图形处理器中进行处理,也就是说,对图形处理器有限制;可以预先配置静态处理队列与图形处理器(如图形处理器1)的对应关系,因此,该静态处理队列的所有任务均需要由图形处理器1进行处理。此外,针对存储在动态处理队列的所有任务,可以在同一个图形处理器中进行处理,也可以在不同图形处理器中进行处理,也就是说,对图形处理器没有限制,可以将动态处理队列的所有任务分配到任意的图形处理器。Moreover, all tasks stored in the static processing queue can be processed in the same graphics processor, or can be processed in the associated graphics processor, that is, there are restrictions on the graphics processor; The correspondence between the static processing queue and the graphics processor (such as graphics processor 1), therefore, all tasks of the static processing queue need to be processed by the graphics processor 1. In addition, all tasks stored in the dynamic processing queue can be processed in the same graphics processor or processed in different graphics processors. That is, there is no limit to the graphics processor, and the dynamic processing queue can be processed. All tasks are assigned to any graphics processor.
其中,时间序列具有依赖关系的任务是指:一个任务的处理依赖于前一个任务。例如,在H264编码中,非I帧任务依赖于前一次出现的I帧任务。基于此,多任务调度模块在接收到任务1和任务2后,若任务1和任务2在时间序列具有依赖关系,则可以将任务1和任务2均存储到静态处理队列。此外,数据具有依赖关系的任务是指:一个数据的处理依赖于前一个数据,对此不做限制。而且,针对存储在静态处理队列的所有任务,可以在同一个图形处理器中进行处理,或者,可以在相关的图形处理器中进行处理。Among them, the time series has a dependency task: the processing of one task depends on the previous task. For example, in H264 encoding, non-I frame tasks depend on the previous I frame task. Based on this, after receiving the task 1 and the task 2, if the task 1 and the task 2 have a dependency in the time series, the task 1 and the task 2 can be stored in the static processing queue. In addition, the task of data dependency is that the processing of one data depends on the previous data, and there is no limitation on this. Moreover, all tasks stored in the static processing queue can be processed in the same graphics processor or can be processed in the associated graphics processor.
其中,时间序列独立的任务是指:一个任务的处理不依赖于前一个任务,即任务是独立任务。例如,图像转换和点云算法处理等任务,各任务就是独立的任务,不依赖于前一个任务。基于此,多任务调度模块在接收到任务3后,若任务3在时间序列上独立,则可以将任务3存储到动态处理队列。此外,数据独立的任务是指:一个数据的处理与其它数据无关,对此不做限制。而且,针对存储在动态处理队列的所有任务,可以在同一个图形处理器中进行处理,也可以在不同图形处理器中进行处理。Among them, the time series independent task means that the processing of one task does not depend on the previous task, that is, the task is an independent task. For example, tasks such as image conversion and point cloud algorithm processing, each task is an independent task, and does not depend on the previous task. Based on this, after receiving the task 3, if the task 3 is independent in time series, the task 3 can store the task 3 to the dynamic processing queue. In addition, the data-independent task means that the processing of one data is independent of other data, and there is no limitation on this. Moreover, all tasks stored in the dynamic processing queue can be processed in the same graphics processor or in different graphics processors.
针对静态处理队列中的每个待处理任务,多任务调度模块可以从静态处理队列中选择待处理任务,在待处理任务处理完成或者未处理完成时,可以 从静态处理队列选择另一待处理任务。针对动态处理队列中的每个待处理任务,多任务调度模块可以从动态处理队列中选择待处理任务,在待处理任务处理完成或者未处理完成时,即可以从动态处理队列选择另一待处理任务。For each pending task in the static processing queue, the multi-task scheduling module may select a task to be processed from the static processing queue, and may select another pending task from the static processing queue when the pending task is processed or not processed. . For each pending task in the dynamic processing queue, the multi-task scheduling module may select a to-be-processed task from the dynamic processing queue, and may select another pending processing from the dynamic processing queue when the pending task is processed or not processed. task.
实施例三:Embodiment 3:
在步骤201之前,在接收到待处理任务后,可以将该待处理任务缓存到任务队列(也可以称为大队列或Task Queue),该任务队列可以包括多个待处理任务。针对任务队列中的每个待处理任务,若该待处理任务是时间序列和/或数据具有依赖关系的静态调度任务,则可以将该待处理任务缓存到静态处理队列;若该待处理任务是时间序列和/或数据独立的动态调度任务,则可以将待处理任务缓存到动态处理队列。在此基础上,在步骤201中,从处理队列中选择待处理任务,可以包括:若该处理队列是静态处理队列,则可以从该静态处理队列中选择时间序列和/或数据具有依赖关系的待处理任务;若该处理队列是动态处理队列,则可以从该动态处理队列中选择时间序列和/或数据独立的待处理任务。实施例三的处理过程与实施例二类似,在此不再赘述。Before the step 201, after the task to be processed is received, the task to be processed may be cached into a task queue (which may also be referred to as a large queue or a Task Queue), and the task queue may include multiple tasks to be processed. For each pending task in the task queue, if the pending task is a static scheduled task whose time series and/or data has a dependency, the pending task may be cached into a static processing queue; if the pending task is Time-series and/or data-independent dynamic scheduling tasks can cache pending tasks to a dynamic processing queue. On this basis, in step 201, selecting a task to be processed from the processing queue may include: if the processing queue is a static processing queue, the time series and/or the data may be selected from the static processing queue. A task to be processed; if the processing queue is a dynamic processing queue, a time series and/or data independent pending task can be selected from the dynamic processing queue. The processing procedure of the third embodiment is similar to that of the second embodiment, and details are not described herein again.
实施例四:Embodiment 4:
在步骤201中,从处理队列中选择待处理任务,可以包括但不限于:获取该处理队列中的每个待处理任务的优先级;基于优先级优先从该处理队列中选择高优先级的待处理任务。其中,获取该处理队列中的每个待处理任务的优先级,可以包括:针对该处理队列中的每个待处理任务,可以获取该待处理任务的任务类型;然后,通过该任务类型查询映射表,得到该任务类型对应的优先级;其中,所述映射表用于记录任务类型与优先级的对应关系。In step 201, selecting a task to be processed from the processing queue may include, but is not limited to, acquiring a priority of each to-be-processed task in the processing queue; and selecting a high-priority priority from the processing queue based on the priority priority. Handling tasks. The obtaining the priority of the to-be-processed task in the processing queue may include: obtaining, for each pending task in the processing queue, a task type of the to-be-processed task; and then querying the mapping by using the task type The table obtains a priority corresponding to the task type, where the mapping table is used to record a correspondence between the task type and the priority.
其中,多任务调度模块可以配置映射表,参见表1所示,该映射表用于记录任务类型与优先级的对应关系。在此基础上,假设待处理任务1包括任务类型A,待处理任务2包括任务类型B,待处理任务3包括任务类型C,则多任务调度模块通过任务类型A查询映射表,得到待处理任务1的优先级5, 通过任务类型B查询映射表,得到待处理任务2的优先级3,通过任务类型C查询映射表,得到待处理任务3的优先级1。显然,由于待处理任务1的优先级5高于待处理任务2的优先级3,待处理任务2的优先级3高于待处理任务3的优先级1,因此,多任务调度模块先从处理队列中选择待处理任务1,然后选择待处理任务2,然后选择待处理任务3,以此类推。The multi-task scheduling module can be configured with a mapping table. As shown in Table 1, the mapping table is used to record the correspondence between the task type and the priority. On the basis of this, it is assumed that the task to be processed 1 includes the task type A, the task to be processed 2 includes the task type B, and the task to be processed 3 includes the task type C, and the multitask scheduling module queries the mapping table through the task type A to obtain the task to be processed. Priority 5 of 1 is queried by the task type B to obtain the priority 3 of the task 2 to be processed, and the mapping table is queried by the task type C to obtain the priority 1 of the task 3 to be processed. Obviously, since the priority 5 of the task 1 to be processed is higher than the priority 3 of the task 2 to be processed, the priority 3 of the task 2 to be processed is higher than the priority 1 of the task 3 to be processed, and therefore, the multitask scheduling module first processes Select pending task 1 in the queue, then select pending task 2, then select pending task 3, and so on.
表1Table 1
任务类型Task type 优先级priority
任务类型ATask type A 55
任务类型BTask type B 33
任务类型CTask type C 11
实施例五:为了选择目标图形处理器,则可以先获取待处理任务的状态信息和/或图形处理器的资源信息,以下对这些信息的获取过程进行详细说明。Embodiment 5: In order to select a target graphics processor, state information of a task to be processed and/or resource information of a graphics processor may be acquired first, and the acquisition process of the information is described in detail below.
情况一、多任务调度模块获取待处理任务的状态信息。Case 1: The multi-task scheduling module obtains status information of the task to be processed.
在待处理任务的处理过程中,统计监测模块可以获取已经完成处理的待处理任务的状态信息,并在状态信息表中记录该待处理任务的任务类型与该待处理任务的状态信息的对应关系;将该状态信息表发送给多任务调度模块。在此基础上,多任务调度模块在从处理队列中选择待处理任务后,先获取该待处理任务的任务类型,并通过该任务类型查询状态信息表,得到与该任务类型对应的状态信息,并将得到的状态信息确定为待处理任务的状态信息。During the processing of the to-be-processed task, the statistical monitoring module may obtain the status information of the to-be-processed task that has been processed, and record the correspondence between the task type of the to-be-processed task and the status information of the to-be-processed task in the status information table. Send the status information table to the multitasking scheduling module. On the basis of the task, the multi-task scheduling module obtains the task type of the task to be processed after selecting the task to be processed from the processing queue, and queries the state information table through the task type to obtain state information corresponding to the task type. The obtained status information is determined as status information of the task to be processed.
例如,在处理任务1、待处理任务2和待处理任务3已经完成处理后,统计监测模块可以得到表2所示的状态信息表,并将该状态信息表发送给多任务调度模块。在此基础上,多任务调度模块从处理队列中选择待处理任务4后,若待处理任务4的任务类型为任务类型A,则可以得到与任务类型A对应的状态信息A,也就是说,所述待处理任务4的状态信息是状态信息A。For example, after the processing task 1, the pending task 2, and the to-be-processed task 3 have been processed, the statistical monitoring module can obtain the status information table shown in Table 2, and send the status information table to the multi-task scheduling module. On the basis of this, after the multi-task scheduling module selects the to-be-processed task 4 from the processing queue, if the task type of the task to be processed 4 is the task type A, the state information A corresponding to the task type A can be obtained, that is, The status information of the task 4 to be processed is status information A.
表2Table 2
任务类型Task type 状态信息status information
任务类型ATask type A 状态信息AStatus information A
任务类型BTask type B 状态信息BStatus information B
任务类型CTask type C 状态信息CStatus information C
情况二、多任务调度模块获取图形处理器的资源信息。Case 2: The multitasking scheduling module acquires resource information of the graphics processor.
在待处理任务的处理过程中,统计监测模块可以获取(如周期性获取)每个图形处理器的资源信息,并将每个图形处理器的资源信息发送给多任务调度模块,这样,多任务调度模块可以得到每个图形处理器的资源信息。During the processing of the to-be-processed task, the statistical monitoring module may acquire (eg, periodically acquire) resource information of each graphics processor, and send the resource information of each graphics processor to the multi-task scheduling module, so that multi-tasking The scheduling module can obtain resource information for each graphics processor.
在上述实施例中,状态信息可以包括任务处理时间,该任务处理时间是任务完成时间与任务接收时间的时间差。此外,资源信息可以包括:空闲处理线程的数量;和/或,处理线程的状态(如占用状态或者空闲状态)。In the above embodiment, the status information may include a task processing time, which is a time difference between the task completion time and the task reception time. Further, the resource information may include: the number of idle processing threads; and/or the state of the processing thread (eg, occupied state or idle state).
其中,在每个待处理任务的处理过程中,统计监测模块可以对待处理任务的状态信息进行统计,并将任务类型与状态信息的对应关系通知给多任务调度模块。具体的,待处理任务的处理过程可以经过三个阶段,任务接收缓存阶段、任务调度分发阶段和任务处理阶段,因此,统计监测模块可以对上述三个阶段消耗的时间进行统计,而这三个阶段消耗的时间也就是待处理任务的状态信息,即该待处理任务的任务处理时间。此外,任务处理时间可以是任务完成时间与任务接收时间的时间差,因此,统计监测模块还可以对待处理任务的任务接收时间和任务完成时间进行统计,并计算任务完成时间与任务接收时间的时间差,该时间差就是该待处理任务的任务处理时间。The statistical monitoring module may perform statistics on the status information of the processing task and notify the multi-task scheduling module of the correspondence between the task type and the status information. Specifically, the processing of the to-be-processed task may go through three stages, a task receiving buffer stage, a task scheduling distribution stage, and a task processing stage. Therefore, the statistical monitoring module may perform statistics on the time consumed by the above three stages, and the three The time consumed by the phase is also the state information of the task to be processed, that is, the task processing time of the task to be processed. In addition, the task processing time may be the time difference between the task completion time and the task receiving time. Therefore, the statistical monitoring module may also perform statistics on the task receiving time and the task completion time of the processing task, and calculate the time difference between the task completion time and the task receiving time. The time difference is the task processing time of the to-be-processed task.
其中,统计监测模块将任务类型与任务处理时间的对应关系发送给多任务调度模块时,由于任务类型对应多个待处理任务,每个待处理任务具有一个任务处理时间,不同待处理任务对应的任务处理时间可以相同或不同。基于此,从任务类型对应的多个待处理任务的任务处理时间中选择最大值,将最大值作为该任务类型对应的任务处理时间。或,从任务类型对应的多个待 处理任务的任务处理时间中选择最小值,将最小值作为该任务类型对应的任务处理时间。或,计算任务类型对应的多个待处理任务的任务处理时间的平均值,将平均值作为该任务类型对应的任务处理时间。或,还可以将上述最大值、最小值、平均值均作为任务类型对应的任务处理时间,对此不做限制。When the statistical monitoring module sends the corresponding relationship between the task type and the task processing time to the multi-task scheduling module, each task to be processed has one task processing time and corresponding to the different tasks to be processed. Task processing times can be the same or different. Based on this, the maximum value is selected from the task processing time of the plurality of to-be-processed tasks corresponding to the task type, and the maximum value is used as the task processing time corresponding to the task type. Or, select a minimum value from the task processing time of the plurality of to-be-processed tasks corresponding to the task type, and use the minimum value as the task processing time corresponding to the task type. Or, the average value of the task processing time of the plurality of to-be-processed tasks corresponding to the task type is calculated, and the average value is taken as the task processing time corresponding to the task type. Alternatively, the maximum value, the minimum value, and the average value may be used as the task processing time corresponding to the task type, and no limitation is imposed thereon.
其中,在每个待处理任务的处理过程中,统计监测模块可以对图形处理器的资源信息进行统计,并将图形处理器的资源信息通知给多任务调度模块。具体的,图形处理器可以支持多任务并行处理,每个图形处理器具有多个处理线程,待处理任务是分配到处理线程进行处理,因此,统计监测模块可以对每个图形处理器的处理线程进行监测,如监测图形处理器的空闲处理线程的数量、监测图形处理器的处理线程的状态(如占用状态或空闲状态)等。The statistical monitoring module may perform statistics on the resource information of the graphics processor and notify the multi-task scheduling module of the resource information of the graphics processor during the processing of each to-be-processed task. Specifically, the graphics processor can support multi-task parallel processing, each graphics processor has multiple processing threads, and the to-be-processed tasks are allocated to the processing thread for processing. Therefore, the statistical monitoring module can process threads for each graphics processor. Monitoring is performed, such as monitoring the number of idle processing threads of the graphics processor, monitoring the state of the processing thread of the graphics processor (eg, occupied state or idle state).
实施例六:Example 6:
在步骤202中,从多个图形处理器中选择与该待处理任务对应的至少一个目标图形处理器,可以包括:根据该待处理任务的状态信息,从多个图形处理器中选择与该待处理任务对应的目标图形处理器。或者,根据每个图形处理器的资源信息,从多个图形处理器中选择与该待处理任务对应的目标图形处理器。或者,根据该待处理任务的状态信息和每个图形处理器的资源信息,从多个图形处理器中选择与该待处理任务对应的目标图形处理器。In step 202, selecting at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors may include: selecting and waiting from the plurality of graphics processors according to the state information of the to-be-processed task The target graphics processor corresponding to the processing task. Alternatively, a target graphics processor corresponding to the to-be-processed task is selected from a plurality of graphics processors according to resource information of each graphics processor. Alternatively, the target graphics processor corresponding to the to-be-processed task is selected from the plurality of graphics processors according to the status information of the to-be-processed task and the resource information of each graphics processor.
情况一、根据待处理任务的状态信息,从多个图形处理器中选择与该待处理任务对应的目标图形处理器,可以包括:根据待处理任务的状态信息确定任务处理时间,针对任务处理时间大于时间阈值的多个待处理任务(表示其任务处理时间比较长),为不同的待处理任务选择不同的目标图形处理器。In the first case, the target graphics processor corresponding to the to-be-processed task is selected from the plurality of graphics processors according to the status information of the to-be-processed task, and may include: determining task processing time according to status information of the to-be-processed task, and processing time for the task A plurality of pending tasks that are greater than the time threshold (indicating that their task processing time is relatively long), and different target graphics processors are selected for different pending tasks.
其中,可以根据经验设置时间阈值,当任务处理时间大于时间阈值时,表示处理时间比较长,因此,通过将任务处理时间大于时间阈值的多个待处理任务分配到不同的图形处理器,从而保证每个待处理任务被及时调度,对所有图形处理器的资源进行合理分配和利用,防止任务处理超时,保证处理时间比较大的任务不在同一时刻占用同一个图形处理器的多个处理线程。The time threshold may be set according to experience. When the task processing time is greater than the time threshold, the processing time is relatively long. Therefore, by assigning multiple to-be-processed tasks whose task processing time is greater than the time threshold to different graphics processors, it is ensured. Each task to be processed is scheduled in time, and all the resources of the graphics processor are allocated and utilized reasonably, to prevent the task processing from timing out, and to ensure that tasks with relatively large processing time do not occupy multiple processing threads of the same graphics processor at the same time.
情况二、根据图形处理器的资源信息,从多个图形处理器中选择与待处理任务对应的目标图形处理器,可以包括:根据图形处理器的资源信息确定空闲处理线程的数量,并从多个图形处理器中选择空闲处理线程最多的图形处理器作为目标图形处理器。由于空闲处理线程数量最多的图形处理器,是资源利用最少的图形处理器,因此,选择空闲处理线程数量最多的图形处理器作为目标图形处理器,可以对所有图形处理器的资源进行合理分配和利用。In the second case, the target graphics processor corresponding to the to-be-processed task is selected from the plurality of graphics processors according to the resource information of the graphics processor, and may include: determining the number of idle processing threads according to the resource information of the graphics processor, and Among the graphics processors, the graphics processor with the most idle processing threads is selected as the target graphics processor. Since the graphics processor with the largest number of idle processing threads is the least utilized graphics processor, the graphics processor with the largest number of idle processing threads is selected as the target graphics processor, and the resources of all graphics processors can be reasonably allocated and use.
情况三、根据图形处理器的资源信息,从多个图形处理器中选择与待处理任务对应的目标图形处理器,可以包括:根据图形处理器的资源信息确定空闲处理线程,并将空闲处理线程对应的图形处理器确定为目标图形处理器。由于是将空闲处理线程对应的图形处理器确定为目标图形处理器,因此将待处理任务分配给目标图形处理器后,目标图形处理器具有处理待处理任务的空闲处理线程,避免目标图形处理器不具有处理待处理任务的空闲处理线程。In the third case, the target graphics processor corresponding to the to-be-processed task is selected from the plurality of graphics processors according to the resource information of the graphics processor, and may include: determining an idle processing thread according to the resource information of the graphics processor, and setting the idle processing thread The corresponding graphics processor is determined to be the target graphics processor. Since the graphics processor corresponding to the idle processing thread is determined as the target graphics processor, after the task to be processed is allocated to the target graphics processor, the target graphics processor has an idle processing thread that processes the to-be-processed task, avoiding the target graphics processor. There are no idle processing threads that handle pending tasks.
其中,当处理线程的状态为空闲状态时,则表示该处理线程是空闲处理线程,当处理线程的状态为占用状态时,则表示该处理线程是占用处理线程。显然,由于是将空闲处理线程对应的图形处理器确定为目标图形处理器,因此,在处理线程空闲时才分配待处理任务,保证待处理任务能够被及时处理。Wherein, when the state of the processing thread is an idle state, it indicates that the processing thread is an idle processing thread, and when the state of the processing thread is an occupied state, it indicates that the processing thread is an occupied processing thread. Obviously, since the graphics processor corresponding to the idle processing thread is determined as the target graphics processor, the pending task is allocated when the processing thread is idle, so that the pending task can be processed in time.
情况四、根据待处理任务的状态信息和每个图形处理器的资源信息,从多个图形处理器中选择与该待处理任务对应的目标图形处理器,包括:针对任务处理时间大于时间阈值的多个待处理任务,为不同的待处理任务选择不同的目标图形处理器;在选择目标图形处理器时,从多个图形处理器中选择空闲处理线程较多的图形处理器,或选择具有空闲处理线程的图形处理器。Case 4: selecting, according to the status information of the task to be processed and the resource information of each graphics processor, a target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors, including: for the task processing time being greater than a time threshold Multiple pending tasks, selecting different target graphics processors for different pending tasks; selecting a graphics processor with more idle processing threads from multiple graphics processors when selecting the target graphics processor, or selecting to have idle A graphics processor that processes threads.
实施例七:Example 7:
在步骤202中,从多个图形处理器中选择与该待处理任务对应的至少一个目标图形处理器,可以包括:针对并行处理的多个待处理任务,为多个待处理任务分别选择对应的目标图形处理器,不同待处理任务对应的目标图形处理器相同或者不同。例如,当不同待处理任务对应不同的目标图形处理器 时,可以将并行处理的多个待处理任务分别到多个图形处理器,从而对所有图形处理器的资源进行合理分配和利用,保证每个待处理任务得到及时处理。In step 202, selecting at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors may include: selecting, for a plurality of to-be-processed tasks for parallel processing, corresponding to each of the plurality of to-be-processed tasks The target graphics processor is the same or different from the target graphics processor corresponding to different pending tasks. For example, when different to-be-processed tasks correspond to different target graphics processors, multiple to-be-processed tasks processed in parallel may be separately distributed to multiple graphics processors, thereby rationally allocating and utilizing resources of all graphics processors to ensure that each The pending tasks are processed in a timely manner.
在步骤202中,从多个图形处理器中选择与该待处理任务对应的至少一个目标图形处理器,可以包括:若处理队列是静态处理队列,则可以从多个图形处理器中查询与该静态处理队列对应的图形处理器(在上述实施例中已经介绍,可以预先配置静态处理队列与图形处理器的对应关系,基于所述对应关系,就可以查询到与该静态处理队列对应的图形处理器),并将查询到的图形处理器确定为该待处理任务对应的目标图形处理器。若处理队列是动态处理队列,则从所有图形处理器中选择目标图形处理器,参见上述实施例。In step 202, selecting at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors may include: if the processing queue is a static processing queue, querying from the plurality of graphics processors The graphics processor corresponding to the static processing queue (in the above embodiment, the correspondence between the static processing queue and the graphics processor may be pre-configured, and based on the correspondence, the graphics processing corresponding to the static processing queue may be queried. And determining the queried processor to be the target graphics processor corresponding to the to-be-processed task. If the processing queue is a dynamic processing queue, the target graphics processor is selected from all graphics processors, see the above embodiment.
实施例八:Example 8:
在步骤204中,通过该目标图形处理器处理该待处理任务,可以包括:目标图形处理器可以利用图像处理、特征点跟踪、半全局立体块匹配、雷达与相机自标定、点云跟踪、本地地图、深度学习等算法,对待处理任务进行处理,对此过程不做限制。例如,在利用图像处理算法对待处理任务进行处理时,可以对源图像进行格式转换、畸变校正,输出预期格式或者特殊格式的图像数据。又例如,在利用特征点跟踪算法对待处理任务进行处理时,可以利用图像序列中像素在时间域上的变化、相邻帧之间的相关性,获得上一帧与当前帧之间的对应关系,从而计算相邻帧之间物体的运动信息。又例如,在利用深度学习算法对待处理任务进行处理时,可以根据同一目标对象在双目成像中的视差和双目之间的位置关系,计算目标对象相对于相机的深度。In step 204, processing the to-be-processed task by the target graphics processor may include: the target graphics processor may utilize image processing, feature point tracking, semi-global stereo block matching, radar and camera self-calibration, point cloud tracking, local Algorithms such as maps and deep learning, which deal with processing tasks, do not limit this process. For example, when the image processing algorithm is used to process the processing task, the source image may be subjected to format conversion, distortion correction, and image data of an expected format or a special format may be output. For example, when the feature point tracking algorithm is used to process the processing task, the correspondence between the previous frame and the current frame can be obtained by using the change of the pixel in the image sequence in the time domain and the correlation between adjacent frames. , thereby calculating motion information of an object between adjacent frames. For another example, when the processing task is processed by the deep learning algorithm, the depth of the target object relative to the camera can be calculated according to the positional relationship between the parallax and the binocular in the binocular imaging of the same target object.
以下结合具体情况,对通过目标图形处理器处理待处理任务进行说明。The following describes the task to be processed by the target graphics processor in combination with specific situations.
情况一、在通过目标图形处理器处理待处理任务(如低优先级的待处理任务)时,若存在比低优先级的待处理任务的优先级更高的待处理任务,则中断低优先级的待处理任务,通过目标图形处理器处理优先级更高的待处理任务;在优先级更高的待处理任务处理完成后,恢复低优先级的待处理任务。Case 1: When a pending task (such as a low priority pending task) is processed by the target graphics processor, if there is a pending task with a higher priority than the low priority pending task, the low priority is interrupted. The pending task is to process the higher priority pending task through the target graphics processor; after the higher priority pending task is processed, the low priority pending task is restored.
参见图3所示,假设目标图形处理器正在处理待处理任务1,此时接收到一个新的待处理任务2。目标图形处理器判断是否存在空闲处理线程。如果存 在,则将待处理任务2分配给空闲处理线程。如果不存在,则比较待处理任务1的优先级与待处理任务2的优先级。若待处理任务1的优先级高,则继续处理待处理任务1,待处理任务2处于等待状态;在待处理任务1处理完成后,处理待处理任务2。若待处理任务2的优先级高,则可以中断待处理任务1,处理待处理任务2;在待处理任务2处理完成后,恢复待处理任务1。Referring to FIG. 3, it is assumed that the target graphics processor is processing the pending task 1, and a new pending task 2 is received. The target graphics processor determines if there is an idle processing thread. If there is one, the pending task 2 is assigned to the idle processing thread. If not, compare the priority of the task 1 to be processed with the priority of the task 2 to be processed. If the priority of the task to be processed 1 is high, the task 1 to be processed is continuously processed, and the task 2 to be processed is in a waiting state; after the processing of the task 1 to be processed is completed, the task 2 to be processed is processed. If the priority of the task to be processed 2 is high, the task 1 to be processed can be interrupted, and the task 2 to be processed can be processed; after the processing of the task 2 to be processed is completed, the task 1 to be processed is restored.
情况二、在通过目标图形处理器处理待处理任务时,如果待处理任务出现异常,则中断该待处理任务,提高该待处理任务的优先级,将该待处理任务缓存到处理队列中。由于该待处理任务的优先级被提高,使得该待处理任务能够被优先从处理队列中选取出来,避免该待处理任务出现等待超时。Case 2: When the pending task is processed by the target graphics processor, if the pending task is abnormal, the pending task is interrupted, the priority of the pending task is increased, and the to-be-processed task is cached in the processing queue. The priority of the to-be-processed task is increased, so that the to-be-processed task can be preferentially selected from the processing queue to prevent the pending task from waiting for timeout.
情况三、通过目标图形处理器并行处理分配给该目标图形处理器的多个待处理任务;其中,并行处理可以包括同步串行处理、内核异步处理。Case 3: A plurality of to-be-processed tasks assigned to the target graphics processor are processed in parallel by the target graphics processor; wherein the parallel processing may include synchronous serial processing and kernel asynchronous processing.
其中,在目标图形处理器处理多个待处理任务时,可以采用同步串行处理或者内核(kernel)异步处理等方式,对多个待处理任务进行并行处理,对此处理方式不做限制,从而可以有效的节省处理时间,并提高处理效率。Wherein, when the target graphics processor processes a plurality of to-be-processed tasks, the synchronous processing or the kernel asynchronous processing may be used to perform parallel processing on the plurality of to-be-processed tasks, and the processing manner is not limited. It can effectively save processing time and improve processing efficiency.
情况四、在目标图形处理器对待处理任务进行处理时,对中央处理器的地址进行锁存,并通过DMA控制器传输目标图形处理器与中央处理器之间的交互数据。例如,当待处理任务的数据量较大时,通过调用cudaHostRegister接口对中央处理器的地址进行页锁存,并通过DMA控制器实现中央处理器与图形处理器之间的数据交互,从而显著提高带宽,减少非必要的数据拷贝。Case 4: When the target graphics processor processes the processing task, the address of the central processing unit is latched, and the interaction data between the target graphics processor and the central processing unit is transmitted through the DMA controller. For example, when the amount of data of the task to be processed is large, the address of the central processing unit is page-locked by calling the cudaHostRegister interface, and the data interaction between the central processing unit and the graphics processor is realized by the DMA controller, thereby significantly improving Bandwidth, reducing unnecessary copies of data.
情况五、在目标图形处理器对待处理任务进行处理时,通过不同的处理线程进行数据共享。具体的,如果多个待处理任务由多个处理线程完成,可以进行处理线程之间的数据共享,避免经过多次内存与显存的拷贝,可以在处理线程内实现显存与内存的共享,省去拷贝过程,可以节省大量处理时间。Case 5: When the target graphics processor processes the processing task, data sharing is performed through different processing threads. Specifically, if multiple to-be-processed tasks are completed by multiple processing threads, data sharing between processing threads can be performed, and after multiple copies of memory and video memory are avoided, sharing of memory and memory can be realized in the processing thread, and the elimination is omitted. The copy process saves a lot of processing time.
本实施例中,目标图形处理器可以利用图像处理、特征点跟踪、半全局立体块匹配、雷达与相机自标定、点云跟踪、本地地图、深度学习等算法,对待处理任务进行处理,因此,处理线程可以包括:图像处理线程、特征点跟踪处理线程、半全局立体块匹配处理线程(Semi-Global Block Matching,简 称SGBM)、雷达与相机自标定处理线程(后续称为自校准处理线程)、点云跟踪处理线程、地图处理线程、深度学习处理线程(Deep Learning)。In this embodiment, the target graphics processor can process the processing task by using image processing, feature point tracking, semi-global stereo block matching, radar and camera self-calibration, point cloud tracking, local map, deep learning, etc., therefore, The processing thread may include: an image processing thread, a feature point tracking processing thread, a semi-global block matching processing thread (SBM), a radar and a camera self-calibration processing thread (hereinafter referred to as a self-calibration processing thread), Point cloud tracking processing thread, map processing thread, deep learning processing thread (Deep Learning).
其中,图像处理线程用于接收原始数据,并对原始数据进行处理,得到去畸变和极线矫正后的灰度图像和RGB图像。特征点跟踪处理线程用于对图像进行特征点检测和跟踪。半全局立体块匹配处理线程用于对双目图像进行匹配,得到视差图(disparity map),并计算出三维点云。自校准处理线程用于标定相机与相机,雷达与相机之间的外参。点云跟踪处理线程用于将三维点云分割为不同的物体,并进行目标跟踪和区域检测,将结果发送给地图处理线程。深度学习处理线程用于对RGB图像进行检测并跟踪目标,将结果发送给地图处理线程。地图处理线程用于接收点云跟踪处理线程的处理结果和深度学习处理线程的处理结果,并利用接收到的信息生成局部地图。The image processing thread is configured to receive the original data and process the original data to obtain a grayscale image and an RGB image after the distortion and epipolar correction. The feature point tracking processing thread is used to perform feature point detection and tracking on the image. The semi-global stereo block matching processing thread is used to match the binocular image, obtain a disparity map, and calculate a three-dimensional point cloud. The self-calibration processing thread is used to calibrate the external parameters between the camera and the camera, radar and camera. The point cloud tracking processing thread is used to split the 3D point cloud into different objects, perform target tracking and area detection, and send the result to the map processing thread. The deep learning processing thread is used to detect and track the RGB image and send the result to the map processing thread. The map processing thread is configured to receive the processing result of the point cloud tracking processing thread and the processing result of the deep learning processing thread, and generate the partial map by using the received information.
参见图4所示,为通过不同的处理线程进行数据共享的示意图。See Figure 4 for a schematic diagram of data sharing through different processing threads.
其中,图像处理线程对待处理任务进行处理时,还可以将输出数据提供给自校准处理线程,由所述自校准处理线程根据输出数据对待处理任务进行处理;或者,可以将输出数据提供给特征点跟踪处理线程,由所述特征点跟踪处理线程根据输出数据对待处理任务进行处理;或者,可以将输出数据提供给半全局立体块匹配处理线程,由所述半全局立体块匹配处理线程根据输出数据对待处理任务进行处理;或者,可以将输出数据提供给深度学习处理线程,由所述深度学习处理线程根据输出数据对待处理任务进行处理。Wherein, when the image processing thread processes the processing task, the output data may also be provided to the self-calibration processing thread, and the self-calibration processing thread processes the processing task according to the output data; or, the output data may be provided to the feature point. Tracking the processing thread, wherein the feature point tracking processing thread processes the processing task according to the output data; or, the output data may be provided to the semi-global stereo block matching processing thread, and the semi-global stereo block matching processing thread according to the output data The processing task is processed; or the output data can be provided to a deep learning processing thread, and the deep learning processing thread processes the processing task according to the output data.
其中,在半全局立体块匹配处理线程对待处理任务进行处理时,还可以将输出数据提供给点云跟踪处理线程,由所述点云跟踪处理线程根据输出数据对待处理任务进行处理。进一步的,在点云跟踪处理线程对待处理任务进行处理时,还可以将输出数据提供给地图处理线程;此外,在深度学习处理线程对待处理任务进行处理时,也可以将输出数据提供给地图处理线程;基于此,所述地图处理线程可以根据所述点云跟踪处理线程提供的输入数据和所述深度学习处理线程提供的输入数据,对待处理任务进行处理。Wherein, when the semi-global stereo block matching processing thread processes the processing task, the output data may also be provided to the point cloud tracking processing thread, and the point cloud tracking processing thread processes the processing task according to the output data. Further, when the point cloud tracking processing thread processes the processing task, the output data may also be provided to the map processing thread; in addition, when the deep learning processing thread processes the processing task, the output data may also be provided to the map processing. Threading; based on this, the map processing thread may process the processing task according to the input data provided by the point cloud tracking processing thread and the input data provided by the deep learning processing thread.
实施例八:Example 8:
基于与上述方法同样的发明构思,本发明实施例还提供一种任务处理设备,如图5所示,所述任务处理设备包括调度器和多个图形处理器;所述调度器,用于从处理队列中选择待处理任务,并从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器,并将所述待处理任务分配给所述目标图形处理器;所述目标图形处理器,用于处理所述待处理任务。Based on the same inventive concept as the above method, the embodiment of the present invention further provides a task processing device. As shown in FIG. 5, the task processing device includes a scheduler and a plurality of graphics processors; and the scheduler is configured to Selecting a task to be processed in the processing queue, and selecting at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors, and assigning the to-be-processed task to the target graphics processor; The target graphics processor is configured to process the to-be-processed task.
所述调度器从处理队列中选择待处理任务时具体用于:若处理队列是静态处理队列,则从所述静态处理队列中选择时间序列和/或数据具有依赖关系的待处理任务。When the scheduler selects a task to be processed from the processing queue, it is specifically used to: if the processing queue is a static processing queue, select a to-be-processed task whose time series and/or data has a dependency from the static processing queue.
所述调度器从处理队列中选择待处理任务时具体用于:若处理队列是动态处理队列,则从动态处理队列中选择时间序列和/或数据独立的待处理任务。When the scheduler selects a task to be processed from the processing queue, it is specifically used to: if the processing queue is a dynamic processing queue, select a time series and/or data independent pending task from the dynamic processing queue.
所述调度器还用于接收到待处理任务;若所述待处理任务是时间序列和/或数据具有依赖关系的静态调度任务,则将所述待处理任务缓存到静态处理队列;若所述待处理任务是时间序列和/或数据独立的动态调度任务,则将所述待处理任务缓存到动态处理队列。The scheduler is further configured to receive a task to be processed; if the task to be processed is a static scheduling task with a time series and/or data having a dependency, the task to be processed is cached into a static processing queue; The to-be-processed task is a time-series and/or data-independent dynamic scheduling task, and the to-be-processed task is cached into a dynamic processing queue.
所述调度器还用于接收待处理任务,将待处理任务缓存到任务队列;针对任务队列中的待处理任务,若待处理任务是时间序列和/或数据具有依赖关系的静态调度任务,将待处理任务缓存到静态处理队列;若待处理任务是时间序列和/或数据独立的动态调度任务,将待处理任务缓存到动态处理队列。The scheduler is further configured to receive a task to be processed, and cache the task to be processed into a task queue. For the task to be processed in the task queue, if the task to be processed is a static sequence task with time series and/or data having dependencies, The pending task is cached to the static processing queue; if the pending task is a time series and/or data independent dynamic scheduling task, the pending task is cached into the dynamic processing queue.
所述调度器从处理队列中选择待处理任务时用于:获取处理队列中待处理任务的优先级;基于优先级优先从处理队列中选择高优先级的待处理任务。When the scheduler selects a task to be processed from the processing queue, it is used to: obtain a priority of a task to be processed in the processing queue; and select a high priority task to be processed from the processing queue based on the priority priority.
所述调度器获取处理队列中待处理任务的优先级时具体用于:获取所述处理队列中的待处理任务的任务类型;通过所述任务类型查询映射表,得到所述任务类型对应的优先级;映射表用于记录任务类型与优先级的对应关系。When the scheduler obtains the priority of the task to be processed in the processing queue, the method is specifically configured to: obtain a task type of the task to be processed in the processing queue; and query the mapping table by using the task type to obtain a priority corresponding to the task type. Level; the mapping table is used to record the correspondence between task types and priorities.
在一个例子中,所述调度器从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器时具体用于:根据所述待处理任务的状态信息和/或图形处理器的资源信息,从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器。In an example, when the scheduler selects at least one target graphics processor corresponding to the to-be-processed task, the scheduler is specifically configured to: according to state information of the to-be-processed task and/or The resource information of the graphics processor, selecting at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors.
所述调度器还用于获取所述待处理任务的任务类型;通过所述任务类型查询状态信息表,得到与所述任务类型对应的状态信息;其中,所述状态信息表用于记录任务类型与状态信息的对应关系;将得到的状态信息确定为所述待处理任务的状态信息。The scheduler is further configured to acquire a task type of the to-be-processed task, and query the state information table by using the task type to obtain state information corresponding to the task type, where the state information table is used to record a task type. Corresponding relationship with the status information; determining the obtained status information as status information of the to-be-processed task.
在一个例子中,参见图4所示,所述任务处理设备还包括:监测器,用于获取已经完成处理的待处理任务的状态信息,并在状态信息表中记录所述待处理任务的任务类型与所述待处理任务的状态信息的对应关系,并将所述状态信息表发送给所述调度器。In an example, as shown in FIG. 4, the task processing device further includes: a monitor, configured to acquire state information of the to-be-processed task that has been processed, and record the task of the to-be-processed task in the state information table. Corresponding relationship between the type and the status information of the to-be-processed task, and sending the status information table to the scheduler.
在上述实施例中,所述状态信息包括任务处理时间;任务处理时间是任务完成时间与任务接收时间的时间差。In the above embodiment, the status information includes a task processing time; the task processing time is a time difference between the task completion time and the task receiving time.
所述监测器,还用于获取图形处理器的资源信息,将所述图形处理器的资源信息发送给所述调度器;所述调度器还用于获取图形处理器的资源信息;The monitor is further configured to acquire resource information of the graphics processor, and send the resource information of the graphics processor to the scheduler; the scheduler is further configured to acquire resource information of the graphics processor;
其中,所述资源信息包括:空闲处理线程的数量;和/或,处理线程的状态;所述状态为占用状态或者空闲状态。The resource information includes: the number of idle processing threads; and/or the state of the processing thread; the state is an occupied state or an idle state.
所述调度器根据所述待处理任务的状态信息和/或图形处理器的资源信息,从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器时具体用于:根据所述状态信息确定任务处理时间,针对任务处理时间大于时间阈值的多个待处理任务,为不同的待处理任务选择不同的目标图形处理器。The scheduler specifically uses at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors according to state information of the to-be-processed task and/or resource information of the graphics processor The task processing time is determined according to the state information, and different target graphics processors are selected for different to-be-processed tasks for the plurality of to-be-processed tasks whose task processing time is greater than the time threshold.
所述调度器根据所述待处理任务的状态信息和/或图形处理器的资源信息,从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器时具体用于:根据所述资源信息确定空闲处理线程的数量,并从多个图形处理器中选择空闲处理线程最多的图形处理器作为目标图形处理器。The scheduler specifically uses at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors according to state information of the to-be-processed task and/or resource information of the graphics processor And determining the number of idle processing threads according to the resource information, and selecting a graphics processor with the largest idle processing thread from the plurality of graphics processors as the target graphics processor.
所述调度器根据所述待处理任务的状态信息和/或图形处理器的资源信息,从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器时具体用于:根据所述资源信息确定空闲处理线程,并将所述空闲处理线程对应的图形处理器确定为所述待处理任务对应的目标图形处理器。The scheduler specifically uses at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors according to state information of the to-be-processed task and/or resource information of the graphics processor And determining, by the resource information, an idle processing thread, and determining, by the graphics processor corresponding to the idle processing thread, a target graphics processor corresponding to the to-be-processed task.
在一个例子中,所述调度器从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器时具体用于:针对并行处理的多个待处理任务,为所述多个待处理任务分别选择对应的目标图形处理器,不同待处理任务对应的目标图形处理器相同或者不同。In an example, when the scheduler selects at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors, specifically, the plurality of to-be-processed tasks for parallel processing are The plurality of to-be-processed tasks respectively select corresponding target graphics processors, and the target graphics processors corresponding to different to-be-processed tasks are the same or different.
所述调度器从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器时具体用于:若所述处理队列是静态处理队列,则从所述多个图形处理器中查询与所述静态处理队列对应的图形处理器,并将查询到的图形处理器确定为所述待处理任务对应的目标图形处理器。When the scheduler selects at least one target graphics processor corresponding to the to-be-processed task, the scheduler is specifically configured to: if the processing queue is a static processing queue, from the multiple graphics The processor is configured to query the graphics processor corresponding to the static processing queue, and determine the queried processor as the target graphics processor corresponding to the to-be-processed task.
所述目标图形处理器处理所述待处理任务时具体用于:若存在比所述待处理任务的优先级更高的待处理任务,则中断所述待处理任务,并处理所述优先级更高的待处理任务;在所述优先级更高的待处理任务处理完成后,恢复所述待处理任务。所述目标图形处理器处理所述待处理任务时具体用于:如果所述待处理任务出现异常,则中断所述待处理任务,并提高所述待处理任务的优先级,并将所述待处理任务缓存到所述处理队列。所述目标图形处理器处理所述待处理任务时具体用于:并行处理分配给所述目标图形处理器的多个待处理任务;其中,所述并行处理包括同步串行处理、内核异步处理。When the target graphics processor processes the to-be-processed task, the method is specifically configured to: if there is a task to be processed that has a higher priority than the to-be-processed task, interrupt the to-be-processed task, and process the priority A high to-be-processed task; after the processing of the higher-priority pending task is completed, the to-be-processed task is restored. The target graphics processor is configured to: when the task to be processed is abnormal, interrupt the to-be-processed task, and increase the priority of the to-be-processed task, and The processing task is cached to the processing queue. The target graphics processor is specifically configured to: parallel process a plurality of to-be-processed tasks allocated to the target graphics processor; wherein the parallel processing includes synchronous serial processing and kernel asynchronous processing.
所述目标图形处理器处理所述待处理任务时具体用于:在对所述待处理任务进行处理时,对中央处理器的地址进行锁存,通过DMA控制器传输所述目标图形处理器与中央处理器之间的交互数据。When the target graphics processor processes the to-be-processed task, the method is specifically configured to: when processing the to-be-processed task, latching an address of the central processing unit, and transmitting, by using a DMA controller, the target graphics processor and Interaction data between central processors.
所述目标图形处理器处理所述待处理任务时具体用于:在对所述待处理任务进行处理时,通过不同的处理线程进行数据共享。When the target graphics processor processes the to-be-processed task, the data processing is performed by using different processing threads when processing the to-be-processed task.
所述目标图形处理器通过不同的处理线程进行数据共享时具体用于:When the target graphics processor performs data sharing through different processing threads, it is specifically used for:
在图像处理线程对待处理任务进行处理时,将输出数据提供给自校准处理线程,由自校准处理线程根据所述输出数据对待处理任务进行处理;或者,When the image processing thread processes the processing task, the output data is provided to the self-calibration processing thread, and the self-calibration processing thread processes the processing task according to the output data; or
将输出数据提供给特征点跟踪处理线程,由所述特征点跟踪处理线程根据所述输出数据对待处理任务进行处理;或者,Providing the output data to the feature point tracking processing thread, wherein the feature point tracking processing thread processes the processing task according to the output data; or
将输出数据提供给半全局立体块匹配处理线程,由所述半全局立体块匹 配处理线程根据所述输出数据对待处理任务进行处理;或者,Providing output data to a semi-global stereo block matching processing thread, wherein the semi-global stereo block matching processing thread processes the processing task according to the output data; or
将输出数据提供给深度学习处理线程,由所述深度学习处理线程根据所述输出数据对待处理任务进行处理。The output data is provided to a deep learning processing thread, and the deep learning processing thread processes the processing task according to the output data.
所述目标图形处理器通过不同的处理线程进行数据共享时具体用于:在半全局立体块匹配处理线程对待处理任务进行处理时,将输出数据提供给点云跟踪处理线程,由点云跟踪处理线程根据所述输出数据对待处理任务进行处理。所述目标图形处理器通过不同的处理线程进行数据共享时具体用于:在点云跟踪处理线程对待处理任务进行处理时,将输出数据提供给地图处理线程;在深度学习处理线程对待处理任务进行处理时,将输出数据提供给地图处理线程;所述地图处理线程根据点云跟踪处理线程提供的输入数据和深度学习处理线程提供的输入数据,对待处理任务进行处理。When the target graphics processor performs data sharing through different processing threads, the method is specifically configured to: when the semi-global stereo block matching processing thread processes the processing task, provide the output data to the point cloud tracking processing thread, and the point cloud tracking processing is performed. The thread processes the processing task according to the output data. When the target graphics processor performs data sharing through different processing threads, the specific data is used to: when the point cloud tracking processing thread processes the processing task, the output data is provided to the map processing thread; and the deep learning processing thread performs the processing task. During processing, the output data is provided to the map processing thread; the map processing thread processes the processing task according to the input data provided by the point cloud tracking processing thread and the input data provided by the deep learning processing thread.
实施例九:Example 9:
基于与上述方法同样的发明构思,本发明实施例中还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有若干计算机指令,所述计算机指令被执行时,实现权利要求上述的任务处理方法。Based on the same inventive concept as the above method, the embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores a plurality of computer instructions, and when the computer instructions are executed, implementing the above claims Task processing method.
上述实施例阐明的系统、装置、模块或单元,可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。The system, apparatus, module or unit set forth in the above embodiments may be implemented by a computer chip or an entity, or by a product having a certain function. A typical implementation device is a computer, and the specific form of the computer may be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email transceiver, and a game control. A combination of a tablet, a tablet, a wearable device, or any of these devices.
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本发明时可以把各单元的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, the above devices are described separately by function into various units. Of course, the functions of the various units may be implemented in one or more software and/or hardware in the practice of the invention.
本领域内的技术人员应明白,本发明实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于 磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, embodiments of the invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可以由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其它可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其它可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.
而且,这些计算机程序指令也可以存储在能引导计算机或其它可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或者多个流程和/或方框图一个方框或者多个方框中指定的功能。Moreover, these computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The instruction means implements the functions specified in one or more blocks of the flowchart or in a flow or block diagram of the flowchart.
这些计算机程序指令也可装载到计算机或其它可编程数据处理设备,使得在计算机或者其它可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其它可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The steps are provided to implement the functions specified in one or more blocks of the flowchart or in a block or blocks of the flowchart.
本领域技术人员应明白,本发明的实施例可提供为方法、系统或计算机程序产品。因此,本发明可以采用完全硬件实施例、完全软件实施例、或者结合软件和硬件方面的实施例的形式。而且,本发明可以采用在一个或者多个其中包含有计算机可用程序代码的计算机可用存储介质(可以包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can be embodied in the form of a computer program product embodied on one or more computer-usable storage media (which may include, but not limited to, disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
以上所述仅为本发明实施例而已,并不用于限制本发明。对于本领域技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原理之内所作的任何修改、等同替换、改进,均应包含在本发明的权利要求范围之内。The above is only the embodiments of the present invention and is not intended to limit the present invention. It will be apparent to those skilled in the art that various modifications and changes can be made in the present invention. Any modifications, equivalents, and improvements made within the spirit and scope of the invention are intended to be included within the scope of the appended claims.

Claims (51)

  1. 一种任务处理方法,其特征在于,应用于包括多个图形处理器的设备,所述方法包括:A task processing method, characterized by being applied to a device including a plurality of graphics processors, the method comprising:
    从处理队列中选择待处理任务;Select a pending task from the processing queue;
    从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器;将所述待处理任务分配给所述目标图形处理器;Selecting at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors; assigning the to-be-processed task to the target graphics processor;
    通过所述目标图形处理器处理所述待处理任务。The to-be-processed task is processed by the target graphics processor.
  2. 根据权利要求1所述的方法,其特征在于,The method of claim 1 wherein
    所述从处理队列中选择待处理任务,包括:The selecting a task to be processed from the processing queue includes:
    若所述处理队列是静态处理队列,则从所述静态处理队列中选择时间序列和/或数据具有依赖关系的待处理任务。If the processing queue is a static processing queue, a task to be processed whose time series and/or data has a dependency is selected from the static processing queue.
  3. 根据权利要求1所述的方法,其特征在于,The method of claim 1 wherein
    所述从处理队列中选择待处理任务,包括:若所述处理队列是动态处理队列,则从所述动态处理队列中选择时间序列和/或数据独立的待处理任务。The selecting a task to be processed from the processing queue includes: selecting a time series and/or data independent pending task from the dynamic processing queue if the processing queue is a dynamic processing queue.
  4. 根据权利要求2或3所述的方法,其特征在于,Method according to claim 2 or 3, characterized in that
    所述从处理队列中选择待处理任务之前,还包括:Before the task to be processed is selected from the processing queue, the method further includes:
    接收到待处理任务;Receiving a pending task;
    若所述待处理任务是时间序列和/或数据具有依赖关系的静态调度任务,将所述待处理任务缓存到静态处理队列;If the to-be-processed task is a static scheduling task whose time series and/or data has a dependency relationship, the to-be-processed task is cached into a static processing queue;
    若所述待处理任务是时间序列和/或数据独立的动态调度任务,将所述待处理任务缓存到动态处理队列。If the to-be-processed task is a time-series and/or data-independent dynamic scheduling task, the to-be-processed task is cached into a dynamic processing queue.
  5. 根据权利要求2或3所述的方法,其特征在于,Method according to claim 2 or 3, characterized in that
    所述从处理队列中选择待处理任务之前,还包括:Before the task to be processed is selected from the processing queue, the method further includes:
    接收待处理任务,将所述待处理任务缓存到任务队列;Receiving a task to be processed, and buffering the to-be-processed task to a task queue;
    针对任务队列中的待处理任务,若所述待处理任务是时间序列和/或数据具有依赖关系的静态调度任务,则将所述待处理任务缓存到静态处理队列;For the task to be processed in the task queue, if the to-be-processed task is a static scheduling task with a time series and/or data having a dependency, the task to be processed is cached into a static processing queue;
    若所述待处理任务是时间序列和/或数据独立的动态调度任务,则将所述 待处理任务缓存到动态处理队列。If the to-be-processed task is a time-series and/or data-independent dynamic scheduling task, the to-be-processed task is cached to a dynamic processing queue.
  6. 根据权利要求1所述的方法,其特征在于,The method of claim 1 wherein
    所述从处理队列中选择待处理任务,包括:The selecting a task to be processed from the processing queue includes:
    获取所述处理队列中的待处理任务的优先级;Obtaining a priority of the to-be-processed task in the processing queue;
    基于所述优先级优先从所述处理队列中选择高优先级的待处理任务。A high priority pending task is selected from the processing queue based on the priority priority.
  7. 根据权利要求6所述的方法,其特征在于,The method of claim 6 wherein:
    所述获取所述处理队列中的待处理任务的优先级,包括:The obtaining the priority of the to-be-processed task in the processing queue includes:
    获取所述处理队列中的待处理任务的任务类型;Obtaining a task type of the to-be-processed task in the processing queue;
    通过所述任务类型查询映射表,得到所述任务类型对应的优先级;Querying the mapping table by using the task type to obtain a priority corresponding to the task type;
    其中,所述映射表用于记录任务类型与优先级的对应关系。The mapping table is used to record the correspondence between the task type and the priority.
  8. 根据权利要求1所述的方法,其特征在于,所述从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器,包括:The method according to claim 1, wherein the selecting at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors comprises:
    根据所述待处理任务的状态信息和/或图形处理器的资源信息,从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器。And selecting at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors according to status information of the to-be-processed task and/or resource information of the graphics processor.
  9. 根据权利要求8所述的方法,其特征在于,所述根据所述待处理任务的状态信息和/或图形处理器的资源信息,从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器之前,还包括:The method according to claim 8, wherein the selecting and the to-be-processed task are selected from the plurality of graphics processors according to status information of the to-be-processed task and/or resource information of a graphics processor Before corresponding to at least one target graphics processor, the method further includes:
    获取所述待处理任务的任务类型;Obtaining a task type of the to-be-processed task;
    通过所述任务类型查询状态信息表,得到与所述任务类型对应的状态信息;其中,所述状态信息表用于记录任务类型与状态信息的对应关系;Obtaining state information corresponding to the task type by using the task type to query the state information table, where the state information table is used to record a correspondence between the task type and the state information;
    将得到的状态信息确定为所述待处理任务的状态信息。The obtained status information is determined as status information of the to-be-processed task.
  10. 根据权利要求9所述的方法,其特征在于,所述通过所述任务类型查询状态信息表,得到与所述任务类型对应的状态信息之前,还包括:The method according to claim 9, wherein before the obtaining the status information corresponding to the task type by using the task type to query the status information table, the method further includes:
    获取已经完成处理的待处理任务的状态信息,并在状态信息表中记录所述待处理任务的任务类型与所述待处理任务的状态信息的对应关系。The status information of the to-be-processed task that has been processed is obtained, and the correspondence between the task type of the to-be-processed task and the status information of the to-be-processed task is recorded in the status information table.
  11. 根据权利要求8所述的方法,其特征在于,所述状态信息包括任务处理时间;任务处理时间是任务完成时间与任务接收时间的时间差。The method according to claim 8, wherein the status information comprises a task processing time; the task processing time is a time difference between the task completion time and the task receiving time.
  12. 根据权利要求8所述的方法,其特征在于,所述根据所述待处理任务的状态信息和/或图形处理器的资源信息,从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器之前,还包括:The method according to claim 8, wherein the selecting and the to-be-processed task are selected from the plurality of graphics processors according to status information of the to-be-processed task and/or resource information of a graphics processor Before corresponding to at least one target graphics processor, the method further includes:
    获取图形处理器的资源信息;其中,所述资源信息包括:空闲处理线程的数量;和/或,处理线程的状态;所述状态为占用状态或者空闲状态。Obtaining resource information of the graphics processor; wherein the resource information includes: a number of idle processing threads; and/or a state of processing a thread; the state is an occupied state or an idle state.
  13. 根据权利要求8所述的方法,其特征在于,所述根据所述待处理任务的状态信息和/或图形处理器的资源信息,从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器,包括:The method according to claim 8, wherein the selecting and the to-be-processed task are selected from the plurality of graphics processors according to status information of the to-be-processed task and/or resource information of a graphics processor Corresponding at least one target graphics processor, including:
    根据所述状态信息确定任务处理时间,针对任务处理时间大于时间阈值的多个待处理任务,为不同的待处理任务选择不同的目标图形处理器。The task processing time is determined according to the state information, and for the plurality of to-be-processed tasks whose task processing time is greater than the time threshold, different target graphics processors are selected for different to-be-processed tasks.
  14. 根据权利要求8所述的方法,其特征在于,所述根据所述待处理任务的状态信息和/或图形处理器的资源信息,从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器,包括:The method according to claim 8, wherein the selecting and the to-be-processed task are selected from the plurality of graphics processors according to status information of the to-be-processed task and/or resource information of a graphics processor Corresponding at least one target graphics processor, including:
    根据所述资源信息确定空闲处理线程的数量,并从所述多个图形处理器中选择空闲处理线程最多的图形处理器作为所述目标图形处理器。Determining the number of idle processing threads according to the resource information, and selecting a graphics processor with the largest idle processing thread from the plurality of graphics processors as the target graphics processor.
  15. 根据权利要求8所述的方法,其特征在于,所述根据所述待处理任务的状态信息和/或图形处理器的资源信息,从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器,包括:The method according to claim 8, wherein the selecting and the to-be-processed task are selected from the plurality of graphics processors according to status information of the to-be-processed task and/or resource information of a graphics processor Corresponding at least one target graphics processor, including:
    根据所述资源信息确定空闲处理线程,并将所述空闲处理线程对应的图形处理器确定为所述待处理任务对应的目标图形处理器。Determining an idle processing thread according to the resource information, and determining a graphics processor corresponding to the idle processing thread as a target graphics processor corresponding to the to-be-processed task.
  16. 根据权利要求1所述的方法,其特征在于,所述从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器,包括:The method according to claim 1, wherein the selecting at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors comprises:
    针对并行处理的多个待处理任务,为所述多个待处理任务分别选择对应的目标图形处理器,不同待处理任务对应的目标图形处理器相同或者不同。For the plurality of to-be-processed tasks to be processed in parallel, the corresponding target graphics processors are respectively selected for the plurality of to-be-processed tasks, and the target graphics processors corresponding to the different to-be-processed tasks are the same or different.
  17. 根据权利要求1所述的方法,其特征在于,所述从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器,包括:The method according to claim 1, wherein the selecting at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors comprises:
    若所述处理队列是静态处理队列,则从所述多个图形处理器中查询与所 述静态处理队列对应的图形处理器,并将查询到的图形处理器确定为所述待处理任务对应的目标图形处理器。If the processing queue is a static processing queue, query a graphics processor corresponding to the static processing queue from the plurality of graphics processors, and determine the queried graphics processor as corresponding to the to-be-processed task Target graphics processor.
  18. 根据权利要求1所述的方法,其特征在于,在通过所述目标图形处理器处理所述待处理任务时,所述方法还包括:The method of claim 1, wherein when the object to be processed is processed by the target graphics processor, the method further comprises:
    若存在比所述待处理任务的优先级更高的待处理任务,则中断所述待处理任务,通过所述目标图形处理器处理所述优先级更高的待处理任务;If there is a task to be processed that has a higher priority than the to-be-processed task, interrupting the to-be-processed task, and processing the higher-priority to-be-processed task by the target graphics processor;
    在所述优先级更高的待处理任务处理完成后,恢复所述待处理任务。After the processing of the higher priority pending task is completed, the to-be-processed task is restored.
  19. 根据权利要求1所述的方法,其特征在于,在通过所述目标图形处理器处理所述待处理任务时,所述方法还包括:The method of claim 1, wherein when the object to be processed is processed by the target graphics processor, the method further comprises:
    如果所述待处理任务出现异常,则中断所述待处理任务,并提高所述待处理任务的优先级,并将所述待处理任务缓存到所述处理队列中。And if the pending task is abnormal, interrupting the to-be-processed task, and increasing a priority of the to-be-processed task, and buffering the to-be-processed task into the processing queue.
  20. 根据权利要求1所述的方法,其特征在于,所述通过所述目标图形处理器处理所述待处理任务,包括:The method according to claim 1, wherein the processing the to-be-processed task by the target graphics processor comprises:
    通过所述目标图形处理器并行处理分配给所述目标图形处理器的多个待处理任务;其中,所述并行处理包括同步串行处理、内核异步处理。And processing, by the target graphics processor, a plurality of to-be-processed tasks allocated to the target graphics processor; wherein the parallel processing comprises synchronous serial processing and kernel asynchronous processing.
  21. 根据权利要求1所述的方法,其特征在于,所述通过所述目标图形处理器处理所述待处理任务,包括:The method according to claim 1, wherein the processing the to-be-processed task by the target graphics processor comprises:
    在对所述待处理任务进行处理时,对中央处理器的地址进行锁存,并通过DMA控制器传输所述目标图形处理器与所述中央处理器之间的交互数据。When processing the to-be-processed task, the address of the central processing unit is latched, and the interaction data between the target graphics processor and the central processing unit is transmitted by the DMA controller.
  22. 根据权利要求1所述的方法,其特征在于,所述通过所述目标图形处理器处理所述待处理任务,包括:The method according to claim 1, wherein the processing the to-be-processed task by the target graphics processor comprises:
    在对所述待处理任务进行处理时,通过不同的处理线程进行数据共享。When the pending task is processed, data sharing is performed through different processing threads.
  23. 根据权利要求22所述的方法,其特征在于,所述通过不同的处理线程进行数据共享,包括:The method according to claim 22, wherein said sharing data by different processing threads comprises:
    在图像处理线程对待处理任务进行处理时,将输出数据提供给自校准处理线程,由自校准处理线程根据所述输出数据对待处理任务进行处理;或者,When the image processing thread processes the processing task, the output data is provided to the self-calibration processing thread, and the self-calibration processing thread processes the processing task according to the output data; or
    将输出数据提供给特征点跟踪处理线程,由所述特征点跟踪处理线程根 据所述输出数据对待处理任务进行处理;或者,Providing the output data to the feature point tracking processing thread, wherein the feature point tracking processing thread processes the processing task according to the output data; or
    将输出数据提供给半全局立体块匹配处理线程,由所述半全局立体块匹配处理线程根据所述输出数据对待处理任务进行处理;或者,Providing the output data to the semi-global stereo block matching processing thread, where the semi-global stereo block matching processing thread processes the processing task according to the output data; or
    将输出数据提供给深度学习处理线程,由所述深度学习处理线程根据所述输出数据对待处理任务进行处理。The output data is provided to a deep learning processing thread, and the deep learning processing thread processes the processing task according to the output data.
  24. 根据权利要求22所述的方法,其特征在于,所述通过不同的处理线程进行数据共享,包括:在半全局立体块匹配处理线程对待处理任务进行处理时,将输出数据提供给点云跟踪处理线程,由所述点云跟踪处理线程根据所述输出数据对待处理任务进行处理。The method according to claim 22, wherein said sharing data by different processing threads comprises: providing output data to point cloud tracking processing when the semi-global stereo block matching processing thread processes the processing task a thread, by which the point cloud tracking processing thread processes the processing task according to the output data.
  25. 根据权利要求22所述的方法,其特征在于,所述通过不同的处理线程进行数据共享,包括:The method according to claim 22, wherein said sharing data by different processing threads comprises:
    在点云跟踪处理线程对待处理任务进行处理时,将输出数据提供给地图处理线程;在深度学习处理线程对待处理任务进行处理时,将输出数据提供给地图处理线程;所述地图处理线程根据所述点云跟踪处理线程提供的输入数据和所述深度学习处理线程提供的输入数据,对待处理任务进行处理。When the point cloud tracking processing thread processes the processing task, the output data is provided to the map processing thread; when the deep learning processing thread processes the processing task, the output data is provided to the map processing thread; the map processing thread according to the The point cloud tracking processing thread provides input data and input data provided by the deep learning processing thread to process the processing task.
  26. 一种任务处理设备,其特征在于,包括调度器和多个图形处理器;A task processing device, comprising: a scheduler and a plurality of graphics processors;
    所述调度器,用于从处理队列中选择待处理任务,并从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器,并将所述待处理任务分配给所述目标图形处理器;The scheduler is configured to select a task to be processed from a processing queue, and select at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors, and allocate the to-be-processed task Giving the target graphics processor;
    所述目标图形处理器,用于处理所述待处理任务。The target graphics processor is configured to process the to-be-processed task.
  27. 根据权利要求26所述的设备,其特征在于,所述调度器从处理队列中选择待处理任务时具体用于:若处理队列是静态处理队列,则从所述静态处理队列中选择时间序列和/或数据具有依赖关系的待处理任务。The device according to claim 26, wherein the scheduler selects a task to be processed from the processing queue, specifically: if the processing queue is a static processing queue, selecting a time series from the static processing queue / or data to be processed with dependencies.
  28. 根据权利要求26所述的设备,其特征在于,所述调度器从处理队列中选择待处理任务时具体用于:若处理队列是动态处理队列,则从所述动态处理队列中选择时间序列和/或数据独立的待处理任务。The device according to claim 26, wherein the scheduler selects a task to be processed from the processing queue, specifically: if the processing queue is a dynamic processing queue, selecting a time series from the dynamic processing queue / or data independent pending tasks.
  29. 根据权利要求27或28所述的设备,其特征在于,Device according to claim 27 or 28, characterized in that
    所述调度器还用于接收到待处理任务;若所述待处理任务是时间序列和/或数据具有依赖关系的静态调度任务,则将所述待处理任务缓存到静态处理队列;若所述待处理任务是时间序列和/或数据独立的动态调度任务,则将所述待处理任务缓存到动态处理队列。The scheduler is further configured to receive a task to be processed; if the task to be processed is a static scheduling task with a time series and/or data having a dependency, the task to be processed is cached into a static processing queue; The to-be-processed task is a time-series and/or data-independent dynamic scheduling task, and the to-be-processed task is cached into a dynamic processing queue.
  30. 根据权利要求27或28所述的设备,其特征在于,所述调度器还用于接收待处理任务,将所述待处理任务缓存到任务队列;针对任务队列中的待处理任务,若所述待处理任务是时间序列和/或数据具有依赖关系的静态调度任务,将所述待处理任务缓存到静态处理队列;若所述待处理任务是时间序列和/或数据独立的动态调度任务,将所述待处理任务缓存到动态处理队列。The device according to claim 27 or 28, wherein the scheduler is further configured to receive a task to be processed, and cache the to-be-processed task to a task queue; The to-be-processed task is a static scheduling task whose time series and/or data has a dependency, and caches the to-be-processed task to a static processing queue; if the to-be-processed task is a time-series and/or data-independent dynamic scheduling task, The to-be-processed task is cached to a dynamic processing queue.
  31. 根据权利要求26所述的设备,其特征在于,所述调度器从处理队列中选择待处理任务时具体用于:获取所述处理队列中的待处理任务的优先级;基于所述优先级优先从所述处理队列中选择高优先级的待处理任务。The device according to claim 26, wherein the scheduler selects a task to be processed from a processing queue, specifically: acquiring a priority of a task to be processed in the processing queue; prioritizing based on the priority A high priority pending task is selected from the processing queue.
  32. 根据权利要求31所述的设备,其特征在于,所述调度器获取所述处理队列中的待处理任务的优先级时具体用于:获取所述处理队列中的待处理任务的任务类型;通过所述任务类型查询映射表,得到所述任务类型对应的优先级;其中,所述映射表用于记录任务类型与优先级的对应关系。The device according to claim 31, wherein the scheduler obtains a priority of a task to be processed in the processing queue, and is configured to: acquire a task type of a task to be processed in the processing queue; The task type query mapping table obtains a priority corresponding to the task type; wherein the mapping table is used to record a correspondence between a task type and a priority.
  33. 根据权利要求26所述的设备,其特征在于,所述调度器从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器时具体用于:根据所述待处理任务的状态信息和/或图形处理器的资源信息,从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器。The device according to claim 26, wherein the scheduler selects at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors, specifically for: according to the waiting Processing status information of the task and/or resource information of the graphics processor, and selecting at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors.
  34. 根据权利要求33所述的设备,其特征在于,所述调度器还用于获取所述待处理任务的任务类型;通过所述任务类型查询状态信息表,得到与所述任务类型对应的状态信息;其中,所述状态信息表用于记录任务类型与状态信息的对应关系;将得到的状态信息确定为所述待处理任务的状态信息。The device according to claim 33, wherein the scheduler is further configured to acquire a task type of the task to be processed, and query the state information table by using the task type to obtain state information corresponding to the task type. The status information table is used to record the correspondence between the task type and the status information; and the obtained status information is determined as the status information of the to-be-processed task.
  35. 根据权利要求34所述的设备,其特征在于,还包括:The device according to claim 34, further comprising:
    监测器,用于获取已经完成处理的待处理任务的状态信息,并在状态信 息表中记录所述待处理任务的任务类型与所述待处理任务的状态信息的对应关系,并将所述状态信息表发送给所述调度器。a monitor, configured to acquire status information of the to-be-processed task that has been processed, and record a correspondence between the task type of the to-be-processed task and the status information of the to-be-processed task in the status information table, and the status is The information table is sent to the scheduler.
  36. 根据权利要求33所述的设备,其特征在于,所述状态信息包括任务处理时间;任务处理时间是任务完成时间与任务接收时间的时间差。The device according to claim 33, wherein said status information comprises a task processing time; and the task processing time is a time difference between the task completion time and the task receiving time.
  37. 根据权利要求33所述的设备,其特征在于,还包括监测器,用于获取图形处理器的资源信息,将所述图形处理器的资源信息发送给所述调度器;The device according to claim 33, further comprising: a monitor, configured to acquire resource information of the graphics processor, and send resource information of the graphics processor to the scheduler;
    所述调度器还用于获取图形处理器的资源信息;The scheduler is further configured to acquire resource information of the graphics processor;
    其中,所述资源信息包括:空闲处理线程的数量;和/或,处理线程的状态;所述状态为占用状态或者空闲状态。The resource information includes: the number of idle processing threads; and/or the state of the processing thread; the state is an occupied state or an idle state.
  38. 根据权利要求33所述的设备,其特征在于,所述调度器根据所述待处理任务的状态信息和/或图形处理器的资源信息,从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器时具体用于:The device according to claim 33, wherein the scheduler selects and waits from the plurality of graphics processors according to status information of the to-be-processed task and/or resource information of a graphics processor. When processing at least one target graphics processor corresponding to the task, it is specifically used to:
    根据所述状态信息确定任务处理时间,针对任务处理时间大于时间阈值的多个待处理任务,为不同的待处理任务选择不同的目标图形处理器。The task processing time is determined according to the state information, and for the plurality of to-be-processed tasks whose task processing time is greater than the time threshold, different target graphics processors are selected for different to-be-processed tasks.
  39. 根据权利要求33所述的设备,其特征在于,所述调度器根据所述待处理任务的状态信息和/或图形处理器的资源信息,从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器时具体用于:The device according to claim 33, wherein the scheduler selects and waits from the plurality of graphics processors according to status information of the to-be-processed task and/or resource information of a graphics processor. When processing at least one target graphics processor corresponding to the task, it is specifically used to:
    根据所述资源信息确定空闲处理线程的数量,并从所述多个图形处理器中选择空闲处理线程最多的图形处理器作为所述目标图形处理器。Determining the number of idle processing threads according to the resource information, and selecting a graphics processor with the largest idle processing thread from the plurality of graphics processors as the target graphics processor.
  40. 根据权利要求33所述的设备,其特征在于,所述调度器根据所述待处理任务的状态信息和/或图形处理器的资源信息,从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器时具体用于:The device according to claim 33, wherein the scheduler selects and waits from the plurality of graphics processors according to status information of the to-be-processed task and/or resource information of a graphics processor. When processing at least one target graphics processor corresponding to the task, it is specifically used to:
    根据所述资源信息确定空闲处理线程,并将所述空闲处理线程对应的图形处理器确定为所述待处理任务对应的目标图形处理器。Determining an idle processing thread according to the resource information, and determining a graphics processor corresponding to the idle processing thread as a target graphics processor corresponding to the to-be-processed task.
  41. 根据权利要求26所述的设备,其特征在于,所述调度器从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器时具体用于:针对并行处理的多个待处理任务,为所述多个待处理任务分别选择对 应的目标图形处理器,不同待处理任务对应的目标图形处理器相同或者不同。The device according to claim 26, wherein the scheduler selects at least one target graphics processor corresponding to the to-be-processed task from the plurality of graphics processors, specifically for: parallel processing The plurality of to-be-processed tasks respectively select corresponding target graphics processors for the plurality of to-be-processed tasks, and the target graphics processors corresponding to the different to-be-processed tasks are the same or different.
  42. 根据权利要求26所述的设备,其特征在于,The device according to claim 26, wherein
    所述调度器从所述多个图形处理器中选择与所述待处理任务对应的至少一个目标图形处理器时具体用于:若所述处理队列是静态处理队列,则从所述多个图形处理器中查询与所述静态处理队列对应的图形处理器,并将查询到的图形处理器确定为所述待处理任务对应的目标图形处理器。When the scheduler selects at least one target graphics processor corresponding to the to-be-processed task, the scheduler is specifically configured to: if the processing queue is a static processing queue, from the multiple graphics The processor is configured to query the graphics processor corresponding to the static processing queue, and determine the queried processor as the target graphics processor corresponding to the to-be-processed task.
  43. 根据权利要求26所述的设备,其特征在于,所述目标图形处理器处理所述待处理任务时具体用于:若存在比所述待处理任务的优先级更高的待处理任务,则中断所述待处理任务,并处理所述优先级更高的待处理任务;在所述优先级更高的待处理任务处理完成后,恢复所述待处理任务。The device according to claim 26, wherein the target graphics processor is configured to: when the task to be processed is processed, if there is a task to be processed having a higher priority than the task to be processed, interrupting And the task to be processed is processed, and the task to be processed with a higher priority is processed; after the processing of the higher priority task is completed, the task to be processed is restored.
  44. 根据权利要求26所述的设备,其特征在于,The device according to claim 26, wherein
    所述目标图形处理器处理所述待处理任务时具体用于:如果所述待处理任务出现异常,则中断所述待处理任务,并提高所述待处理任务的优先级,并将所述待处理任务缓存到所述处理队列中。The target graphics processor is configured to: when the task to be processed is abnormal, interrupt the to-be-processed task, and increase the priority of the to-be-processed task, and The processing task is cached into the processing queue.
  45. 根据权利要求26所述的设备,其特征在于,所述目标图形处理器处理所述待处理任务时具体用于:并行处理分配给所述目标图形处理器的多个待处理任务;其中,所述并行处理包括同步串行处理、内核异步处理。The device according to claim 26, wherein the target graphics processor is configured to: process, in parallel, a plurality of to-be-processed tasks assigned to the target graphics processor when processing the to-be-processed task; Parallel processing includes synchronous serial processing and kernel asynchronous processing.
  46. 根据权利要求26所述的设备,其特征在于,The device according to claim 26, wherein
    所述目标图形处理器处理所述待处理任务时具体用于:在对所述待处理任务进行处理时,对中央处理器的地址进行锁存,通过DMA控制器传输所述目标图形处理器与中央处理器之间的交互数据。When the target graphics processor processes the to-be-processed task, the method is specifically configured to: when processing the to-be-processed task, latching an address of the central processing unit, and transmitting, by using a DMA controller, the target graphics processor and Interaction data between central processors.
  47. 根据权利要求26所述的设备,其特征在于,The device according to claim 26, wherein
    所述目标图形处理器处理所述待处理任务时具体用于:在对所述待处理任务进行处理时,通过不同的处理线程进行数据共享。When the target graphics processor processes the to-be-processed task, the data processing is performed by using different processing threads when processing the to-be-processed task.
  48. 根据权利要求47所述的设备,其特征在于,The device according to claim 47, wherein
    所述目标图形处理器通过不同的处理线程进行数据共享时具体用于:When the target graphics processor performs data sharing through different processing threads, it is specifically used for:
    在图像处理线程对待处理任务进行处理时,将输出数据提供给自校准处 理线程,由自校准处理线程根据所述输出数据对待处理任务进行处理;或者,When the image processing thread processes the processing task, the output data is provided to the self-calibration processing thread, and the self-calibration processing thread processes the processing task according to the output data; or
    将输出数据提供给特征点跟踪处理线程,由所述特征点跟踪处理线程根据所述输出数据对待处理任务进行处理;或者,Providing the output data to the feature point tracking processing thread, wherein the feature point tracking processing thread processes the processing task according to the output data; or
    将输出数据提供给半全局立体块匹配处理线程,由所述半全局立体块匹配处理线程根据所述输出数据对待处理任务进行处理;或者,Providing the output data to the semi-global stereo block matching processing thread, where the semi-global stereo block matching processing thread processes the processing task according to the output data; or
    将输出数据提供给深度学习处理线程,由所述深度学习处理线程根据所述输出数据对待处理任务进行处理。The output data is provided to a deep learning processing thread, and the deep learning processing thread processes the processing task according to the output data.
  49. 根据权利要求47所述的设备,其特征在于,所述目标图形处理器通过不同的处理线程进行数据共享时具体用于:在半全局立体块匹配处理线程对待处理任务进行处理时,将输出数据提供给点云跟踪处理线程,由点云跟踪处理线程根据所述输出数据对待处理任务进行处理。The device according to claim 47, wherein the target graphics processor performs data sharing through different processing threads, and is specifically configured to: when the semi-global stereo block matching processing thread processes the processing task, output data Provided to the point cloud tracking processing thread, the point cloud tracking processing thread processes the processing task according to the output data.
  50. 根据权利要求47所述的设备,其特征在于,The device according to claim 47, wherein
    所述目标图形处理器通过不同的处理线程进行数据共享时具体用于:When the target graphics processor performs data sharing through different processing threads, it is specifically used for:
    在点云跟踪处理线程对待处理任务进行处理时,将输出数据提供给地图处理线程;在深度学习处理线程对待处理任务进行处理时,将输出数据提供给地图处理线程;所述地图处理线程根据点云跟踪处理线程提供的输入数据和深度学习处理线程提供的输入数据,对待处理任务进行处理。When the point cloud tracking processing thread processes the processing task, the output data is provided to the map processing thread; when the deep learning processing thread processes the processing task, the output data is provided to the map processing thread; the map processing thread according to the point The cloud traces the input data provided by the processing thread and the input data provided by the deep learning processing thread to process the processing task.
  51. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有若干计算机指令,所述计算机指令被执行时,实现权利要求1-25任一项所述的任务处理方法。A computer readable storage medium, wherein the computer readable storage medium stores a plurality of computer instructions, and when the computer instructions are executed, the task processing method of any one of claims 1-25 is implemented.
PCT/CN2018/080970 2018-03-28 2018-03-28 Method, device, and machine readable storage medium for task processing WO2019183861A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/080970 WO2019183861A1 (en) 2018-03-28 2018-03-28 Method, device, and machine readable storage medium for task processing
CN201880012037.2A CN110494848A (en) 2018-03-28 2018-03-28 Task processing method, equipment and machine readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/080970 WO2019183861A1 (en) 2018-03-28 2018-03-28 Method, device, and machine readable storage medium for task processing

Publications (1)

Publication Number Publication Date
WO2019183861A1 true WO2019183861A1 (en) 2019-10-03

Family

ID=68062482

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/080970 WO2019183861A1 (en) 2018-03-28 2018-03-28 Method, device, and machine readable storage medium for task processing

Country Status (2)

Country Link
CN (1) CN110494848A (en)
WO (1) WO2019183861A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112015553A (en) * 2020-08-27 2020-12-01 深圳壹账通智能科技有限公司 Data processing method, device, equipment and medium based on machine learning model
CN113342493A (en) * 2021-06-15 2021-09-03 上海哔哩哔哩科技有限公司 Task execution method and device and computer equipment

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11954518B2 (en) * 2019-12-20 2024-04-09 Nvidia Corporation User-defined metered priority queues
CN111190712A (en) * 2019-12-25 2020-05-22 北京推想科技有限公司 Task scheduling method, device, equipment and medium
CN111209112A (en) * 2019-12-31 2020-05-29 杭州迪普科技股份有限公司 Exception handling method and device
CN111625358B (en) * 2020-05-25 2023-06-20 浙江大华技术股份有限公司 Resource allocation method and device, electronic equipment and storage medium
CN111694648B (en) * 2020-06-09 2023-08-15 阿波罗智能技术(北京)有限公司 Task scheduling method and device and electronic equipment
CN111708639A (en) * 2020-06-22 2020-09-25 中国科学技术大学 Task scheduling system and method, storage medium and electronic device
CN111858078A (en) * 2020-07-15 2020-10-30 江门市俐通环保科技有限公司 High-concurrency data erasing method, device, equipment and storage medium
CN112214020A (en) * 2020-09-23 2021-01-12 北京特种机械研究所 Method and device for establishing task framework and processing tasks of AGV (automatic guided vehicle) scheduling system
CN115955550B (en) * 2023-03-15 2023-06-27 浙江宇视科技有限公司 Image analysis method and system for GPU cluster
CN116954954A (en) * 2023-09-20 2023-10-27 摩尔线程智能科技(北京)有限责任公司 Method and device for processing multi-task queues, storage medium and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110161675A1 (en) * 2009-12-30 2011-06-30 Nvidia Corporation System and method for gpu based encrypted storage access
CN107122243A (en) * 2017-04-12 2017-09-01 杭州远算云计算有限公司 Heterogeneous Cluster Environment and CFD computational methods for CFD simulation calculations
CN107273331A (en) * 2017-06-30 2017-10-20 山东超越数控电子有限公司 A kind of heterogeneous computing system and method based on CPU+GPU+FPGA frameworks

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104156264B (en) * 2014-08-01 2017-10-10 西北工业大学 A kind of base band signal process tasks in parallel real-time scheduling method based on many GPU
US20170300361A1 (en) * 2016-04-15 2017-10-19 Intel Corporation Employing out of order queues for better gpu utilization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110161675A1 (en) * 2009-12-30 2011-06-30 Nvidia Corporation System and method for gpu based encrypted storage access
CN107122243A (en) * 2017-04-12 2017-09-01 杭州远算云计算有限公司 Heterogeneous Cluster Environment and CFD computational methods for CFD simulation calculations
CN107273331A (en) * 2017-06-30 2017-10-20 山东超越数控电子有限公司 A kind of heterogeneous computing system and method based on CPU+GPU+FPGA frameworks

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112015553A (en) * 2020-08-27 2020-12-01 深圳壹账通智能科技有限公司 Data processing method, device, equipment and medium based on machine learning model
CN113342493A (en) * 2021-06-15 2021-09-03 上海哔哩哔哩科技有限公司 Task execution method and device and computer equipment

Also Published As

Publication number Publication date
CN110494848A (en) 2019-11-22

Similar Documents

Publication Publication Date Title
WO2019183861A1 (en) Method, device, and machine readable storage medium for task processing
US8144149B2 (en) System and method for dynamically load balancing multiple shader stages in a shared pool of processing units
US10140157B2 (en) Multiple process scheduling of threads using process queues
US9176794B2 (en) Graphics compute process scheduling
JP5722327B2 (en) Hardware based scheduling of GPU work
KR101788267B1 (en) Optimizing communication of system call requests
US20070091088A1 (en) System and method for managing the computation of graphics shading operations
US9176795B2 (en) Graphics processing dispatch from user mode
US20130212594A1 (en) Method of optimizing performance of hierarchical multi-core processor and multi-core processor system for performing the method
US20120229481A1 (en) Accessibility of graphics processing compute resources
US8743131B2 (en) Course grain command buffer
US20150163324A1 (en) Approach to adaptive allocation of shared resources in computer systems
CN111309649B (en) Data transmission and task processing method, device and equipment
US20150134884A1 (en) Method and system for communicating with non-volatile memory
US9104491B2 (en) Batch scheduler management of speculative and non-speculative tasks based on conditions of tasks and compute resources
US20220114120A1 (en) Image processing accelerator
US20150134883A1 (en) Method and system for communicating with non-volatile memory via multiple data paths
US20130219386A1 (en) Dynamic allocation of compute resources
CN114647527A (en) Measuring and detecting idle processing periods and determining root causes thereof in cloud-based streaming applications
JP2023519405A (en) Method and task scheduler for scheduling hardware accelerators
CN114816777A (en) Command processing device, method, electronic device and computer readable storage medium
US10095408B2 (en) Reducing negative effects of insufficient data throughput for real-time processing
EP3752262B1 (en) Asynchronous camera frame allocation
US20130262834A1 (en) Hardware Managed Ordered Circuit
CN112114967B (en) GPU resource reservation method based on service priority

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18912728

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18912728

Country of ref document: EP

Kind code of ref document: A1