CN115880131A

CN115880131A - Driving region heterogeneous calculation acceleration method, system, device and medium

Info

Publication number: CN115880131A
Application number: CN202211698514.XA
Authority: CN
Inventors: 宋嘉文; 薛壮壮; 鄂贵; 袁宝煜; 车启谣
Original assignee: Hozon New Energy Automobile Co Ltd
Current assignee: Hozon New Energy Automobile Co Ltd
Priority date: 2022-12-28
Filing date: 2022-12-28
Publication date: 2023-03-31

Abstract

A driving-capable region heterogeneous computing acceleration method, system, device and medium are realized based on multi-channel fisheye images according to time-flow parallel computing, and the method comprises the following steps: inputting at least one first image into a digital visual preprocessing module DVPP for processing at a first moment; inputting the first image processed by the DVPP into an artificial intelligence preprocessing module AIPP for processing at a second moment, and inputting at least one second image into the DVPP for processing; inputting the processed first image into a neural network inference module AICore for calculation at a third moment, inputting a second image processed by DVPP into AIPP processing, and inputting at least one third image into DVPP processing; inputting the processed first image into a CPU module at a fourth moment, performing post-processing calculation to obtain a first image task processing result, inputting a second image processed by AIPP into AICore for calculation, inputting a third image processed by DVPP into AIPP for processing, and inputting at least one fourth image into DVPP for processing; and obtaining a task processing result and a travelable area of the vehicle in the image after the CPU module processes the image completely.

Description

Driving region heterogeneous calculation acceleration method, system, device and medium

Technical Field

The application belongs to the technical field of methods for accelerating heterogeneous computation of drivable regions, and particularly relates to a method, a system, equipment and a medium for accelerating heterogeneous computation of drivable regions.

Background

At present, the travelable region algorithm is widely applied to an auxiliary parking system, image processing tasks are generally performed on an existing mobile data center MDC by serially calculating processing steps, actually, a calculation peak value of a calculation unit is increased, and meanwhile, each calculation unit is often in an idle state during calculation processing, so that the utilization rate of the calculation unit is not high.

The travelable area algorithm in the automatic driving technology needs to process input images of multiple paths of fisheye cameras so as to obtain a travelable area, and the travelable area algorithm has high real-time requirement on the input image processing. The application of the original drivable region algorithm in the auxiliary parking system is mainly based on serial calculation of a fish-eye camera CPU and a GPU, so that a drivable region is obtained. The serial computing does not utilize the diversity of heterogeneous computing hardware and the multipath characteristics of the fisheye cameras in the parking task for acceleration, so that the problems of centralized use of computing resources and high delay in a computing unit often exist, and the requirement for actually processing multipath images cannot be met.

In view of this, it is necessary to provide a method, a system, a device, and a medium for accelerating heterogeneous computation of a drivable area, so as to solve the problem that serial computation based on a fisheye camera CPU and a GPU obtains a drivable area, and the heterogeneous computation hardware diversity and the characteristic of multiple fisheye cameras in a parking task are not utilized for acceleration, which results in centralized use of computing resources and high delay.

Disclosure of Invention

The application provides a driving-capable area heterogeneous computing acceleration method, a system, equipment and a medium, and solves the problems of centralized computing resource use and high delay existing when a driving-capable area is obtained through serial computing based on a fish-eye camera CPU and a GPU. According to the method and the device, heterogeneous acceleration is performed by utilizing the diversity of heterogeneous computing hardware and the multipath characteristics of the fisheye cameras in the parking task, the time for processing the input images is shortened, and the time efficiency of the input image processing is improved.

The purpose of the application and the technical problem to be solved are realized by adopting the following technical scheme.

The application provides a driving-capable area heterogeneous computing acceleration method which is realized based on parallel computing of a plurality of paths of fisheye images according to time and flow, and comprises the following steps:

at a first moment, inputting at least one first image in the multi-path fisheye images into a digital visual preprocessing module DVPP for image preprocessing;

at the second moment, inputting the first image processed by the digital visual preprocessing module DVPP into an artificial intelligence preprocessing module AIPP for artificial intelligence preprocessing, and inputting at least one second image in the multi-path fisheye images into the digital visual preprocessing module DVPP for image preprocessing;

at the third moment, inputting the first image processed by the artificial intelligence preprocessing module AIPP into a neural network inference module AICore for neural network inference calculation, inputting the second image processed by the digital vision preprocessing module DVPP into the artificial intelligence preprocessing module AIPP for artificial intelligence preprocessing, and inputting at least one third image in the multi-path fisheye images into the digital vision preprocessing module DVPP for image preprocessing;

at the fourth moment, inputting a first image processed by the neural network inference module AICore into a Central Processing Unit (CPU) module for post-processing calculation to obtain a first image task processing result, inputting a second image processed by the artificial intelligence preprocessing module AIPP into the neural network inference module AICore for neural network inference calculation, inputting a third image processed by the digital vision preprocessing module DVPP into the artificial intelligence preprocessing module AIPP for artificial intelligence preprocessing, and inputting at least one fourth image in the multi-path fisheye images into the digital vision preprocessing module DVPP for image preprocessing;

and obtaining a plurality of multi-path fisheye image task processing results after the CPU module post-processes all the multi-path fisheye images, and obtaining a travelable area of the vehicle in the image based on the multi-path fisheye image task processing results.

Optionally, the digital visual preprocessing module DVPP performs the image preprocessing, the artificial intelligence preprocessing module AIPP performs the artificial intelligence preprocessing, and the neural network inference module AICore performs the neural network inference calculation based on a graphics processing unit GPU module.

Optionally, the GPU module of the graphics processor has at least one GPU chip; the GPU chip is used for the digital visual preprocessing module DVPP to carry out image preprocessing, the artificial intelligence preprocessing module AIPP to carry out artificial intelligence preprocessing and the neural network inference module AICore to carry out neural network inference calculation;

the CPU module of the central processing unit is at least provided with one CPU chip, and the CPU chip is used for post-processing calculation and obtaining an image task processing result.

Optionally, the GPU chip and the CPU chip are both integrated multi-core chips, and both may use a multi-thread and/or multi-process manner to perform image processing and calculation.

Optionally, the GPU module of the graphics processor has at least three GPU chips, and the CPU module of the central processing unit has at least one CPU chip;

at least a first GPU chip is used for the digital visual preprocessing module DVPP to carry out image preprocessing, at least a second GPU chip is used for the artificial intelligence preprocessing module AIPP to carry out artificial intelligence preprocessing, and at least a third GPU chip is used for the neural network inference module AICore to carry out neural network inference calculation.

Optionally, the digital visual preprocessing module DVPP is configured to perform the first processing duration of image preprocessing, the artificial intelligence preprocessing module AIPP is configured to perform the second processing duration of artificial intelligence preprocessing, and the neural network inference module AICore is configured to perform the third processing duration of neural network inference calculation, and the processing duration is preferably shorter than the processing duration.

Optionally, the multiple fisheyes acquire and obtain four images based on the four fisheyes, and the number of image paths is matched with the number of the multiple flowing water paths calculated in parallel.

Optionally, the multiple image frames marked with different timestamps are respectively configured in the multiple pipelines for parallel computation; the configuration of the plurality of image frames is carried out in a multi-process and/or multi-thread mode based on the CPU module.

The application also provides a driving-capable region heterogeneous computing acceleration system, which is realized by time-flow parallel computing based on multi-path fisheye images, and the driving-capable region heterogeneous computing acceleration system comprises:

the digital visual preprocessing module DVPP is used for inputting at least one first image in the fish-eye multi-path images into the digital visual preprocessing module DVPP for image preprocessing at a first moment;

the artificial intelligence preprocessing module AIPP is used for inputting the first image processed by the digital vision preprocessing module DVPP into the artificial intelligence preprocessing module AIPP for artificial intelligence preprocessing at the second moment and inputting at least one second image in the multi-path fisheye images into the digital vision preprocessing module DVPP for image preprocessing at the same time;

the neural network inference module AICore is used for inputting the first image processed by the artificial intelligence preprocessing module AIPP into the neural network inference module AICore for neural network inference calculation at a third moment, inputting the second image processed by the digital vision preprocessing module DVPP into the artificial intelligence preprocessing module AIPP for artificial intelligence preprocessing at the same time, and inputting at least one third image in the multi-path fisheye images into the digital vision preprocessing module DVPP for image preprocessing;

the central processing unit CPU module is used for inputting a first image processed by the neural network inference module AICore into the central processing unit CPU module for processing and calculation to obtain a first image task processing result at a fourth moment, inputting a second image processed by the artificial intelligence preprocessing module AIPP into the neural network inference module AICore for neural network inference calculation, inputting a third image processed by the digital vision preprocessing module DVPP into the artificial intelligence preprocessing module AIPP for artificial intelligence preprocessing, and inputting at least one fourth image in the multi-path fisheye images into the digital vision preprocessing module DVPP for image preprocessing;

and the travelable region acquisition module is used for obtaining a plurality of multipath fisheye image task processing results after the central processing unit CPU module carries out post-processing on all the multipath fisheye images, and acquiring travelable regions of vehicles in the images based on the multipath fisheye image task processing results.

The present application also provides an electronic device, including:

a memory for storing non-transitory computer readable instructions; and

a processor for executing the computer readable instructions such that the computer readable instructions, when executed by the processor, implement the method described above.

The present application also provides a computer-readable storage medium comprising computer instructions which, when run on a device, cause the device to perform the method described above.

Compared with the prior art, the method has obvious advantages and beneficial effects. By means of the technical scheme, the method and the device have at least one of the following advantages and beneficial effects:

1. the application provides a drivable region heterogeneous computation acceleration method, which is realized based on parallel computation of a plurality of paths of fisheye images according to time and running water, and comprises the following steps: at a first moment, inputting at least one first image in the multi-path fisheye images into a digital visual preprocessing module DVPP (digital visual preprocessing) for image preprocessing; at the second moment, inputting the first image processed by the digital visual preprocessing module DVPP into an artificial intelligence preprocessing module AIPP for artificial intelligence preprocessing, and inputting at least one second image in the multi-path fisheye images into the digital visual preprocessing module DVPP for image preprocessing; at the third moment, inputting the first image processed by the artificial intelligence preprocessing module AIPP into a neural network inference module AICore for neural network inference calculation, inputting the second image processed by the digital vision preprocessing module DVPP into the artificial intelligence preprocessing module AIPP for artificial intelligence preprocessing, and inputting at least one third image in the multi-path fisheye images into the digital vision preprocessing module DVPP for image preprocessing; at the fourth moment, inputting the first image processed by the neural network inference module AICore into a Central Processing Unit (CPU) module for post-processing calculation to obtain a first image task processing result, inputting the second image processed by the artificial intelligence preprocessing module AIPP into the neural network inference module AICore for neural network inference calculation, inputting the third image processed by the digital vision preprocessing module DVPP into the artificial intelligence preprocessing module AIPP for artificial intelligence preprocessing, and inputting at least one fourth image in the multi-path fisheye images into the digital vision preprocessing module DVPP for image preprocessing; and obtaining a plurality of multi-path fisheye image task processing results after the CPU module post-processes all the multi-path fisheye images, and obtaining a travelable area of the vehicle in the image based on the multi-path fisheye image task processing results. According to the method, a multi-path fisheye multi-path pipeline parallel computing scheme is adopted, when one path of image enters one of heterogeneous computing cores, other paths of images enter at certain intervals, and it is ensured that other cores are not in an idle waiting state when one of the heterogeneous computing cores performs computing. According to the method and the device, heterogeneous acceleration is performed by utilizing the diversity of heterogeneous computing hardware and the multipath characteristics of the fisheye cameras in the parking task, the time for processing the input images is shortened, and the time efficiency of the input image processing is improved.

2. The image preprocessing is carried out by adopting a digital vision preprocessing module DVPP, the artificial intelligence preprocessing module AIPP carries out the artificial intelligence preprocessing and a neural network inference module AICore carries out the neural network inference calculation is realized based on a graphic processing unit GPU module. According to the method, when the graphics processor GPU module is adopted to realize the computation of the heterogeneous computation cores based on the graphics processor GPU module and the central processing unit CPU module by the digital vision preprocessing module DVPP, the artificial intelligence preprocessing module AIPP and the neural network inference module AICore for the multiple images of the multiple fisheye images according to time-flow parallel computation, all the rest heterogeneous computation cores are not in an idle waiting state, the fisheye images in the other paths of the multiple fisheye images are computed in parallel, and the heterogeneous computation hardware graphics processor GPU module is utilized to perform heterogeneous acceleration on the characteristics of the multiple paths of the fisheye cameras, so that the input image processing time is shortened, and the input image processing timeliness is improved.

3. According to the method, the GPU chip and the CPU chip are integrated multi-core chips, and image processing calculation can be performed in a multi-thread and/or multi-process mode; the graphics processing unit GPU module is at least provided with three GPU chips, and the central processing unit CPU module is at least provided with one CPU chip; at least a first GPU chip is used for the digital visual preprocessing module DVPP to carry out image preprocessing, at least a second GPU chip is used for the artificial intelligence preprocessing module AIPP to carry out artificial intelligence preprocessing, and at least a third GPU chip is used for the neural network inference module AICore to carry out neural network inference calculation. According to the method, the digital vision preprocessing module DVPP, the artificial intelligence preprocessing module AIPP and the neural network inference module AICore are adopted to respectively realize the parallel computation of a plurality of images of the multi-path fisheye images according to time flow by adopting at least three GPU chips, so that the independence of each image in the multi-path fisheye images during computation is ensured; in addition, the GPU chip and the CPU chip are integrated multi-core chips, and a multithreading and/or multiprocessing mode is adopted to respectively carry out image processing calculation on at least one image in each path of image in the multi-path fisheye image; when various images are processed, task sequence queuing and processing are carried out in a multithreading and/or multiprocessing mode, and the problems that when serial calculation is carried out on a fisheye camera CPU and a GPU to obtain a travelable area, the use of calculation resources in a GPU chip and the CPU chip is concentrated and delay is high are solved.

4. The method comprises the steps that the digital vision preprocessing module DVPP is used for conducting image preprocessing first processing time, the artificial intelligence preprocessing module AIPP is used for conducting second processing time of artificial intelligence preprocessing, the neural network inference module AICore is used for conducting third processing time of neural network inference calculation for comparison, processing is conducted with priority and small processing time, when one path of image enters one heterogeneous computing core, the other paths of images enter the corresponding different heterogeneous computing cores according to a certain interval through image preprocessing with the priority and small processing time, and the other heterogeneous computing cores are also in a heterogeneous computing core computing state instead of an idle waiting state when one heterogeneous computing core in a GPU chip and a CPU chip is computed; meanwhile, when the image with long processing time is preprocessed, heterogeneous calculation processing of the image is carried out in a multi-core chip integrated by the GPU and the CPU in a multi-thread and/or multi-process mode, a plurality of heterogeneous calculation cores can be enabled to be parallel at the same time, the image preprocessing with long processing time is accelerated, parallel optimization accelerated calculation processing is carried out at the same time through heterogeneous calculation, and the time effectiveness of the GPU chip and the CPU chip in the accelerated heterogeneous calculation of the travelable area of the multi-path fisheye image is improved.

5. According to the method, the multi-channel fisheyes are used for acquiring and obtaining four-channel images based on the four-channel fisheyes, and the number of image channels is matched with the number of multi-channel flowing water in parallel computing. The method and the device have the advantages that the number of the image paths is matched with the number of the multi-path flowing water which is subjected to parallel computing, the first time can be positioned at the time when each path of flowing water begins to be processed in the preset period, the four paths of fisheyes are collected and the four paths of images are acquired to perform heterogeneous accelerated computing and are divided into a plurality of different time periods through the first time, the second time, the third time and the fourth time, so that when one path of images enters one path of flowing water heterogeneous computing, the rest paths of images can enter heterogeneous computing of other paths of remaining flowing water sequentially according to certain time period intervals, namely the multi-path flowing water can perform heterogeneous computing simultaneously and parallelly, and the time efficiency of the parallel heterogeneous computing of the multi-path flowing water is improved.

The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical means of the present invention more clearly understood, the present invention may be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present invention more clearly understood, the following preferred embodiments are described in detail with reference to the accompanying drawings.

Drawings

Fig. 1 is a schematic flowchart of a driving area heterogeneous computing acceleration method according to an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a drivable region heterogeneous computing acceleration system according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a four-way fisheye four-way pipeline parallel computing architecture according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

To further explain the technical means and effects of the present invention adopted to achieve the predetermined objects, the following detailed description of the embodiments, structures, features and effects thereof according to the present invention will be given with reference to the accompanying drawings and preferred embodiments.

The application provides a driving-capable area heterogeneous computing acceleration method, a system, equipment and a medium, and solves the problems of centralized computing resource use and high delay when a driving-capable area is obtained by serial computing based on a fish-eye camera CPU and a GPU. According to the method and the device, heterogeneous acceleration is performed by utilizing the diversity of heterogeneous computing hardware and the multipath characteristics of the fisheye cameras in the parking task, the time for processing the input image is shortened, and the timeliness of the input image processing is improved.

The application provides a driving-capable region heterogeneous computing acceleration method, which is realized based on parallel computing of a plurality of paths of fisheye images according to time and flow, and as shown in the attached figure 1, the driving-capable region heterogeneous computing acceleration method comprises the following steps:

s1, at a first moment, inputting at least one first image in the multi-path fisheye images into a digital visual preprocessing module DVPP for image preprocessing;

s2, at a second moment, inputting the first image processed by the digital visual preprocessing module DVPP into an artificial intelligence preprocessing module AIPP for artificial intelligence preprocessing, and inputting at least one second image in the multi-path fisheye images into the digital visual preprocessing module DVPP for image preprocessing;

s3, at a third moment, inputting the first image processed by the artificial intelligence preprocessing module AIPP into a neural network inference module AICore for neural network inference calculation, inputting the second image processed by the digital vision preprocessing module DVPP into the artificial intelligence preprocessing module AIPP for artificial intelligence preprocessing, and inputting at least one third image in the multi-path fisheye images into the digital vision preprocessing module DVPP for image preprocessing;

s4, at the fourth moment, inputting the first image processed by the neural network inference module AICore into a Central Processing Unit (CPU) module for post-processing calculation to obtain a first image task processing result, inputting the second image processed by the artificial intelligence preprocessing module AIPP into the neural network inference module AICore for neural network inference calculation, inputting the third image processed by the digital vision preprocessing module DVPP into the artificial intelligence preprocessing module AIPP for artificial intelligence preprocessing, and inputting at least one fourth image in the multi-path fisheye images into the digital vision preprocessing module DVPP for image preprocessing;

and S5, obtaining a plurality of multi-path fisheye image task processing results after the CPU module carries out post-processing on the multi-path fisheye images, and obtaining the travelable area of the vehicle in the image based on the multi-path fisheye image task processing results.

For example, performing an image parallel computing processing task on a plurality of fisheye multi-path pipelines on a Mobile Data Center MDC (Mobile Data Center) generally includes: when a certain path of image enters one path of flow heterogeneous computation core, the other paths of images respectively enter the flow heterogeneous computation cores of the corresponding paths at preset time according to a certain time interval, so that when the certain path of heterogeneous computation core in the multi-path flow is used for computation, the other heterogeneous computation cores are not in an idle waiting resource idle state. If so, inputting a first image in the multi-path fisheye images into the digital visual preprocessing module DVPP for image preprocessing at a first moment; when the first image enters an artificial intelligence preprocessing module AIPP after image preprocessing calculation is finished, the first image passes through T _DVPP And in a time interval, namely at a second moment, inputting the first image processed by the digital visual preprocessing module DVPP into the artificial intelligence preprocessing module AIPP for artificial intelligence preprocessing, and simultaneously inputting one second image in the multi-path fisheye images into the digital visual preprocessing module DVPP. At the time of passing through T _AIPP And in the time interval, namely at the third moment, inputting the first image processed by the artificial intelligence preprocessing module AIPP into a neural network inference module AICore for neural network inference calculation, simultaneously inputting the second image processed by the digital vision preprocessing module DVPP into the artificial intelligence preprocessing module AIPP for artificial intelligence preprocessing, and inputting one third image in the multi-path fisheye images into the digital vision preprocessing module DVPP for image preprocessing. At the time of passing through T _AICore Inputting the first image processed by the neural network inference module AICore into a Central Processing Unit (CPU) module at the fourth moment to perform post-processing calculation to obtain a first image task processing result, and performing post-processing calculation by the CPU module to obtain a second image task processing resultTaking the time required by the processing result of the first image task as T _CPU Meanwhile, a second image processed by the artificial intelligence preprocessing module AIPP is input into the neural network inference module AICore for neural network inference calculation, a third image processed by the digital vision preprocessing module DVPP is input into the artificial intelligence preprocessing module AIPP for artificial intelligence preprocessing, and a fourth image in the multi-path fisheye images is input into the digital vision preprocessing module DVPP for image preprocessing. If the multiple fisheye images are four fisheye images and the number of the parallel processing paths is four-path running-line parallel calculation, four fisheye image task processing results are obtained until the central processing unit CPU module carries out post-processing on all the four fisheye images, and the travelable areas of vehicles in the four fisheye images are obtained based on the four fisheye image task processing results. Wherein, the total time T of the driving region heterogeneous acceleration four-way pipeline parallel computation _bsum Is calculated as in formula (1):

T _bsum ＝T _DVPP +T _AIPP +T _AICore +T _CPU *4 (1)

the total time T calculated serially _csum Is calculated as in formula (2):

T _csum ＝T _DVPP *4+T _AIPP *4+T _AICore *4+T _CPU *4 (2)

the total time T of the four-way pipeline parallel computation of the driving region heterogeneous acceleration _bsum Total time T calculated more serially _csum Save time T _save Is calculated as in formula (3):

T _save ＝T _DVPP *3+T _AIPP *3+T _AICore *3 (3)

the number of the images or image frames processed by each path of module can be multiple, and the effect of the driving area heterogeneous acceleration four-path pipeline parallel computing can be achieved by performing parallel computing on the multi-frame images with different time stamps. The driving-capable area heterogeneous computing acceleration scheme not only improves the driving-capable area heterogeneous computing acceleration processing time, but also avoids the situation that the computing peak value of a single heterogeneous computing core is high due to the fact that a plurality of example images are started for acceleration and parallel computing. According to the method and the device, heterogeneous accelerated calculation is performed by utilizing the diversity of the heterogeneous calculation hardware GPU and the multipath characteristics of the fisheye cameras in the parking task, the time for calculating and processing the input images or the image frames is shortened, and the time efficiency of the calculating and processing of the input images or the image frames is improved.

Optionally, the digital visual preprocessing module DVPP performs the image preprocessing, the artificial intelligence preprocessing module AIPP performs the artificial intelligence preprocessing, and the neural network inference module AICore performs the neural network inference calculation based on a GPU module of a graphics processor.

For example, the digital visual preprocessing module DVPP performs the image preprocessing, which includes: video Decoding (VDEC), video Encoding (VENC), JPEG decoding (JPEGD), JPEG encoding (JPEGE), PNG (PNGD), and visual pre-processing (VPC). When input data enters the data engine, the data format can be checked and found by the engine to not meet the processing requirement of subsequent AI Core, and the digital vision preprocessing module can be started to carry out data preprocessing. Preprocessing of data stream images/pictures/image frames as follows:

(1) First, matrix will move data from memory to the buffer of DVPP for caching.

(2) According to the format of specific data, the preprocessing engine completes parameter configuration and data transmission through a programming interface provided by the DVPP.

(3) After the programming interface is started, the DVPP transmits the configuration parameters and the original data to a driver, and the DVPP drives and calls a PNG or JPEG decoding module to initialize and issue tasks.

(4) The PNG or JPEG decoding module in the DVPP special hardware starts the actual operation to finish the decoding of the picture, and data in YUV or RGB format is obtained, thereby meeting the requirement of the subsequent processing.

(5) After decoding is completed, the Matrix continuously calls the VPC through the same mechanism to further convert the picture into the YUV420SP format, and because the YUV420SP format data is high in storage efficiency and small in occupied bandwidth, more data can be transmitted under the same bandwidth to meet the requirement of the strong computing throughput of the AI Core. Meanwhile, the DVPP can also finish the cutting and scaling of images/pictures/image frames. For example, in typical cropping and zero padding operations for changing the size of an image/picture/image frame, the VPC extracts a to-be-processed image/picture/image frame portion from an original image/picture/image frame, and performs zero padding on the to-be-processed image/picture/image frame portion, thereby preserving edge feature information in the convolutional neural network calculation process. The zero padding operation needs four padding sizes, namely, an upper padding size, a lower padding size, a left padding size and a right padding size, and the edge of the image/picture/image frame is expanded in the zero padding area, so that the image/picture/image frame after zero padding, which can be directly calculated, is finally obtained.

The image/picture/image frame data after a series of pre-processing is not limited to the following two processing methods:

the image/picture/image frame data may be further pre-processed by AIPP according to the model requirements (alternatively, if the data output by DVPP meets the image/picture/image frame requirements, it may not be processed by AIPP), and then the image/picture/image frame data meeting the requirements is sent to AI Core under the control of AI CPU for the required neural network calculation.

And uniformly coding the output image/picture/image frame data through a JPEG coding module, finishing coding post-processing, putting the data into a buffer of the DVPP, finally taking out the data by Matrix for subsequent operation, and simultaneously releasing the computing resources of the DVPP and recovering the cache.

In the whole preprocessing process, the Matrix completes function calling of different modules. DVPP is used as a customized data supply module, adopts a heterogeneous or special processing mode to carry out rapid transformation on image/picture/image frame data, provides a sufficient data source for AI Core, and meets the requirements of large data volume and large bandwidth in neural network calculation.

It should be noted that, when the GPU module of the graphics processor has one GPU chip and the CPU module of the central processing unit has one CPU chip, the CPU chip and the GPU chip are generally multi-core chips, and the CPU chip may perform image preprocessing on the digital visual preprocessing module DVPP, perform artificial intelligence preprocessing on the artificial intelligence preprocessing module AIPP, and perform neural network inference calculation on the neural network inference module AICore at different times of different time periods, such as the first time and the second time \8230, etc., and perform heterogeneous parallel calculation processing on image/picture/image frame data in the multi-channel flow water of each preprocessing module by using a preset multi-task process form at the same time.

When the GPU module of the graphics processor has a plurality of GPU chips and the CPU module of the central processing unit has a plurality of CPU chips, the heterogeneous parallel computing processing of the image/picture/image frame data in the multi-channel streaming water of each preprocessing module is similar to the above-described manner, and is not described herein again.

According to the method and the device, heterogeneous acceleration is performed on the multipath characteristics of the fisheye camera by using the GPU module of the heterogeneous computing hardware graphics processor, the time for processing the input image is shortened, and the timeliness of the input image processing is improved.

It should be noted that, when the GPU chip and the CPU chip may perform parallel processing calculation of an image in a multi-thread and multi-process manner, as an implementation manner, a multi-path pipeline calculation of each of the preprocessing modules may be controlled in a multi-process manner of presetting multiple tasks in the CPU chip, and at the same time, a multi-path pipeline calculation of a single thread or multiple threads may be included in a certain process to adjust and control at least one of the preprocessing modules, so as to accelerate the efficiency of the multi-path pipeline parallel calculation of each of the preprocessing modules.

In addition, the GPU chip and the CPU chip may also perform parallel processing and calculation of images in a multi-thread or multi-process manner, which is not described herein again.

It should be noted that, the GPU module of the graphics processor has at least three GPU chips, and the three GPU chips are respectively and correspondingly configured in the image preprocessing modules, so as to avoid the problems that the digital visual preprocessing module DVPP performs the image preprocessing, the artificial intelligence preprocessing module AIPP performs the artificial intelligence preprocessing, and the neural network inference module AICore performs the neural network inference calculation, and meanwhile, when a certain heterogeneous calculation of the GPU chip in the GPU module of the graphics processor checks different image/picture/image frame data to process, the calculation peak of a single heterogeneous calculation core is higher, and the use of the calculation resources is centralized and the delay is high.

It should be noted that, by comparing the durations of the preprocessing modules in the image processing and by using the preprocessing module with a short priority processing duration, each preprocessing module can smoothly and stably transit to the corresponding next preprocessing module for processing (or queue in the order of timestamps and perform preprocessing at a preset time) after processing the respective image, and by preferentially processing the preprocessing module with a short processing duration, the parallel processing time sequence of each preprocessing module can be optimized and continued, the queuing time of the image preprocessing at each preprocessing module can be shortened, and the heterogeneous processing timeliness of each preprocessing module can be further improved.

Optionally, the multi-channel fisheyes acquire and obtain four channels of images based on the four channels of fisheyes, and the number of image channels is matched with the number of multi-channel running water in parallel computation.

For example, four fisheyes are selected from multiple fisheye images to acquire and obtain images, and the acquired images are the four fisheye images, and due to the fact that multiple pipelines are parallel to perform micro-batch parallel calculation of multiple paths simultaneously in the same time dimension. If only one set of variable domain is adopted, the same-name variables of different batches in the same path number processing stage cannot be effectively managed, and then the unnecessary variables cannot be released in time. Therefore, the implementation is not limited to the adoption of two variable domains, namely the local variable domain and the global variable domain corresponding to the micro-batch of each way. The variable domain of the micro-batch of each path number is responsible for storing an intermediate variable in forward calculation of the processing stage of each path number and is used for reversely calculating the gradient of the micro-batch of the path number; the global variable field is responsible for storing the gradient from the micro-batch of each path number to the micro-batch of each path number relative to the global variable field. By scheduling different variable domains, local variables and global variables can be effectively managed, redundant variables are released in time, and the overhead of excessive resource idleness or resource calculation outside each flow path number of each processing module is reduced.

Optionally, the plurality of image frames marked with different timestamps are respectively configured in the multi-channel pipeline for parallel computation; the configuration of the plurality of image frames is configured based on the CPU module of the central processing unit in a multi-process and/or multi-thread mode.

It should be noted that, by marking a plurality of image frames with different timestamps, the size of the image frame in the image or video frame can be preset, and the plurality of image frames are marked by the different timestamps, so that the plurality of image frames marked by the different timestamps are configured in a multi-process and/or multi-thread manner based on the CPU module of the central processing unit, thereby ensuring that the plurality of image frames marked by the different timestamps can be more continuous when being processed in parallel in a multi-way pipeline; meanwhile, the centralization of the use of computing resources in a GPU chip and a CPU chip when a plurality of image frames are subjected to parallel computing in the CPU and the GPU is reduced, the delay time of computing processing is reduced, and the frame rate of parallel heterogeneous computing processing of the image frames is improved.

The present application further provides a system for accelerating computation of heterogeneous travelable regions, which is implemented based on parallel computation of multiple fisheye images according to time and pipeline, as shown in fig. 2, the system 200 for accelerating computation of heterogeneous travelable regions includes:

the digital visual preprocessing module DVPP210 is used for inputting at least one first image in the fisheye multipath images into the digital visual preprocessing module DVPP for image preprocessing at a first moment;

the artificial intelligence preprocessing module AIPP220 is used for inputting the first image processed by the digital vision preprocessing module DVPP into the artificial intelligence preprocessing module AIPP for artificial intelligence preprocessing at the second moment and inputting at least one second image in the multi-path fisheye images into the digital vision preprocessing module DVPP for image preprocessing at the same time;

the neural network inference module AICore230 is used for inputting the first image processed by the artificial intelligence preprocessing module AIPP into the neural network inference module AICore for neural network inference calculation at a third moment, inputting the second image processed by the digital vision preprocessing module DVPP into the artificial intelligence preprocessing module AIPP for artificial intelligence preprocessing at the same time, and inputting at least one third image in the multi-path fisheye images into the digital vision preprocessing module DVPP for image preprocessing;

the central processing unit CPU module 240 is configured to input, at a fourth time, the first image processed by the neural network inference module aicre into the central processing unit CPU module to perform processing calculation to obtain a first image task processing result, input, at the same time, the second image processed by the artificial intelligence preprocessing module AIPP into the neural network inference module aicre to perform neural network inference calculation, input, into the artificial intelligence preprocessing module AIPP, the third image processed by the digital vision preprocessing module DVPP to perform artificial intelligence preprocessing, and input, into the digital vision preprocessing module DVPP, at least one fourth image in the multi-channel fisheye image to perform image preprocessing;

and the travelable region acquisition module 250 is used for acquiring a plurality of multipath fisheye image task processing results after the central processing unit CPU module performs all post-processing on the multipath fisheye images, and acquiring travelable regions of vehicles in the images based on the multipath fisheye image task processing results.

It should be noted that, the operations of the digital visual preprocessing module DVPP210, the artificial intelligence preprocessing module AIPP220, the neural network inference module aicre 230, the central processing unit CPU module 240 and the drivable area acquisition module 250 of the drivable area heterogeneous computing acceleration system 200 may refer to the contents of the drivable area heterogeneous computing acceleration method described above, and are not described herein again.

In an embodiment of the present application, as shown in fig. 3, not limited to one cycle, at a time when the preset time is t1, a first picture/time-stamped image frame is selected from four images captured by four fish-eyes and input to a digital visual preprocessing module DVPP of a first pipeline, so as to perform first picture/time-stamped image frame preprocessing. Inputting a first image processed by the digital visual preprocessing module DVPP of the first path of running water into an artificial intelligence preprocessing module AIPP for artificial intelligence preprocessing at the moment when the preset time is t 2; and meanwhile, inputting a second image/image frame with a timestamp in the four fisheye images into a digital visual preprocessing module DVPP (digital visual preprocessing) of a second flow to perform image/image frame preprocessing with the timestamp. At the moment when the preset time is t3, inputting the first image/image frame with the time stamp processed by the artificial intelligence preprocessing module AIPP into the neural network inference module AICore of the first flow for neural network inference calculation of the first flow, inputting the second image/image frame with the time stamp processed by the digital vision preprocessing module DVPP of the second flow into the artificial intelligence preprocessing module AIPP of the second flow for artificial intelligence preprocessing of the second flow, and inputting the third image/image frame with the time stamp in the four fisheye images into the digital vision preprocessing module DVPP of the third flow for image/image frame with the time stamp of the third flow for image preprocessing of the third flow. At the moment that the preset time is t4, inputting a first image/image frame with a timestamp processed by a neural network inference module AICore of a first flow into a central processing unit CPU module of the first flow for subsequent processing calculation to obtain a first image/image frame task processing result with the timestamp, inputting a second image/image frame with a timestamp processed by an artificial intelligence preprocessing module AIPP of a second flow into a neural network inference module AICore of the second flow for neural network inference calculation of the second flow, inputting a third image/image frame with a timestamp processed by a digital vision preprocessing module DVPP of a third flow into an artificial intelligence preprocessing module AIPP of a third flow for artificial intelligence preprocessing of the third flow, and inputting a fourth image/image frame with a timestamp in a fourth fish eye image into a digital vision preprocessing module DVPP of a fourth flow for image/image frame with a timestamp preprocessing of the fourth flow. And after the CPU module of the central processing unit of the fourth pipeline completely carries out subsequent processing on the four fisheye images/image frames with the timestamps, acquiring four image frame task processing results of the four fisheye images/image frames with the timestamps, and acquiring the travelable areas of the vehicles in the four fisheye images/image frames with the timestamps based on the four image frame task processing results of the four fisheye images/image frames with the timestamps.

The present application further provides an electronic device, as shown in fig. 4, the electronic device 400 includes:

a memory 410 for storing non-transitory computer readable instructions 430; and

a processor 420 for executing the computer-readable instructions 430, such that the computer-readable instructions 430 when executed by the processor 420 implement the method described above.

It should be noted that any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps in the process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following technologies, which are well known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried out in the method of implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be partially realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer-readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Although the present invention has been described with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the present invention.

Claims

1. A driving-capable region heterogeneous computing acceleration method is characterized in that the driving-capable region heterogeneous computing acceleration method is realized based on multi-path fisheye images according to time-flow parallel computing, and comprises the following steps:

at the second moment, inputting the first image processed by the digital visual preprocessing module DVPP into an artificial intelligence preprocessing module AIPP for artificial intelligence preprocessing, and simultaneously inputting at least one second image in the multi-path fisheye images into the digital visual preprocessing module DVPP for image preprocessing;

at the fourth moment, inputting the first image processed by the neural network inference module AICore into a Central Processing Unit (CPU) module for post-processing calculation to obtain a first image task processing result, inputting the second image processed by the artificial intelligence preprocessing module AIPP into the neural network inference module AICore for neural network inference calculation, inputting the third image processed by the digital vision preprocessing module DVPP into the artificial intelligence preprocessing module AIPP for artificial intelligence preprocessing, and inputting at least one fourth image in the multi-path fisheye images into the digital vision preprocessing module DVPP for image preprocessing;

and obtaining a plurality of multi-path fisheye image task processing results after the CPU module carries out post-processing on the multi-path fisheye images, and obtaining the drivable area of the vehicle in the image based on the multi-path fisheye image task processing results.

2. The heterogeneous computation acceleration method according to claim 1, characterized in that said digital visual preprocessing module DVPP performs said image preprocessing, said artificial intelligence preprocessing module AIPP performs said artificial intelligence preprocessing, and said neural network inference module AICore performs said neural network inference computation based on a graphics processor GPU module.

3. The heterogeneous computing acceleration method of claim 2, wherein the graphics processor GPU module has at least one GPU chip; the GPU chip is used for the digital visual preprocessing module DVPP to carry out image preprocessing, the artificial intelligence preprocessing module AIPP to carry out artificial intelligence preprocessing and the neural network inference module AICore to carry out neural network inference calculation;

4. The heterogeneous computing acceleration method according to claim 3, characterized in that the GPU chip and the CPU chip are both integrated multi-core chips, and both can adopt multi-thread and/or multi-process modes to perform image processing and computation.

5. The heterogeneous computing acceleration method of claim 4, wherein the graphics processor GPU module has at least three GPU chips and the central processor CPU module has at least one CPU chip;

6. The heterogeneous computation acceleration method according to claim 5, characterized in that the digital visual preprocessing module DVPP performs the image preprocessing for a first processing time, the artificial intelligence preprocessing module AIPP performs the artificial intelligence preprocessing for a second processing time, and the neural network inference module AICore performs the neural network inference computation for a third processing time, and the processing time is shorter than the first processing time.

7. The heterogeneous computation acceleration method of claim 6, characterized in that the multi-path fisheyes collect and acquire four-path images based on four-path fisheyes, and the number of image paths matches the number of multi-path pipelines computed in parallel.

8. A drivable region heterogeneous computing acceleration system is characterized by being realized based on multi-path fisheye images according to time-flow parallel computing and comprises the following components:

the neural network inference module AICore is used for inputting the first image processed by the artificial intelligence preprocessing module AIPP into the neural network inference module AICore for neural network inference calculation at a third moment, inputting the second image processed by the digital vision preprocessing module DVPP into the artificial intelligence preprocessing module AIPP for artificial intelligence preprocessing, and inputting at least one third image in the multi-path fisheye images into the digital vision preprocessing module DVPP for image preprocessing;

the central processing unit CPU module is used for inputting a first image processed by the neural network inference module AICore into the central processing unit CPU module for processing calculation to obtain a first image task processing result at a fourth moment, inputting a second image processed by the artificial intelligence preprocessing module AIPP into the neural network inference module AICore for neural network inference calculation, inputting a third image processed by the digital vision preprocessing module DVPP into the artificial intelligence preprocessing module AIPP for artificial intelligence preprocessing, and inputting at least one fourth image in the multi-path fisheye images into the digital vision preprocessing module DVPP for image preprocessing;

and the travelable area acquisition module is used for acquiring a plurality of multipath fisheye image task processing results after the central processing unit CPU module carries out post-processing on the multipath fisheye images, and acquiring travelable areas of vehicles in the images based on the multipath fisheye image task processing results.

9. An electronic device, comprising:

a memory for storing non-transitory computer readable instructions; and

a processor for executing the computer readable instructions such that the computer readable instructions, when executed by the processor, implement the method of any of claims 1 to 7.

10. A computer readable storage medium comprising computer instructions which, when run on a device, cause the device to perform the method of any of claims 1 to 7.