CN113610135B - Image processing method, device, computer equipment and storage medium - Google Patents
Image processing method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN113610135B CN113610135B CN202110875653.4A CN202110875653A CN113610135B CN 113610135 B CN113610135 B CN 113610135B CN 202110875653 A CN202110875653 A CN 202110875653A CN 113610135 B CN113610135 B CN 113610135B
- Authority
- CN
- China
- Prior art keywords
- image data
- processor
- model
- heterogeneous
- heterogeneous processor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 21
- 238000012545 processing Methods 0.000 claims abstract description 119
- 238000001514 detection method Methods 0.000 claims abstract description 60
- 238000000034 method Methods 0.000 claims abstract description 53
- 238000007781 pre-processing Methods 0.000 claims abstract description 29
- 230000015654 memory Effects 0.000 claims description 95
- 238000004590 computer program Methods 0.000 claims description 5
- 238000011144 upstream manufacturing Methods 0.000 claims description 5
- 230000008447 perception Effects 0.000 abstract description 12
- 238000010586 diagram Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 230000002093 peripheral effect Effects 0.000 description 4
- 230000000717 retained effect Effects 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 206010034960 Photophobia Diseases 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 208000013469 light sensitivity Diseases 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Processing (AREA)
- Traffic Control Systems (AREA)
Abstract
The embodiment of the invention provides an image processing method, an image processing device, computer equipment and a storage medium, wherein the method comprises the following steps: in the same detection period, a plurality of cameras are sequentially called to collect multi-frame first image data, in the detection period, each time the first image data are stored in a central processing unit, the first image data are copied from the central processing unit to a heterogeneous processor, in the detection period, the first image data are preprocessed for a first target model in the heterogeneous processor to obtain second image data, the first target model is an image processing model of the first image data to be processed, in the heterogeneous processor, the first target model is called to process the second image data, the acquired first image data are copied and preprocessed at the interval time between the cameras collecting the first image data, and the copying and preprocessing are hidden and compressed in the detection period, so that the time cost of the whole perception flow is obviously reduced.
Description
Technical Field
The embodiment of the invention relates to the technical field of computer vision, in particular to an image processing method, an image processing device, computer equipment and a storage medium.
Background
In the sensing system of automatic driving, many modules are used for sensing image data shot by a camera, for example, detecting targets such as traffic lights, lane lines, tail lights, pedestrians, vehicles and the like.
The target detection method generally establishes a neural network, machine learning and other models to infer image data to obtain a detection result, and comprises data acquisition, preprocessing and model inference in a complete perception process, so that the time cost is relatively high, the instantaneity is poor, and the automatic driving decision is influenced.
Disclosure of Invention
The embodiment of the invention provides an image processing method, an image processing device, computer equipment and a storage medium, which are used for solving the problem of high time expenditure for sensing image data in automatic driving.
In a first aspect, an embodiment of the present invention provides an image processing method, including:
in the same detection period, sequentially calling a plurality of cameras to acquire multi-frame first image data;
copying the first image data from the central processor to a heterogeneous processor every time the first image data is stored to the central processor in the detection period;
In the detection period, preprocessing the first image data for a first target model in the heterogeneous processor to obtain second image data, wherein the first target model is an image processing model for processing the first image data;
and in the heterogeneous processor, the first target model is called to process the second image data.
In a second aspect, an embodiment of the present invention further provides an image processing apparatus, including:
the image data acquisition module is used for sequentially calling a plurality of cameras to acquire multi-frame first image data in the same detection period;
a cycle copy module for copying the first image data from the central processor to a heterogeneous processor every time the first image data is stored to the central processor in the detection cycle;
the period preprocessing module is used for preprocessing the first image data for a first target model in the heterogeneous processor in the detection period to obtain second image data, wherein the first target model is an image processing model for the first image data to be processed;
and the image data deducing module is used for calling the first target model to process the second image data in the heterogeneous processor.
In a third aspect, an embodiment of the present invention further provides a computer apparatus, including:
one or more processors;
a memory for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the image processing method as described in the first aspect.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the image processing method according to the first aspect.
In this embodiment, in the same detection period, multiple cameras are sequentially called to collect multiple frames of first image data, in the detection period, each time the first image data is stored in the central processor, the first image data is copied from the central processor to the heterogeneous processor, in the detection period, the first image data is preprocessed for the first target model in the heterogeneous processor to obtain second image data, the first target model is an image processing model of the first image data to be processed, in the heterogeneous processor, the first target model is called to process the second image data, the characteristics of fusion of the cameras and other perception sensors in automatic driving are utilized, the gap time between the cameras collecting the first image data is used for copying and preprocessing the collected first image data, the copying and preprocessing are hidden and compressed in the detection period, so that the time cost of the whole perception flow is obviously reduced, the instantaneity is ensured, and the accuracy of automatic driving decision is ensured.
Drawings
Fig. 1 is a schematic structural diagram of a vehicle according to an embodiment of the present invention;
fig. 2 is a flowchart of an image processing method according to a first embodiment of the present invention;
fig. 3 is a schematic diagram of a sensing operation of a camera and a lidar according to a first embodiment of the present invention;
FIG. 4 is an exemplary diagram of a page lock memory according to a first embodiment of the present invention;
FIG. 5 is a schematic diagram showing a comparison of a CPU and a GPU according to a first embodiment of the present invention;
FIG. 6 is an exemplary diagram of a pre-process provided in accordance with a first embodiment of the present invention;
fig. 7 is a flowchart of an image processing method according to a second embodiment of the present invention;
fig. 8 is a diagram illustrating a D2D according to a second embodiment of the present invention;
fig. 9 is a flowchart of an image processing method according to a third embodiment of the present invention;
fig. 10 is a schematic structural diagram of an image processing apparatus according to a fourth embodiment of the present invention;
fig. 11 is a schematic structural diagram of a computer device according to a fifth embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
Currently, for an autopilot scenario, in a complete image data sensing process, a CPU (central processing unit ) performs preprocessing on image data collected by a camera after a complete detection period, copies the preprocessed image data to heterogeneous computing devices (i.e., heterogeneous processors), such as a GPU (graphics processing unit, graphics processor), a TPU (tensor processing unit, tensor processor), an NPU (neural network processing unit, neural network processor), and the like, and deploys an image processing model such as a neural network, machine learning, and the like on the heterogeneous processors, and invokes the image processing model to sense the preprocessed image data.
In the process, the image data collected by the cameras are often not synchronous, but sequential, and in the detection period, the cameras which are sequenced in advance collect the image data, and the cameras which are sequenced in the back wait to collect the image data, so that the time is wasted; the preprocessing operation of the CPU on the image data is slower, and the delay cost of copying the image data from the CPU to the heterogeneous processor is larger; for different image processing models, the same piece of image data is copied, and redundancy copy exists.
Referring to fig. 1, a vehicle 100 to which an embodiment of an image processing apparatus in an embodiment of the present invention can be applied is shown.
As shown in fig. 1, a vehicle 100 may include a drive control apparatus 101, a vehicle body bus 102, an ECU (Electronic Control Unit ) 103, an ECU 104, an ECU 105, a sensor 106, a sensor 107, a sensor 108, and an actuator 109, an actuator 110, and an actuator 111.
The driving control apparatus (also referred to as an onboard brain) 101 is responsible for overall intelligent control of the entire vehicle 100. The driving control apparatus 101 may be a separately provided controller, such as a CPU, a heterogeneous processor (e.g., GPU, TPU, NPU, etc.) Programmable logic controller (Programmable LogicController, PLC), a single chip microcomputer, an industrial controller, etc.; the device can also be equipment consisting of other electronic devices with input/output ports and operation control functions; but also a computer device installed with a vehicle driving control type application. The driving control device may analyze and process data sent from each ECU and/or data sent from each sensor received on the body bus 102, make a corresponding decision, and send an instruction corresponding to the decision to the body bus.
The body bus 102 may be a bus for connecting the drive control apparatus 101, the ECU 103, the ECU 104, the ECU 105, the sensor 106, the sensor 107, the sensor 108, and other apparatuses of the vehicle 100 not shown. Because of the wide acceptance of high performance and reliability of CAN (Controller AreaNetwork ) buses, the body bus commonly used in motor vehicles is currently the CAN bus. Of course, it is understood that the body bus may be other types of buses.
The body bus 102 may send the instruction sent by the driving control device 101 to the ECU 103, the ECU 104, the ECU 105, and the ECU 103, the ECU 104, and the ECU 105 analyze the instruction and send the instruction to the corresponding executing device for execution.
The sensors 106, 107, 108 include, but are not limited to, lidar, cameras, and the like. Wherein 4-8 groups of wide-angle and high-light-sensitivity cameras are generally erected around the vehicle body, and each group of cameras has different numbers, and can have 2-4 cameras.
It should be noted that, the image processing method provided by the embodiment of the present invention may be executed by the driving control apparatus 101, and accordingly, the image processing device is generally provided in the driving control apparatus 101.
It should be understood that the number of vehicles, drive control devices, body buses, ECUs, actuators, and sensors in fig. 1 are merely illustrative. There may be any number of vehicles, driving control devices, body buses, ECU, and sensors, as desired for implementation.
Example 1
Fig. 2 is a flowchart of an image processing method according to a first embodiment of the present invention, where the method may be applied to the case of adjusting the timing of copying image data from a CPU to a heterogeneous processor, preprocessing the image data on the heterogeneous processor, and sensing the image data, and the method may be performed by an image processing apparatus, which may be implemented by software and/or hardware, may be configured in a computer device, for example, a vehicle, and the like, and specifically includes the following steps:
step 201, in the same detection period, sequentially calling a plurality of cameras to acquire multi-frame first image data.
The vehicle in this embodiment may support automatic driving, which may refer to the ability of the vehicle itself to have environmental awareness, path planning, and to autonomously implement vehicle control, that is, humanoid driving by electronically controlling the vehicle.
Depending on how well the vehicle is handling tasks, autonomous vehicles can be categorized into L0 non-Automation (No Automation), L1 driver assistance (Driver Assistance), L2 partial Automation (Partial Automation), L3 conditional Automation (Conditional Automation), L4 High Automation (High Automation), L5 Full Automation (Full Automation).
The vehicle driven automatically in this embodiment may refer to a vehicle meeting any one of requirements L1-L5, where the system performs an auxiliary function in L1-L3, and when L4 is reached, the vehicle driving is handed over to the system, so the vehicle driven automatically may be selected as a vehicle meeting any one of requirements L4 and L5.
A plurality of cameras are installed on a vehicle, the angles installed among the plurality of cameras are different and can cover the range around the vehicle, the plurality of cameras are usually matched with other perception sensors (such as Lidar, millimeter wave radar, ultrasonic radar and the like) to collect multi-frame first image data, each camera collects one frame of first image data, in the embodiment, the time of the plurality of cameras matched with the other perception sensors to collect the multi-frame first image data is recorded as a detection period, so that in the same detection period, the plurality of cameras can be sequentially called to collect the multi-frame first image data according to the arrangement sequence of the cameras according to the perception characteristics of the other perception sensors.
The laser radar is taken as an example of other perception sensors, one circle (360 degrees) of rotation of the laser radar is a detection period, the rotation of the laser radar can be continuously controlled in the automatic driving process of a vehicle, point cloud data of the surrounding environment of the vehicle are collected in the rotation process, point cloud and first image data can be used for perception at the same time, then when the angle of the laser radar is matched with the angle of a camera, the laser radar scans the visible range of the first camera, and at the moment, a specific synchronizer triggers and calls the camera to collect the first image data.
For example, as shown in fig. 3, 3 cameras are installed in the vehicle, the left-side camera 311 is positioned at an angle of 60 ° in the left front, the front-side camera 312 is positioned at an angle of 60 ° in the right front, the laser radar 300 rotates clockwise, the synchronizer triggers the camera 311 to collect first image data when rotated to 60 ° in the left front, the synchronizer triggers the camera 312 to collect first image data when rotated to the right front, and the synchronizer triggers the camera 313 to collect first image data when rotated to 60 ° in the right front.
Step 202, copying the first image data from the central processor to the heterogeneous processor each time the first image data is stored to the central processor during the detection period.
In general, a CPU is used as a control center, each camera is connected to the CPU, and first image data collected by the camera is stored in a memory of the CPU.
In a complete detection cycle, every two cameras acquire the time of a gap between the first image data, the time of the gap can be utilized, and the first image data is copied from the memory of the CPU to the memory of the heterogeneous processor through PCI-E (peripheral component interconnect express, peripheral equipment high-speed connection standard).
Considering the CPU as a Host, the heterogeneous processor as a Device, and the operation of copying the first image data from the memory of the CPU to the memory of the heterogeneous processor may be denoted as H2D (Host to Device).
For example, as shown in fig. 3, the laser radar 300 has a gap time 321 after rotating to the front left 60 ° and before rotating to the front right, performs an H2D operation on the first image data acquired by the camera 311 during the gap time 321, the laser radar 300 has a gap time 322 after rotating to the front right 60 ° and performs an H2D operation on the first image data acquired by the camera 312 during the gap time 322.
In general, each heterogeneous processor may copy the first image data from the CPU every time the first image data is stored in the central processor in the detection period, and of course, if some heterogeneous processors do not preprocess the first image data, the first image data may not be copied from the CPU, which is not limited in this embodiment.
CPU data allocation is pageable, which is to cut the entire virtual memory and physical memory space into a fixed size segment, such as a continuous and fixed size memory space, called Page (Page), each Page size being determined by hardware, typically 4kb,128 bytes, etc.
The virtual address and the physical address are mapped by a page table, and the page table is actually stored in a Memory Management Unit (MMU) of the CPU, so that the CPU can directly find the physical memory address to be accessed by the MMU.
As shown in fig. 4, the heterogeneous processor Device, such as a GPU, cannot access data directly from the pageable memory of the Host, and therefore, the pageable memory of the central processing unit CPU may be allocated in advance to each camera with a lock page memory readable by the heterogeneous processor, and the image data collected by each camera is stored in the corresponding lock page memory, so that the lock page memory is used as a temporary storage area transferred from the Host to the heterogeneous Device.
Further, when there are a plurality of cameras, a memory pool can be applied from the central processing unit CPU for the plurality of cameras, and each camera in the memory pool is allocated with a lock page memory readable by the heterogeneous processor.
The running process of the memory pool management program is as follows:
initializing: only one block of available memory information is in the memory mapping table, and the size is the size of all available memories in the memory pool. One mem_chunk is allocated from memory chunk pool to point to the first block in the memory map and the other fields in mem_chunk are filled according to the specific memory pool implementation and then added to the memory chunk set.
Application memory: when a block of memory is applied, first, a suitable memory block is searched in the memory chunk set. If a memory block meeting the requirements is found, a corresponding chunk is found in a memory mapping table, the content of a corresponding block structure in the chunk is modified, then the content of the chunk in a memory chunk set is modified according to the modified chunk, and finally the starting address of the allocated memory is returned; otherwise, return to NULL.
Releasing the memory: when a memory is released, firstly finding the corresponding chunk of the memory according to the starting address of the memory, then trying to combine the chunk with the chunk adjacent to the chunk, modifying the content of the corresponding block in the chunk and the content of the corresponding chunk in the memory chunk set, or adding a new memory_chunk to the memory chunk set.
As shown in fig. 4, in the detection period, each time the first image data collected by the camera is stored in the lock page memory corresponding to the camera, a driver of the heterogeneous processor (for example, CUDA (Compute Unified Device Architecture, unified computing device architecture)) may copy the first image data from the lock page memory into a memory of the heterogeneous processor, for example, DRAM (Dynamic Random Access Memory ).
Further, after the setting of the resolution ratio of the camera and the like is initialized, the size of the first image data collected by the camera is fixed, so that the volume of the first image data collected by the camera is queried for each camera, and a page locking memory which is matched with the volume (such as the same volume and slightly larger than the volume) and readable by a heterogeneous processor is applied for the camera in the central processing unit, so that the memory space of the CPU can be saved, and the cost for transmitting the first image data between the pageable memory and the page locking memory is reduced.
Of course, other ways of applying for the lock page memory may be used besides volume matching with the first image data, for example, applying for the lock page memory for the heterogeneous processor according to a default volume, applying for the lock page memory for the heterogeneous processor according to a maximum volume uniformly, and the like, which is not limited in this embodiment.
In addition, the CPU may also transfer the first image data to the heterogeneous processor through other memory transfer modes (such as direct memory transfer (DMA)) besides the page-locked memory, which is not limited in this embodiment.
Step 203, in the detection period, preprocessing the first image data for the first object model in the heterogeneous processor to obtain second image data.
In this embodiment, one or more image processing models may be deployed in advance in heterogeneous processors according to the requirements of autopilot, and each image processing module may implement a function of independently processing image data.
In one deployment approach, the image processing model is divided into a primary branch model, a secondary branch model.
The main branch model can be divided into a plurality of nodes according to the sequence, the main branch model positioned at the downstream node depends on the main branch model positioned at the upstream node, namely, the input of the main branch model positioned at the downstream node is the output of the main branch model positioned at the upstream node, the main branch model is generally an image processing model which is important for automatic driving, the using frequency is higher, and the main branch model is deployed in the same heterogeneous processor, so that data can be directly transmitted to each other in the same heterogeneous processor, data transmission among different processors is reduced, consumption of transmitted data is reduced, and processing efficiency is improved.
The main branch model depends on the secondary branch model, that is, the input of the main branch model is the output of the branch model, the branch model provides support for the main branch model, the importance of the secondary branch model for automatic driving is generally lower than that of the main branch model for automatic driving, the use frequency of the secondary branch model is generally lower than that of the main branch model, and the secondary branch model is deployed in other one or more heterogeneous processors, so that the load calculation pressure can be balanced.
Taking GPU as an example, GPU includes GPU1 (No. 1 card) and GPU0 (No. 0 card), GPU0 is also called integrated display, i.e. integrated display card, is a display card of the main board, GPU1 is also called independent display, i.e. independent display card, is a single display card, and the independent display performance is generally higher than the integrated display.
In this example, a main branch model is deployed on the GPU1, for example, an image processing model for fusing image data and point cloud data, an image processing model for detecting a target using the fused image data and point cloud data, an image processing model for predicting a speed for the target, an image processing model for tracking the target, and the like, so that higher computing performance is configured for the main branch model, normal operation of the main branch model is ensured, and performance of automatic driving is ensured.
A secondary branching model is deployed on GPU0, for example, an image processing model for dividing image data, an image processing model for classifying traffic lights, an image processing model for classifying taillights, and so forth.
Part or all of the image processing models in the heterogeneous processor are image processing models of first image data to be processed and are recorded as first target models.
For the different first object models, the requirements for the input image data are not uniform, for example, the first object model for classifying traffic light numbers is input as 64×64 pixel image data inside the traffic light, and for the more complex first object model for detecting the targets of pedestrians, vehicles, etc., the first object model is input as the original size (such as 1024×796) image data.
In the present embodiment, the first image data is transmitted to the heterogeneous processor by using the gap time, and then, in the detection period, the image processing rules set for the first target model, such as scaling, translation, rotation, etc., may be determined in the heterogeneous processor, so that the first image data is preprocessed according to the image processing rules to obtain the second image data.
On the other hand, heterogeneous processors have many processing cores (cores) and generally have high parallelism.
For example, as shown in fig. 5, the CPU and the GPU each have an arithmetic unit ALL, a Control unit Control, and a Cache unit Cache, where the duty ratio of the arithmetic unit ALL of the CPU is 25%, the duty ratio of the Control unit Control is 25%, the duty ratio of the Cache unit Cache is 50%, and the duty ratio of the arithmetic unit ALL of the GPU is 90%, the duty ratio of the Control unit Control is 5%, and the duty ratio of the Cache unit Cache is 5%, so that the CPU is suitable for a complex logic controlled scenario, while the GPU is suitable for a parallel computing, independent scenario.
The preprocessing such as scaling, translation, rotation and the like is migrated from the CPU to the heterogeneous processor for execution, and a processing core is provided for different operators for realizing the preprocessing, so that the preprocessing efficiency is greatly improved.
For example, as shown in fig. 6, in an example of processing an Image batch in batch, batch cropping operations may be performed on different regions of interest (egion of interest, ROI) on a batch of first Image images (e.g., image0, image1, image2, … …, image n) in parallel on heterogeneous processing.
On the other hand, the first image data is preprocessed by the heterogeneous processor where the first target model is located, so that the corresponding first target image is directly called in the same heterogeneous processor to process the preprocessed second image data, and the time cost of data transmission can be reduced.
And different image processing models share the same first image data, so that the situation of redundant copying is greatly reduced.
Further, if some heterogeneous processors do not support parallelism of some image processing models (such as TensorFlow), in the present embodiment, multiple Virtual heterogeneous processors (Virtual devices) may be built in the heterogeneous processors as multiple Virtual processors, each of which independently occupies a core implementation computation flow of the heterogeneous processor.
Therefore, in the multiple virtual processors, the first image data can be preprocessed for the first target model in parallel to obtain the second image data, so that preprocessing efficiency is ensured.
Step 204, in the heterogeneous processor, the first target model is called to process the second image data.
When all the first image data are preprocessed, the detection period is also finished, and at the moment, the corresponding first target model on the heterogeneous processor can be called to process the second image data.
In practical applications, different image processing models are inconsistent with respect to the input structuring type, for example, a bare GPU memory pointer is input by a TensorRT (image processing model), a Tensor declared by a TensorFlow is input by a TensorFlow, and so on, second image data is sorted according to the structuring type and input to the first object model for processing.
In this embodiment, in the same detection period, multiple cameras are sequentially called to collect multiple frames of first image data, in the detection period, each time the first image data is stored in the central processor, the first image data is copied from the central processor to the heterogeneous processor, in the detection period, the first image data is preprocessed for the first target model in the heterogeneous processor to obtain second image data, the first target model is an image processing model of the first image data to be processed, in the heterogeneous processor, the first target model is called to process the second image data, the characteristics of fusion of the cameras and other perception sensors in automatic driving are utilized, the gap time between the cameras collecting the first image data is used for copying and preprocessing the collected first image data, the copying and preprocessing are hidden and compressed in the detection period, so that the time cost of the whole perception flow is obviously reduced, the instantaneity is ensured, and the accuracy of automatic driving decision is ensured.
Example two
Fig. 7 is a flowchart of an image processing method according to a second embodiment of the present invention, where the method further includes the steps of:
in step 701, in the same detection period, a plurality of cameras are sequentially called to collect multi-frame first image data.
Step 702, copying the first image data from the central processor to the heterogeneous processor when each first image data is stored in the central processor in the detection period.
In step 703, in the detection period, the first image data is preprocessed for the first object model in the heterogeneous processor, so as to obtain the second image data.
The first target model is an image processing model of first image data to be processed.
Step 704, in the heterogeneous processor, the first object model is invoked to process the second image data.
Step 705, querying the current heterogeneous processor for the processor in which the second target model is located.
In the present embodiment, the image processing models deployed for each processor (CPU, heterogeneous processor, etc.) and the dependency relationship between the image processing models, that is, which image processing model output is which image processing model input, may be recorded by metadata, configuration information, etc.
When the third image data is obtained, that is, the image data output by any image processing model in the current heterogeneous processor, the current heterogeneous processor can query the processor in which the image processing model (that is, the second target model) of the third image data to be processed is located in a metadata mode, a configuration information mode and the like.
In step 706, if the processor is the current heterogeneous processor, the third image data is retained in the current heterogeneous processor, so as to be processed by the second object model to be called.
If the processor deploying the second object model is the current heterogeneous processor, the third image data may be retained in the memory of the current heterogeneous processor waiting for the second object model deployed on the current heterogeneous processor to be invoked to infer the third image data.
Step 707, if the processor is another heterogeneous processor, transmitting the third image data to the current heterogeneous processor, so as to call the second object model to process the third image data.
If the processor where the second object model is deployed is another heterogeneous processor, as shown in fig. 8, a copy operation may be performed between the current heterogeneous processor and the other heterogeneous processor by means of a memcpy function (memory copy function), and the like, denoted as D2D (Device to Device), that is, copying the third image data from the memory of the current heterogeneous processor to the memory of the other heterogeneous processor, waiting for invoking the second object model deployed on the other heterogeneous processor to infer information on the third image data.
Step 708, if the processor is a central processing unit, transmitting the third image data to the central processing unit, so as to call the second object model to process the third image data.
The CPU may provide a strong logic computing power, and thus an image processing model that partially relies on the logic computing power may be deployed on the CPU, for example, an image processing model for identifying whether a detection frame is horizontal or vertical, an image processing model for detecting a traffic light position, an image processing model for detecting a number, and so on.
If the processor deploying the second object model is a CPU, the third image data can be copied from the memory of the current heterogeneous processor to the memory of the CPU, waiting for the second object model deployed on the CPU to be called for deducing the third image data.
Example III
Fig. 9 is a flowchart of an image processing method according to a third embodiment of the present invention, where the method is based on the foregoing embodiment, and further includes the following steps:
step 901, sequentially calling a plurality of cameras to acquire multi-frame first image data in the same detection period.
In the detection period, each first image data is stored in the central processing unit, and the first image data is copied from the central processing unit to the heterogeneous processor.
In step 903, in the detection period, the first image data is preprocessed for the first object model in the heterogeneous processor, so as to obtain second image data.
The first target model is an image processing model of first image data to be processed.
Step 904, in the heterogeneous processor, the first target model is called to process the second image data.
Step 905, querying a central processor for a processor in which the third target model is located.
In the present embodiment, the image processing models deployed for each processor (CPU, heterogeneous processor, etc.) and the dependency relationship between the image processing models, that is, which image processing model output is which image processing model input, may be recorded by metadata, configuration information, etc.
When the fourth image data, that is, the image data output by any one of the image processing models in the central processing unit, the CPU may query, by means of metadata, configuration information, or the like, the processor in which the image processing model (that is, the third target model) to be processed for the fourth image data is located.
Step 906, if the processor is a central processing unit, the fourth image data is retained in the central processing unit, so as to call the third object model to process the fourth image data.
If the processor for deploying the third object model is a CPU, the fourth image data can be kept in the memory of the CPU, and the third object model deployed on the CPU is waited for being invoked to infer the fourth image data.
In step 907, if the processor is a heterogeneous processor, the fourth image data is transmitted to the heterogeneous processor, so as to call the third object model to process the fourth image data.
If the processor deploying the third object model is a heterogeneous processor, the fourth image data may be copied from the memory of the CPU to the memory of the heterogeneous processor by performing an H2D operation, waiting for the third object model deployed on the heterogeneous processor to be invoked to infer the fourth image data.
It should be noted that, the H2D operation may directly use the memory to transmit the fourth image data, and does not use the page-locked memory to transmit the fourth image data.
It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.
Example III
Fig. 10 is a block diagram of an image processing apparatus according to a third embodiment of the present invention, which may specifically include the following modules:
the image data acquisition module 1001 is configured to sequentially invoke a plurality of cameras to acquire a plurality of frames of first image data in the same detection period;
a cycle copy module 1002 for copying the first image data from the central processor to heterogeneous processors every time the first image data is stored to the central processor in the detection cycle;
a period preprocessing module 1003, configured to preprocess, in the heterogeneous processor, the first image data for a first target model in the detection period to obtain second image data, where the first target model is an image processing model for the first image data to be processed;
an image data inference module 1004, configured to invoke the first object model to process the second image data in the heterogeneous processor.
In one embodiment of the present invention, the image data acquisition module 1001 includes:
the radar rotation module is used for controlling the rotation of the laser radar, collecting point cloud data in the rotation process, and the rotation of the laser radar is a detection period;
And the synchronous acquisition module is used for calling the camera to acquire the first image data when the angle of the laser radar is matched with the angle of the camera.
In one embodiment of the invention, the cycle copy module 1002 includes:
the lock page memory allocation module is used for allocating lock page memories readable by the heterogeneous processor for each camera in the central processing unit;
and the page-locking memory copying module is used for copying the first image data from the page-locking memory to the heterogeneous processor every time the first image data acquired by the camera is stored in the page-locking memory corresponding to the camera in the detection period.
In one embodiment of the present invention, the lock page memory allocation module includes:
the volume query module is used for querying the volume of the first image data acquired by the cameras for each camera;
and the volume matching application module is used for applying for the lock page memory which is matched with the volume and readable by the heterogeneous processor for the camera in the central processing unit.
In one embodiment of the present invention, the period preprocessing module 1003 includes:
an image processing rule determining module configured to determine, in the heterogeneous processor, an image processing rule set for a first target model in the detection period;
And the rule processing module is used for preprocessing the first image data according to the image processing rule to obtain second image data.
In another embodiment of the present invention, the period preprocessing module 1003 includes:
a virtual processor construction module, configured to construct, in the heterogeneous processor, a plurality of virtual heterogeneous processors as a plurality of virtual processors, each of the virtual processors independently occupying cores of the heterogeneous processor;
and the virtual processor processing module is used for preprocessing the first image data for the first target model in parallel in the plurality of virtual processors to obtain second image data.
In another embodiment of the present invention, the image processing model is divided into a main branch model and a sub branch model;
the main branch model at a downstream node depends on the main branch model at an upstream node;
the primary branch model depends on the secondary branch model;
the primary branch model is deployed in the same heterogeneous processor, and the secondary branch model is deployed in other heterogeneous processors.
In one embodiment of the present invention, further comprising:
The first processor query module is used for querying a processor in which a second target model is located in the current heterogeneous processor, wherein the second target model is an image processing model of third image data to be processed, and the third image data is image data output by any image processing model in the current heterogeneous processor;
the first retaining module is used for retaining the third image data in the current heterogeneous processor if the processor is the current heterogeneous processor, so as to call the second target model to process the third image data;
the first heterogeneous copy module is configured to transmit the third image data to the other heterogeneous processors if the processor is the other heterogeneous processors, so as to call the second target model to process the third image data;
and the host copying module is used for transmitting the third image data to the central processing unit if the processor is the central processing unit so as to call the second target model to process the third image data.
In one embodiment of the present invention, further comprising:
the second processor query module is used for querying a processor in which a third target model is located in the central processing unit, wherein the third target model is an image processing model of fourth image data to be processed, and the fourth image data is image data output by any image processing model in the central processing unit;
The second retaining module is used for retaining the fourth image data in the central processing unit if the processor is the central processing unit so as to call the third target model to process the fourth image data;
and the second heterogeneous copy module is used for transmitting the fourth image data to the heterogeneous processor if the processor is the heterogeneous processor so as to call the third target model to process the fourth image data.
The image processing device provided by the embodiment of the invention can execute the image processing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example IV
Fig. 11 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention. FIG. 11 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in fig. 11 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in FIG. 11, the computer device 12 is in the form of a general purpose computing device. Components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.
Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. The computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 11, commonly referred to as a "hard disk drive"). Although not shown in fig. 11, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.
The computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with the computer device 12, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Moreover, computer device 12 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through network adapter 20. As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with computer device 12, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing the image processing method provided by the embodiment of the present invention.
Example five
The fifth embodiment of the present invention further provides a computer readable storage medium, on which a computer program is stored, where the computer program when executed by a processor implements each process of the image processing method described above, and the same technical effects can be achieved, so that repetition is avoided, and no further description is given here.
The computer readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.
Claims (11)
1. An image processing method, comprising:
in the same detection period, sequentially calling a plurality of cameras to acquire multi-frame first image data;
copying the first image data from the central processor to a heterogeneous processor every time the first image data is stored to the central processor in the detection period;
in the detection period, preprocessing the first image data for a first target model in the heterogeneous processor to obtain second image data, wherein the first target model is an image processing model for processing the first image data;
In the heterogeneous processor, invoking the first target model to process the second image data;
the image processing model is divided into a main branch model and a secondary branch model;
the main branch model at a downstream node depends on the main branch model at an upstream node;
the primary branch model depends on the secondary branch model;
the primary branch model is deployed in the same heterogeneous processor, and the secondary branch model is deployed in other heterogeneous processors.
2. The method according to claim 1, wherein sequentially invoking the plurality of cameras to acquire the plurality of frames of the first image data in the same detection period comprises:
controlling the laser radar to rotate, and collecting point cloud data in the rotating process, wherein the rotation of the laser radar is a detection period;
and when the angle of the laser radar is matched with the angle of the camera, invoking the camera to acquire first image data.
3. The method of claim 1, wherein the copying the first image data from the central processor to a heterogeneous processor each time the first image data is stored to the central processor in the detection period comprises:
Distributing a lock page memory readable by a heterogeneous processor for each camera in a central processing unit;
and in the detection period, when the first image data acquired by the camera is stored in the page lock memory corresponding to the camera, copying the first image data from the page lock memory to the heterogeneous processor.
4. The method of claim 3, wherein said allocating, in the central processor, heterogeneous processor-readable lock page memory for each of said cameras comprises:
inquiring the volume of the first image data acquired by the cameras for each camera;
and applying for the lock page memory which is matched with the volume and readable by the heterogeneous processor for the camera in the central processing unit.
5. The method of claim 1, wherein the preprocessing the first image data for a first object model in the heterogeneous processor during the detection period to obtain second image data comprises:
in the detection period, determining, in the heterogeneous processor, an image processing rule set for a first target model;
and preprocessing the first image data according to the image processing rule to obtain second image data.
6. The method of claim 1, wherein the preprocessing the first image data for a first object model in the heterogeneous processor during the detection period to obtain second image data comprises:
in the detection period, constructing a plurality of virtual heterogeneous processors in the heterogeneous processors, wherein each virtual processor independently occupies the cores of the heterogeneous processors as a plurality of virtual processors;
and in the plurality of virtual processors, preprocessing the first image data for the first target model in parallel to obtain second image data.
7. The method according to any one of claims 1-6, further comprising:
inquiring a processor in which a second target model is located in the current heterogeneous processor, wherein the second target model is an image processing model of third image data to be processed, and the third image data is image data output by any image processing model in the current heterogeneous processor;
if the processor is the current heterogeneous processor, the third image data is reserved in the current heterogeneous processor, so that the second target model is to be called to process the third image data;
If the processor is the other heterogeneous processor, transmitting the third image data to the other heterogeneous processor so as to call the second target model to process the third image data;
and if the processor is the central processor, transmitting the third image data to the central processor so as to call the second target model to process the third image data.
8. The method according to any one of claims 1-6, further comprising:
inquiring a processor in which a third target model is located in the central processing unit, wherein the third target model is an image processing model of fourth image data to be processed, and the fourth image data is image data output by any image processing model in the central processing unit;
if the processor is the central processor, the fourth image data is reserved in the central processor so as to be called to the third target model to process the fourth image data;
and if the processor is the heterogeneous processor, transmitting the fourth image data to the heterogeneous processor so as to call the third target model to process the fourth image data.
9. An image processing apparatus, comprising:
the image data acquisition module is used for sequentially calling a plurality of cameras to acquire multi-frame first image data in the same detection period;
a cycle copy module for copying the first image data from the central processor to a heterogeneous processor every time the first image data is stored to the central processor in the detection cycle;
the period preprocessing module is used for preprocessing the first image data for a first target model in the heterogeneous processor in the detection period to obtain second image data, wherein the first target model is an image processing model for the first image data to be processed;
the image data deducing module is used for calling the first target model to process the second image data in the heterogeneous processor;
the image processing model is divided into a main branch model and a secondary branch model;
the main branch model at a downstream node depends on the main branch model at an upstream node;
the primary branch model depends on the secondary branch model;
the primary branch model is deployed in the same heterogeneous processor, and the secondary branch model is deployed in other heterogeneous processors.
10. A computer device, the computer device comprising:
one or more processors;
a memory for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the image processing method of any of claims 1-8.
11. A computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the image processing method according to any one of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110875653.4A CN113610135B (en) | 2021-07-30 | 2021-07-30 | Image processing method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110875653.4A CN113610135B (en) | 2021-07-30 | 2021-07-30 | Image processing method, device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113610135A CN113610135A (en) | 2021-11-05 |
CN113610135B true CN113610135B (en) | 2024-04-02 |
Family
ID=78306313
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110875653.4A Active CN113610135B (en) | 2021-07-30 | 2021-07-30 | Image processing method, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113610135B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8179282B1 (en) * | 2007-11-30 | 2012-05-15 | Cal Poly Corporation | Consensus based vehicle detector verification system |
CN110430444A (en) * | 2019-08-12 | 2019-11-08 | 北京中科寒武纪科技有限公司 | A kind of video stream processing method and system |
CN110781849A (en) * | 2019-10-30 | 2020-02-11 | 北京锐安科技有限公司 | Image processing method, device, equipment and storage medium |
CN112214627A (en) * | 2019-07-12 | 2021-01-12 | 上海赜睿信息科技有限公司 | Search method, readable storage medium and electronic device |
-
2021
- 2021-07-30 CN CN202110875653.4A patent/CN113610135B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8179282B1 (en) * | 2007-11-30 | 2012-05-15 | Cal Poly Corporation | Consensus based vehicle detector verification system |
CN112214627A (en) * | 2019-07-12 | 2021-01-12 | 上海赜睿信息科技有限公司 | Search method, readable storage medium and electronic device |
CN110430444A (en) * | 2019-08-12 | 2019-11-08 | 北京中科寒武纪科技有限公司 | A kind of video stream processing method and system |
CN110781849A (en) * | 2019-10-30 | 2020-02-11 | 北京锐安科技有限公司 | Image processing method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113610135A (en) | 2021-11-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112540671B (en) | Remote operation of vision-based smart robotic system | |
US11494370B2 (en) | Hardware-controlled updating of a physical operating parameter for in-field fault detection | |
JP2024109937A (en) | Variational Grasp Generation | |
CN114902292A (en) | Determining object orientation from images using machine learning | |
JP2022538813A (en) | Intersection Region Detection and Classification for Autonomous Machine Applications | |
CN112824061A (en) | Guiding uncertainty-awareness policy optimization: combining model-free and model-based strategies for efficient sample learning | |
CN114556372A (en) | Processor and system for transforming tensor operations in machine learning | |
CN114503158A (en) | Neural network for image registration and image segmentation using registration simulator training | |
DE112020004302T5 (en) | TRAINING STRATEGY SEARCH USING REINFORCEMENT LEARNING | |
JP2023502575A (en) | Training and inference using neural networks to predict the orientation of objects in images | |
JP2023500608A (en) | Processor and system for identifying out-of-distribution input data in neural networks | |
CN115271061A (en) | Dynamic weight update for neural networks | |
CN114970852A (en) | Generating frames of neural simulations using one or more neural networks | |
CN114556823B (en) | Parallel CRC Implementation on Image Processing Unit | |
CN114556420A (en) | Image alignment neural network | |
CN114766034A (en) | Object detection system based on machine learning | |
CN114596250A (en) | Object detection and collision avoidance using neural networks | |
CN112215053B (en) | Multi-sensor multi-object tracking | |
JP2022115021A (en) | Machine learning techniques to improve video conferencing applications | |
CN115053236A (en) | Techniques for training and reasoning using multiple processor resources | |
JP2023031237A (en) | Object Tracking Using LiDAR Data for Autonomous Machine Applications | |
CN114764611A (en) | Parallel execution of non-maxima suppression | |
CN115855022A (en) | Performing autonomous path navigation using deep neural networks | |
CN115130667A (en) | Access tensor | |
CN115438783A (en) | Neural network classification technique |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |