WO2020238008A1 - 运动物体检测及智能驾驶控制方法、装置、介质及设备 - Google Patents
运动物体检测及智能驾驶控制方法、装置、介质及设备 Download PDFInfo
- Publication number
- WO2020238008A1 WO2020238008A1 PCT/CN2019/114611 CN2019114611W WO2020238008A1 WO 2020238008 A1 WO2020238008 A1 WO 2020238008A1 CN 2019114611 W CN2019114611 W CN 2019114611W WO 2020238008 A1 WO2020238008 A1 WO 2020238008A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- processed
- disparity
- value
- weight distribution
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 94
- 238000001514 detection method Methods 0.000 title claims abstract description 51
- 230000033001 locomotion Effects 0.000 claims abstract description 123
- 230000003287 optical effect Effects 0.000 claims abstract description 45
- 238000004590 computer program Methods 0.000 claims abstract description 24
- 238000003860 storage Methods 0.000 claims abstract description 14
- 238000009826 distribution Methods 0.000 claims description 179
- 238000012545 processing Methods 0.000 claims description 104
- 238000013527 convolutional neural network Methods 0.000 claims description 88
- 238000010586 diagram Methods 0.000 claims description 46
- 230000008859 change Effects 0.000 claims description 33
- 238000012549 training Methods 0.000 claims description 25
- 230000008569 process Effects 0.000 claims description 18
- 238000004458 analytical method Methods 0.000 claims description 16
- 230000009466 transformation Effects 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 230000001131 transforming effect Effects 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims 2
- 238000004891 communication Methods 0.000 description 16
- 238000005516 engineering process Methods 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000013519 translation Methods 0.000 description 8
- 238000006073 displacement reaction Methods 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000003384 imaging method Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 239000013598 vector Substances 0.000 description 4
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 3
- 238000013459 approach Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000010276 construction Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/269—Analysis of motion using gradient-based methods
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0231—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
- G05D1/0246—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means
- G05D1/0251—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means extracting 3D information from a plurality of images taken from different locations, e.g. stereo vision
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W30/00—Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
- B60W30/08—Active safety systems predicting or avoiding probable or impending collision or attempting to minimise its consequences
- B60W30/09—Taking automatic action to avoid collision, e.g. braking and steering
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W30/00—Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
- B60W30/08—Active safety systems predicting or avoiding probable or impending collision or attempting to minimise its consequences
- B60W30/095—Predicting travel path or likelihood of collision
- B60W30/0956—Predicting travel path or likelihood of collision the prediction being responsive to traffic or environmental parameters
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0214—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory in accordance with safety or protection criteria, e.g. avoiding hazardous areas
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0221—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0223—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0276—Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0276—Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
- G05D1/0278—Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle using satellite positioning signals, e.g. GPS
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/18—Image warping, e.g. rearranging pixels individually
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/215—Motion-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/248—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/285—Analysis of motion using a sequence of stereo image pairs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
- G06T7/579—Depth or shape recovery from multiple images from motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
- G06T7/593—Depth or shape recovery from multiple images from stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2420/00—Indexing codes relating to the type of sensors based on the principle of their operation
- B60W2420/40—Photo, light or radio wave sensitive means, e.g. infrared sensors
- B60W2420/403—Image sensing, e.g. optical camera
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/08—Indexing scheme for image data processing or generation, in general involving all processing steps from image acquisition to 3D model generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
- G06T2207/10021—Stereoscopic video; Stereoscopic image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20072—Graph-based image processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20228—Disparity calculation for image-based rendering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30248—Vehicle exterior or interior
- G06T2207/30252—Vehicle exterior; Vicinity of vehicle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30248—Vehicle exterior or interior
- G06T2207/30252—Vehicle exterior; Vicinity of vehicle
- G06T2207/30261—Obstacle
Definitions
- the present disclosure relates to computer vision technology, and in particular to a moving object detection method, a moving object detection device, an intelligent driving control method, an intelligent driving control device, electronic equipment, a computer-readable storage medium, and a computer program.
- Perceived moving objects and their moving directions can be provided to the decision-making layer so that the decision-making layer can make decisions based on the perception results.
- the decision-making layer can control the vehicle to slow down or even stop to ensure the safe driving of the vehicle.
- the embodiments of the present disclosure provide a moving object detection technical solution.
- a method for detecting a moving object includes: acquiring depth information of pixels in an image to be processed; acquiring optical flow information between the image to be processed and a reference image; wherein, The reference image and the image to be processed are two images with a time-series relationship obtained based on continuous shooting by the camera; according to the depth information and optical flow information, the pixels in the image to be processed are obtained relative to the The three-dimensional sports field of the reference image; according to the three-dimensional sports field, the moving object in the image to be processed is determined.
- an intelligent driving control method including: acquiring a video stream of the road where the vehicle is located through a camera device provided on the vehicle; using the above-mentioned moving object detection method, the video stream includes At least one video frame of the video frame is detected to determine the moving object in the video frame; the vehicle control instruction is generated and output according to the moving object.
- a moving object detection device including: a first acquisition module for acquiring depth information of pixels in an image to be processed; a second acquisition module for acquiring the image to be processed The optical flow information between the reference image and the reference image; wherein the reference image and the image to be processed are two images with a time series relationship obtained based on continuous shooting by the camera; the third acquisition module is used for Depth information and optical flow information are used to obtain the three-dimensional motion field of the pixels in the image to be processed relative to the reference image; the determining moving object module is used to determine the moving object in the image to be processed according to the three-dimensional motion field.
- an intelligent driving control device which includes: a fourth acquisition module for acquiring a video stream of the road where the vehicle is located through a camera device provided on the vehicle; the above-mentioned moving object detection device , For detecting a moving object in at least one video frame included in the video stream, and determining a moving object in the video frame; a control module, for generating and outputting a control instruction of the vehicle according to the moving object.
- an electronic device including: a processor, a memory, a communication interface, and a communication bus.
- the processor, the memory, and the communication interface complete each other via the communication bus.
- Inter-communication; the memory is used to store at least one executable instruction, the executable instruction causes the processor to execute the above method.
- a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, it implements any method embodiment of the present disclosure.
- a computer program including computer instructions, which, when the computer instructions run in the processor of the device, implement any method embodiment of the present disclosure.
- the present disclosure uses the depth information of pixels in the image to be processed and the relationship between the image to be processed and the reference image.
- the optical flow information can obtain the three-dimensional motion field of the pixels in the image to be processed relative to the reference image. Since the three-dimensional motion field can reflect the moving object, the present disclosure can use the three-dimensional motion field to determine the moving object in the image to be processed. It can be seen from this that the technical solution provided by the present disclosure is beneficial to improve the accuracy of sensing moving objects, thereby helping to improve the safety of intelligent driving of the vehicle.
- FIG. 1 is a flowchart of an embodiment of the moving object detection method of the present disclosure
- Figure 2 is a schematic diagram of an image to be processed in the present disclosure
- FIG. 3 is a schematic diagram of an embodiment of the first disparity map of the image to be processed shown in FIG. 2;
- FIG. 4 is a schematic diagram of an embodiment of the first disparity map of the image to be processed in the present disclosure
- FIG. 5 is a schematic diagram of an embodiment of the convolutional neural network of the present disclosure.
- FIG. 6 is a schematic diagram of an embodiment of the first weight distribution diagram of the first disparity map of the present disclosure
- FIG. 7 is a schematic diagram of another embodiment of the first weight distribution diagram of the first disparity map of the present disclosure.
- FIG. 8 is a schematic diagram of an embodiment of the second weight distribution diagram of the first disparity map of the present disclosure.
- FIG. 9 is a schematic diagram of an embodiment of the third disparity map of the present disclosure.
- FIG. 10 is a schematic diagram of an implementation manner of the second weight distribution diagram of the third disparity map shown in FIG. 9;
- FIG. 11 is a schematic diagram of an embodiment of the present disclosure to optimize and adjust the first disparity map of the image to be processed
- FIG. 12 is a schematic diagram of an embodiment of the three-dimensional coordinate system of the present disclosure.
- FIG. 13 is a schematic diagram of an embodiment of the reference image and the image after Warp processing in the present disclosure
- FIG. 14 is a schematic diagram of an embodiment of the Warp processed image, the image to be processed, and the optical flow diagram of the image to be processed relative to the reference image of the present disclosure
- 15 is a schematic diagram of an embodiment of the image to be processed and its motion mask of the present disclosure
- FIG. 16 is a schematic diagram of an embodiment of a moving object detection frame formed by the present disclosure.
- FIG. 17 is a flowchart of an embodiment of the convolutional neural network training method of the present disclosure.
- FIG. 19 is a schematic structural diagram of an embodiment of the moving object detection device of the present disclosure.
- 20 is a schematic structural diagram of an embodiment of the intelligent driving control device of the present disclosure.
- Fig. 21 is a block diagram of an exemplary device for implementing the embodiments of the present disclosure.
- the embodiments of the present disclosure can be applied to electronic devices such as terminal devices, computer systems, and servers, which can operate with many other general or special computing system environments or configurations.
- Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, and servers including but not limited to: personal computer systems, server computer systems, thin clients, thick Client computers, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, small computer systems, large computer systems, and distributed cloud computing technology environments including any of the above systems, etc. .
- Electronic devices such as terminal devices, computer systems, and servers can be described in the general context of computer system executable instructions (such as program modules) executed by the computer system.
- program modules can include routines, programs, target programs, components, logic, and data structures, etc., which perform specific tasks or implement specific abstract data types.
- the computer system/server can be implemented in a distributed cloud computing environment. In the distributed cloud computing environment, tasks are executed by remote processing equipment linked through a communication network.
- program modules may be located on a storage medium of a local or remote computing system including a storage device.
- FIG. 1 is a flowchart of an embodiment of the moving object detection method of the present disclosure. As shown in FIG. 1, the method of this embodiment includes: step S100, step S110, step S120, and step S130. The steps are described in detail below.
- the present disclosure may use the disparity map of the image to be processed to obtain the depth information of the pixels (such as all pixels) in the image to be processed. That is, first, the disparity map of the image to be processed is acquired, and then, according to the disparity map of the image to be processed, the depth information of the pixels in the image to be processed is acquired.
- the disparity map of the image to be processed is referred to as the first disparity map of the image to be processed in the following.
- the first disparity map in the present disclosure is used to describe the disparity of the image to be processed. Parallax can be considered as the difference in the position of the target object when observing the same target object from two points at a certain distance.
- An example of the image to be processed is shown in Figure 2.
- An example of the first disparity map of the image to be processed shown in FIG. 2 is shown in FIG. 3.
- the first disparity map of the image to be processed in the present disclosure may also be expressed in the form shown in FIG. 4.
- Each number in FIG. 4 (such as 0, 1, 2, 3, 4, 5, etc.) respectively represents: the disparity of the pixel at the position (x, y) in the image to be processed. It should be particularly noted that FIG. 4 does not show a complete first disparity map.
- the image to be processed in the present disclosure is usually a monocular image. That is, the image to be processed is usually an image obtained by shooting with a monocular camera device.
- the present disclosure can realize moving object detection without the need to provide a binocular camera device, thereby helping to reduce the cost of moving object detection.
- the present disclosure may use a convolutional neural network successfully trained in advance to obtain the first disparity map of the image to be processed.
- the image to be processed is input into a convolutional neural network, the image to be processed is subjected to disparity analysis processing via the convolutional neural network, and the convolutional neural network outputs the disparity analysis processing result, so that the present disclosure can obtain based on the disparity analysis processing result
- the first disparity map of the image to be processed By using the convolutional neural network to obtain the first disparity map of the image to be processed, the disparity map can be obtained without using two images for pixel-by-pixel disparity calculation and camera calibration. It is helpful to improve the convenience and real-time performance of obtaining the disparity map.
- the convolutional neural network in the present disclosure generally includes but is not limited to: multiple convolutional layers (Conv) and multiple deconvolutional layers (Deconv).
- the convolutional neural network of the present disclosure can be divided into two parts, namely an encoding part and a decoding part.
- the image to be processed input to the convolutional neural network (the image to be processed as shown in Figure 2) is encoded by the encoding part (ie feature extraction processing), and the encoding processing result of the encoding part is provided to the decoding part,
- the decoding part decodes the encoding processing result and outputs the decoding processing result.
- the present disclosure can obtain the first disparity map (the disparity map shown in FIG.
- the coding part in the convolutional neural network includes but is not limited to: multiple convolutional layers, and multiple convolutional layers are connected in series.
- the decoding part in the convolutional neural network includes, but is not limited to: multiple convolutional layers and multiple deconvolutional layers, and multiple convolutional layers and multiple deconvolutional layers are arranged at intervals and connected in series.
- FIG. 5 An example of the convolutional neural network of the present disclosure is shown in FIG. 5.
- the first rectangle on the left represents the image to be processed in the input convolutional neural network
- the first rectangle on the right represents the disparity map output by the convolutional neural network.
- Each rectangle from the second rectangle to the 15th rectangle on the left represents a convolutional layer
- all the rectangles from the 16th rectangle on the left to the second rectangle on the right represent deconvolution layers and convolutions set apart from each other
- the 16th rectangle on the left represents the deconvolution layer
- the 17th rectangle on the left represents the convolution layer
- the 18th rectangle on the left represents the deconvolution layer
- the 19th rectangle on the left represents the convolution layer.
- the convolutional neural network of the present disclosure may merge the low-level information and high-level information in the convolutional neural network by means of skip connection.
- the output of at least one convolutional layer in the encoding part is provided to at least one deconvolutional layer in the decoding part through a jump connection.
- the input of all convolutional layers in the convolutional neural network usually includes: the output of the previous layer (such as a convolutional layer or a deconvolutional layer), and at least one deconvolutional layer (such as The input of a partial deconvolution layer or all deconvolution layers) includes: the upsample (Upsample) result of the output of the previous convolution layer and the output of the convolution layer of the coding part jump connected to the deconvolution layer.
- the content pointed by the solid arrow drawn from below the convolution layer on the right side of Figure 5 represents the output of the previous convolution layer
- the dotted arrow in Figure 5 represents the upsampling result provided to the deconvolution layer.
- the solid arrow drawn above the convolutional layer on the left side of FIG. 5 represents the output of the convolutional layer jump-connected to the deconvolutional layer.
- the present disclosure does not limit the number of jump connections and the network structure of the convolutional neural network.
- the present disclosure helps to improve the accuracy of the disparity map generated by the convolutional neural network by fusing the low-level information and the high-level information in the convolutional neural network.
- the convolutional neural network of the present disclosure is obtained by training using binocular image samples. For the training process of the convolutional neural network, refer to the description in the following embodiments. I will not elaborate on it here.
- the present disclosure may also optimize and adjust the first disparity map of the image to be processed obtained by using the convolutional neural network, so as to obtain a more accurate first disparity map.
- the present disclosure may use the disparity map of the horizontal mirror image (for example, the left mirror image or the right mirror image) of the image to be processed to optimize and adjust the first disparity map of the image to be processed.
- the horizontal mirror image of the image to be processed is referred to as the first horizontal mirror image
- the disparity image of the first horizontal mirror image is referred to as the second disparity image.
- a specific example of optimizing and adjusting the first disparity map in the present disclosure is as follows:
- Step A Obtain a horizontal mirror image of the second disparity image.
- the first horizontal mirror image in the present disclosure is intended to indicate that the mirror image is a mirror image formed by performing a horizontal mirror image processing (not a vertical mirror image processing) on the image to be processed.
- the horizontal mirror image of the second disparity map is referred to as the second horizontal mirror image below.
- the second horizontal mirror image in the present disclosure refers to a mirror image formed after the second disparity image is mirrored in the horizontal direction. The second horizontal mirror image is still a disparity image.
- the present disclosure may first perform left mirror processing or right mirror processing on the image to be processed (because the left mirror processing result is the same as the right mirror processing result, therefore, the present disclosure may perform left mirror processing or right mirror processing on the image to be processed) , Obtain the first horizontal mirror image; then, obtain the disparity map of the first horizontal mirror image; finally, perform left mirror processing or right mirror processing on the second disparity map (because the left mirror processing result of the second disparity map is The right mirror image processing results are the same. Therefore, the present disclosure may perform left mirror processing or right mirror processing on the second disparity map to obtain the second horizontal mirror image.
- the second horizontal mirror image is referred to as the third disparity image below.
- the present disclosure when performing horizontal mirroring processing on the image to be processed in the present disclosure, it is not necessary to consider whether the image to be processed is mirrored as a left-eye image or as a right-eye image. That is, regardless of whether the image to be processed is used as a left-eye image or a right-eye image, the present disclosure can perform left mirror processing or right mirror processing on the image to be processed, thereby obtaining the first horizontal mirror image.
- the present disclosure when performing horizontal mirroring processing on the second disparity map, it may not consider whether the second disparity map should be left mirrored or whether the second disparity map should be mirrored right.
- the convolutional neural network used to generate the first disparity map of the image to be processed if the left-eye image sample in the binocular image sample is used as input, it is provided to the convolutional neural network for training, Then, the successfully trained convolutional neural network will use the input image to be processed as the left-eye image in testing and practical applications, that is, the image to be processed in the present disclosure is used as the left-eye image to be processed.
- the successfully trained convolutional neural network will be the input to-be-processed image as the right eye image in testing and practical applications
- the image to be processed in the present disclosure is regarded as the right eye image to be processed.
- the present disclosure may also use the aforementioned convolutional neural network to obtain the second disparity map.
- the first horizontal mirror image is input to a convolutional neural network, and the first horizontal mirror image is subjected to disparity analysis processing via the convolutional neural network, and the convolutional neural network outputs the disparity analysis processing result, so that the present disclosure can be based on the output
- the result of the disparity analysis processing is to obtain the second disparity map.
- Step B Obtain the weight distribution map of the disparity map (ie, the first disparity map) and the weight distribution map of the second horizontal mirror image (ie, the third disparity map) of the image to be processed.
- the weight distribution map of the first disparity map is used to describe the respective weight values of multiple disparity values (for example, all disparity values) in the first disparity map.
- the weight distribution map of the first disparity map may include, but is not limited to: a first weight distribution map of the first disparity map and a second weight distribution map of the first disparity map.
- the first weight distribution map of the first disparity map is a weight distribution map set uniformly for the first disparity maps of a plurality of different images to be processed, that is, the first weight distribution map of the first disparity map can be multi-oriented.
- the first disparity maps of different images to be processed that is, the first disparity maps of different images to be processed use the same first weight distribution map. Therefore, the present disclosure can change the first weight distribution map of the first disparity map It is called the global weight distribution map of the first disparity map.
- the global weight distribution map of the first disparity map is used to describe the global weight values corresponding to multiple disparity values (such as all disparity values) in the first disparity map.
- the second weight distribution map of the first disparity map is a weight distribution map set for the first disparity map of a single image to be processed, that is, the second weight distribution map of the first disparity map is for a single image to be processed
- the first disparity map that is, the first disparity map of different images to be processed uses a different second weight distribution map. Therefore, the present disclosure may refer to the second weight distribution map of the first disparity map as the first The local weight distribution map of the disparity map.
- the local weight distribution map of the first disparity map is used to describe the respective local weight values of multiple disparity values (such as all disparity values) in the first disparity map.
- the weight distribution map of the third disparity map is used to describe the respective weight values of the multiple disparity values in the third disparity map.
- the weight distribution map of the third disparity map may include, but is not limited to: the first weight distribution map of the third disparity map and the second weight distribution map of the third disparity map.
- the first weight distribution map of the third disparity map is a weight distribution map set uniformly for the third disparity maps of multiple different images to be processed, that is, the first weight distribution map of the third disparity map faces multiple
- the third disparity map of different images to be processed that is, the third disparity map of different images to be processed uses the same first weight distribution map, therefore, the present disclosure can distribute the first weight of the third disparity map
- the graph is called the global weight distribution graph of the third disparity graph.
- the global weight distribution map of the third disparity map is used to describe the respective global weight values of multiple disparity values (such as all disparity values) in the third disparity map.
- the second weight distribution map of the third disparity map is a weight distribution map set for the third disparity map of a single image to be processed, that is, the second weight distribution map of the third disparity map is for a single image to be processed
- the third disparity map of the third disparity map that is, the third disparity map of different images to be processed uses a different second weight distribution map. Therefore, the present disclosure may refer to the second weight distribution map of the third disparity map as the third The local weight distribution map of the disparity map.
- the local weight distribution map of the third disparity map is used to describe the respective local weight values of multiple disparity values (such as all disparity values) in the third disparity map.
- the first weight distribution map of the first disparity map includes: at least two left and right separated regions, and different regions have different weight values.
- the magnitude relationship between the weight value of the area on the left and the weight value of the area on the right is usually related to whether the image to be processed is used as the left-eye image to be processed or the right-eye image to be processed.
- FIG. 6 is a first weight distribution diagram of the disparity map shown in FIG. 3, and the first weight distribution diagram is divided into five regions, namely, region 1, region 2, region 3, region 4, and region 5 shown in FIG. 6 .
- the weight value of area 1 is less than the weight value of area 2
- the weight value of area 2 is less than the weight value of area 3
- the weight value of area 3 is less than the weight value of area 4
- the weight value of area 4 is less than the weight value of area 5.
- any region in the first weight distribution map of the first disparity map may have the same weight value, or may have different weight values.
- the weight value on the left side of the region is usually not greater than the weight value on the right side of the region.
- the weight value of area 2 can be from left to right, by 0 gradually increases and approaches 0.5; the weight value of region 3 is 0.5; the weight value of region 4 can be from the left to the right, gradually increasing from a value greater than 0.5 and approaches 1; the weight value of region 5 is 1, That is, in the first disparity map, the disparity corresponding to area 5 is completely credible.
- FIG. 7 shows the first weight distribution diagram of the image to be processed as the first disparity map of the right eye image.
- the first weight distribution diagram is divided into five regions, namely, region 1, region 2, region 3, Area 4 and Area 5.
- the weight value of area 5 is less than the weight value of area 4
- the weight value of area 4 is less than the weight value of area 3
- the weight value of area 3 is less than the weight value of area 2
- the weight value of area 2 is less than the weight value of area 1.
- any region in the first weight distribution map of the first disparity map may have the same weight value, or may have different weight values.
- the weight value on the right side in the region is usually not greater than the weight value on the left side in the region.
- the weight value of area 5 in Figure 7 can be 0, that is, in the first disparity map, the disparity corresponding to area 5 is completely unreliable; the weight value of area 4 can be from right to left, from 0 Gradually increase and approach 0.5; the weight value of area 3 is 0.5; the weight value of area 2 can be from the right to the left, gradually increasing from a value greater than 0.5 and approaching 1; the weight value of area 1 is 1, that is In the first disparity map, the disparity corresponding to area 1 is completely credible.
- the first weight distribution map of the third disparity map includes: at least two left and right separated regions, and different regions have different weight values.
- the relationship between the weight value of the area on the left and the weight value of the area on the right is usually related to whether the image to be processed is used as the left-eye image or the right-eye image.
- the weight value of the area on the right is greater than the weight of the area on the left value.
- any region in the first weight distribution map of the third disparity map may have the same weight value, or may have different weight values. In the case that a region in the first weight distribution map of the third disparity map has different weight values, the weight value on the left side in the region is usually not greater than the weight value on the right side in the region.
- the weight value of the region on the left is greater than that of the region on the right.
- Weights any region in the first weight distribution map of the third disparity map may have the same weight value, or may have different weight values.
- the weight value on the right side in the region is usually not greater than the weight value on the left side in the region.
- the manner of setting the second weight distribution map of the first disparity map may include the following steps:
- horizontal mirror processing for example, left mirror processing or right mirror processing
- the following is referred to as the fourth disparity map.
- the second weight distribution map of the first disparity map of the image to be processed The weight value of the pixel in is set to the first value, otherwise, the weight value of the pixel is set to the second value.
- the first value in this disclosure is greater than the second value. For example, the first value is 1 and the second value is 0.
- FIG. 8 an example of the second weight distribution map of the first disparity map is shown in FIG. 8.
- the weight values of the white areas in FIG. 8 are all 1, which indicates that the disparity value at this position is completely reliable.
- the weight value of the black area in FIG. 8 is 0, which means that the disparity value at this position is completely unreliable.
- the first variable corresponding to the pixel in the present disclosure may be set according to the disparity value of the corresponding pixel in the first disparity map and a constant value greater than zero.
- the product of the disparity value of the corresponding pixel in the first disparity map and a constant value greater than zero is used as the first variable corresponding to the corresponding pixel in the fourth disparity map.
- the second weight distribution map of the first disparity map may be expressed by the following formula (1):
- L l represents the second weight distribution map of the first disparity map
- Re represents the disparity value of the corresponding pixel in the fourth disparity map
- d l represents the disparity value of the corresponding pixel in the first disparity map
- the second weight distribution map of the third disparity map may be set as follows: for any pixel in the first disparity map, if the disparity of the pixel in the first disparity map is If the value is greater than the second variable corresponding to the pixel, the weight value of the pixel in the second weight distribution map of the third disparity map is set to the first value, otherwise, it is set to the second value.
- the first value in the present disclosure is greater than the second value. For example, the first value is 1 and the second value is 0.
- the second variable corresponding to the pixel in the present disclosure may be set according to the disparity value of the corresponding pixel in the fourth disparity map and a constant value greater than zero.
- the first disparity map is left/right mirrored to form a mirror disparity map, that is, the fourth disparity map, and then the product of the disparity value of the corresponding pixel in the fourth disparity map and a constant value greater than zero , As the second variable corresponding to the corresponding pixel in the first disparity map.
- the present disclosure is based on the image to be processed in FIG. 2, and an example of the formed third disparity map is shown in FIG. 9.
- An example of the second weight distribution map of the third disparity map shown in FIG. 9 is shown in FIG. 10.
- the weight values of the white areas in FIG. 10 are all 1, which means that the disparity value at this position is completely reliable.
- the weight value of the black area in FIG. 10 is 0, which means that the disparity value at this position is completely unreliable.
- the second weight distribution map of the third disparity map may be expressed by the following formula (2):
- L l ′ represents the second weight distribution map of the third disparity map
- Re represents the disparity value of the corresponding pixel in the fourth disparity map
- d l represents the disparity value of the corresponding pixel in the first disparity map
- Step C According to the weight distribution map of the first disparity map and the weight distribution map of the third disparity map of the image to be processed, the first disparity map of the image to be processed is optimized and adjusted, and the optimized and adjusted disparity map is finally obtained The disparity map of the image to be processed.
- the present disclosure may use the first weight distribution map and the second weight distribution map of the first disparity map to adjust multiple disparity values in the first disparity map to obtain the adjusted first disparity map ;
- an example of obtaining the optimized and adjusted first disparity map of the image to be processed is as follows:
- the third weight distribution graph can be expressed by the following formula (3):
- W l represents the third weight distribution map
- M l represents the first weight distribution map of the first disparity map
- L l represents the second weight distribution map of the first disparity map
- the fourth weight distribution graph can be expressed by the following formula (4):
- W l ' represents the fourth weight distribution map
- M l ' represents the first weight distribution map of the third disparity map
- L l ' represents the second weight distribution map of the third disparity map
- the third weight distribution map adjusts the multiple disparity values in the first disparity map according to the third weight distribution map to obtain the adjusted first disparity map. For example, for the disparity value of any pixel in the first disparity map, the disparity value of the pixel is replaced with: the disparity value of the pixel and the corresponding position in the third weight distribution map The product of the weight values of pixels. After performing the above-mentioned replacement processing on all pixels in the first disparity map, the adjusted first disparity map is obtained.
- the multiple disparity values in the third disparity map are adjusted according to the fourth weight distribution map to obtain the adjusted third disparity map.
- the disparity value of any pixel in the third disparity map the disparity value of the pixel is replaced with: the disparity value of the pixel and the corresponding position in the fourth weight distribution map The product of the weight values of pixels.
- the finally obtained disparity map of the image to be processed can be expressed by the following formula (5):
- d final represents the finally obtained disparity map of the image to be processed (as shown in the first image on the right in Figure 11);
- W l represents the third weight distribution map (the upper left in Figure 11) (Shown in the first picture);
- W l' represents the fourth weight distribution map (shown in the first picture in the lower left in Fig. 11);
- d l indicates the first disparity map (the second picture in the upper left in Fig. 11 Shown); Shows the third disparity map (shown in the second image from the bottom left in Figure 11).
- the present disclosure does not limit the execution order of the two steps of merging the first weight distribution map and the second weight distribution map.
- the two merging processing steps can be executed simultaneously or sequentially.
- the present disclosure does not limit the sequence of adjusting the disparity value in the first disparity map and adjusting the disparity value in the third disparity map.
- the two adjustment steps can be performed at the same time or Execute successively.
- the image to be processed when used as the left-eye image, there are usually phenomena such as missing left side parallax and the left edge of the object being occluded. These phenomena will cause the corresponding area in the parallax map of the image to be processed The disparity value is not accurate.
- the image to be processed when used as the right eye image to be processed, there are usually phenomena such as missing right side parallax and the right edge of the object being occluded. These phenomena will lead to the corresponding area in the parallax map of the image to be processed The disparity value is not accurate.
- the present disclosure performs left/right mirror processing on the image to be processed, and performs mirror processing on the disparity map of the mirror image, and then uses the disparity map after the mirror processing to optimize and adjust the disparity map of the image to be processed, which is beneficial to reduce the disparity of the image to be processed
- the phenomenon that the disparity value of the corresponding area in the disparity map is inaccurate, thereby helping to improve the accuracy of moving object detection.
- the method for obtaining the first disparity map of the image to be processed in the present disclosure includes but is not limited to: obtaining the first disparity map of the image to be processed by stereo matching. Parallax map. For example, using BM (Block Matching, block matching) algorithm, SGBM (Semi-Global Block Matching, semi-global block matching) algorithm, or GC (Graph Cuts, graph cut) algorithm and other stereo matching algorithms to obtain the first disparity of the image to be processed Figure.
- BM Block Matching, block matching
- SGBM Semi-Global Block Matching, semi-global block matching
- GC Graph Cuts, graph cut
- a convolutional neural network used to obtain a disparity map of a binocular image is used to perform disparity processing on the image to be processed, thereby obtaining the first disparity map of the image to be processed.
- the present disclosure may use the following formula (6) to obtain the depth information of the pixels in the image to be processed:
- Depth represents the depth value of the pixel
- f x is a known value, which represents the focal length of the camera in the horizontal direction (X-axis direction in the three-dimensional coordinate system)
- b is a known value, which means that the parallax map is obtained
- the baseline of the binocular image sample used by the convolutional neural network, b belongs to the calibration parameter of the binocular camera device
- Disparity represents the parallax of the pixel.
- the to-be-processed image and the reference image in the present disclosure may be two images that are formed in a sequence relationship during continuous shooting (such as multiple continuous shooting or recording) of the same camera.
- the time interval for forming two images is usually short to ensure that the contents of the two images are mostly the same.
- the time interval for forming two images may be the time interval between two adjacent video frames.
- the time interval for forming two images may be the time interval between two adjacent photos in the continuous photographing mode of the camera device.
- the image to be processed may be a video frame (such as the current video frame) in the video captured by the camera, and the reference image of the image to be processed is another video frame in the video, for example, the reference image is the current video The previous video frame of the frame.
- the image to be processed may be one of the multiple photos taken by the camera device based on the continuous photography mode, and the reference image of the image to be processed may be another photo in the multiple photos, such as to be processed The previous or next photo of the image, etc.
- the image to be processed and the reference image in the present disclosure may both be RGB (Red Green Blue) images or the like.
- the camera device in the present disclosure may be a camera device installed on a moving object, for example, a camera device installed on vehicles, trains, and airplanes.
- the reference image in the present disclosure is usually a monocular image. That is, the reference image is usually an image obtained by shooting with a monocular camera.
- the present disclosure can realize the detection of moving objects without having to set up a binocular camera device, thereby helping to reduce the cost of detecting moving objects.
- the optical flow information between the image to be processed and the reference image in the present disclosure can be considered as the two-dimensional motion field of the pixels in the image to be processed and the reference image.
- the optical flow information does not indicate that the pixels are in three dimensions. Real movement in space.
- the present disclosure can introduce the pose change of the camera device when the image to be processed and the reference image are captured, that is, the present disclosure according to the pose change information of the camera device, To obtain the optical flow information between the image to be processed and the reference image, so as to help eliminate the interference caused by the position and posture change of the camera device in the obtained optical flow information.
- the method of obtaining the optical flow information between the image to be processed and the reference image according to the pose change information of the camera device of the present disclosure may include the following steps:
- Step 1 Obtain the pose change information of the image to be processed and the reference image captured by the camera.
- the pose change information in the present disclosure refers to the difference between the pose of the camera when the image to be processed is captured and the pose when the reference image is captured.
- the pose change information is based on the three-dimensional space.
- the pose change information may include: translation information of the camera device and rotation information of the camera device.
- the translation information of the camera device may include: the displacement of the camera device on three coordinate axes (the coordinate system shown in FIG. 12).
- the rotation information of the camera device may be: a rotation vector based on Roll, Yaw, and Pitch.
- the rotation information of the camera device may include rotation component vectors based on the three rotation directions of Roll, Yaw, and Pitch.
- the rotation information of the camera device can be expressed as the following formula (7):
- R represents rotation information, which is a 3 ⁇ 3 matrix
- R 11 represents cos ⁇ cos ⁇ -cos ⁇ sin ⁇ sin ⁇
- R 12 represents -cos ⁇ cos ⁇ sin ⁇ -cos ⁇ sin ⁇
- R 13 represents sin ⁇ sin ⁇
- R 21 represents cos ⁇ sin ⁇ +cos ⁇ cos ⁇ sin ⁇
- R 22 represents cos ⁇ cos ⁇ cos ⁇ -sin ⁇ sin ⁇
- R 23 is sin ⁇ sin ⁇
- R 31 is sin ⁇ sin ⁇
- R 32 is cos ⁇ sin ⁇
- R 33 is cos ⁇
- Euler angle ( ⁇ , ⁇ , ⁇ ) represents the rotation angle based on Roll, Yaw and Pitch.
- the present disclosure may use visual technology to obtain the pose change information of the image to be processed and the reference image taken by the camera device, for example, use SLAM (Simultaneous Localization And Mapping, instant positioning and map construction) to obtain the pose change information.
- SLAM Simultaneous Localization And Mapping, instant positioning and map construction
- the present disclosure may use the open source ORB (Oriented FAST and Rotated Brief, Oriented Fast and Rotated Summary, a descriptor)-SLAM framework's RGBD (Red Green Blue Detph) model to obtain pose change information.
- ORB Oriented FAST and Rotated Brief, Oriented Fast and Rotated Summary, a descriptor
- RGBD Red Green Blue Detph
- the image to be processed RGB image
- the depth map of the image to be processed and the reference image
- the present disclosure may also use other methods to obtain pose change information, for example, use GPS (Global Positioning System, global positioning system) and angular velocity sensors to obtain pose
- the present disclosure may use a 4 ⁇ 4 homogeneous matrix as shown in the following formula (8) to represent the pose change information:
- T l c represents the pose change information of the image to be processed by the camera device (such as the current video frame c) and the reference image (such as the previous video frame l of the current video frame c), such as the pose Change matrix;
- R represents the rotation information of the camera device, which is a 3 ⁇ 3 matrix, namely t represents the translation information of the camera device, that is, the translation vector;
- t can be represented by the three translation components of t x , t y and t z , t x represents the translation component in the X-axis direction, and t y represents the translation component in the Y-axis direction , T z represents the translation component in the Z-axis direction.
- Step 2 Establish a correspondence between the pixel value of the pixel in the image to be processed and the pixel value of the pixel in the reference image according to the pose change information.
- the pose of the camera when shooting the image to be processed is usually different from the pose when shooting the reference image. Therefore, the three-dimensional coordinate system corresponding to the image to be processed (that is, the camera shooting The three-dimensional coordinate system of the image to be processed is different from the three-dimensional coordinate system corresponding to the reference image (that is, the three-dimensional coordinate system of the reference image taken by the camera device).
- the three-dimensional spatial position of the pixel may be converted first, so that the pixel in the image to be processed and the pixel in the reference image are in the same three-dimensional coordinate system.
- the present disclosure may first obtain the pixels (such as all pixels) in the image to be processed in the three-dimensional coordinate system of the camera device corresponding to the image to be processed according to the depth information obtained above and the parameters (known values) of the camera device. That is, the present disclosure first converts the pixels in the image to be processed into the three-dimensional space, so as to obtain the coordinates of the pixels in the three-dimensional space (ie, three-dimensional coordinates). For example, the present disclosure may use the following formula (9) to obtain the three-dimensional coordinates of any pixel in the image to be processed:
- Z represents the depth value of the pixel
- X, Y, and Z represent the three-dimensional coordinates of the pixel (ie the first coordinate)
- f x represents the horizontal direction of the camera device (the X-axis direction in the three-dimensional coordinate system) Focal length
- f y represents the focal length of the camera in the vertical direction (the Y-axis direction in the three-dimensional coordinate system)
- (u, v) represents the two-dimensional coordinates of the pixel in the image to be processed
- c x , c y represent the image of the camera Principal point coordinates
- Disparity represents the parallax of pixels.
- any one pixel is expressed as P i (X i, Y i ,Z i )
- P i c represents the three-dimensional coordinates of the i-th pixel in the image to be processed, namely P i (X i ,Y i ,Z i );
- c represents the image to be processed, the value range of i and the number of multiple pixels Related. For example, if the number of pixels is N (N is an integer greater than 1), the value range of i can be 1 to N or 0 to N-1.
- the present disclosure may convert the first coordinates of the multiple pixels to the corresponding reference image according to the aforementioned pose change information.
- the second coordinates of a plurality of pixels are obtained.
- the present disclosure may use the following formula (10) to obtain the second coordinate of any pixel in the image to be processed:
- P i l represents the second coordinate of the i-th pixel in the image to be processed
- T l c represents the image taken by the camera to be processed (such as the current video frame c) and the reference image (such as the current video
- the pose change information of the previous video frame l) of frame c such as the pose change matrix, namely P i c represents the first coordinate of the i-th pixel in the image to be processed.
- the present disclosure may perform projection processing on the second coordinates of multiple pixels based on the two-dimensional coordinate system of the two-dimensional image, so as to obtain the converted The projected two-dimensional coordinates of the image to be processed to the three-dimensional coordinate system corresponding to the reference image.
- the present disclosure may use the following formula (11) to obtain the projected two-dimensional coordinates:
- (u, v) represents the projected two-dimensional coordinates of the pixel in the image to be processed
- f x represents the focal length of the camera in the horizontal direction (the X-axis direction in the three-dimensional coordinate system)
- f y represents the camera The focal length of the device in the vertical direction (the Y-axis direction in the three-dimensional coordinate system)
- c x , c y represent the principal point coordinates of the camera device
- (X, Y, Z) represent the second coordinates of the pixels in the image to be processed.
- the present disclosure may establish the pixel values of the pixels in the image to be processed and the reference image according to the projected two-dimensional coordinates and the two-dimensional coordinates of the reference image.
- the corresponding relationship between the pixel values of the pixels can be expressed as: for any pixel at the same position in the image formed by projecting two-dimensional coordinates and the reference image, the pixel value of the pixel in the image to be processed and the pixel value of the pixel in the reference image value.
- Step 3 Perform transformation processing on the reference image according to the above corresponding relationship.
- the present disclosure may use the foregoing correspondence relationship to perform Warp (warping) processing on the reference image, thereby transforming the reference image into the image to be processed.
- Warp processing warping
- FIG. 13 An example of Warp processing on the reference image is shown in Figure 13.
- the left image in FIG. 13 is the reference image
- the right image in FIG. 13 is the image formed after Warp processing is performed on the reference image.
- Step 4 Calculate the optical flow information between the image to be processed and the reference image based on the image to be processed and the transformed image.
- the optical flow information in the present disclosure includes, but is not limited to: dense optical flow information.
- the optical flow information is calculated for all pixels in the image.
- the present disclosure may use visual technology to obtain optical flow information, for example, use OpenCV (Open Source Computer Vision Library) to obtain optical flow information.
- OpenCV Open Source Computer Vision Library
- the present disclosure can input the image to be processed and the transformed image into a model based on OpenCV, and the model will output the optical flow information between the two input images, so that the present disclosure obtains the difference between the image to be processed and the reference image Optical flow information between.
- the algorithm used in this model to calculate optical flow information includes but is not limited to: Gunnar Farneback (person's name) algorithm.
- optical flow information of any pixel in the image to be processed obtained by the present disclosure is expressed as I of ( ⁇ u, ⁇ v)
- the optical flow information of the pixel usually conforms to the following formula (12):
- I t (u t ,v t ) represents a pixel in the reference image
- I t+1 (u t+1 ,v t+1 ) represents the corresponding position in the image to be processed Pixels.
- the reference image after Warp processing (such as the previous video frame after Warp processing), the image to be processed (such as the current video frame) and the optical flow information obtained by calculation are shown in FIG. 14.
- the top image in Figure 14 is the reference image after Warp processing
- the middle image in Figure 14 is the image to be processed
- the bottom image in Figure 14 is the optical flow information between the image to be processed and the reference image.
- Optical flow information for the reference image is added for the convenience of detailed comparison.
- S120 Obtain a three-dimensional motion field of the pixels in the image to be processed relative to the reference image according to the depth information and the optical flow information.
- the present disclosure can obtain the three-dimensional motion field of the pixels (such as all pixels) in the image to be processed relative to the reference image (which may be referred to as The three-dimensional motion field of the pixels in the image to be processed).
- the three-dimensional sports field in the present disclosure can be considered as: a three-dimensional sports field formed by scene motion in a three-dimensional space.
- the three-dimensional motion field of the pixels in the image to be processed can be considered as: the three-dimensional spatial displacement of the pixels in the image to be processed between the image to be processed and the reference image.
- the three-dimensional sports field can be represented by a scene flow (Scene Flow).
- the present disclosure may use the following formula (13) to obtain the scene stream I sf ( ⁇ X, ⁇ Y, ⁇ Z) of multiple pixels in the image to be processed:
- ( ⁇ X, ⁇ Y, ⁇ Z) represents the displacement of any pixel in the image to be processed in the three coordinate axis directions of the three-dimensional coordinate system
- ⁇ I depth represents the depth value of the pixel
- ( ⁇ u, ⁇ v) represents the optical flow information of the pixel, that is, the displacement of the pixel in the two-dimensional image between the image to be processed and the reference image
- f x represents the focal length of the camera in the horizontal direction (X-axis direction in the three-dimensional coordinate system)
- f y represents the focal length of the imaging device in the vertical direction (the Y axis direction in the three-dimensional coordinate system)
- c x , c y represent the principal point coordinates of the imaging device.
- S130 Determine a moving object in the image to be processed according to the three-dimensional sports field.
- the present disclosure may determine the motion information of the object in the image to be processed in the three-dimensional space according to the three-dimensional sports field.
- the movement information of an object in the three-dimensional space can indicate whether the object is a moving object.
- the present disclosure may first obtain the motion information of the pixels in the image to be processed in the three-dimensional space according to the three-dimensional sports field; then, perform clustering processing on the pixels according to the motion information of the pixels in the three-dimensional space; and finally, according to the clustering processing As a result, determine the motion information of the object in the image to be processed in the three-dimensional space to determine the motion object in the image to be processed.
- the motion information of pixels in the image to be processed in the three-dimensional space may include, but is not limited to: the speed of multiple pixels (such as all pixels) in the image to be processed in the three-dimensional space.
- the speed here is usually in the form of a vector, that is, the speed of the pixel in the present disclosure can reflect the speed of the pixel and the direction of the speed of the pixel.
- the present disclosure can easily obtain the motion information of the pixels in the image to be processed in the three-dimensional space by using the three-dimensional sports field.
- the three-dimensional space in the present disclosure includes: a three-dimensional space based on a three-dimensional coordinate system.
- the three-dimensional coordinate system may be: the three-dimensional coordinate system of the camera device that takes the image to be processed.
- the Z axis of the three-dimensional coordinate system is usually the optical axis of the imaging device, that is, the depth direction.
- FIG. 12 an example of the X axis, Y axis, Z axis and origin of the three-dimensional coordinate system of the present disclosure is shown in FIG. 12.
- the X-axis points to the horizontal to the right
- the Y-axis points to the bottom of the vehicle
- the Z-axis points to the front of the vehicle.
- the origin of the three-dimensional coordinate system is at the camera Light center position.
- the present disclosure can calculate the three-dimensional coordinate system of the three-dimensional coordinate system of the camera corresponding to the image to be processed in the three-dimensional coordinate system of the image to be processed according to the three-dimensional sports field and the time difference ⁇ t between the image to be processed and the reference image taken by the camera.
- v x , v y and v z respectively represent the speed of any pixel in the image to be processed in the three coordinate axis directions of the three-dimensional coordinate system of the camera device corresponding to the image to be processed;
- ( ⁇ X , ⁇ Y, ⁇ Z) represents the displacement of the pixel in the image to be processed in the three coordinate axis directions of the three-dimensional coordinate system of the camera corresponding to the image to be processed;
- ⁇ t represents the time difference between the image to be processed and the reference image taken by the camera .
- the present disclosure may first determine the motion area in the image to be processed, and perform clustering processing on the pixels in the motion area. For example, according to the motion information of the pixels in the motion area in the three-dimensional space, the pixels in the motion area are clustered. For another example, according to the motion information of the pixels in the motion area in the three-dimensional space and the position of the pixels in the three-dimensional space, the pixels in the motion area are clustered.
- the present disclosure may use a motion mask to determine the motion area in the image to be processed. For example, the present disclosure may obtain the motion mask of the image to be processed according to the motion information of the pixel in the three-dimensional space.
- the present disclosure may filter the speeds of multiple pixels (such as all pixels) in the image to be processed according to a preset speed threshold, so as to form a motion mask of the image to be processed according to the result of the filter processing.
- the present disclosure can use the following formula (17) to obtain the motion mask of the image to be processed:
- I motion represents a pixel in the motion mask; if the velocity of the pixel
- the present disclosure may refer to an area composed of pixels with a value of 1 in the motion mask as a motion area, and the size of the motion mask is the same as the size of the image to be processed. Therefore, the present disclosure can determine the motion region in the image to be processed based on the motion region in the motion mask.
- An example of the motion mask in the present disclosure is shown in FIG. 15.
- the bottom image in Figure 15 is the image to be processed, and the top image in Figure 15 is the motion mask of the image to be processed.
- the black part in the picture above is the non-motion area, and the gray part in the picture above is the motion area.
- the moving area in the picture above is basically the same as the moving object in the picture below.
- the accuracy of the present disclosure for determining the motion area in the image to be processed will also increase.
- the present disclosure may first standardize the three-dimensional spatial position information and motion information of the pixels in the motion area. Processing, so that the three-dimensional space coordinate values of the pixels in the motion area are converted into a predetermined coordinate interval (such as [0, 1]), and the speed of the pixels in the motion area is converted into a predetermined speed interval (such as [0, 1]) )in. After that, using the transformed three-dimensional space coordinate value and speed, perform density clustering processing, thereby obtaining at least one cluster.
- a predetermined coordinate interval such as [0, 1]
- speed of the pixels in the motion area is converted into a predetermined speed interval (such as [0, 1])
- the standardization processing in the present disclosure includes but not limited to: min-max (minimum-maximum) standardization processing, Z-score (score) standardization processing, and the like.
- the min-max standardization process for the three-dimensional spatial position information of the pixels in the motion area can be expressed by the following formula (18), and the min-max standardization process for the motion information of the pixels in the motion area can be expressed by the following formula ( 19) means:
- (X, Y, Z) represents the three-dimensional spatial position information of a pixel in the motion area in the image to be processed;
- (X * , Y * , Z * ) represents the normalized processing of the pixel (X min , Y min , Z min ) represents the minimum X coordinate, minimum Y coordinate and minimum Z coordinate in the three-dimensional spatial position information of all pixels in the motion area;
- (X max , Y max ,Z max ) represents the maximum X coordinate, the maximum Y coordinate, and the maximum Z coordinate in the three-dimensional space position information of all pixels in the motion area.
- (v x , v y , v z ) represents the velocity of the pixels in the motion area in the three coordinate axis directions in the three-dimensional space; Represents the speed of (v x , v y , v z ) after min-max normalization processing; (v xmin , v ymin , v zmin ) represents the three coordinate axis directions of all pixels in the motion area in the three-dimensional space (V xmax ,v ymax ,v zmax ) represents the maximum velocity of all pixels in the motion area in the three coordinate axis directions in the three-dimensional space.
- the clustering algorithm used in the clustering process of the present disclosure includes, but is not limited to: a density clustering algorithm.
- a density clustering algorithm For example, DBSCAN (Density-Based Spatial Clustering of Applications with Noise, density-based clustering method with noise) and so on.
- DBSCAN Density-Based Spatial Clustering of Applications with Noise, density-based clustering method with noise
- Each cluster obtained by clustering corresponds to a moving object instance, that is, each cluster can be regarded as a moving object in the image to be processed.
- the present disclosure can determine the moving object instance corresponding to the cluster according to the speed and direction of multiple pixels (such as all pixels) in the cluster.
- the magnitude and direction of the speed may use the average speed and average direction of all pixels in the cluster to represent the speed and direction of the moving object instance corresponding to the cluster.
- the present disclosure may use the following formula (20) to express the speed and direction of the moving object instance corresponding to a type of cluster:
- the present disclosure may also determine the category based on the position information (ie, the two-dimensional coordinates in the image to be processed) of multiple pixels (such as all pixels) belonging to the same cluster
- the moving object instance corresponding to the cluster is the moving object detection box (Bounding-Box) in the image to be processed.
- the present disclosure can calculate the maximum column coordinate u max and the minimum column coordinate u min of all pixels in the cluster in the image to be processed, and calculate the maximum of all pixels in the cluster
- the row coordinate v max and the minimum row coordinate v min (Note: It is assumed that the origin of the image coordinate system is located at the upper left corner of the image).
- the coordinates of the moving object detection frame obtained in the present disclosure in the image to be processed can be expressed as (u min , v min , u max , v max ).
- FIG. 16 an example of the moving object detection frame in the image to be processed determined by the present disclosure is shown in the lower figure in FIG. 16. If the moving object detection frame is reflected in the moving mask, it is as shown in the upper figure in Figure 16.
- the multiple rectangular boxes in the upper and lower figures of FIG. 16 are all moving object detection boxes obtained in the present disclosure.
- the present disclosure may also determine the position information of the moving object in the three-dimensional space according to the position information of multiple pixels belonging to the same cluster in the three-dimensional space.
- the position information of the moving object in the three-dimensional space includes but is not limited to: the coordinate of the moving object on the horizontal coordinate axis (X coordinate axis), the coordinate of the moving object on the depth coordinate axis (Z coordinate axis), and the moving object in the vertical direction.
- the height in the straight direction that is, the height of the moving object), etc.
- the present disclosure may first determine the distance between all pixels in a cluster and the camera device based on the position information of all pixels belonging to the same cluster in the three-dimensional space, and then place the closest pixel in the The position information in the three-dimensional space is used as the position information of the moving object in the three-dimensional space.
- the present disclosure may use the following formula (21) to calculate the distance between multiple pixels in a cluster and the camera device, and select the minimum distance:
- d min represents the minimum distance
- X represents the X coordinate of a class i in the i-th cluster of pixels
- Z i represents a Z coordinate class i-th cluster of pixels.
- the X and Z coordinates of the pixel with the minimum distance can be used as the position information of the moving object in the three-dimensional space, as shown in the following formula (22):
- O X represents the coordinate of the moving object on the horizontal coordinate axis, that is, the X coordinate of the moving object
- O Z represents the coordinate of the moving object on the depth direction coordinate axis (Z coordinate axis), that is, the movement The Z coordinate of the object
- X close represents the X coordinate of the pixel with the smallest distance calculated above
- Z close represents the Z coordinate of the pixel with the smallest distance calculated above.
- the present disclosure may use the following formula (23) to calculate the height of the moving object:
- O H represents the height of the moving object in three-dimensional space
- Y max represents the maximum Y coordinates of all the pixels of one class clusters in the three-dimensional space
- Y min represents all pixels of one class clusters in The smallest Y coordinate in three-dimensional space.
- FIG. 17 The flow of an embodiment of training a convolutional neural network in the present disclosure is shown in FIG. 17.
- the image samples input to the convolutional neural network of the present disclosure may always be left-eye image samples of binocular image samples, or may always be right-eye image samples of binocular image samples.
- the successfully trained convolutional neural network will use the input to-be-processed image as the to-be-processed image in the test or actual application scenarios. Process the left eye image.
- the successfully trained convolutional neural network will use the input to-be-processed image as the to-be-processed image in the test or actual application scenarios. Process the right eye image.
- S1710 Perform disparity analysis processing via a convolutional neural network, and obtain a disparity map of the left-eye image sample and a disparity map of the right-eye image sample based on the output of the convolutional neural network.
- S1720 Reconstruct the right-eye image according to the disparity map of the left-eye image sample and the right-eye image sample.
- the method of reconstructing the right-eye image in the present disclosure includes but is not limited to: performing reprojection calculation on the disparity map of the left-eye image sample and the right-eye image sample to obtain the reconstructed right-eye image.
- S1730 Reconstruct the left-eye image according to the disparity map of the right-eye image sample and the left-eye image sample.
- the method of reconstructing the left-eye image in the present disclosure includes but is not limited to: performing re-projection calculation on the right-eye image sample and the disparity map of the left-eye image sample to obtain the reconstructed left-eye image.
- S1740 Adjust the network parameters of the convolutional neural network according to the difference between the reconstructed left-eye image and the left-eye image sample, and the difference between the reconstructed right-eye image and the right-eye image sample.
- the loss function used in the present disclosure when determining the difference includes, but is not limited to: L1 loss function, smooth loss function, lr-Consistency loss function, etc.
- the present disclosure propagates the calculated loss back to adjust the network parameters of the convolutional neural network (such as the weight of the convolution kernel)
- the gradient calculated based on the chain derivation of the convolutional neural network can be used.
- To back-propagate the loss which helps to improve the training efficiency of the convolutional neural network.
- the predetermined iterative conditions in the present disclosure may include: the difference between the left eye image and the left eye image sample reconstructed based on the disparity map output by the convolutional neural network, and the right eye image and the right eye image reconstructed based on the disparity map output by the convolutional neural network.
- the difference between the image samples meets the predetermined difference requirement. If the difference meets the requirements, the training of the convolutional neural network is successfully completed this time.
- the predetermined iterative conditions in the present disclosure may also include: training the convolutional neural network, and the number of binocular image samples used reaches a predetermined number requirement, etc.
- the number of binocular image samples in use reaches the predetermined number requirement.
- the difference between the left eye image and the left eye image samples reconstructed based on the disparity map output by the convolutional neural network, and the disparity map output based on the convolutional neural network If the difference between the reconstructed right-eye image and the right-eye image sample does not meet the predetermined difference requirements, the training of the convolutional neural network is not successful this time.
- FIG. 18 is a flowchart of an embodiment of the intelligent driving control method of the present disclosure.
- the intelligent driving control method of the present disclosure can be applied but not limited to: an automatic driving (such as a fully unassisted automatic driving) environment or an assisted driving environment.
- the camera device includes, but is not limited to, an RGB-based camera device.
- S1810 Perform moving object detection on at least one video frame included in the video stream to obtain a moving object in the video frame, for example, obtain motion information of an object in the video frame in a three-dimensional space.
- a moving object in the video frame for example, obtain motion information of an object in the video frame in a three-dimensional space.
- S1820 Generate and output a vehicle control instruction according to the moving object in the video frame. For example, according to the motion information of the object in the video frame in the three-dimensional space, a vehicle control instruction is generated and output to control the vehicle.
- control commands generated by the present disclosure include but are not limited to: speed maintaining control commands, speed adjustment control commands (such as decelerating driving commands, accelerating driving commands, etc.), direction maintaining control commands, and direction adjustment control commands (such as left steering commands) , Right turn command, left lane merging command, or right lane merging command, etc.), whistle command, warning prompt control command or driving mode switching control command (such as switching to automatic cruise driving mode, etc.).
- the moving object detection technology of the present disclosure can be applied in the field of intelligent driving control, and can also be applied in other fields; for example, it can realize the movement of moving object detection in industrial manufacturing and indoor fields such as supermarkets. Object detection and moving object detection in the security field, etc.
- the present disclosure does not limit the applicable scenarios of moving object detection technology.
- the moving object detection device provided by the present disclosure is shown in FIG. 19.
- the device shown in FIG. 19 includes: a first acquisition module 1900, a second acquisition module 1910, a third acquisition module 1920, and a moving object determination module 1930.
- the device may further include: a training module.
- the first acquiring module 1900 is used to acquire the depth information of the pixels in the image to be processed.
- the first acquisition module 1900 may include: a first sub-module and a second sub-module.
- the first sub-module is used to obtain the first disparity map of the image to be processed.
- the second sub-module is used to obtain the depth information of the pixels in the image to be processed according to the first disparity map of the image to be processed.
- the image to be processed in the present disclosure includes: a monocular image.
- the first sub-module includes: a first unit, a second unit and a third unit.
- the first unit is used to input the image to be processed into the convolutional neural network, perform disparity analysis processing through the convolutional neural network, and obtain the first disparity map of the image to be processed based on the output of the convolutional neural network.
- the convolutional neural network is obtained by the training module using binocular image samples.
- the second unit is used to obtain the second horizontal mirror image of the first horizontal mirror image and the second disparity map of the image to be processed.
- the first horizontal mirror image of the image to be processed is the result of performing mirror processing in the horizontal direction on the image to be processed.
- the formed mirror image, and the second horizontal mirror image of the second disparity image is a mirror image formed by performing mirror image processing on the second disparity image in the horizontal direction.
- the third unit is used to adjust the disparity of the first disparity map of the image to be processed according to the weight distribution map of the first disparity map of the image to be processed and the weight distribution map of the second horizontal mirror image of the second disparity map to finally obtain The first disparity map of the image to be processed.
- the second unit may input the first level mirror image of the image to be processed into the convolutional neural network, perform disparity analysis processing via the convolutional neural network, and obtain the first level of the image to be processed based on the output of the neural network
- the second disparity map of the mirror image the second unit performs mirror processing on the second disparity map of the first horizontal mirror image of the image to be processed to obtain the second horizontal mirror image of the first horizontal mirror image of the image to be processed .
- the weight distribution diagram in the present disclosure includes: at least one of a first weight distribution diagram and a second weight distribution diagram; the first weight distribution diagram is a weight distribution diagram uniformly set for a plurality of images to be processed; second The weight distribution map is a weight distribution map set separately for different images to be processed.
- the first weight distribution map includes at least two left and right regions, and different regions have different weight values.
- the weight value of the area on the right is greater than the area on the left
- the weight value of the region on the right is greater than the weight value of the region on the left
- the weight value of the area on the left is greater than the area on the right
- the weight value of the region on the left is greater than the weight value of the region on the right
- the third unit is also used to set a second weight distribution map of the first disparity map of the image to be processed.
- the third unit performs horizontal mirroring processing on the first disparity map of the image to be processed to form a mirror disparity map; For any pixel in the mirror disparity map, if the disparity value of the pixel is greater than the first variable corresponding to the pixel, set the weight value of the pixel in the second weight distribution map of the image to be processed Is the first value, otherwise, set to the second value; where the first value is greater than the second value.
- the first variable corresponding to the pixel is set according to the disparity value of the pixel in the first disparity map of the image to be processed and a constant value greater than zero.
- the third unit is further configured to set a second weight distribution map of the second horizontal mirror image of the second disparity map, for example, for any pixel in the second horizontal mirror image of the second disparity map, If the disparity value of the pixel in the first disparity map of the image to be processed is greater than the second variable corresponding to the pixel, the third unit will map the second weight distribution of the second horizontal mirror image of the second disparity map The weight value of the pixel is set to the first value, otherwise, the third unit sets it to the second value; wherein the first value is greater than the second value.
- the second variable corresponding to the pixel is set according to the disparity value of the corresponding pixel in the horizontal mirror image of the first disparity map of the image to be processed and a constant value greater than zero.
- the third unit may be further configured to: firstly, adjust the disparity value in the first disparity map of the image to be processed according to the first weight distribution map and the second weight distribution map of the first disparity map of the image to be processed ; Afterwards, the third unit adjusts the disparity value in the second horizontal mirror image of the second disparity map according to the first weight distribution map and the second weight distribution map of the second horizontal mirror image of the second disparity map; finally, the first The three units merge the first disparity map after the disparity value adjustment and the second horizontal mirror image after the disparity value adjustment, and finally obtain the first disparity map of the image to be processed.
- the first obtaining module 1900 and the sub-modules and units it includes reference may be made to the foregoing description of S100, which is not described in detail here.
- the second acquisition module 1910 is used to acquire the optical flow information between the image to be processed and the reference image.
- the reference image and the image to be processed are two images with a time series relationship obtained based on the continuous shooting of the camera.
- the image to be processed is a video frame in the video shot by the camera device, and the reference image of the image to be processed includes: the previous video frame of the video frame.
- the second acquisition module 1910 may include: a third submodule, a fourth submodule, a fifth submodule, and a sixth submodule.
- the third sub-module is used to obtain the pose change information of the image to be processed and the reference image taken by the camera;
- the fourth sub-module is used to establish the pixel value of the pixel in the to-be-processed image and the reference image according to the pose change information The corresponding relationship between the pixel values of the pixels;
- the fifth sub-module is used to transform the reference image according to the above-mentioned correspondence;
- the sixth sub-module is used to calculate the reference image based on the image to be processed and the transformed reference image Optical flow information between the image to be processed and the reference image.
- the fourth sub-module may first obtain the first coordinates of the pixels in the image to be processed in the three-dimensional coordinate system of the camera device corresponding to the image to be processed according to the depth information and the preset parameters of the camera; then, the fourth sub-module The first coordinate can be converted to the second coordinate in the three-dimensional coordinate system of the camera device corresponding to the reference image according to the pose change information; after that, based on the two-dimensional coordinate system of the two-dimensional image, the fourth sub-module is The coordinates are projected to obtain the projected two-dimensional coordinates of the image to be processed; finally, the fourth sub-module establishes the pixel value and reference of the pixels in the image to be processed according to the projected two-dimensional coordinates of the image to be processed and the two-dimensional coordinates of the reference image Correspondence between the pixel values of pixels in the image.
- S110 the specific operations performed by the second acquisition module 1910 and the sub-modules and units it includes, refer to the foregoing description of S110, which is not described in detail here
- the third acquiring module 1920 is configured to acquire the three-dimensional motion field of the pixels in the image to be processed relative to the reference image according to the depth information and the optical flow information.
- the third acquiring module 1920 For the specific operations performed by the third acquiring module 1920, refer to the above description of S120, which will not be described in detail here.
- the moving object determining module 1930 is used to determine the moving object in the image to be processed according to the three-dimensional sports field.
- the module for determining a moving object may include: a seventh sub-module, an eighth sub-module, and a ninth sub-module.
- the seventh sub-module is used to obtain the motion information of the pixels in the image to be processed in the three-dimensional space according to the three-dimensional sports field.
- the seventh sub-module can calculate the position of the pixels in the image to be processed in the three coordinate axis directions of the three-dimensional coordinate system of the camera device corresponding to the image to be processed according to the three-dimensional sports field and the time difference between shooting the image to be processed and the reference image. speed.
- the eighth sub-module is used for clustering the pixels according to the motion information of the pixels in the three-dimensional space.
- the eighth sub-module includes: the fourth unit, the fifth unit, and the sixth unit.
- the fourth unit is used to obtain the motion mask of the image to be processed according to the motion information of the pixel in the three-dimensional space.
- the motion information of the pixel in the three-dimensional space includes: the speed of the pixel in the three-dimensional space.
- the fourth unit can filter the speed of the pixel in the image to be processed according to the preset speed threshold to form the motion mask of the image to be processed .
- the fifth unit is used to determine the motion area in the image to be processed according to the motion mask.
- the sixth unit is used for clustering the pixels in the motion area according to the three-dimensional space position information and motion information of the pixels in the motion area. For example, the sixth unit can convert the three-dimensional coordinate values of the pixels in the motion area into a predetermined coordinate interval; afterwards, the sixth unit converts the speed of the pixels in the motion area into a predetermined speed interval; finally, the sixth unit converts Perform density clustering processing on the pixels in the motion area to obtain at least one cluster.
- the ninth sub-module is used to determine the moving object in the image to be processed according to the result of the clustering processing. For example, for any cluster, the ninth sub-module can determine the speed and direction of the moving object according to the speed and direction of multiple pixels in the cluster; among them, one cluster is used as the image to be processed.
- the ninth sub-module is also used to determine the moving object detection frame in the image to be processed according to the spatial position information of the pixels belonging to the same cluster.
- S130 For the specific operations performed by determining the moving object module 1930 and the sub-modules and units included therein, reference may be made to the foregoing description of S130, which is not described in detail here.
- the training module is used to input one of the binocular image samples into the convolutional neural network to be trained, and perform parallax analysis processing through the convolutional neural network. Based on the output of the convolutional neural network, the training module obtains the left eye image sample The training module reconstructs the right-eye image based on the disparity map of the left-eye image sample and the right-eye image sample; the training module reconstructs the left-eye image based on the disparity map of the right-eye image sample and the left-eye image sample; the training module is based on the reconstructed The difference between the left-eye image and the left-eye image sample, and the difference between the reconstructed right-eye image and the right-eye image sample, adjust the network parameters of the convolutional neural network.
- the specific operations performed by the training module can be referred to the above description with respect to FIG. 17, which will not be described in detail here.
- the intelligent driving control device provided by the present disclosure is shown in FIG. 20.
- the device shown in FIG. 20 includes: a fourth acquisition module 2000, a moving object detection device 2010, and a control module 2020.
- the fourth acquisition module 2000 is used to acquire the video stream of the road where the vehicle is located through the camera device provided on the vehicle.
- the moving object detection device 2010 is configured to perform moving object detection on at least one video frame included in the video stream, and determine the moving object in the video frame.
- the structure of the moving object detection device 2010 and the specific operations performed by each module, sub-module, and unit can be referred to the description of FIG. 19, which is not described in detail here.
- the control module 2020 is used to generate and output vehicle control commands according to the moving objects.
- the control commands generated and output by the control module 2020 include, but are not limited to: speed maintaining control commands, speed adjustment control commands, direction maintaining control commands, direction adjustment control commands, warning prompt control commands, and driving mode switching control commands.
- FIG. 21 shows an exemplary device 2100 suitable for implementing the present disclosure.
- the device 2100 may be a control system/electronic system configured in a car, a mobile terminal (for example, a smart mobile phone, etc.), a personal computer (PC, for example, a desktop computer). Or notebook computers, etc.), tablets, servers, etc.
- the device 2100 includes one or more processors, communication parts, etc., the one or more processors may be: one or more central processing units (CPU) 2101, and/or, one or more The image processor (GPU) 2113 of the neural network for visual tracking, etc.
- CPU central processing units
- GPU The image processor
- the processor can be based on executable instructions stored in the read only memory (ROM) 2102 or loaded from the storage part 2108 to the random access memory (RAM) 2103. Executing instructions to perform various appropriate actions and processing.
- the communication part 2112 may include but is not limited to a network card, and the network card may include but is not limited to an IB (Infiniband) network card.
- the processor can communicate with the read-only memory 2102 and/or the random access memory 2103 to execute executable instructions, connect to the communication unit 2112 via the bus 2104, and communicate with other target devices via the communication unit 2112, thereby completing the corresponding in this disclosure step.
- the RAM 2103 can also store various programs and data required for device operation.
- the CPU 2101, the ROM 2102, and the RAM 2103 are connected to each other through a bus 2104.
- ROM2102 is an optional module.
- the RAM 2103 stores executable instructions, or writes executable instructions into the ROM 2102 during runtime, and the executable instructions cause the central processing unit 2101 to execute the steps included in the above-mentioned moving object detection method or intelligent driving control method.
- An input/output (I/O) interface 2105 is also connected to the bus 2104.
- the communication unit 2112 may be integrated, or may be configured to have multiple sub-modules (for example, multiple IB network cards) and be connected to the bus respectively.
- the following components are connected to the I/O interface 2105: an input part 2106 including a keyboard, a mouse, etc.; an output part 2107 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and speakers, etc.; a storage part 2108 including a hard disk, etc. ; And a communication part 2109 including a network interface card such as a LAN card, a modem, etc. The communication section 2109 performs communication processing via a network such as the Internet.
- the driver 2110 is also connected to the I/O interface 2105 as needed.
- a removable medium 2111 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 2110 as needed, so that the computer program read from it is installed in the storage portion 2108 as needed.
- FIG. 21 is only an optional implementation. In the specific practice process, the number and types of components in Figure 21 can be selected, deleted, added or replaced according to actual needs. ; In the setting of different functional components, implementation methods such as separate settings or integrated settings can also be used. For example, GPU2113 and CPU2101 can be set separately, and the GPU2113 can be integrated on CPU2101. The communication part can be set separately or integrated. Set on CPU2101 or GPU2113, etc. These alternative embodiments all fall into the protection scope of the present disclosure.
- the process described below with reference to the flowcharts can be implemented as a computer software program.
- the embodiments of the present disclosure include a computer program product, which includes a computer program product tangibly contained on a machine-readable medium.
- the computer program includes program code for executing the steps shown in the flowchart.
- the program code may include instructions corresponding to the steps in the method provided by the present disclosure.
- the computer program may be downloaded and installed from the network through the communication part 2109, and/or installed from the removable medium 2111.
- the computer program is executed by the central processing unit (CPU) 2101, the instructions described in the present disclosure for realizing the above-mentioned corresponding steps are executed.
- the embodiments of the present disclosure also provide a computer program program product for storing computer-readable instructions, which when executed, cause a computer to execute the procedures described in any of the foregoing embodiments.
- Moving object detection method or intelligent driving control method can be specifically implemented by hardware, software or a combination thereof.
- the computer program product is specifically embodied as a computer storage medium.
- the computer program product is specifically embodied as a software product, such as a software development kit (SDK), etc. Wait.
- SDK software development kit
- the embodiments of the present disclosure also provide another moving object detection method or intelligent driving control method and corresponding devices and electronic equipment, computer storage media, computer programs, and computer program products.
- the method includes: a first device sends a moving object detection instruction or an intelligent driving control instruction to a second device, and the instruction causes the second device to execute the moving object detection method or the intelligent driving control method in any of the foregoing possible embodiments; A device receives the moving object detection result or the intelligent driving control result sent by the second device.
- the visually moving object detection instruction or the intelligent driving control instruction may be specifically a calling instruction
- the first device may instruct the second device to perform the moving object detection operation or the intelligent driving control operation by calling, and respond accordingly
- the second device can execute the steps and/or processes in any embodiment of the above-mentioned moving object detection method or intelligent driving control method.
- the method and apparatus, electronic equipment, and computer-readable storage medium of the present disclosure may be implemented in many ways.
- the method and apparatus, electronic equipment, and computer-readable storage medium of the present disclosure can be implemented by software, hardware, firmware or any combination of software, hardware, and firmware.
- the above-mentioned order of the steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above, unless otherwise specifically stated.
- the present disclosure can also be implemented as programs recorded in a recording medium, and these programs include machine-readable instructions for implementing the method according to the present disclosure.
- the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Automation & Control Theory (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Aviation & Aerospace Engineering (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Transportation (AREA)
- Mechanical Engineering (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Computer Graphics (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Biodiversity & Conservation Biology (AREA)
- Geometry (AREA)
- Evolutionary Biology (AREA)
- Human Computer Interaction (AREA)
- Social Psychology (AREA)
- Psychiatry (AREA)
- Electromagnetism (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Image Analysis (AREA)
- Length Measuring Devices By Optical Means (AREA)
- Traffic Control Systems (AREA)
- Image Processing (AREA)
Abstract
Description
Claims (61)
- 一种运动物体检测方法,其特征在于,包括:获取待处理图像中的像素的深度信息;获取所述待处理图像和参考图像之间的光流信息;其中,所述参考图像和所述待处理图像是基于摄像装置的连续拍摄而获得的具有时序关系的两幅图像;根据所述深度信息和光流信息,获取所述待处理图像中的像素相对于所述参考图像的三维运动场;根据所述三维运动场,确定所述待处理图像中的运动物体。
- 根据权利要求1所述的方法,其特征在于,所述待处理图像为所述摄像装置拍摄的视频中的一视频帧,所述待处理图像的参考图像包括:所述视频帧的前一视频帧。
- 根据权利要求1或2所述的方法,其特征在于,所述获取待处理图像中的像素的深度信息,包括:获取待处理图像的第一视差图;根据所述第一视差图,获取所述待处理图像中的像素的深度信息。
- 根据权利要求3所述的方法,其特征在于,所述待处理图像包括:单目图像,所述获取待处理图像的第一视差图,包括:将待处理图像输入至卷积神经网络中,经由所述卷积神经网络进行视差分析处理,基于所述卷积神经网络的输出,获得所述待处理图像的第一视差图;其中,所述卷积神经网络是利用双目图像样本,训练获得的。
- 根据权利要求4所述的方法,其特征在于,所述获取待处理图像的第一视差图,还包括:获取所述待处理图像的第一水平镜像图的第二视差图的第二水平镜像图,所述待处理图像的第一水平镜像图是对所述待处理图像进行水平方向上的镜像处理所形成的镜像图,所述第二视差图的第二水平镜像图是对所述第二视差图进行水平方向上的镜像处理所形成的镜像图;根据所述第一视差图的权重分布图、以及所述第二水平镜像图的权重分布图,对所述第一视差图进行视差调整,最终获得所述待处理图像的第一视差图。
- 根据权利要求5所述的方法,其特征在于,所述获取所述待处理图像的第一水平镜像图的第二视差图的第二水平镜像图,包括:将待处理图像的第一水平镜像图输入至卷积神经网络中,经由所述卷积神经网络进行视差分析处理,基于所述神经网络的输出,获得所述待处理图像的第一水平镜像图的第二视差图;对所述第二视差图进行镜像处理,获得所述第二水平镜像图。
- 根据权利要求5或6所述的方法,其特征在于,所述权重分布图包括:第一权重分布图以及第二权重分布图中的至少一个;所述第一权重分布图是针对多个待处理图像统一设置的权重分布图;所述第二权重分布图是针对不同待处理图像分别设置的权重分布图。
- 根据权利要求7所述的方法,其特征在于,所述第一权重分布图包括至少两个左右分列的区域,不同区域具有不同的权重值。
- 根据权利要求7或8所述的方法,其特征在于,在所述待处理图像被作为左目图像的情况下:对于所述第一视差图的第一权重分布图中的任意两个区域而言,位于右侧的区域的权重值大于位于左侧的区域的权重值;对于所述第二水平镜像图的第一权重分布图中的任意两个区域而言,位于右侧的区域的权重值大于位于左侧的区域的权重值。
- 根据权利要求9所述的方法,其特征在于:对于所述第一视差图的第一权重分布图中的至少一区域而言,该区域中左侧部分的权重值不大于该区域中右侧部分的权重值;对于所述第二水平镜像图的第一权重分布图中的至少一区域而言,该区域中左侧部分的权重值不大于该区域中右侧部分的权重值。
- 根据权利要求7或8所述的方法,其特征在于,在所述待处理图像被作为右目图像的情况下:对于所述第一视差图的第一权重分布图中的任意两个区域而言,位于左侧的区域的权重值大于位于右侧的区域的权重值;对于所述第二水平镜像图的第一权重分布图中的任意两个区域而言,位于左侧的区域的权重值大 于位于右侧的区域的权重值。
- 根据权利要求11所述的方法,其特征在于:对于所述第一视差图的第一权重分布图中的至少一区域而言,该区域中右侧部分的权重值不大于该区域中左侧部分的权重值;对于所述第二水平镜像图的第一权重分布图中的至少一区域而言,该区域中右侧部分的权重值不大于该区域中左侧部分的权重值。
- 根据权利要求7至12中任一项所述的方法,其特征在于,所述第一视差图的第二权重分布图的设置方式包括:对所述第一视差图进行水平镜像处理,形成镜像视差图;对于所述镜像视差图中的任一像素点而言,如果该像素点的视差值大于该像素点对应的第一变量,则将所述第一视差图的第二权重分布图中的该像素点的权重值设置为第一值,否则,设置为第二值;其中,所述第一值大于第二值。
- 根据权利要求13所述的方法,其特征在于,所述像素点对应的第一变量是根据所述第一视差图中的该像素点的视差值以及大于零的常数值,设置的。
- 根据权利要求7至14中任一项所述的方法,其特征在于,所述第二水平镜像图的第二权重分布图的设置方式包括:对于所述第二水平镜像图中的任一像素点而言,如果所述第一视差图中的该像素点的视差值大于该像素点对应的第二变量,则将所述第二水平镜像图的第二权重分布图中的该像素点的权重值设置为第一值,否则,设置为第二值;其中,所述第一值大于第二值。
- 根据权利要求15所述的方法,其特征在于,所述像素点对应的第二变量是根据所述第一视差图的水平镜像图中的相应像素点的视差值以及大于零的常数值,设置的。
- 根据权利要求7至16所述的方法,其特征在于,所述根据所述第一视差图的权重分布图、以及所述第二水平镜像图的权重分布图,对所述第一视差图进行视差调整,包括:根据所述第一视差图的第一权重分布图和第二权重分布图,调整所述第一视差图中的视差值;根据所述第二水平镜像图的第一权重分布图和第二权重分布图,调整所述第二水平镜像图中的视差值;合并视差值调整后的第一视差图和视差值调整后的第二水平镜像图,最终获得所述待处理图像的第一视差图。
- 根据权利要求4至17中任一项所述的方法,其特征在于,所述卷积神经网络的训练过程,包括:将双目图像样本中的其中一目图像样本输入至待训练的卷积神经网络中,经由所述卷积神经网络进行视差分析处理,基于所述卷积神经网络的输出,获得左目图像样本的视差图和右目图像样本的视差图;根据所述左目图像样本及所述右目图像样本的视差图重建右目图像;根据所述右目图像样本及所述左目图像样本的视差图重建左目图像;根据重建的左目图像和左目图像样本之间的差异、以及重建的右目图像和右目图像样本之间的差异,调整所述卷积神经网络的网络参数。
- 根据权利要求1至18中任一项所述的方法,其特征在于,所述获取所述待处理图像和参考图像之间的光流信息,包括:获取摄像装置拍摄所述待处理图像和所述参考图像的位姿变化信息;根据所述位姿变化信息,建立所述待处理图像中的像素的像素值与所述参考图像中的像素的像素值之间的对应关系;根据所述对应关系,对参考图像进行变换处理;根据所述待处理图像和所述变换处理后的参考图像,计算所述待处理图像和参考图像之间的光流信息。
- 根据权利要求19所述的方法,其特征在于,所述根据所述位姿变化信息,建立所述待处理图像中的像素的像素值与所述参考图像中的像素的像素值之间的对应关系,包括:根据所述深度信息和摄像装置的预设参数,获取所述待处理图像中的像素在待处理图像对应的摄像装置的三维坐标系中的第一坐标;根据所述位姿变化信息,将所述第一坐标转换到所述参考图像对应的摄像装置的三维坐标系中的第二坐标;基于二维图像的二维坐标系,对所述第二坐标进行投影处理,获得所述待处理图像的投影二维坐标;根据所述待处理图像的投影二维坐标和所述参考图像的二维坐标,建立所述待处理图像中的像素的像素值与所述参考图像中的像素的像素值之间的对应关系。
- 根据权利要求1-20任一所述的方法,其特征在于,根据所述三维运动场,确定所述待处理图像中的运动物体,包括:根据所述三维运动场,获取所述待处理图像中的像素在三维空间的运动信息;根据所述像素在三维空间的运动信息,对所述像素进行聚类处理;根据所述聚类处理的结果,确定所述待处理图像中的运动物体。
- 根据权利要求21所述的方法,其特征在于,所述根据所述三维运动场,获取所述待处理图像中的像素在三维空间的运动信息,包括:根据所述三维运动场以及拍摄所述待处理图像和所述参考图像之间的时间差,计算所述待处理图像中的像素在待处理图像对应的摄像装置的三维坐标系的三个坐标轴方向上的速度。
- 根据权利要求21或22所述的方法,其特征在于,所述根据所述像素在三维空间的运动信息,对所述像素进行聚类处理,包括:根据所述像素在三维空间的运动信息,获取所述待处理图像的运动掩膜;根据所述运动掩膜,确定待处理图像中的运动区域;根据运动区域中的像素的三维空间位置信息和运动信息,对所述运动区域中的像素进行聚类处理。
- 根据权利要求23所述的方法,其特征在于,所述像素在三维空间的运动信息包括:像素在三维空间的速度大小,所述根据所述像素在三维空间的运动信息,获取所述待处理图像的运动掩膜,包括:根据预设速度阈值,对所述待处理图像中的像素的速度大小进行过滤处理,形成所述待处理图像的运动掩膜。
- 根据权利要求23或24所述的方法,其特征在于,所述根据运动区域中的像素的三维空间位置信息和运动信息,对所述运动区域中的像素进行聚类处理,包括:将所述运动区域中的像素的三维空间坐标值转化到预定坐标区间;将所述运动区域中的像素的速度转化到预定速度区间;根据转化后的三维空间坐标值和转化后的速度,对所述运动区域中的像素进行密度聚类处理,获得至少一个类簇。
- 根据权利要求25所述的方法,其特征在于,所述根据所述聚类处理的结果,确定所述待处理图像中的运动物体,包括:针对任一个类簇,根据该类簇中的多个像素的速度大小和速度方向,确定运动物体的速度大小和速度方向;其中,一个类簇被作为待处理图像中的一个运动物体。
- 根据权利要求21至26中任一项所述的方法,其特征在于,所述根据所述聚类处理的结果,确定所述待处理图像中的运动物体,还包括:根据属于同一个类簇的像素的空间位置信息,确定所述待处理图像中的运动物体检测框。
- 一种智能驾驶控制方法,其特征在于,包括:通过车辆上设置的摄像装置获取所述车辆所在路面的视频流;采用如权利要求1-27中任一项所述的方法,对所述视频流包括的至少一视频帧进行运动物体检测,确定该视频帧中的运动物体;根据所述运动物体生成并输出所述车辆的控制指令。
- 根据权利要求28所述的方法,其特征在于,所述控制指令包括以下至少之一:速度保持控制指令、速度调整控制指令、方向保持控制指令、方向调整控制指令、预警提示控制指令、驾驶模式切换控制指令。
- 一种运动物体检测装置,其特征在于,包括:第一获取模块,用于获取待处理图像中的像素的深度信息;第二获取模块,用于获取所述待处理图像和参考图像之间的光流信息;其中,所述参考图像和所 述待处理图像是基于摄像装置的连续拍摄而获得的具有时序关系的两幅图像;第三获取模块,用于根据所述深度信息和光流信息,获取所述待处理图像中的像素相对于所述参考图像的三维运动场;确定运动物体模块,用于根据所述三维运动场,确定所述待处理图像中的运动物体。
- 根据权利要求30所述的装置,其特征在于,所述待处理图像为所述摄像装置拍摄的视频中的一视频帧,所述待处理图像的参考图像包括:所述视频帧的前一视频帧。
- 根据权利要求30或31所述的装置,其特征在于,所述第一获取模块包括:第一子模块,用于获取待处理图像的第一视差图;第二子模块,用于根据所述待处理图像的第一视差图,获取所述待处理图像中的像素的深度信息。
- 根据权利要求32所述的装置,其特征在于,所述待处理图像包括:单目图像,所述第一子模块,包括:第一单元,用于将待处理图像输入至卷积神经网络中,经由所述卷积神经网络进行视差分析处理,基于所述卷积神经网络的输出,获得所述待处理图像的第一视差图;其中,所述卷积神经网络是利用双目图像样本,训练获得的。
- 根据权利要求33所述的装置,其特征在于,所述第一子模块,还包括:第二单元,用于获取所述待处理图像的第一水平镜像图的第二视差图的第二水平镜像图,所述待处理图像的第一水平镜像图是对所述待处理图像进行水平方向上的镜像处理所形成的镜像图,所述第二视差图的第二水平镜像图是对所述第二视差图进行水平方向上的镜像处理所形成的镜像图;第三单元,用于根据所述第一视差图的权重分布图、以及所述第二水平镜像图的权重分布图,对所述第一视差图进行视差调整,最终获得所述待处理图像的第一视差图。
- 根据权利要求34所述的装置,其特征在于,所述第二单元用于:将所述第一水平镜像图输入至卷积神经网络中,经由所述卷积神经网络进行视差分析处理,基于所述神经网络的输出,获得所述第二视差图;对所述第二视差图进行镜像处理,获得所述第二水平镜像图。
- 根据权利要求34或35所述的装置,其特征在于,所述权重分布图包括:第一权重分布图以及第二权重分布图中的至少一个;所述第一权重分布图是针对多个待处理图像统一设置的权重分布图;所述第二权重分布图是针对不同待处理图像分别设置的权重分布图。
- 根据权利要求36所述的装置,其特征在于,所述第一权重分布图包括至少两个左右分列的区域,不同区域具有不同的权重值。
- 根据权利要求36或37所述的装置,其特征在于,在所述待处理图像被作为左目图像的情况下:对于所述第一视差图的第一权重分布图中的任意两个区域而言,位于右侧的区域的权重值大于位于左侧的区域的权重值;对于所述第二水平镜像图的第一权重分布图中的任意两个区域而言,位于右侧的区域的权重值大于位于左侧的区域的权重值。
- 根据权利要求38所述的装置,其特征在于:对于所述第一视差图的第一权重分布图中的至少一区域而言,该区域中左侧部分的权重值不大于该区域中右侧部分的权重值;对于所述第二水平镜像图的第一权重分布图中的至少一区域而言,该区域中左侧部分的权重值不大于该区域中右侧部分的权重值。
- 根据权利要求36或37所述的装置,其特征在于,在所述待处理图像被作为右目图像的情况下:对于所述第一视差图的第一权重分布图中的任意两个区域而言,位于左侧的区域的权重值大于位于右侧的区域的权重值;对于所述第二水平镜像图的第一权重分布图中的任意两个区域而言,位于左侧的区域的权重值大于位于右侧的区域的权重值。
- 根据权利要求40所述的装置,其特征在于:对于所述第一视差图的第一权重分布图中的至少一区域而言,该区域中右侧部分的权重值不大于该区域中左侧部分的权重值;对于所述第二水平镜像图的第一权重分布图中的至少一区域而言,该区域中右侧部分的权重值不 大于该区域中左侧部分的权重值。
- 根据权利要求36至41中任一项所述的装置,其特征在于,所述第三单元还用于设置所述第一视差图的第二权重分布图,第三单元设置所述第一视差图的第二权重分布图的方式包括:对所述第一视差图进行水平镜像处理,形成镜像视差图;对于所述镜像视差图中的任一像素点而言,如果该像素点的视差值大于该像素点对应的第一变量,则将所述第一视差图的第二权重分布图中的该像素点的权重值设置为第一值,否则,设置为第二值;其中,所述第一值大于第二值。
- 根据权利要求42所述的装置,其特征在于,所述像素点对应的第一变量是根据所述第一视差图中的该像素点的视差值以及大于零的常数值,设置的。
- 根据权利要求36至43中任一项所述的装置,其特征在于,所述第三单元还用于设置所述第二水平镜像图的第二权重分布图,所述第三单元设置所述第二视差图的第二水平镜像图的第二权重分布图的方式包括:对于所述第二水平镜像图中的任一像素点而言,如果所述第一视差图中的该像素点的视差值大于该像素点对应的第二变量,则将所述第二水平镜像图的第二权重分布图中的该像素点的权重值设置为第一值,否则,设置为第二值;其中,所述第一值大于第二值。
- 根据权利要求44所述的装置,其特征在于,所述像素点对应的第二变量是根据所述第一视差图的水平镜像图中的相应像素点的视差值以及大于零的常数值,设置的。
- 根据权利要求36至45所述的装置,其特征在于,所述第三单元用于:根据所述第一视差图的第一权重分布图和第二权重分布图,调整所述第一视差图中的视差值;根据所述第二水平镜像图的第一权重分布图和第二权重分布图,调整所述第二水平镜像图中的视差值;合并视差值调整后的第一视差图和视差值调整后的第二水平镜像图,最终获得所述待处理图像的第一视差图。
- 根据权利要求33至46中任一项所述的装置,其特征在于,所述装置还包括:训练模块,用于:将双目图像样本中的其中一目图像样本输入至待训练的卷积神经网络中,经由所述卷积神经网络进行视差分析处理,基于所述卷积神经网络的输出,获得左目图像样本的视差图和右目图像样本的视差图;根据所述左目图像样本及所述右目图像样本的视差图重建右目图像;根据所述右目图像样本及所述左目图像样本的视差图重建左目图像;根据重建的左目图像和左目图像样本之间的差异、以及重建的右目图像和右目图像样本之间的差异,调整所述卷积神经网络的网络参数。
- 根据权利要求30至47中任一项所述的装置,其特征在于,所述第二获取模块,包括:第三子模块,用于获取摄像装置拍摄所述待处理图像和所述参考图像的位姿变化信息;第四子模块,用于根据所述位姿变化信息,建立所述待处理图像中的像素的像素值与所述参考图像中的像素的像素值之间的对应关系;第五子模块,用于根据所述对应关系,对参考图像进行变换处理;第六子模块,用于根据所述待处理图像和所述变换处理后的参考图像,计算所述待处理图像和参考图像之间的光流信息。
- 根据权利要求48所述的装置,其特征在于,所述第四子模块用于:根据所述深度信息和摄像装置的预设参数,获取所述待处理图像中的像素在待处理图像对应的摄像装置的三维坐标系中的第一坐标;根据所述位姿变化信息,将所述第一坐标转换到所述参考图像对应的摄像装置的三维坐标系中的第二坐标;基于二维图像的二维坐标系,对所述第二坐标进行投影处理,获得所述待处理图像的投影二维坐标;根据所述待处理图像的投影二维坐标和所述参考图像的二维坐标,建立所述待处理图像中的像素的像素值与所述参考图像中的像素的像素值之间的对应关系。
- 根据权利要求30-49任一所述的装置,其特征在于,所述确定运动物体模块,包括:第七子模块,用于根据所述三维运动场,获取所述待处理图像中的像素在三维空间的运动信息;第八子模块,用于根据所述像素在三维空间的运动信息,对所述像素进行聚类处理;第九子模块,用于根据所述聚类处理的结果,确定所述待处理图像中的运动物体。
- 根据权利要求50所述的装置,其特征在于,所述第七子模块用于:根据所述三维运动场以及拍摄所述待处理图像和所述参考图像之间的时间差,计算所述待处理图像中的像素在待处理图像对应的摄像装置的三维坐标系的三个坐标轴方向上的速度。
- 根据权利要求50或51所述的装置,其特征在于,所述第八子模块包括:第四单元,用于根据所述像素在三维空间的运动信息,获取所述待处理图像的运动掩膜;第五单元,用于根据所述运动掩膜,确定待处理图像中的运动区域;第六单元,用于根据运动区域中的像素的三维空间位置信息和运动信息,对所述运动区域中的像素进行聚类处理。
- 根据权利要求52所述的装置,其特征在于,所述像素在三维空间的运动信息包括:像素在三维空间的速度大小,所述第四单元用于:根据预设速度阈值,对所述待处理图像中的像素的速度大小进行过滤处理,形成所述待处理图像的运动掩膜。
- 根据权利要求52或53所述的装置,其特征在于,所述第六单元用于:将所述运动区域中的像素的三维空间坐标值转化到预定坐标区间;将所述运动区域中的像素的速度转化到预定速度区间;根据转化后的三维空间坐标值和转化后的速度,对所述运动区域中的像素进行密度聚类处理,获得至少一个类簇。
- 根据权利要求54所述的装置,其特征在于,所述第九子模块用于:针对任一个类簇,根据该类簇中的多个像素的速度大小和速度方向,确定运动物体的速度大小和速度方向;其中,一个类簇被作为待处理图像中的一个运动物体。
- 根据权利要求50至55中任一项所述的装置,其特征在于,所述第九子模块还用于:根据属于同一个类簇的像素的空间位置信息,确定所述待处理图像中的运动物体检测框。
- 一种智能驾驶控制装置,其特征在于,包括:第四获取模块,用于通过车辆上设置的摄像装置获取所述车辆所在路面的视频流;权利要求1-27中任一项所述的运动物体检测装置,用于对所述视频流包括的至少一视频帧进行运动物体检测,确定该视频帧中的运动物体;控制模块,用于根据所述运动物体生成并输出所述车辆的控制指令。
- 根据权利要求57所述的装置,其特征在于,所述控制指令包括以下至少之一:速度保持控制指令、速度调整控制指令、方向保持控制指令、方向调整控制指令、预警提示控制指令、驾驶模式切换控制指令。
- 一种电子设备,包括:存储器,用于存储计算机程序;处理器,用于执行所述存储器中存储的计算机程序,且所述计算机程序被执行时,实现上述权利要求1-29中任一项所述的方法。
- 一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时,实现上述权利要求1-29中任一项所述的方法。
- 一种计算机程序,包括计算机指令,当所述计算机指令在设备的处理器中运行时,实现上述权利要求1-29中任一项所述的方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020217001946A KR20210022703A (ko) | 2019-05-29 | 2019-10-31 | 운동 물체 검출 및 지능형 운전 제어 방법, 장치, 매체 및 기기 |
JP2020567917A JP7091485B2 (ja) | 2019-05-29 | 2019-10-31 | 運動物体検出およびスマート運転制御方法、装置、媒体、並びに機器 |
SG11202013225PA SG11202013225PA (en) | 2019-05-29 | 2019-10-31 | Methods, devices, media, and apparatuses of detecting moving object, and of intelligent driving control |
US17/139,492 US20210122367A1 (en) | 2019-05-29 | 2020-12-31 | Methods, devices, media, and apparatuses of detecting moving object, and of intelligent driving control |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910459420.9 | 2019-05-29 | ||
CN201910459420.9A CN112015170A (zh) | 2019-05-29 | 2019-05-29 | 运动物体检测及智能驾驶控制方法、装置、介质及设备 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/139,492 Continuation US20210122367A1 (en) | 2019-05-29 | 2020-12-31 | Methods, devices, media, and apparatuses of detecting moving object, and of intelligent driving control |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020238008A1 true WO2020238008A1 (zh) | 2020-12-03 |
Family
ID=73501844
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/114611 WO2020238008A1 (zh) | 2019-05-29 | 2019-10-31 | 运动物体检测及智能驾驶控制方法、装置、介质及设备 |
Country Status (6)
Country | Link |
---|---|
US (1) | US20210122367A1 (zh) |
JP (1) | JP7091485B2 (zh) |
KR (1) | KR20210022703A (zh) |
CN (1) | CN112015170A (zh) |
SG (1) | SG11202013225PA (zh) |
WO (1) | WO2020238008A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114037740A (zh) * | 2021-11-09 | 2022-02-11 | 北京字节跳动网络技术有限公司 | 图像数据流的处理方法、装置及电子设备 |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113727141B (zh) * | 2020-05-20 | 2023-05-12 | 富士通株式会社 | 视频帧的插值装置以及方法 |
CN112784738B (zh) * | 2021-01-21 | 2023-09-19 | 上海云从汇临人工智能科技有限公司 | 运动目标检测告警方法、装置以及计算机可读存储介质 |
CN113096151B (zh) * | 2021-04-07 | 2022-08-09 | 地平线征程(杭州)人工智能科技有限公司 | 对目标的运动信息进行检测的方法和装置、设备和介质 |
CN113553986B (zh) * | 2021-08-02 | 2022-02-08 | 浙江索思科技有限公司 | 一种船舶上运动目标检测方法及系统 |
CN113781539A (zh) * | 2021-09-06 | 2021-12-10 | 京东鲲鹏(江苏)科技有限公司 | 深度信息获取方法、装置、电子设备和计算机可读介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8866821B2 (en) * | 2009-01-30 | 2014-10-21 | Microsoft Corporation | Depth map movement tracking via optical flow and velocity prediction |
CN104318561A (zh) * | 2014-10-22 | 2015-01-28 | 上海理工大学 | 基于双目立体视觉与光流融合的车辆运动信息检测方法 |
JP2016081108A (ja) * | 2014-10-10 | 2016-05-16 | トヨタ自動車株式会社 | 物体検出装置 |
CN107341815A (zh) * | 2017-06-01 | 2017-11-10 | 哈尔滨工程大学 | 基于多目立体视觉场景流的剧烈运动检测方法 |
CN109272493A (zh) * | 2018-08-28 | 2019-01-25 | 中国人民解放军火箭军工程大学 | 一种基于递归卷积神经网络的单目视觉里程计方法 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5627905A (en) * | 1994-12-12 | 1997-05-06 | Lockheed Martin Tactical Defense Systems | Optical flow detection system |
JP2010204805A (ja) | 2009-03-02 | 2010-09-16 | Konica Minolta Holdings Inc | 周辺監視装置および該方法 |
JP2011209070A (ja) | 2010-03-29 | 2011-10-20 | Daihatsu Motor Co Ltd | 画像処理装置 |
CN102867311B (zh) * | 2011-07-07 | 2015-11-25 | 株式会社理光 | 目标跟踪方法和目标跟踪设备 |
CN104902246B (zh) * | 2015-06-17 | 2020-07-28 | 浙江大华技术股份有限公司 | 视频监视方法和装置 |
CN105100771A (zh) * | 2015-07-14 | 2015-11-25 | 山东大学 | 一种基于场景分类和几何标注的单视点视频深度获取方法 |
CN107330924A (zh) * | 2017-07-07 | 2017-11-07 | 郑州仁峰软件开发有限公司 | 一种基于单目摄像头识别运动物体的方法 |
CN107808388B (zh) * | 2017-10-19 | 2021-10-12 | 中科创达软件股份有限公司 | 包含运动目标的图像处理方法、装置及电子设备 |
CN109727273B (zh) * | 2018-12-29 | 2020-12-04 | 北京茵沃汽车科技有限公司 | 一种基于车载鱼眼相机的移动目标检测方法 |
CN109727275B (zh) * | 2018-12-29 | 2022-04-12 | 北京沃东天骏信息技术有限公司 | 目标检测方法、装置、系统和计算机可读存储介质 |
CN111247557A (zh) * | 2019-04-23 | 2020-06-05 | 深圳市大疆创新科技有限公司 | 用于移动目标物体检测的方法、系统以及可移动平台 |
-
2019
- 2019-05-29 CN CN201910459420.9A patent/CN112015170A/zh active Pending
- 2019-10-31 KR KR1020217001946A patent/KR20210022703A/ko not_active Application Discontinuation
- 2019-10-31 WO PCT/CN2019/114611 patent/WO2020238008A1/zh active Application Filing
- 2019-10-31 JP JP2020567917A patent/JP7091485B2/ja active Active
- 2019-10-31 SG SG11202013225PA patent/SG11202013225PA/en unknown
-
2020
- 2020-12-31 US US17/139,492 patent/US20210122367A1/en not_active Abandoned
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8866821B2 (en) * | 2009-01-30 | 2014-10-21 | Microsoft Corporation | Depth map movement tracking via optical flow and velocity prediction |
JP2016081108A (ja) * | 2014-10-10 | 2016-05-16 | トヨタ自動車株式会社 | 物体検出装置 |
CN104318561A (zh) * | 2014-10-22 | 2015-01-28 | 上海理工大学 | 基于双目立体视觉与光流融合的车辆运动信息检测方法 |
CN107341815A (zh) * | 2017-06-01 | 2017-11-10 | 哈尔滨工程大学 | 基于多目立体视觉场景流的剧烈运动检测方法 |
CN109272493A (zh) * | 2018-08-28 | 2019-01-25 | 中国人民解放军火箭军工程大学 | 一种基于递归卷积神经网络的单目视觉里程计方法 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114037740A (zh) * | 2021-11-09 | 2022-02-11 | 北京字节跳动网络技术有限公司 | 图像数据流的处理方法、装置及电子设备 |
Also Published As
Publication number | Publication date |
---|---|
US20210122367A1 (en) | 2021-04-29 |
JP2021528732A (ja) | 2021-10-21 |
KR20210022703A (ko) | 2021-03-03 |
SG11202013225PA (en) | 2021-01-28 |
CN112015170A (zh) | 2020-12-01 |
JP7091485B2 (ja) | 2022-06-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020238008A1 (zh) | 运动物体检测及智能驾驶控制方法、装置、介质及设备 | |
KR102319177B1 (ko) | 이미지 내의 객체 자세를 결정하는 방법 및 장치, 장비, 및 저장 매체 | |
US11100310B2 (en) | Object three-dimensional detection method and apparatus, intelligent driving control method and apparatus, medium and device | |
WO2020108311A1 (zh) | 目标对象3d检测方法、装置、介质及设备 | |
WO2019179464A1 (zh) | 用于预测目标对象运动朝向的方法、车辆控制方法及装置 | |
US11064178B2 (en) | Deep virtual stereo odometry | |
US11049270B2 (en) | Method and apparatus for calculating depth map based on reliability | |
WO2020258703A1 (zh) | 障碍物检测方法、智能驾驶控制方法、装置、介质及设备 | |
WO2022156626A1 (zh) | 一种图像的视线矫正方法、装置、电子设备、计算机可读存储介质及计算机程序产品 | |
EP4165547A1 (en) | Image augmentation for analytics | |
US20220138977A1 (en) | Two-stage depth estimation machine learning algorithm and spherical warping layer for equi-rectangular projection stereo matching | |
CN110060230B (zh) | 三维场景分析方法、装置、介质及设备 | |
US20210334569A1 (en) | Image depth determining method and living body identification method, circuit, device, and medium | |
US11823415B2 (en) | 3D pose estimation in robotics | |
US11961266B2 (en) | Multiview neural human prediction using implicit differentiable renderer for facial expression, body pose shape and clothes performance capture | |
US20210078597A1 (en) | Method and apparatus for determining an orientation of a target object, method and apparatus for controlling intelligent driving control, and device | |
CN113129352A (zh) | 一种稀疏光场重建方法及装置 | |
WO2023082822A1 (zh) | 图像数据的处理方法和装置 | |
CN113592706B (zh) | 调整单应性矩阵参数的方法和装置 | |
CN107274477B (zh) | 一种基于三维空间表层的背景建模方法 | |
US20220138978A1 (en) | Two-stage depth estimation machine learning algorithm and spherical warping layer for equi-rectangular projection stereo matching | |
US11417063B2 (en) | Determining a three-dimensional representation of a scene | |
KR20240012426A (ko) | 비제약 이미지 안정화 | |
CN111260544B (zh) | 数据处理方法及装置、电子设备和计算机存储介质 | |
Kong et al. | Self-supervised indoor 360-degree depth estimation via structural regularization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2020567917 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19931428 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 20217001946 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19931428 Country of ref document: EP Kind code of ref document: A1 |