WO2020238008A1 - 运动物体检测及智能驾驶控制方法、装置、介质及设备 - Google Patents

运动物体检测及智能驾驶控制方法、装置、介质及设备 Download PDF

Info

Publication number
WO2020238008A1
WO2020238008A1 PCT/CN2019/114611 CN2019114611W WO2020238008A1 WO 2020238008 A1 WO2020238008 A1 WO 2020238008A1 CN 2019114611 W CN2019114611 W CN 2019114611W WO 2020238008 A1 WO2020238008 A1 WO 2020238008A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
processed
disparity
value
weight distribution
Prior art date
Application number
PCT/CN2019/114611
Other languages
English (en)
French (fr)
Inventor
姚兴华
刘润涛
曾星宇
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to KR1020217001946A priority Critical patent/KR20210022703A/ko
Priority to JP2020567917A priority patent/JP7091485B2/ja
Priority to SG11202013225PA priority patent/SG11202013225PA/en
Publication of WO2020238008A1 publication Critical patent/WO2020238008A1/zh
Priority to US17/139,492 priority patent/US20210122367A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0231Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
    • G05D1/0246Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means
    • G05D1/0251Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means extracting 3D information from a plurality of images taken from different locations, e.g. stereo vision
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/08Active safety systems predicting or avoiding probable or impending collision or attempting to minimise its consequences
    • B60W30/09Taking automatic action to avoid collision, e.g. braking and steering
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/08Active safety systems predicting or avoiding probable or impending collision or attempting to minimise its consequences
    • B60W30/095Predicting travel path or likelihood of collision
    • B60W30/0956Predicting travel path or likelihood of collision the prediction being responsive to traffic or environmental parameters
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0214Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory in accordance with safety or protection criteria, e.g. avoiding hazardous areas
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0223Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
    • G05D1/0278Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle using satellite positioning signals, e.g. GPS
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/18Image warping, e.g. rearranging pixels individually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/285Analysis of motion using a sequence of stereo image pairs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/579Depth or shape recovery from multiple images from motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2420/00Indexing codes relating to the type of sensors based on the principle of their operation
    • B60W2420/40Photo, light or radio wave sensitive means, e.g. infrared sensors
    • B60W2420/403Image sensing, e.g. optical camera
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/08Indexing scheme for image data processing or generation, in general involving all processing steps from image acquisition to 3D model generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • G06T2207/10021Stereoscopic video; Stereoscopic image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20072Graph-based image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20228Disparity calculation for image-based rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle
    • G06T2207/30261Obstacle

Definitions

  • the present disclosure relates to computer vision technology, and in particular to a moving object detection method, a moving object detection device, an intelligent driving control method, an intelligent driving control device, electronic equipment, a computer-readable storage medium, and a computer program.
  • Perceived moving objects and their moving directions can be provided to the decision-making layer so that the decision-making layer can make decisions based on the perception results.
  • the decision-making layer can control the vehicle to slow down or even stop to ensure the safe driving of the vehicle.
  • the embodiments of the present disclosure provide a moving object detection technical solution.
  • a method for detecting a moving object includes: acquiring depth information of pixels in an image to be processed; acquiring optical flow information between the image to be processed and a reference image; wherein, The reference image and the image to be processed are two images with a time-series relationship obtained based on continuous shooting by the camera; according to the depth information and optical flow information, the pixels in the image to be processed are obtained relative to the The three-dimensional sports field of the reference image; according to the three-dimensional sports field, the moving object in the image to be processed is determined.
  • an intelligent driving control method including: acquiring a video stream of the road where the vehicle is located through a camera device provided on the vehicle; using the above-mentioned moving object detection method, the video stream includes At least one video frame of the video frame is detected to determine the moving object in the video frame; the vehicle control instruction is generated and output according to the moving object.
  • a moving object detection device including: a first acquisition module for acquiring depth information of pixels in an image to be processed; a second acquisition module for acquiring the image to be processed The optical flow information between the reference image and the reference image; wherein the reference image and the image to be processed are two images with a time series relationship obtained based on continuous shooting by the camera; the third acquisition module is used for Depth information and optical flow information are used to obtain the three-dimensional motion field of the pixels in the image to be processed relative to the reference image; the determining moving object module is used to determine the moving object in the image to be processed according to the three-dimensional motion field.
  • an intelligent driving control device which includes: a fourth acquisition module for acquiring a video stream of the road where the vehicle is located through a camera device provided on the vehicle; the above-mentioned moving object detection device , For detecting a moving object in at least one video frame included in the video stream, and determining a moving object in the video frame; a control module, for generating and outputting a control instruction of the vehicle according to the moving object.
  • an electronic device including: a processor, a memory, a communication interface, and a communication bus.
  • the processor, the memory, and the communication interface complete each other via the communication bus.
  • Inter-communication; the memory is used to store at least one executable instruction, the executable instruction causes the processor to execute the above method.
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, it implements any method embodiment of the present disclosure.
  • a computer program including computer instructions, which, when the computer instructions run in the processor of the device, implement any method embodiment of the present disclosure.
  • the present disclosure uses the depth information of pixels in the image to be processed and the relationship between the image to be processed and the reference image.
  • the optical flow information can obtain the three-dimensional motion field of the pixels in the image to be processed relative to the reference image. Since the three-dimensional motion field can reflect the moving object, the present disclosure can use the three-dimensional motion field to determine the moving object in the image to be processed. It can be seen from this that the technical solution provided by the present disclosure is beneficial to improve the accuracy of sensing moving objects, thereby helping to improve the safety of intelligent driving of the vehicle.
  • FIG. 1 is a flowchart of an embodiment of the moving object detection method of the present disclosure
  • Figure 2 is a schematic diagram of an image to be processed in the present disclosure
  • FIG. 3 is a schematic diagram of an embodiment of the first disparity map of the image to be processed shown in FIG. 2;
  • FIG. 4 is a schematic diagram of an embodiment of the first disparity map of the image to be processed in the present disclosure
  • FIG. 5 is a schematic diagram of an embodiment of the convolutional neural network of the present disclosure.
  • FIG. 6 is a schematic diagram of an embodiment of the first weight distribution diagram of the first disparity map of the present disclosure
  • FIG. 7 is a schematic diagram of another embodiment of the first weight distribution diagram of the first disparity map of the present disclosure.
  • FIG. 8 is a schematic diagram of an embodiment of the second weight distribution diagram of the first disparity map of the present disclosure.
  • FIG. 9 is a schematic diagram of an embodiment of the third disparity map of the present disclosure.
  • FIG. 10 is a schematic diagram of an implementation manner of the second weight distribution diagram of the third disparity map shown in FIG. 9;
  • FIG. 11 is a schematic diagram of an embodiment of the present disclosure to optimize and adjust the first disparity map of the image to be processed
  • FIG. 12 is a schematic diagram of an embodiment of the three-dimensional coordinate system of the present disclosure.
  • FIG. 13 is a schematic diagram of an embodiment of the reference image and the image after Warp processing in the present disclosure
  • FIG. 14 is a schematic diagram of an embodiment of the Warp processed image, the image to be processed, and the optical flow diagram of the image to be processed relative to the reference image of the present disclosure
  • 15 is a schematic diagram of an embodiment of the image to be processed and its motion mask of the present disclosure
  • FIG. 16 is a schematic diagram of an embodiment of a moving object detection frame formed by the present disclosure.
  • FIG. 17 is a flowchart of an embodiment of the convolutional neural network training method of the present disclosure.
  • FIG. 19 is a schematic structural diagram of an embodiment of the moving object detection device of the present disclosure.
  • 20 is a schematic structural diagram of an embodiment of the intelligent driving control device of the present disclosure.
  • Fig. 21 is a block diagram of an exemplary device for implementing the embodiments of the present disclosure.
  • the embodiments of the present disclosure can be applied to electronic devices such as terminal devices, computer systems, and servers, which can operate with many other general or special computing system environments or configurations.
  • Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, and servers including but not limited to: personal computer systems, server computer systems, thin clients, thick Client computers, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, small computer systems, large computer systems, and distributed cloud computing technology environments including any of the above systems, etc. .
  • Electronic devices such as terminal devices, computer systems, and servers can be described in the general context of computer system executable instructions (such as program modules) executed by the computer system.
  • program modules can include routines, programs, target programs, components, logic, and data structures, etc., which perform specific tasks or implement specific abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment. In the distributed cloud computing environment, tasks are executed by remote processing equipment linked through a communication network.
  • program modules may be located on a storage medium of a local or remote computing system including a storage device.
  • FIG. 1 is a flowchart of an embodiment of the moving object detection method of the present disclosure. As shown in FIG. 1, the method of this embodiment includes: step S100, step S110, step S120, and step S130. The steps are described in detail below.
  • the present disclosure may use the disparity map of the image to be processed to obtain the depth information of the pixels (such as all pixels) in the image to be processed. That is, first, the disparity map of the image to be processed is acquired, and then, according to the disparity map of the image to be processed, the depth information of the pixels in the image to be processed is acquired.
  • the disparity map of the image to be processed is referred to as the first disparity map of the image to be processed in the following.
  • the first disparity map in the present disclosure is used to describe the disparity of the image to be processed. Parallax can be considered as the difference in the position of the target object when observing the same target object from two points at a certain distance.
  • An example of the image to be processed is shown in Figure 2.
  • An example of the first disparity map of the image to be processed shown in FIG. 2 is shown in FIG. 3.
  • the first disparity map of the image to be processed in the present disclosure may also be expressed in the form shown in FIG. 4.
  • Each number in FIG. 4 (such as 0, 1, 2, 3, 4, 5, etc.) respectively represents: the disparity of the pixel at the position (x, y) in the image to be processed. It should be particularly noted that FIG. 4 does not show a complete first disparity map.
  • the image to be processed in the present disclosure is usually a monocular image. That is, the image to be processed is usually an image obtained by shooting with a monocular camera device.
  • the present disclosure can realize moving object detection without the need to provide a binocular camera device, thereby helping to reduce the cost of moving object detection.
  • the present disclosure may use a convolutional neural network successfully trained in advance to obtain the first disparity map of the image to be processed.
  • the image to be processed is input into a convolutional neural network, the image to be processed is subjected to disparity analysis processing via the convolutional neural network, and the convolutional neural network outputs the disparity analysis processing result, so that the present disclosure can obtain based on the disparity analysis processing result
  • the first disparity map of the image to be processed By using the convolutional neural network to obtain the first disparity map of the image to be processed, the disparity map can be obtained without using two images for pixel-by-pixel disparity calculation and camera calibration. It is helpful to improve the convenience and real-time performance of obtaining the disparity map.
  • the convolutional neural network in the present disclosure generally includes but is not limited to: multiple convolutional layers (Conv) and multiple deconvolutional layers (Deconv).
  • the convolutional neural network of the present disclosure can be divided into two parts, namely an encoding part and a decoding part.
  • the image to be processed input to the convolutional neural network (the image to be processed as shown in Figure 2) is encoded by the encoding part (ie feature extraction processing), and the encoding processing result of the encoding part is provided to the decoding part,
  • the decoding part decodes the encoding processing result and outputs the decoding processing result.
  • the present disclosure can obtain the first disparity map (the disparity map shown in FIG.
  • the coding part in the convolutional neural network includes but is not limited to: multiple convolutional layers, and multiple convolutional layers are connected in series.
  • the decoding part in the convolutional neural network includes, but is not limited to: multiple convolutional layers and multiple deconvolutional layers, and multiple convolutional layers and multiple deconvolutional layers are arranged at intervals and connected in series.
  • FIG. 5 An example of the convolutional neural network of the present disclosure is shown in FIG. 5.
  • the first rectangle on the left represents the image to be processed in the input convolutional neural network
  • the first rectangle on the right represents the disparity map output by the convolutional neural network.
  • Each rectangle from the second rectangle to the 15th rectangle on the left represents a convolutional layer
  • all the rectangles from the 16th rectangle on the left to the second rectangle on the right represent deconvolution layers and convolutions set apart from each other
  • the 16th rectangle on the left represents the deconvolution layer
  • the 17th rectangle on the left represents the convolution layer
  • the 18th rectangle on the left represents the deconvolution layer
  • the 19th rectangle on the left represents the convolution layer.
  • the convolutional neural network of the present disclosure may merge the low-level information and high-level information in the convolutional neural network by means of skip connection.
  • the output of at least one convolutional layer in the encoding part is provided to at least one deconvolutional layer in the decoding part through a jump connection.
  • the input of all convolutional layers in the convolutional neural network usually includes: the output of the previous layer (such as a convolutional layer or a deconvolutional layer), and at least one deconvolutional layer (such as The input of a partial deconvolution layer or all deconvolution layers) includes: the upsample (Upsample) result of the output of the previous convolution layer and the output of the convolution layer of the coding part jump connected to the deconvolution layer.
  • the content pointed by the solid arrow drawn from below the convolution layer on the right side of Figure 5 represents the output of the previous convolution layer
  • the dotted arrow in Figure 5 represents the upsampling result provided to the deconvolution layer.
  • the solid arrow drawn above the convolutional layer on the left side of FIG. 5 represents the output of the convolutional layer jump-connected to the deconvolutional layer.
  • the present disclosure does not limit the number of jump connections and the network structure of the convolutional neural network.
  • the present disclosure helps to improve the accuracy of the disparity map generated by the convolutional neural network by fusing the low-level information and the high-level information in the convolutional neural network.
  • the convolutional neural network of the present disclosure is obtained by training using binocular image samples. For the training process of the convolutional neural network, refer to the description in the following embodiments. I will not elaborate on it here.
  • the present disclosure may also optimize and adjust the first disparity map of the image to be processed obtained by using the convolutional neural network, so as to obtain a more accurate first disparity map.
  • the present disclosure may use the disparity map of the horizontal mirror image (for example, the left mirror image or the right mirror image) of the image to be processed to optimize and adjust the first disparity map of the image to be processed.
  • the horizontal mirror image of the image to be processed is referred to as the first horizontal mirror image
  • the disparity image of the first horizontal mirror image is referred to as the second disparity image.
  • a specific example of optimizing and adjusting the first disparity map in the present disclosure is as follows:
  • Step A Obtain a horizontal mirror image of the second disparity image.
  • the first horizontal mirror image in the present disclosure is intended to indicate that the mirror image is a mirror image formed by performing a horizontal mirror image processing (not a vertical mirror image processing) on the image to be processed.
  • the horizontal mirror image of the second disparity map is referred to as the second horizontal mirror image below.
  • the second horizontal mirror image in the present disclosure refers to a mirror image formed after the second disparity image is mirrored in the horizontal direction. The second horizontal mirror image is still a disparity image.
  • the present disclosure may first perform left mirror processing or right mirror processing on the image to be processed (because the left mirror processing result is the same as the right mirror processing result, therefore, the present disclosure may perform left mirror processing or right mirror processing on the image to be processed) , Obtain the first horizontal mirror image; then, obtain the disparity map of the first horizontal mirror image; finally, perform left mirror processing or right mirror processing on the second disparity map (because the left mirror processing result of the second disparity map is The right mirror image processing results are the same. Therefore, the present disclosure may perform left mirror processing or right mirror processing on the second disparity map to obtain the second horizontal mirror image.
  • the second horizontal mirror image is referred to as the third disparity image below.
  • the present disclosure when performing horizontal mirroring processing on the image to be processed in the present disclosure, it is not necessary to consider whether the image to be processed is mirrored as a left-eye image or as a right-eye image. That is, regardless of whether the image to be processed is used as a left-eye image or a right-eye image, the present disclosure can perform left mirror processing or right mirror processing on the image to be processed, thereby obtaining the first horizontal mirror image.
  • the present disclosure when performing horizontal mirroring processing on the second disparity map, it may not consider whether the second disparity map should be left mirrored or whether the second disparity map should be mirrored right.
  • the convolutional neural network used to generate the first disparity map of the image to be processed if the left-eye image sample in the binocular image sample is used as input, it is provided to the convolutional neural network for training, Then, the successfully trained convolutional neural network will use the input image to be processed as the left-eye image in testing and practical applications, that is, the image to be processed in the present disclosure is used as the left-eye image to be processed.
  • the successfully trained convolutional neural network will be the input to-be-processed image as the right eye image in testing and practical applications
  • the image to be processed in the present disclosure is regarded as the right eye image to be processed.
  • the present disclosure may also use the aforementioned convolutional neural network to obtain the second disparity map.
  • the first horizontal mirror image is input to a convolutional neural network, and the first horizontal mirror image is subjected to disparity analysis processing via the convolutional neural network, and the convolutional neural network outputs the disparity analysis processing result, so that the present disclosure can be based on the output
  • the result of the disparity analysis processing is to obtain the second disparity map.
  • Step B Obtain the weight distribution map of the disparity map (ie, the first disparity map) and the weight distribution map of the second horizontal mirror image (ie, the third disparity map) of the image to be processed.
  • the weight distribution map of the first disparity map is used to describe the respective weight values of multiple disparity values (for example, all disparity values) in the first disparity map.
  • the weight distribution map of the first disparity map may include, but is not limited to: a first weight distribution map of the first disparity map and a second weight distribution map of the first disparity map.
  • the first weight distribution map of the first disparity map is a weight distribution map set uniformly for the first disparity maps of a plurality of different images to be processed, that is, the first weight distribution map of the first disparity map can be multi-oriented.
  • the first disparity maps of different images to be processed that is, the first disparity maps of different images to be processed use the same first weight distribution map. Therefore, the present disclosure can change the first weight distribution map of the first disparity map It is called the global weight distribution map of the first disparity map.
  • the global weight distribution map of the first disparity map is used to describe the global weight values corresponding to multiple disparity values (such as all disparity values) in the first disparity map.
  • the second weight distribution map of the first disparity map is a weight distribution map set for the first disparity map of a single image to be processed, that is, the second weight distribution map of the first disparity map is for a single image to be processed
  • the first disparity map that is, the first disparity map of different images to be processed uses a different second weight distribution map. Therefore, the present disclosure may refer to the second weight distribution map of the first disparity map as the first The local weight distribution map of the disparity map.
  • the local weight distribution map of the first disparity map is used to describe the respective local weight values of multiple disparity values (such as all disparity values) in the first disparity map.
  • the weight distribution map of the third disparity map is used to describe the respective weight values of the multiple disparity values in the third disparity map.
  • the weight distribution map of the third disparity map may include, but is not limited to: the first weight distribution map of the third disparity map and the second weight distribution map of the third disparity map.
  • the first weight distribution map of the third disparity map is a weight distribution map set uniformly for the third disparity maps of multiple different images to be processed, that is, the first weight distribution map of the third disparity map faces multiple
  • the third disparity map of different images to be processed that is, the third disparity map of different images to be processed uses the same first weight distribution map, therefore, the present disclosure can distribute the first weight of the third disparity map
  • the graph is called the global weight distribution graph of the third disparity graph.
  • the global weight distribution map of the third disparity map is used to describe the respective global weight values of multiple disparity values (such as all disparity values) in the third disparity map.
  • the second weight distribution map of the third disparity map is a weight distribution map set for the third disparity map of a single image to be processed, that is, the second weight distribution map of the third disparity map is for a single image to be processed
  • the third disparity map of the third disparity map that is, the third disparity map of different images to be processed uses a different second weight distribution map. Therefore, the present disclosure may refer to the second weight distribution map of the third disparity map as the third The local weight distribution map of the disparity map.
  • the local weight distribution map of the third disparity map is used to describe the respective local weight values of multiple disparity values (such as all disparity values) in the third disparity map.
  • the first weight distribution map of the first disparity map includes: at least two left and right separated regions, and different regions have different weight values.
  • the magnitude relationship between the weight value of the area on the left and the weight value of the area on the right is usually related to whether the image to be processed is used as the left-eye image to be processed or the right-eye image to be processed.
  • FIG. 6 is a first weight distribution diagram of the disparity map shown in FIG. 3, and the first weight distribution diagram is divided into five regions, namely, region 1, region 2, region 3, region 4, and region 5 shown in FIG. 6 .
  • the weight value of area 1 is less than the weight value of area 2
  • the weight value of area 2 is less than the weight value of area 3
  • the weight value of area 3 is less than the weight value of area 4
  • the weight value of area 4 is less than the weight value of area 5.
  • any region in the first weight distribution map of the first disparity map may have the same weight value, or may have different weight values.
  • the weight value on the left side of the region is usually not greater than the weight value on the right side of the region.
  • the weight value of area 2 can be from left to right, by 0 gradually increases and approaches 0.5; the weight value of region 3 is 0.5; the weight value of region 4 can be from the left to the right, gradually increasing from a value greater than 0.5 and approaches 1; the weight value of region 5 is 1, That is, in the first disparity map, the disparity corresponding to area 5 is completely credible.
  • FIG. 7 shows the first weight distribution diagram of the image to be processed as the first disparity map of the right eye image.
  • the first weight distribution diagram is divided into five regions, namely, region 1, region 2, region 3, Area 4 and Area 5.
  • the weight value of area 5 is less than the weight value of area 4
  • the weight value of area 4 is less than the weight value of area 3
  • the weight value of area 3 is less than the weight value of area 2
  • the weight value of area 2 is less than the weight value of area 1.
  • any region in the first weight distribution map of the first disparity map may have the same weight value, or may have different weight values.
  • the weight value on the right side in the region is usually not greater than the weight value on the left side in the region.
  • the weight value of area 5 in Figure 7 can be 0, that is, in the first disparity map, the disparity corresponding to area 5 is completely unreliable; the weight value of area 4 can be from right to left, from 0 Gradually increase and approach 0.5; the weight value of area 3 is 0.5; the weight value of area 2 can be from the right to the left, gradually increasing from a value greater than 0.5 and approaching 1; the weight value of area 1 is 1, that is In the first disparity map, the disparity corresponding to area 1 is completely credible.
  • the first weight distribution map of the third disparity map includes: at least two left and right separated regions, and different regions have different weight values.
  • the relationship between the weight value of the area on the left and the weight value of the area on the right is usually related to whether the image to be processed is used as the left-eye image or the right-eye image.
  • the weight value of the area on the right is greater than the weight of the area on the left value.
  • any region in the first weight distribution map of the third disparity map may have the same weight value, or may have different weight values. In the case that a region in the first weight distribution map of the third disparity map has different weight values, the weight value on the left side in the region is usually not greater than the weight value on the right side in the region.
  • the weight value of the region on the left is greater than that of the region on the right.
  • Weights any region in the first weight distribution map of the third disparity map may have the same weight value, or may have different weight values.
  • the weight value on the right side in the region is usually not greater than the weight value on the left side in the region.
  • the manner of setting the second weight distribution map of the first disparity map may include the following steps:
  • horizontal mirror processing for example, left mirror processing or right mirror processing
  • the following is referred to as the fourth disparity map.
  • the second weight distribution map of the first disparity map of the image to be processed The weight value of the pixel in is set to the first value, otherwise, the weight value of the pixel is set to the second value.
  • the first value in this disclosure is greater than the second value. For example, the first value is 1 and the second value is 0.
  • FIG. 8 an example of the second weight distribution map of the first disparity map is shown in FIG. 8.
  • the weight values of the white areas in FIG. 8 are all 1, which indicates that the disparity value at this position is completely reliable.
  • the weight value of the black area in FIG. 8 is 0, which means that the disparity value at this position is completely unreliable.
  • the first variable corresponding to the pixel in the present disclosure may be set according to the disparity value of the corresponding pixel in the first disparity map and a constant value greater than zero.
  • the product of the disparity value of the corresponding pixel in the first disparity map and a constant value greater than zero is used as the first variable corresponding to the corresponding pixel in the fourth disparity map.
  • the second weight distribution map of the first disparity map may be expressed by the following formula (1):
  • L l represents the second weight distribution map of the first disparity map
  • Re represents the disparity value of the corresponding pixel in the fourth disparity map
  • d l represents the disparity value of the corresponding pixel in the first disparity map
  • the second weight distribution map of the third disparity map may be set as follows: for any pixel in the first disparity map, if the disparity of the pixel in the first disparity map is If the value is greater than the second variable corresponding to the pixel, the weight value of the pixel in the second weight distribution map of the third disparity map is set to the first value, otherwise, it is set to the second value.
  • the first value in the present disclosure is greater than the second value. For example, the first value is 1 and the second value is 0.
  • the second variable corresponding to the pixel in the present disclosure may be set according to the disparity value of the corresponding pixel in the fourth disparity map and a constant value greater than zero.
  • the first disparity map is left/right mirrored to form a mirror disparity map, that is, the fourth disparity map, and then the product of the disparity value of the corresponding pixel in the fourth disparity map and a constant value greater than zero , As the second variable corresponding to the corresponding pixel in the first disparity map.
  • the present disclosure is based on the image to be processed in FIG. 2, and an example of the formed third disparity map is shown in FIG. 9.
  • An example of the second weight distribution map of the third disparity map shown in FIG. 9 is shown in FIG. 10.
  • the weight values of the white areas in FIG. 10 are all 1, which means that the disparity value at this position is completely reliable.
  • the weight value of the black area in FIG. 10 is 0, which means that the disparity value at this position is completely unreliable.
  • the second weight distribution map of the third disparity map may be expressed by the following formula (2):
  • L l ′ represents the second weight distribution map of the third disparity map
  • Re represents the disparity value of the corresponding pixel in the fourth disparity map
  • d l represents the disparity value of the corresponding pixel in the first disparity map
  • Step C According to the weight distribution map of the first disparity map and the weight distribution map of the third disparity map of the image to be processed, the first disparity map of the image to be processed is optimized and adjusted, and the optimized and adjusted disparity map is finally obtained The disparity map of the image to be processed.
  • the present disclosure may use the first weight distribution map and the second weight distribution map of the first disparity map to adjust multiple disparity values in the first disparity map to obtain the adjusted first disparity map ;
  • an example of obtaining the optimized and adjusted first disparity map of the image to be processed is as follows:
  • the third weight distribution graph can be expressed by the following formula (3):
  • W l represents the third weight distribution map
  • M l represents the first weight distribution map of the first disparity map
  • L l represents the second weight distribution map of the first disparity map
  • the fourth weight distribution graph can be expressed by the following formula (4):
  • W l ' represents the fourth weight distribution map
  • M l ' represents the first weight distribution map of the third disparity map
  • L l ' represents the second weight distribution map of the third disparity map
  • the third weight distribution map adjusts the multiple disparity values in the first disparity map according to the third weight distribution map to obtain the adjusted first disparity map. For example, for the disparity value of any pixel in the first disparity map, the disparity value of the pixel is replaced with: the disparity value of the pixel and the corresponding position in the third weight distribution map The product of the weight values of pixels. After performing the above-mentioned replacement processing on all pixels in the first disparity map, the adjusted first disparity map is obtained.
  • the multiple disparity values in the third disparity map are adjusted according to the fourth weight distribution map to obtain the adjusted third disparity map.
  • the disparity value of any pixel in the third disparity map the disparity value of the pixel is replaced with: the disparity value of the pixel and the corresponding position in the fourth weight distribution map The product of the weight values of pixels.
  • the finally obtained disparity map of the image to be processed can be expressed by the following formula (5):
  • d final represents the finally obtained disparity map of the image to be processed (as shown in the first image on the right in Figure 11);
  • W l represents the third weight distribution map (the upper left in Figure 11) (Shown in the first picture);
  • W l' represents the fourth weight distribution map (shown in the first picture in the lower left in Fig. 11);
  • d l indicates the first disparity map (the second picture in the upper left in Fig. 11 Shown); Shows the third disparity map (shown in the second image from the bottom left in Figure 11).
  • the present disclosure does not limit the execution order of the two steps of merging the first weight distribution map and the second weight distribution map.
  • the two merging processing steps can be executed simultaneously or sequentially.
  • the present disclosure does not limit the sequence of adjusting the disparity value in the first disparity map and adjusting the disparity value in the third disparity map.
  • the two adjustment steps can be performed at the same time or Execute successively.
  • the image to be processed when used as the left-eye image, there are usually phenomena such as missing left side parallax and the left edge of the object being occluded. These phenomena will cause the corresponding area in the parallax map of the image to be processed The disparity value is not accurate.
  • the image to be processed when used as the right eye image to be processed, there are usually phenomena such as missing right side parallax and the right edge of the object being occluded. These phenomena will lead to the corresponding area in the parallax map of the image to be processed The disparity value is not accurate.
  • the present disclosure performs left/right mirror processing on the image to be processed, and performs mirror processing on the disparity map of the mirror image, and then uses the disparity map after the mirror processing to optimize and adjust the disparity map of the image to be processed, which is beneficial to reduce the disparity of the image to be processed
  • the phenomenon that the disparity value of the corresponding area in the disparity map is inaccurate, thereby helping to improve the accuracy of moving object detection.
  • the method for obtaining the first disparity map of the image to be processed in the present disclosure includes but is not limited to: obtaining the first disparity map of the image to be processed by stereo matching. Parallax map. For example, using BM (Block Matching, block matching) algorithm, SGBM (Semi-Global Block Matching, semi-global block matching) algorithm, or GC (Graph Cuts, graph cut) algorithm and other stereo matching algorithms to obtain the first disparity of the image to be processed Figure.
  • BM Block Matching, block matching
  • SGBM Semi-Global Block Matching, semi-global block matching
  • GC Graph Cuts, graph cut
  • a convolutional neural network used to obtain a disparity map of a binocular image is used to perform disparity processing on the image to be processed, thereby obtaining the first disparity map of the image to be processed.
  • the present disclosure may use the following formula (6) to obtain the depth information of the pixels in the image to be processed:
  • Depth represents the depth value of the pixel
  • f x is a known value, which represents the focal length of the camera in the horizontal direction (X-axis direction in the three-dimensional coordinate system)
  • b is a known value, which means that the parallax map is obtained
  • the baseline of the binocular image sample used by the convolutional neural network, b belongs to the calibration parameter of the binocular camera device
  • Disparity represents the parallax of the pixel.
  • the to-be-processed image and the reference image in the present disclosure may be two images that are formed in a sequence relationship during continuous shooting (such as multiple continuous shooting or recording) of the same camera.
  • the time interval for forming two images is usually short to ensure that the contents of the two images are mostly the same.
  • the time interval for forming two images may be the time interval between two adjacent video frames.
  • the time interval for forming two images may be the time interval between two adjacent photos in the continuous photographing mode of the camera device.
  • the image to be processed may be a video frame (such as the current video frame) in the video captured by the camera, and the reference image of the image to be processed is another video frame in the video, for example, the reference image is the current video The previous video frame of the frame.
  • the image to be processed may be one of the multiple photos taken by the camera device based on the continuous photography mode, and the reference image of the image to be processed may be another photo in the multiple photos, such as to be processed The previous or next photo of the image, etc.
  • the image to be processed and the reference image in the present disclosure may both be RGB (Red Green Blue) images or the like.
  • the camera device in the present disclosure may be a camera device installed on a moving object, for example, a camera device installed on vehicles, trains, and airplanes.
  • the reference image in the present disclosure is usually a monocular image. That is, the reference image is usually an image obtained by shooting with a monocular camera.
  • the present disclosure can realize the detection of moving objects without having to set up a binocular camera device, thereby helping to reduce the cost of detecting moving objects.
  • the optical flow information between the image to be processed and the reference image in the present disclosure can be considered as the two-dimensional motion field of the pixels in the image to be processed and the reference image.
  • the optical flow information does not indicate that the pixels are in three dimensions. Real movement in space.
  • the present disclosure can introduce the pose change of the camera device when the image to be processed and the reference image are captured, that is, the present disclosure according to the pose change information of the camera device, To obtain the optical flow information between the image to be processed and the reference image, so as to help eliminate the interference caused by the position and posture change of the camera device in the obtained optical flow information.
  • the method of obtaining the optical flow information between the image to be processed and the reference image according to the pose change information of the camera device of the present disclosure may include the following steps:
  • Step 1 Obtain the pose change information of the image to be processed and the reference image captured by the camera.
  • the pose change information in the present disclosure refers to the difference between the pose of the camera when the image to be processed is captured and the pose when the reference image is captured.
  • the pose change information is based on the three-dimensional space.
  • the pose change information may include: translation information of the camera device and rotation information of the camera device.
  • the translation information of the camera device may include: the displacement of the camera device on three coordinate axes (the coordinate system shown in FIG. 12).
  • the rotation information of the camera device may be: a rotation vector based on Roll, Yaw, and Pitch.
  • the rotation information of the camera device may include rotation component vectors based on the three rotation directions of Roll, Yaw, and Pitch.
  • the rotation information of the camera device can be expressed as the following formula (7):
  • R represents rotation information, which is a 3 ⁇ 3 matrix
  • R 11 represents cos ⁇ cos ⁇ -cos ⁇ sin ⁇ sin ⁇
  • R 12 represents -cos ⁇ cos ⁇ sin ⁇ -cos ⁇ sin ⁇
  • R 13 represents sin ⁇ sin ⁇
  • R 21 represents cos ⁇ sin ⁇ +cos ⁇ cos ⁇ sin ⁇
  • R 22 represents cos ⁇ cos ⁇ cos ⁇ -sin ⁇ sin ⁇
  • R 23 is sin ⁇ sin ⁇
  • R 31 is sin ⁇ sin ⁇
  • R 32 is cos ⁇ sin ⁇
  • R 33 is cos ⁇
  • Euler angle ( ⁇ , ⁇ , ⁇ ) represents the rotation angle based on Roll, Yaw and Pitch.
  • the present disclosure may use visual technology to obtain the pose change information of the image to be processed and the reference image taken by the camera device, for example, use SLAM (Simultaneous Localization And Mapping, instant positioning and map construction) to obtain the pose change information.
  • SLAM Simultaneous Localization And Mapping, instant positioning and map construction
  • the present disclosure may use the open source ORB (Oriented FAST and Rotated Brief, Oriented Fast and Rotated Summary, a descriptor)-SLAM framework's RGBD (Red Green Blue Detph) model to obtain pose change information.
  • ORB Oriented FAST and Rotated Brief, Oriented Fast and Rotated Summary, a descriptor
  • RGBD Red Green Blue Detph
  • the image to be processed RGB image
  • the depth map of the image to be processed and the reference image
  • the present disclosure may also use other methods to obtain pose change information, for example, use GPS (Global Positioning System, global positioning system) and angular velocity sensors to obtain pose
  • the present disclosure may use a 4 ⁇ 4 homogeneous matrix as shown in the following formula (8) to represent the pose change information:
  • T l c represents the pose change information of the image to be processed by the camera device (such as the current video frame c) and the reference image (such as the previous video frame l of the current video frame c), such as the pose Change matrix;
  • R represents the rotation information of the camera device, which is a 3 ⁇ 3 matrix, namely t represents the translation information of the camera device, that is, the translation vector;
  • t can be represented by the three translation components of t x , t y and t z , t x represents the translation component in the X-axis direction, and t y represents the translation component in the Y-axis direction , T z represents the translation component in the Z-axis direction.
  • Step 2 Establish a correspondence between the pixel value of the pixel in the image to be processed and the pixel value of the pixel in the reference image according to the pose change information.
  • the pose of the camera when shooting the image to be processed is usually different from the pose when shooting the reference image. Therefore, the three-dimensional coordinate system corresponding to the image to be processed (that is, the camera shooting The three-dimensional coordinate system of the image to be processed is different from the three-dimensional coordinate system corresponding to the reference image (that is, the three-dimensional coordinate system of the reference image taken by the camera device).
  • the three-dimensional spatial position of the pixel may be converted first, so that the pixel in the image to be processed and the pixel in the reference image are in the same three-dimensional coordinate system.
  • the present disclosure may first obtain the pixels (such as all pixels) in the image to be processed in the three-dimensional coordinate system of the camera device corresponding to the image to be processed according to the depth information obtained above and the parameters (known values) of the camera device. That is, the present disclosure first converts the pixels in the image to be processed into the three-dimensional space, so as to obtain the coordinates of the pixels in the three-dimensional space (ie, three-dimensional coordinates). For example, the present disclosure may use the following formula (9) to obtain the three-dimensional coordinates of any pixel in the image to be processed:
  • Z represents the depth value of the pixel
  • X, Y, and Z represent the three-dimensional coordinates of the pixel (ie the first coordinate)
  • f x represents the horizontal direction of the camera device (the X-axis direction in the three-dimensional coordinate system) Focal length
  • f y represents the focal length of the camera in the vertical direction (the Y-axis direction in the three-dimensional coordinate system)
  • (u, v) represents the two-dimensional coordinates of the pixel in the image to be processed
  • c x , c y represent the image of the camera Principal point coordinates
  • Disparity represents the parallax of pixels.
  • any one pixel is expressed as P i (X i, Y i ,Z i )
  • P i c represents the three-dimensional coordinates of the i-th pixel in the image to be processed, namely P i (X i ,Y i ,Z i );
  • c represents the image to be processed, the value range of i and the number of multiple pixels Related. For example, if the number of pixels is N (N is an integer greater than 1), the value range of i can be 1 to N or 0 to N-1.
  • the present disclosure may convert the first coordinates of the multiple pixels to the corresponding reference image according to the aforementioned pose change information.
  • the second coordinates of a plurality of pixels are obtained.
  • the present disclosure may use the following formula (10) to obtain the second coordinate of any pixel in the image to be processed:
  • P i l represents the second coordinate of the i-th pixel in the image to be processed
  • T l c represents the image taken by the camera to be processed (such as the current video frame c) and the reference image (such as the current video
  • the pose change information of the previous video frame l) of frame c such as the pose change matrix, namely P i c represents the first coordinate of the i-th pixel in the image to be processed.
  • the present disclosure may perform projection processing on the second coordinates of multiple pixels based on the two-dimensional coordinate system of the two-dimensional image, so as to obtain the converted The projected two-dimensional coordinates of the image to be processed to the three-dimensional coordinate system corresponding to the reference image.
  • the present disclosure may use the following formula (11) to obtain the projected two-dimensional coordinates:
  • (u, v) represents the projected two-dimensional coordinates of the pixel in the image to be processed
  • f x represents the focal length of the camera in the horizontal direction (the X-axis direction in the three-dimensional coordinate system)
  • f y represents the camera The focal length of the device in the vertical direction (the Y-axis direction in the three-dimensional coordinate system)
  • c x , c y represent the principal point coordinates of the camera device
  • (X, Y, Z) represent the second coordinates of the pixels in the image to be processed.
  • the present disclosure may establish the pixel values of the pixels in the image to be processed and the reference image according to the projected two-dimensional coordinates and the two-dimensional coordinates of the reference image.
  • the corresponding relationship between the pixel values of the pixels can be expressed as: for any pixel at the same position in the image formed by projecting two-dimensional coordinates and the reference image, the pixel value of the pixel in the image to be processed and the pixel value of the pixel in the reference image value.
  • Step 3 Perform transformation processing on the reference image according to the above corresponding relationship.
  • the present disclosure may use the foregoing correspondence relationship to perform Warp (warping) processing on the reference image, thereby transforming the reference image into the image to be processed.
  • Warp processing warping
  • FIG. 13 An example of Warp processing on the reference image is shown in Figure 13.
  • the left image in FIG. 13 is the reference image
  • the right image in FIG. 13 is the image formed after Warp processing is performed on the reference image.
  • Step 4 Calculate the optical flow information between the image to be processed and the reference image based on the image to be processed and the transformed image.
  • the optical flow information in the present disclosure includes, but is not limited to: dense optical flow information.
  • the optical flow information is calculated for all pixels in the image.
  • the present disclosure may use visual technology to obtain optical flow information, for example, use OpenCV (Open Source Computer Vision Library) to obtain optical flow information.
  • OpenCV Open Source Computer Vision Library
  • the present disclosure can input the image to be processed and the transformed image into a model based on OpenCV, and the model will output the optical flow information between the two input images, so that the present disclosure obtains the difference between the image to be processed and the reference image Optical flow information between.
  • the algorithm used in this model to calculate optical flow information includes but is not limited to: Gunnar Farneback (person's name) algorithm.
  • optical flow information of any pixel in the image to be processed obtained by the present disclosure is expressed as I of ( ⁇ u, ⁇ v)
  • the optical flow information of the pixel usually conforms to the following formula (12):
  • I t (u t ,v t ) represents a pixel in the reference image
  • I t+1 (u t+1 ,v t+1 ) represents the corresponding position in the image to be processed Pixels.
  • the reference image after Warp processing (such as the previous video frame after Warp processing), the image to be processed (such as the current video frame) and the optical flow information obtained by calculation are shown in FIG. 14.
  • the top image in Figure 14 is the reference image after Warp processing
  • the middle image in Figure 14 is the image to be processed
  • the bottom image in Figure 14 is the optical flow information between the image to be processed and the reference image.
  • Optical flow information for the reference image is added for the convenience of detailed comparison.
  • S120 Obtain a three-dimensional motion field of the pixels in the image to be processed relative to the reference image according to the depth information and the optical flow information.
  • the present disclosure can obtain the three-dimensional motion field of the pixels (such as all pixels) in the image to be processed relative to the reference image (which may be referred to as The three-dimensional motion field of the pixels in the image to be processed).
  • the three-dimensional sports field in the present disclosure can be considered as: a three-dimensional sports field formed by scene motion in a three-dimensional space.
  • the three-dimensional motion field of the pixels in the image to be processed can be considered as: the three-dimensional spatial displacement of the pixels in the image to be processed between the image to be processed and the reference image.
  • the three-dimensional sports field can be represented by a scene flow (Scene Flow).
  • the present disclosure may use the following formula (13) to obtain the scene stream I sf ( ⁇ X, ⁇ Y, ⁇ Z) of multiple pixels in the image to be processed:
  • ( ⁇ X, ⁇ Y, ⁇ Z) represents the displacement of any pixel in the image to be processed in the three coordinate axis directions of the three-dimensional coordinate system
  • ⁇ I depth represents the depth value of the pixel
  • ( ⁇ u, ⁇ v) represents the optical flow information of the pixel, that is, the displacement of the pixel in the two-dimensional image between the image to be processed and the reference image
  • f x represents the focal length of the camera in the horizontal direction (X-axis direction in the three-dimensional coordinate system)
  • f y represents the focal length of the imaging device in the vertical direction (the Y axis direction in the three-dimensional coordinate system)
  • c x , c y represent the principal point coordinates of the imaging device.
  • S130 Determine a moving object in the image to be processed according to the three-dimensional sports field.
  • the present disclosure may determine the motion information of the object in the image to be processed in the three-dimensional space according to the three-dimensional sports field.
  • the movement information of an object in the three-dimensional space can indicate whether the object is a moving object.
  • the present disclosure may first obtain the motion information of the pixels in the image to be processed in the three-dimensional space according to the three-dimensional sports field; then, perform clustering processing on the pixels according to the motion information of the pixels in the three-dimensional space; and finally, according to the clustering processing As a result, determine the motion information of the object in the image to be processed in the three-dimensional space to determine the motion object in the image to be processed.
  • the motion information of pixels in the image to be processed in the three-dimensional space may include, but is not limited to: the speed of multiple pixels (such as all pixels) in the image to be processed in the three-dimensional space.
  • the speed here is usually in the form of a vector, that is, the speed of the pixel in the present disclosure can reflect the speed of the pixel and the direction of the speed of the pixel.
  • the present disclosure can easily obtain the motion information of the pixels in the image to be processed in the three-dimensional space by using the three-dimensional sports field.
  • the three-dimensional space in the present disclosure includes: a three-dimensional space based on a three-dimensional coordinate system.
  • the three-dimensional coordinate system may be: the three-dimensional coordinate system of the camera device that takes the image to be processed.
  • the Z axis of the three-dimensional coordinate system is usually the optical axis of the imaging device, that is, the depth direction.
  • FIG. 12 an example of the X axis, Y axis, Z axis and origin of the three-dimensional coordinate system of the present disclosure is shown in FIG. 12.
  • the X-axis points to the horizontal to the right
  • the Y-axis points to the bottom of the vehicle
  • the Z-axis points to the front of the vehicle.
  • the origin of the three-dimensional coordinate system is at the camera Light center position.
  • the present disclosure can calculate the three-dimensional coordinate system of the three-dimensional coordinate system of the camera corresponding to the image to be processed in the three-dimensional coordinate system of the image to be processed according to the three-dimensional sports field and the time difference ⁇ t between the image to be processed and the reference image taken by the camera.
  • v x , v y and v z respectively represent the speed of any pixel in the image to be processed in the three coordinate axis directions of the three-dimensional coordinate system of the camera device corresponding to the image to be processed;
  • ( ⁇ X , ⁇ Y, ⁇ Z) represents the displacement of the pixel in the image to be processed in the three coordinate axis directions of the three-dimensional coordinate system of the camera corresponding to the image to be processed;
  • ⁇ t represents the time difference between the image to be processed and the reference image taken by the camera .
  • the present disclosure may first determine the motion area in the image to be processed, and perform clustering processing on the pixels in the motion area. For example, according to the motion information of the pixels in the motion area in the three-dimensional space, the pixels in the motion area are clustered. For another example, according to the motion information of the pixels in the motion area in the three-dimensional space and the position of the pixels in the three-dimensional space, the pixels in the motion area are clustered.
  • the present disclosure may use a motion mask to determine the motion area in the image to be processed. For example, the present disclosure may obtain the motion mask of the image to be processed according to the motion information of the pixel in the three-dimensional space.
  • the present disclosure may filter the speeds of multiple pixels (such as all pixels) in the image to be processed according to a preset speed threshold, so as to form a motion mask of the image to be processed according to the result of the filter processing.
  • the present disclosure can use the following formula (17) to obtain the motion mask of the image to be processed:
  • I motion represents a pixel in the motion mask; if the velocity of the pixel
  • the present disclosure may refer to an area composed of pixels with a value of 1 in the motion mask as a motion area, and the size of the motion mask is the same as the size of the image to be processed. Therefore, the present disclosure can determine the motion region in the image to be processed based on the motion region in the motion mask.
  • An example of the motion mask in the present disclosure is shown in FIG. 15.
  • the bottom image in Figure 15 is the image to be processed, and the top image in Figure 15 is the motion mask of the image to be processed.
  • the black part in the picture above is the non-motion area, and the gray part in the picture above is the motion area.
  • the moving area in the picture above is basically the same as the moving object in the picture below.
  • the accuracy of the present disclosure for determining the motion area in the image to be processed will also increase.
  • the present disclosure may first standardize the three-dimensional spatial position information and motion information of the pixels in the motion area. Processing, so that the three-dimensional space coordinate values of the pixels in the motion area are converted into a predetermined coordinate interval (such as [0, 1]), and the speed of the pixels in the motion area is converted into a predetermined speed interval (such as [0, 1]) )in. After that, using the transformed three-dimensional space coordinate value and speed, perform density clustering processing, thereby obtaining at least one cluster.
  • a predetermined coordinate interval such as [0, 1]
  • speed of the pixels in the motion area is converted into a predetermined speed interval (such as [0, 1])
  • the standardization processing in the present disclosure includes but not limited to: min-max (minimum-maximum) standardization processing, Z-score (score) standardization processing, and the like.
  • the min-max standardization process for the three-dimensional spatial position information of the pixels in the motion area can be expressed by the following formula (18), and the min-max standardization process for the motion information of the pixels in the motion area can be expressed by the following formula ( 19) means:
  • (X, Y, Z) represents the three-dimensional spatial position information of a pixel in the motion area in the image to be processed;
  • (X * , Y * , Z * ) represents the normalized processing of the pixel (X min , Y min , Z min ) represents the minimum X coordinate, minimum Y coordinate and minimum Z coordinate in the three-dimensional spatial position information of all pixels in the motion area;
  • (X max , Y max ,Z max ) represents the maximum X coordinate, the maximum Y coordinate, and the maximum Z coordinate in the three-dimensional space position information of all pixels in the motion area.
  • (v x , v y , v z ) represents the velocity of the pixels in the motion area in the three coordinate axis directions in the three-dimensional space; Represents the speed of (v x , v y , v z ) after min-max normalization processing; (v xmin , v ymin , v zmin ) represents the three coordinate axis directions of all pixels in the motion area in the three-dimensional space (V xmax ,v ymax ,v zmax ) represents the maximum velocity of all pixels in the motion area in the three coordinate axis directions in the three-dimensional space.
  • the clustering algorithm used in the clustering process of the present disclosure includes, but is not limited to: a density clustering algorithm.
  • a density clustering algorithm For example, DBSCAN (Density-Based Spatial Clustering of Applications with Noise, density-based clustering method with noise) and so on.
  • DBSCAN Density-Based Spatial Clustering of Applications with Noise, density-based clustering method with noise
  • Each cluster obtained by clustering corresponds to a moving object instance, that is, each cluster can be regarded as a moving object in the image to be processed.
  • the present disclosure can determine the moving object instance corresponding to the cluster according to the speed and direction of multiple pixels (such as all pixels) in the cluster.
  • the magnitude and direction of the speed may use the average speed and average direction of all pixels in the cluster to represent the speed and direction of the moving object instance corresponding to the cluster.
  • the present disclosure may use the following formula (20) to express the speed and direction of the moving object instance corresponding to a type of cluster:
  • the present disclosure may also determine the category based on the position information (ie, the two-dimensional coordinates in the image to be processed) of multiple pixels (such as all pixels) belonging to the same cluster
  • the moving object instance corresponding to the cluster is the moving object detection box (Bounding-Box) in the image to be processed.
  • the present disclosure can calculate the maximum column coordinate u max and the minimum column coordinate u min of all pixels in the cluster in the image to be processed, and calculate the maximum of all pixels in the cluster
  • the row coordinate v max and the minimum row coordinate v min (Note: It is assumed that the origin of the image coordinate system is located at the upper left corner of the image).
  • the coordinates of the moving object detection frame obtained in the present disclosure in the image to be processed can be expressed as (u min , v min , u max , v max ).
  • FIG. 16 an example of the moving object detection frame in the image to be processed determined by the present disclosure is shown in the lower figure in FIG. 16. If the moving object detection frame is reflected in the moving mask, it is as shown in the upper figure in Figure 16.
  • the multiple rectangular boxes in the upper and lower figures of FIG. 16 are all moving object detection boxes obtained in the present disclosure.
  • the present disclosure may also determine the position information of the moving object in the three-dimensional space according to the position information of multiple pixels belonging to the same cluster in the three-dimensional space.
  • the position information of the moving object in the three-dimensional space includes but is not limited to: the coordinate of the moving object on the horizontal coordinate axis (X coordinate axis), the coordinate of the moving object on the depth coordinate axis (Z coordinate axis), and the moving object in the vertical direction.
  • the height in the straight direction that is, the height of the moving object), etc.
  • the present disclosure may first determine the distance between all pixels in a cluster and the camera device based on the position information of all pixels belonging to the same cluster in the three-dimensional space, and then place the closest pixel in the The position information in the three-dimensional space is used as the position information of the moving object in the three-dimensional space.
  • the present disclosure may use the following formula (21) to calculate the distance between multiple pixels in a cluster and the camera device, and select the minimum distance:
  • d min represents the minimum distance
  • X represents the X coordinate of a class i in the i-th cluster of pixels
  • Z i represents a Z coordinate class i-th cluster of pixels.
  • the X and Z coordinates of the pixel with the minimum distance can be used as the position information of the moving object in the three-dimensional space, as shown in the following formula (22):
  • O X represents the coordinate of the moving object on the horizontal coordinate axis, that is, the X coordinate of the moving object
  • O Z represents the coordinate of the moving object on the depth direction coordinate axis (Z coordinate axis), that is, the movement The Z coordinate of the object
  • X close represents the X coordinate of the pixel with the smallest distance calculated above
  • Z close represents the Z coordinate of the pixel with the smallest distance calculated above.
  • the present disclosure may use the following formula (23) to calculate the height of the moving object:
  • O H represents the height of the moving object in three-dimensional space
  • Y max represents the maximum Y coordinates of all the pixels of one class clusters in the three-dimensional space
  • Y min represents all pixels of one class clusters in The smallest Y coordinate in three-dimensional space.
  • FIG. 17 The flow of an embodiment of training a convolutional neural network in the present disclosure is shown in FIG. 17.
  • the image samples input to the convolutional neural network of the present disclosure may always be left-eye image samples of binocular image samples, or may always be right-eye image samples of binocular image samples.
  • the successfully trained convolutional neural network will use the input to-be-processed image as the to-be-processed image in the test or actual application scenarios. Process the left eye image.
  • the successfully trained convolutional neural network will use the input to-be-processed image as the to-be-processed image in the test or actual application scenarios. Process the right eye image.
  • S1710 Perform disparity analysis processing via a convolutional neural network, and obtain a disparity map of the left-eye image sample and a disparity map of the right-eye image sample based on the output of the convolutional neural network.
  • S1720 Reconstruct the right-eye image according to the disparity map of the left-eye image sample and the right-eye image sample.
  • the method of reconstructing the right-eye image in the present disclosure includes but is not limited to: performing reprojection calculation on the disparity map of the left-eye image sample and the right-eye image sample to obtain the reconstructed right-eye image.
  • S1730 Reconstruct the left-eye image according to the disparity map of the right-eye image sample and the left-eye image sample.
  • the method of reconstructing the left-eye image in the present disclosure includes but is not limited to: performing re-projection calculation on the right-eye image sample and the disparity map of the left-eye image sample to obtain the reconstructed left-eye image.
  • S1740 Adjust the network parameters of the convolutional neural network according to the difference between the reconstructed left-eye image and the left-eye image sample, and the difference between the reconstructed right-eye image and the right-eye image sample.
  • the loss function used in the present disclosure when determining the difference includes, but is not limited to: L1 loss function, smooth loss function, lr-Consistency loss function, etc.
  • the present disclosure propagates the calculated loss back to adjust the network parameters of the convolutional neural network (such as the weight of the convolution kernel)
  • the gradient calculated based on the chain derivation of the convolutional neural network can be used.
  • To back-propagate the loss which helps to improve the training efficiency of the convolutional neural network.
  • the predetermined iterative conditions in the present disclosure may include: the difference between the left eye image and the left eye image sample reconstructed based on the disparity map output by the convolutional neural network, and the right eye image and the right eye image reconstructed based on the disparity map output by the convolutional neural network.
  • the difference between the image samples meets the predetermined difference requirement. If the difference meets the requirements, the training of the convolutional neural network is successfully completed this time.
  • the predetermined iterative conditions in the present disclosure may also include: training the convolutional neural network, and the number of binocular image samples used reaches a predetermined number requirement, etc.
  • the number of binocular image samples in use reaches the predetermined number requirement.
  • the difference between the left eye image and the left eye image samples reconstructed based on the disparity map output by the convolutional neural network, and the disparity map output based on the convolutional neural network If the difference between the reconstructed right-eye image and the right-eye image sample does not meet the predetermined difference requirements, the training of the convolutional neural network is not successful this time.
  • FIG. 18 is a flowchart of an embodiment of the intelligent driving control method of the present disclosure.
  • the intelligent driving control method of the present disclosure can be applied but not limited to: an automatic driving (such as a fully unassisted automatic driving) environment or an assisted driving environment.
  • the camera device includes, but is not limited to, an RGB-based camera device.
  • S1810 Perform moving object detection on at least one video frame included in the video stream to obtain a moving object in the video frame, for example, obtain motion information of an object in the video frame in a three-dimensional space.
  • a moving object in the video frame for example, obtain motion information of an object in the video frame in a three-dimensional space.
  • S1820 Generate and output a vehicle control instruction according to the moving object in the video frame. For example, according to the motion information of the object in the video frame in the three-dimensional space, a vehicle control instruction is generated and output to control the vehicle.
  • control commands generated by the present disclosure include but are not limited to: speed maintaining control commands, speed adjustment control commands (such as decelerating driving commands, accelerating driving commands, etc.), direction maintaining control commands, and direction adjustment control commands (such as left steering commands) , Right turn command, left lane merging command, or right lane merging command, etc.), whistle command, warning prompt control command or driving mode switching control command (such as switching to automatic cruise driving mode, etc.).
  • the moving object detection technology of the present disclosure can be applied in the field of intelligent driving control, and can also be applied in other fields; for example, it can realize the movement of moving object detection in industrial manufacturing and indoor fields such as supermarkets. Object detection and moving object detection in the security field, etc.
  • the present disclosure does not limit the applicable scenarios of moving object detection technology.
  • the moving object detection device provided by the present disclosure is shown in FIG. 19.
  • the device shown in FIG. 19 includes: a first acquisition module 1900, a second acquisition module 1910, a third acquisition module 1920, and a moving object determination module 1930.
  • the device may further include: a training module.
  • the first acquiring module 1900 is used to acquire the depth information of the pixels in the image to be processed.
  • the first acquisition module 1900 may include: a first sub-module and a second sub-module.
  • the first sub-module is used to obtain the first disparity map of the image to be processed.
  • the second sub-module is used to obtain the depth information of the pixels in the image to be processed according to the first disparity map of the image to be processed.
  • the image to be processed in the present disclosure includes: a monocular image.
  • the first sub-module includes: a first unit, a second unit and a third unit.
  • the first unit is used to input the image to be processed into the convolutional neural network, perform disparity analysis processing through the convolutional neural network, and obtain the first disparity map of the image to be processed based on the output of the convolutional neural network.
  • the convolutional neural network is obtained by the training module using binocular image samples.
  • the second unit is used to obtain the second horizontal mirror image of the first horizontal mirror image and the second disparity map of the image to be processed.
  • the first horizontal mirror image of the image to be processed is the result of performing mirror processing in the horizontal direction on the image to be processed.
  • the formed mirror image, and the second horizontal mirror image of the second disparity image is a mirror image formed by performing mirror image processing on the second disparity image in the horizontal direction.
  • the third unit is used to adjust the disparity of the first disparity map of the image to be processed according to the weight distribution map of the first disparity map of the image to be processed and the weight distribution map of the second horizontal mirror image of the second disparity map to finally obtain The first disparity map of the image to be processed.
  • the second unit may input the first level mirror image of the image to be processed into the convolutional neural network, perform disparity analysis processing via the convolutional neural network, and obtain the first level of the image to be processed based on the output of the neural network
  • the second disparity map of the mirror image the second unit performs mirror processing on the second disparity map of the first horizontal mirror image of the image to be processed to obtain the second horizontal mirror image of the first horizontal mirror image of the image to be processed .
  • the weight distribution diagram in the present disclosure includes: at least one of a first weight distribution diagram and a second weight distribution diagram; the first weight distribution diagram is a weight distribution diagram uniformly set for a plurality of images to be processed; second The weight distribution map is a weight distribution map set separately for different images to be processed.
  • the first weight distribution map includes at least two left and right regions, and different regions have different weight values.
  • the weight value of the area on the right is greater than the area on the left
  • the weight value of the region on the right is greater than the weight value of the region on the left
  • the weight value of the area on the left is greater than the area on the right
  • the weight value of the region on the left is greater than the weight value of the region on the right
  • the third unit is also used to set a second weight distribution map of the first disparity map of the image to be processed.
  • the third unit performs horizontal mirroring processing on the first disparity map of the image to be processed to form a mirror disparity map; For any pixel in the mirror disparity map, if the disparity value of the pixel is greater than the first variable corresponding to the pixel, set the weight value of the pixel in the second weight distribution map of the image to be processed Is the first value, otherwise, set to the second value; where the first value is greater than the second value.
  • the first variable corresponding to the pixel is set according to the disparity value of the pixel in the first disparity map of the image to be processed and a constant value greater than zero.
  • the third unit is further configured to set a second weight distribution map of the second horizontal mirror image of the second disparity map, for example, for any pixel in the second horizontal mirror image of the second disparity map, If the disparity value of the pixel in the first disparity map of the image to be processed is greater than the second variable corresponding to the pixel, the third unit will map the second weight distribution of the second horizontal mirror image of the second disparity map The weight value of the pixel is set to the first value, otherwise, the third unit sets it to the second value; wherein the first value is greater than the second value.
  • the second variable corresponding to the pixel is set according to the disparity value of the corresponding pixel in the horizontal mirror image of the first disparity map of the image to be processed and a constant value greater than zero.
  • the third unit may be further configured to: firstly, adjust the disparity value in the first disparity map of the image to be processed according to the first weight distribution map and the second weight distribution map of the first disparity map of the image to be processed ; Afterwards, the third unit adjusts the disparity value in the second horizontal mirror image of the second disparity map according to the first weight distribution map and the second weight distribution map of the second horizontal mirror image of the second disparity map; finally, the first The three units merge the first disparity map after the disparity value adjustment and the second horizontal mirror image after the disparity value adjustment, and finally obtain the first disparity map of the image to be processed.
  • the first obtaining module 1900 and the sub-modules and units it includes reference may be made to the foregoing description of S100, which is not described in detail here.
  • the second acquisition module 1910 is used to acquire the optical flow information between the image to be processed and the reference image.
  • the reference image and the image to be processed are two images with a time series relationship obtained based on the continuous shooting of the camera.
  • the image to be processed is a video frame in the video shot by the camera device, and the reference image of the image to be processed includes: the previous video frame of the video frame.
  • the second acquisition module 1910 may include: a third submodule, a fourth submodule, a fifth submodule, and a sixth submodule.
  • the third sub-module is used to obtain the pose change information of the image to be processed and the reference image taken by the camera;
  • the fourth sub-module is used to establish the pixel value of the pixel in the to-be-processed image and the reference image according to the pose change information The corresponding relationship between the pixel values of the pixels;
  • the fifth sub-module is used to transform the reference image according to the above-mentioned correspondence;
  • the sixth sub-module is used to calculate the reference image based on the image to be processed and the transformed reference image Optical flow information between the image to be processed and the reference image.
  • the fourth sub-module may first obtain the first coordinates of the pixels in the image to be processed in the three-dimensional coordinate system of the camera device corresponding to the image to be processed according to the depth information and the preset parameters of the camera; then, the fourth sub-module The first coordinate can be converted to the second coordinate in the three-dimensional coordinate system of the camera device corresponding to the reference image according to the pose change information; after that, based on the two-dimensional coordinate system of the two-dimensional image, the fourth sub-module is The coordinates are projected to obtain the projected two-dimensional coordinates of the image to be processed; finally, the fourth sub-module establishes the pixel value and reference of the pixels in the image to be processed according to the projected two-dimensional coordinates of the image to be processed and the two-dimensional coordinates of the reference image Correspondence between the pixel values of pixels in the image.
  • S110 the specific operations performed by the second acquisition module 1910 and the sub-modules and units it includes, refer to the foregoing description of S110, which is not described in detail here
  • the third acquiring module 1920 is configured to acquire the three-dimensional motion field of the pixels in the image to be processed relative to the reference image according to the depth information and the optical flow information.
  • the third acquiring module 1920 For the specific operations performed by the third acquiring module 1920, refer to the above description of S120, which will not be described in detail here.
  • the moving object determining module 1930 is used to determine the moving object in the image to be processed according to the three-dimensional sports field.
  • the module for determining a moving object may include: a seventh sub-module, an eighth sub-module, and a ninth sub-module.
  • the seventh sub-module is used to obtain the motion information of the pixels in the image to be processed in the three-dimensional space according to the three-dimensional sports field.
  • the seventh sub-module can calculate the position of the pixels in the image to be processed in the three coordinate axis directions of the three-dimensional coordinate system of the camera device corresponding to the image to be processed according to the three-dimensional sports field and the time difference between shooting the image to be processed and the reference image. speed.
  • the eighth sub-module is used for clustering the pixels according to the motion information of the pixels in the three-dimensional space.
  • the eighth sub-module includes: the fourth unit, the fifth unit, and the sixth unit.
  • the fourth unit is used to obtain the motion mask of the image to be processed according to the motion information of the pixel in the three-dimensional space.
  • the motion information of the pixel in the three-dimensional space includes: the speed of the pixel in the three-dimensional space.
  • the fourth unit can filter the speed of the pixel in the image to be processed according to the preset speed threshold to form the motion mask of the image to be processed .
  • the fifth unit is used to determine the motion area in the image to be processed according to the motion mask.
  • the sixth unit is used for clustering the pixels in the motion area according to the three-dimensional space position information and motion information of the pixels in the motion area. For example, the sixth unit can convert the three-dimensional coordinate values of the pixels in the motion area into a predetermined coordinate interval; afterwards, the sixth unit converts the speed of the pixels in the motion area into a predetermined speed interval; finally, the sixth unit converts Perform density clustering processing on the pixels in the motion area to obtain at least one cluster.
  • the ninth sub-module is used to determine the moving object in the image to be processed according to the result of the clustering processing. For example, for any cluster, the ninth sub-module can determine the speed and direction of the moving object according to the speed and direction of multiple pixels in the cluster; among them, one cluster is used as the image to be processed.
  • the ninth sub-module is also used to determine the moving object detection frame in the image to be processed according to the spatial position information of the pixels belonging to the same cluster.
  • S130 For the specific operations performed by determining the moving object module 1930 and the sub-modules and units included therein, reference may be made to the foregoing description of S130, which is not described in detail here.
  • the training module is used to input one of the binocular image samples into the convolutional neural network to be trained, and perform parallax analysis processing through the convolutional neural network. Based on the output of the convolutional neural network, the training module obtains the left eye image sample The training module reconstructs the right-eye image based on the disparity map of the left-eye image sample and the right-eye image sample; the training module reconstructs the left-eye image based on the disparity map of the right-eye image sample and the left-eye image sample; the training module is based on the reconstructed The difference between the left-eye image and the left-eye image sample, and the difference between the reconstructed right-eye image and the right-eye image sample, adjust the network parameters of the convolutional neural network.
  • the specific operations performed by the training module can be referred to the above description with respect to FIG. 17, which will not be described in detail here.
  • the intelligent driving control device provided by the present disclosure is shown in FIG. 20.
  • the device shown in FIG. 20 includes: a fourth acquisition module 2000, a moving object detection device 2010, and a control module 2020.
  • the fourth acquisition module 2000 is used to acquire the video stream of the road where the vehicle is located through the camera device provided on the vehicle.
  • the moving object detection device 2010 is configured to perform moving object detection on at least one video frame included in the video stream, and determine the moving object in the video frame.
  • the structure of the moving object detection device 2010 and the specific operations performed by each module, sub-module, and unit can be referred to the description of FIG. 19, which is not described in detail here.
  • the control module 2020 is used to generate and output vehicle control commands according to the moving objects.
  • the control commands generated and output by the control module 2020 include, but are not limited to: speed maintaining control commands, speed adjustment control commands, direction maintaining control commands, direction adjustment control commands, warning prompt control commands, and driving mode switching control commands.
  • FIG. 21 shows an exemplary device 2100 suitable for implementing the present disclosure.
  • the device 2100 may be a control system/electronic system configured in a car, a mobile terminal (for example, a smart mobile phone, etc.), a personal computer (PC, for example, a desktop computer). Or notebook computers, etc.), tablets, servers, etc.
  • the device 2100 includes one or more processors, communication parts, etc., the one or more processors may be: one or more central processing units (CPU) 2101, and/or, one or more The image processor (GPU) 2113 of the neural network for visual tracking, etc.
  • CPU central processing units
  • GPU The image processor
  • the processor can be based on executable instructions stored in the read only memory (ROM) 2102 or loaded from the storage part 2108 to the random access memory (RAM) 2103. Executing instructions to perform various appropriate actions and processing.
  • the communication part 2112 may include but is not limited to a network card, and the network card may include but is not limited to an IB (Infiniband) network card.
  • the processor can communicate with the read-only memory 2102 and/or the random access memory 2103 to execute executable instructions, connect to the communication unit 2112 via the bus 2104, and communicate with other target devices via the communication unit 2112, thereby completing the corresponding in this disclosure step.
  • the RAM 2103 can also store various programs and data required for device operation.
  • the CPU 2101, the ROM 2102, and the RAM 2103 are connected to each other through a bus 2104.
  • ROM2102 is an optional module.
  • the RAM 2103 stores executable instructions, or writes executable instructions into the ROM 2102 during runtime, and the executable instructions cause the central processing unit 2101 to execute the steps included in the above-mentioned moving object detection method or intelligent driving control method.
  • An input/output (I/O) interface 2105 is also connected to the bus 2104.
  • the communication unit 2112 may be integrated, or may be configured to have multiple sub-modules (for example, multiple IB network cards) and be connected to the bus respectively.
  • the following components are connected to the I/O interface 2105: an input part 2106 including a keyboard, a mouse, etc.; an output part 2107 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and speakers, etc.; a storage part 2108 including a hard disk, etc. ; And a communication part 2109 including a network interface card such as a LAN card, a modem, etc. The communication section 2109 performs communication processing via a network such as the Internet.
  • the driver 2110 is also connected to the I/O interface 2105 as needed.
  • a removable medium 2111 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 2110 as needed, so that the computer program read from it is installed in the storage portion 2108 as needed.
  • FIG. 21 is only an optional implementation. In the specific practice process, the number and types of components in Figure 21 can be selected, deleted, added or replaced according to actual needs. ; In the setting of different functional components, implementation methods such as separate settings or integrated settings can also be used. For example, GPU2113 and CPU2101 can be set separately, and the GPU2113 can be integrated on CPU2101. The communication part can be set separately or integrated. Set on CPU2101 or GPU2113, etc. These alternative embodiments all fall into the protection scope of the present disclosure.
  • the process described below with reference to the flowcharts can be implemented as a computer software program.
  • the embodiments of the present disclosure include a computer program product, which includes a computer program product tangibly contained on a machine-readable medium.
  • the computer program includes program code for executing the steps shown in the flowchart.
  • the program code may include instructions corresponding to the steps in the method provided by the present disclosure.
  • the computer program may be downloaded and installed from the network through the communication part 2109, and/or installed from the removable medium 2111.
  • the computer program is executed by the central processing unit (CPU) 2101, the instructions described in the present disclosure for realizing the above-mentioned corresponding steps are executed.
  • the embodiments of the present disclosure also provide a computer program program product for storing computer-readable instructions, which when executed, cause a computer to execute the procedures described in any of the foregoing embodiments.
  • Moving object detection method or intelligent driving control method can be specifically implemented by hardware, software or a combination thereof.
  • the computer program product is specifically embodied as a computer storage medium.
  • the computer program product is specifically embodied as a software product, such as a software development kit (SDK), etc. Wait.
  • SDK software development kit
  • the embodiments of the present disclosure also provide another moving object detection method or intelligent driving control method and corresponding devices and electronic equipment, computer storage media, computer programs, and computer program products.
  • the method includes: a first device sends a moving object detection instruction or an intelligent driving control instruction to a second device, and the instruction causes the second device to execute the moving object detection method or the intelligent driving control method in any of the foregoing possible embodiments; A device receives the moving object detection result or the intelligent driving control result sent by the second device.
  • the visually moving object detection instruction or the intelligent driving control instruction may be specifically a calling instruction
  • the first device may instruct the second device to perform the moving object detection operation or the intelligent driving control operation by calling, and respond accordingly
  • the second device can execute the steps and/or processes in any embodiment of the above-mentioned moving object detection method or intelligent driving control method.
  • the method and apparatus, electronic equipment, and computer-readable storage medium of the present disclosure may be implemented in many ways.
  • the method and apparatus, electronic equipment, and computer-readable storage medium of the present disclosure can be implemented by software, hardware, firmware or any combination of software, hardware, and firmware.
  • the above-mentioned order of the steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above, unless otherwise specifically stated.
  • the present disclosure can also be implemented as programs recorded in a recording medium, and these programs include machine-readable instructions for implementing the method according to the present disclosure.
  • the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Automation & Control Theory (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computer Graphics (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Geometry (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Electromagnetism (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)
  • Length Measuring Devices By Optical Means (AREA)
  • Traffic Control Systems (AREA)
  • Image Processing (AREA)

Abstract

一种运动物体检测方法和装置、智能驾驶控制方法和装置、电子设备、计算机可读存储介质以及计算机程序,其中的运动物体检测方法包括:获取待处理图像中像素的深度信息(S100);获取待处理图像和参考图像之间的光流信息(S110);其中,参考图像和待处理图像是基于摄像装置的连续拍摄而获得的具有时序关系的两幅图像;根据深度信息和光流信息,获取待处理图像中像素相对于参考图像的三维运动场(S120);根据该三维运动场,确定待处理图像中的运动物体(S130)。

Description

运动物体检测及智能驾驶控制方法、装置、介质及设备
本公开要求在2019年5月29日提交中国专利局、申请号为201910459420.9、发明名称为“运动物体检测及智能驾驶控制方法、装置、介质及设备”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及计算机视觉技术,尤其是涉及一种运动物体检测方法、运动物体检测装置、智能驾驶控制方法、智能驾驶控制装置、电子设备、计算机可读存储介质以及计算机程序。
背景技术
在智能驾驶以及安防监控等技术领域中,需要感知运动物体及其运动方向。感知到的运动物体及其运动方向,可以提供给决策层,使决策层基于感知结果进行决策。例如,在智能驾驶系统中,在感知到处于道路旁边的运动物体(如人或者动物等)向道路中央靠近时,决策层可以控制车辆减速行驶,甚至停车,以保障车辆的安全行驶。
发明内容
本公开实施方式提供一种运动物体检测技术方案。
根据本公开实施方式第一方面,提供一种运动物体检测方法,该方法包括:获取待处理图像中的像素的深度信息;获取所述待处理图像和参考图像之间的光流信息;其中,所述参考图像和所述待处理图像是基于摄像装置的连续拍摄而获得的具有时序关系的两幅图像;根据所述深度信息和光流信息,获取所述待处理图像中的像素相对于所述参考图像的三维运动场;根据所述三维运动场,确定所述待处理图像中的运动物体。
根据本公开实施方式第二方面,提供一种智能驾驶控制方法,包括:通过车辆上设置的摄像装置获取所述车辆所在路面的视频流;采用如上述运动物体检测方法,对所述视频流包括的至少一视频帧进行运动物体检测,确定该视频帧中的运动物体;根据所述运动物体生成并输出所述车辆的控制指令。
根据本公开实施方式第三方面,提供一种运动物体检测装置,包括:第一获取模块,用于获取待处理图像中的像素的深度信息;第二获取模块,用于获取所述待处理图像和参考图像之间的光流信息;其中,所述参考图像和所述待处理图像是基于摄像装置的连续拍摄而获得的具有时序关系的两幅图像;第三获取模块,用于根据所述深度信息和光流信息,获取所述待处理图像中的像素相对于所述参考图像的三维运动场;确定运动物体模块,用于根据所述三维运动场,确定所述待处理图像中的运动物体。
根据本公开实施方式第四方面,提供一种智能驾驶控制装置,该装置包括:第四获取模块,用于通过车辆上设置的摄像装置获取所述车辆所在路面的视频流;上述运动物体检测装置,用于对所述视频流包括的至少一视频帧进行运动物体检测,确定该视频帧中的运动物体;控制模块,用于根据所述运动物体生成并输出所述车辆的控制指令。
根据本公开实施例的第五方面,提供了一种电子设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行上述方法。
根据本公开实施方式第六方面,提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时,实现本公开任一方法实施方式。
根据本公开实施方式的第七方面,提供一种计算机程序,包括计算机指令,当所述计算机指令在设备的处理器中运行时,实现本公开任一方法实施方式。
基于本公开提供的运动物体检测方法、智能驾驶控制方法、装置、电子设备、计算机可读存储介质以及计算机程序,本公开通过利用待处理图像中的像素的深度信息以及待处理图像和参考图像之间的光流信息,可以获得待处理图像中的像素相对于参考图像的三维运动场,由于三维运动场可以反映出运动物体,因此,本公开可以利用三维运动场确定待处理图像中的运动物体。由此可知,本公开提供的技术方案有利于提高感知运动物体的准确性,从而有利于提高车辆智能行驶的安全性。
下面通过附图和实施方式,对本公开的技术方案做进一步的详细描述。
附图说明
构成说明书的一部分的附图描述了本公开的实施方式,并连同描述一起用于解释本公开的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本公开,其中:
图1为本公开的运动物体检测方法一个实施方式的流程图;
图2为本公开的待处理图像的一个示意图;
图3为图2所示的待处理图像的第一视差图一个实施方式的示意图;
图4为本公开的待处理图像的第一视差图一个实施方式的示意图;
图5为本公开的卷积神经网络一个实施方式的示意图;
图6为本公开的第一视差图的第一权重分布图一个实施方式的示意图;
图7为本公开的第一视差图的第一权重分布图另一个实施方式的示意图;
图8为本公开的第一视差图的第二权重分布图一个实施方式的示意图;
图9为本公开的第三视差图一个实施方式的示意图;
图10为图9所示的第三视差图的第二权重分布图一个实施方式示意图;
图11为本公开的对待处理图像的第一视差图进行优化调整实施方式示意图;
图12为本公开的三维坐标系一个实施方式的示意图;
图13为本公开的参考图像以及Warp处理后的图像一个实施方式示意图;
图14为本公开的Warp处理后的图像、待处理图像以及待处理图像相对于参考图像的光流图一个实施方式示意图;
图15为本公开的待处理图像及其运动掩膜的一个实施方式示意图;
图16为本公开形成的运动物体检测框一个实施方式示意图;
图17为本公开的卷积神经网络训练方法一个实施方式的流程图;
图18为本公开的智能驾驶控制方法一个实施方式的流程图;
图19为本公开的运动物体检测装置一个实施方式的结构示意图;
图20为本公开的智能驾驶控制装置一个实施方式的结构示意图;
图21为实现本公开实施方式的一示例性设备的框图。
具体实施例
现在将参照附图来详细描述本公开的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。对于相关领域普通技术人员已知的技术、方法以及设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。应当注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本公开实施例可以应用于终端设备、计算机系统及服务器等电子设备,其可与众多其它通用或者专用的计算系统环境或者配置一起操作。适于与终端设备、计算机系统以及服务器等电子设备一起使用的众所周知的终端设备、计算系统、环境和/或配置的例子,包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
终端设备、计算机系统以及服务器等电子设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑以及数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
示例性实施例
图1为本公开的运动物体检测方法一个实施例的流程图。如图1所示,该实施例方法包括:步骤S100、步骤S110、步骤S120以及步骤S130。下面对各步骤进行详细描述。
S100、获取待处理图像中的像素的深度信息。
在一个可选示例中,本公开可以借助待处理图像的视差图,来获得待处理图像中的像素(如所有像素)的深度信息。即,首先,获取待处理图像的视差图,然后,根据待处理图像的视差图,获取待 处理图像中的像素的深度信息。
在一个可选示例中,为清晰描述,下述将待处理图像的视差图称为待处理图像的第一视差图。本公开中的第一视差图用于描述待处理图像的视差。视差可以认为指,从相距一定距离的两个点位置处,观察同一个目标对象时,所产生的目标对象位置差异。待处理图像的一个例子如图2所示。图2所示的待处理图像的第一视差图的一个例子如图3所示。可选的,本公开中的待处理图像的第一视差图还可以表示为如图4所示的形式。图4中的各数字(如0、1、2、3、4和5等)分别表示:待处理图像中的(x,y)位置处的像素的视差。需要特别说明的是,图4并没有示出一个完整的第一视差图。
在一个可选示例中,本公开中的待处理图像通常为单目图像。即待处理图像通常是利用单目摄像装置进行拍摄,所获得的图像。在待处理图像为单目图像的情况下,本公开可以在不需要设置双目摄像装置的情况下,实现运动物体检测,从而有利于降低运动物体检测成本。
在一个可选示例中,本公开可以利用预先成功训练的卷积神经网络,来获得待处理图像的第一视差图。例如,将待处理图像输入至卷积神经网络中,经由该卷积神经网络对待处理图像进行视差分析处理,该卷积神经网络输出视差分析处理结果,从而本公开可以基于视差分析处理结果,获得待处理图像的第一视差图。通过利用卷积神经网络来获得待处理图像的第一视差图,可以在不需要使用两个图像进行逐像素视差计算,且不需要进行摄像装置标定的情况下,获得视差图。有利于提高获得视差图的便捷性和实时性。
在一个可选示例中,本公开中的卷积神经网络通常包括但不限于:多个卷积层(Conv)以及多个反卷积层(Deconv)。本公开的卷积神经网络可以被划分为两个部分,即编码部分和解码部分。输入至卷积神经网络中的待处理图像(如图2所示的待处理图像),由编码部分对其进行编码处理(即特征提取处理),编码部分的编码处理结果被提供给解码部分,由解码部分对编码处理结果进行解码处理,并输出解码处理结果。本公开可以根据卷积神经网络输出的解码处理结果,获得待处理图像的第一视差图(如图3所示的视差图)。可选的,卷积神经网络中的编码部分包括但不限于:多个卷积层,且多个卷积层串联。卷积神经网络中的解码部分包括但不限于:多个卷积层和多个反卷积层,多个卷积层和多个反卷积层相互间隔设置,且串联连接。
本公开的卷积神经网络的一个例子,如图5所示。图5中,左侧第1个长方形表示输入卷积神经网络中的待处理图像,右侧第1个长方形表示卷积神经网络输出的视差图。左侧第2个长方形至第15个长方形中的每一个长方形均表示卷积层,左侧第16个长方形至右侧第2个长方形中的所有长方形表示相互间隔设置的反卷积层和卷积层,如左侧第16个长方形表示反卷积层,左侧第17个长方形表示卷积层,左侧第18个长方形表示反卷积层,左侧第19个长方形表示卷积层,以此类推,直到右侧第2个长方形,且右侧第2个长方形表示反卷积层。
在一个可选示例中,本公开的卷积神经网络可通过跳连接(Skip Connect)的方式,使卷积神经网络中的低层信息和高层信息融合。例如,将编码部分中的至少一卷积层的输出通过跳连接的方式提供给解码部分中的至少一反卷积层。可选的,卷积神经网络中的所有卷积层的输入通常包括:上一层(如卷积层或者反卷积层)的输出,卷积神经网络中的至少一反卷积层(如部分反卷积层或者所有反卷积层)的输入包括:上一卷积层的输出的上采样(Upsample)结果和与该反卷积层跳连接的编码部分的卷积层的输出。例如,由图5右侧的卷积层的下方引出的实线箭头所指向的内容表示上一卷积层的输出,图5中的虚线箭头表示提供给反卷积层的上采样结果,由图5左侧的卷积层的上方引出的实线箭头表示与反卷积层跳连接的卷积层的输出。本公开不限制跳连接的数量以及卷积神经网络的网络结构。本公开通过将卷积神经网络中的低层信息和高层信息进行融合,有利于提高卷积神经网络生成的视差图的准确性。可选的,本公开的卷积神经网络是利用双目图像样本训练获得的。该卷积神经网络的训练过程可以参见下述实施方式中的描述。在此不再详细说明。
在一个可选示例中,本公开还可以对利用卷积神经网络获得的待处理图像的第一视差图进行优化调整,以便于获得更为准确的第一视差图。可选的,本公开可以利用待处理图像的水平镜像图(例如,左镜像图或者右镜像图)的视差图,对待处理图像的第一视差图进行优化调整。为便于描述,下述将待处理图像的水平镜像图称为第一水平镜像图,将第一水平镜像图的视差图称为第二视差图。本公开对第一视差图进行优化调整的一个具体例子如下:
步骤A、获取第二视差图的水平镜像图。
可选的,本公开中的第一水平镜像图意在表明:该镜像图是对待处理图像进行水平方向上的镜像处理(不是竖直方向上的镜像处理),所形成的镜像图。为便于描述,下述将第二视差图的水平镜像图称为第二水平镜像图。可选的,本公开中的第二水平镜像图是指,对第二视差图进行水平方向上的镜像处理后,形成的镜像图。第二水平镜像图仍然是视差图。
可选的,本公开可以先对待处理图像进行左镜像处理或右镜像处理(由于左镜像处理结果与右镜像处理结果相同,因此,本公开对待处理图像进行左镜像处理或右镜像处理均可),获得第一水平镜像图;然后,再获取第一水平镜像图的视差图;最后,再对该第二视差图进行左镜像处理或者右镜像处理(由于第二视差图的左镜像处理结果与右镜像处理结果相同,因此,本公开对第二视差图进行左镜像处理或者右镜像处理均可),从而获得第二水平镜像图。为了方便描述,下述将第二水平镜像图称为第三视差图。
由上述描述可知,本公开在对待处理图像进行水平镜像处理时,可以不考虑待处理图像是被作为左目图像进行镜像处理,还是被作为右目图像进行镜像处理。也就是说,无论待处理图像被作为左目图像,还是被作为右目图像,本公开均可以对待处理图像进行左镜像处理或者右镜像处理,从而获得第一水平镜像图。同样,本公开在对第二视差图进行水平镜像处理时,也可以不考虑是应该对第二视差图进行左镜像处理,还是应该对第二视差图进行右镜像处理。
需要说明的是,在训练用于生成待处理图像的第一视差图的卷积神经网络的过程,如果以双目图像样本中的左目图像样本作为输入,提供给卷积神经网络,进行训练,则成功训练后的卷积神经网络在测试以及实际应用中,会将输入的待处理图像作为左目图像,也就是说,本公开的待处理图像被作为待处理左目图像。如果以双目图像样本中的右目图像样本作为输入,提供给卷积神经网络,进行训练,则成功训练后的卷积神经网络在测试以及实际应用中,会将输入的待处理图像作为右目图像,也就是说,本公开的待处理图像被作为待处理右目图像。
可选的,本公开同样可以利用上述卷积神经网络,来获得第二视差图。例如,将第一水平镜像图输入至卷积神经网络中,经由该卷积神经网络对第一水平镜像图进行视差分析处理,卷积神经网络输出视差分析处理结果,从而本公开可以根据输出的视差分析处理结果,获得第二视差图。
步骤B、获取待处理图像的视差图(即第一视差图)的权重分布图以及第二水平镜像图(即第三视差图)的权重分布图。
在一个可选示例中,第一视差图的权重分布图用于描述第一视差图中的多个视差值(例如,所有视差值)各自对应的权重值。第一视差图的权重分布图可以包括但不限于:第一视差图的第一权重分布图以及第一视差图的第二权重分布图。
可选的,上述第一视差图的第一权重分布图是针对多个不同的待处理图像的第一视差图统一设置的权重分布图,即第一视差图的第一权重分布图可以面向多个不同的待处理图像的第一视差图,也就是说,不同待处理图像的第一视差图使用同一个第一权重分布图,因此,本公开可以将第一视差图的第一权重分布图称为第一视差图的全局权重分布图。第一视差图的全局权重分布图用于描述第一视差图中的多个视差值(如所有视差值)各自对应的全局权重值。
可选的,上述第一视差图的第二权重分布图是针对单个待处理图像的第一视差图而设置的权重分布图,即第一视差图的第二权重分布图是面向单个待处理图像的第一视差图,也就是说,不同待处理图像的第一视差图,使用了不同的第二权重分布图,因此,本公开可以将第一视差图的第二权重分布图称为第一视差图的局部权重分布图。第一视差图的局部权重分布图用于描述第一视差图中的多个视差值(如所有视差值)各自对应的局部权重值。
在一个可选示例中,第三视差图的权重分布图用于描述第三视差图中的多个视差值各自对应的权重值。第三视差图的权重分布图可以包括但不限于:第三视差图的第一权重分布图以及第三视差图的第二权重分布图。
可选的,上述第三视差图的第一权重分布图是针对多个不同的待处理图像的第三视差图统一设置的权重分布图,即第三视差图的第一权重分布图面向多个不同的待处理图像的第三视差图的,也就是说,不同待处理图像的第三视差图使用了同一个第一权重分布图,因此,本公开可以将第三视差图的第一权重分布图称为第三视差图的全局权重分布图。第三视差图的全局权重分布图用于描述第三视差图中的多个视差值(如所有视差值)各自对应的全局权重值。
可选的,上述第三视差图的第二权重分布图是针对单个待处理图像的第三视差图而设置的权重分布图,即第三视差图的第二权重分布图是面向单个待处理图像的第三视差图的,也就是说,不同待处理图像的第三视差图使用了不同的第二权重分布图,因此,本公开可以将第三视差图的第二权重分布图称为第三视差图的局部权重分布图。第三视差图的局部权重分布图用于描述第三视差图中的多个视差值(如所有视差值)各自对应的局部权重值。
在一个可选示例中,第一视差图的第一权重分布图包括:至少两个左右分列的区域,不同区域具有不同的权重值。可选的,位于左侧的区域的权重值与位于右侧的区域的权重值的大小关系,通常与待处理图像被作为待处理左目图像,还是被作为待处理右目图像,相关。
例如,在待处理图像被作为左目图像的情况下,对于第一视差图的第一权重分布图中的任意两个区域而言,位于右侧的区域的权重值大于位于左侧的区域的权重值。图6为图3所示的视差图的第一权重分布图,该第一权重分布图被划分为五个区域,即图6所示的区域1、区域2、区域3、区域4以及区域5。区域1的权重值小于区域2的权重值,区域2的权重值小于区域3的权重值,区域3的权重值小于区域4的权重值,区域4的权重值小于区域5的权重值。另外,第一视差图的第一权重分布图中的任一区域内可以具有相同的权重值,也可以具有不同的权重值。在第一视差图的第一权重分布图中的一个区域内具有不同的权重值的情况下,区域内左侧的权重值通常不大于该区域内右侧的权重值。可选的,图6所示的区域1的权重值可以为0,即在第一视差图中,区域1对应视差是完全不可信的;区域2的权重值可以从左侧到右侧,由0逐渐增大并接近0.5;区域3的权重值为0.5;区域4的权重值可以从左侧到右侧,由一大于0.5的数值逐渐增大并接近1;区域5的权重值为1,即在第一视差图中,区域5对应视差是完全可信的。
再例如,在待处理图像被作为右目图像的情况下,对于第一视差图的第一权重分布图中的任意两个区域而言,位于左侧的区域的权重值大于位于右侧的区域的权重值。图7示出了待处理图像被作为右目图像的第一视差图的第一权重分布图,第一权重分布图被划分为五个区域,即图7中的区域1、区域2、区域3、区域4和区域5。区域5的权重值小于区域4的权重值,区域4的权重值小于区域3的权重值,区域3的权重值小于区域2的权重值,区域2的权重值小于区域1的权重值。另外,第一视差图的第一权重分布图中的任一区域内可以具有相同的权重值,也可以具有不同的权重值。在第一视差图的第一权重分布图中的一个区域具有不同的权重值的情况下,该区域内右侧的权重值通常不大于该区域内左侧的权重值。可选的,图7中的区域5的权重值可为0,即在第一视差图中,区域5对应视差是完全不可信的;区域4的权重值可以从右侧到左侧,由0逐渐增大并接近0.5;区域3的权重值为0.5;区域2的权重值可以从右侧到左侧,由一大于0.5的数值逐渐增大并接近1;区域1的权重值为1,即在第一视差图中,区域1对应视差是完全可信的。
可选的,第三视差图的第一权重分布图包括:至少两个左右分列的区域,不同区域具有不同的权重值。位于左侧的区域的权重值与位于右侧的区域的权重值的大小关系,通常与待处理图像被作为左目图像,还是被作为右目图像,相关。
例如,在待处理图像被作为左目图像的情况下,对于第三视差图的第一权重分布图中的任意两个区域而言,位于右侧的区域的权重值大于位于左侧的区域的权重值。另外,第三视差图的第一权重分布图中的任一区域内可以具有相同的权重值,也可以具有不同的权重值。在第三视差图的第一权重分布图中的一个区域内具有不同的权重值的情况下,该区域内左侧的权重值通常不大于该区域内右侧的权重值。
再例如,在待处理图像被作为右目图像的情况下,对于第三视差图的第一权重分布图中的任意两个区域而言,位于左侧的区域的权重值大于位于右侧的区域的权重值。另外,第三视差图的第一权重分布图中的任一区域内可以具有相同的权重值,也可以具有不同的权重值。在第三视差图的第一权重分布图中的一个区域内具有不同的权重值的情况下,该区域内右侧的权重值通常不大于该区域内左侧的权重值。
可选的,第一视差图的第二权重分布图的设置方式可以包括下述步骤:
首先,对第一视差图进行水平镜像处理(例如,左镜像处理或者右镜像处理),形成镜像视差图。为了便于描述,下述称为第四视差图。
其次,对于第四视差图中的任一像素点而言,如果该像素点的视差值大于该像素点对应的第一变量,则将待处理图像的第一视差图的第二权重分布图中的该像素点的权重值设置为第一值,否则,该像素点的权重值被设置为第二值。本公开中的第一值大于第二值。例如,第一值为1,第二值为0。
可选的,第一视差图的第二权重分布图的一个例子如图8所示。图8中的白色区域的权重值均为1,表示该位置处的视差值完全可信。图8中的黑色区域的权重值为0,表示该位置处的视差值完全不可信。
可选的,本公开中的像素点对应的第一变量可以是根据第一视差图中的相应像素点的视差值以及大于零的常数值设置的。例如,将第一视差图中的相应像素点的视差值与大于零的常数值的乘积,作为第四视差图中的相应像素点对应的第一变量。
可选的,第一视差图的第二权重分布图可以使用下述公式(1)表示:
Figure PCTCN2019114611-appb-000001
在上述公式(1)中,L l表示第一视差图的第二权重分布图;
Figure PCTCN2019114611-appb-000002
表示第四视差图的相应像素点的视差值;d l表示第一视差图中的相应像素点的视差值;thresh 1表示大于零的常数值,thresh 1的取值范围可以为1.1-1.5,如thresh 1=1.2或者thresh 2=1.25等。
在一个可选示例中,第三视差图的第二权重分布图的设置方式可以为:对于第一视差图中的任一像素点而言,如果第一视差图中的该像素点的视差值大于该像素点对应的第二变量,则将第三视差图的第二权重分布图中的该像素点的权重值设置为第一值,否则,设置为第二值。可选的,本公开中的第一值大于第二值。例如,第一值为1,第二值为0。
可选的,本公开中的像素点对应的第二变量可以是根据第四视差图中的相应像素点的视差值以及大于零的常数值设置的。例如,先对第一视差图进行左/右镜像处理,形成镜像视差图,即第四视差图,然后,将第四视差图中的相应像素点的视差值与大于零的常数值的乘积,作为第一视差图中的相应像素点对应的第二变量。
可选的,本公开基于图2的待处理图像,所形成的第三视差图的一个例子如图9所示。图9所示的第三视差图的第二权重分布图的一个例子,如图10所示。图10中的白色区域的权重值均为1,表示该位置处的视差值是完全可信的。图10中的黑色区域的权重值为0,表示该位置处的视差值完全不可信。
可选的,第三视差图的第二权重分布图可以使用下述公式(2)表示:
Figure PCTCN2019114611-appb-000003
在上述公式(2)中,L l′表示第三视差图的第二权重分布图;
Figure PCTCN2019114611-appb-000004
表示第四视差图的相应像素点的视差值;d l表示第一视差图中的相应像素点的视差值;thresh 2表示大于零的常数值,thresh 2的取值范围可以为1.1-1.5,如thresh 2=1.2或者thresh 2=1.25等。
步骤C、根据待处理图像的第一视差图的权重分布图、以及第三视差图的权重分布图,对待处理图像的第一视差图进行优化调整,优化调整后的视差图即为最终获得的待处理图像的视差图。
在一个可选示例中,本公开可以利用第一视差图的第一权重分布图和第二权重分布图对第一视差图中的多个视差值进行调整,获得调整后的第一视差图;利用第三视差图的第一权重分布图和第二权重分布图,对第三视差图中的多个视差值进行调整,获得调整后的第三视差图;之后,对调整后的第一视差图和调整后的第三视差图进行合并处理,从而获得优化调整后的待处理图像的第一视差图。
可选的,获得优化调整后的待处理图像的第一视差图的一个例子如下:
首先,对第一视差图的第一权重分布图和第一视差图的第二权重分布图进行合并处理,获得第三权重分布图。第三权重分布图可以采用下述公式(3)表示:
W l=M l+L l·0.5     公式(3)
在公式(3)中,W l表示第三权重分布图;M l表示第一视差图的第一权重分布图;L l表示第一视差图的第二权重分布图;其中的0.5也可以变换为其他常数值。
其次,对第三视差图的第一权重分布图和第三视差图的第二权重分布图进行合并处理,获得第四权重分布图。第四权重分布图可以采用下述公式(4)表示:
W l'=M l'+L l'·0.5         公式(4)
在公式(4)中,W l'表示第四权重分布图,M l'表示第三视差图的第一权重分布图;L l'表示第三视差图的第二权重分布图;其中的0.5也可以变换为其他常数值。
再次,根据第三权重分布图调整第一视差图中的多个视差值,获得调整后的第一视差图。例如,针对第一视差图中的任一像素点的视差值而言,将该像素点的视差值替换为:该像素点的视差值与第 三权重分布图中的相应位置处的像素点的权重值的乘积。在对第一视差图中的所有像素点均进行了上述替换处理后,获得调整后的第一视差图。
之后,根据第四权重分布图调整第三视差图中的多个视差值,获得调整后的第三视差图。例如,针对第三视差图中的任一像素点的视差值而言,将该像素点的视差值替换为:该像素点的视差值与第四权重分布图中的相应位置处的像素点的权重值的乘积。在对第三视差图中的所有像素点均进行了上述替换处理后,获得调整后的第三视差图。
最后,合并调整后的第一视差图和调整后的第三视差图,最终获得待处理图像的视差图(即最终的第一视差图)。最终获得的待处理图像的视差图可以采用下述公式(5)表示:
Figure PCTCN2019114611-appb-000005
在公式(5)中,d final表示最终获得的待处理图像的视差图(如图11中的右侧第1幅图所示);W l表示第三权重分布图(如图11中的左上第1幅图所示);W l'表示第四权重分布图(如图11中的左下第1幅图所示);d l表示第一视差图(如图11中的左上第2幅图所示);
Figure PCTCN2019114611-appb-000006
表示第三视差图(如图11中的左下第2幅图所示)。
需要说明的是,本公开不限制对第一权重分布图和第二权重分布图进行合并处理的两个步骤的执行顺序,例如,两个合并处理的步骤可以同时执行,也可以先后执行。另外,本公开也不限制对第一视差图中的视差值进行调整和对第三视差图中的视差值进行调整的先后执行顺序,例如,两个调整的步骤可以同时进行,也可以先后执行。
可选的,在待处理图像被作为左目图像的情况下,通常会存在左侧视差缺失以及物体的左侧边缘被遮挡等现象,这些现象会导致的待处理图像的视差图中的相应区域的视差值不准确。同样的,在待处理图像被作为待处理右目图像的情况下,通常会存在右侧视差缺失以及物体的右侧边缘被遮挡等现象,这些现象会导致的待处理图像的视差图中的相应区域的视差值不准确。本公开通过对待处理图像进行左/右镜像处理,并对该镜像图的视差图进行镜像处理,进而利用镜像处理后的视差图来优化调整待处理图像的视差图,有利于减弱待处理图像的视差图中的相应区域的视差值不准确的现象,从而有利于提高运动物体检测的精度。
在一个可选示例中,在待处理图像为双目图像的应用场景中,本公开获得待处理图像的第一视差图的方式包括但不限于:利用立体匹配的方式获得待处理图像的第一视差图。例如,利用BM(Block Matching,块匹配)算法、SGBM(Semi-Global Block Matching,半全局块匹配)算法、或者GC(Graph Cuts,图割)算法等立体匹配算法获得待处理图像的第一视差图。再例如,利用用于获取双目图像的视差图的卷积神经网络,对待处理图像进行视差处理,从而获得待处理图像的第一视差图。
在一个可选示例中,本公开在获得了待处理图像的第一视差图之后,可以利用下述公式(6)来获得待处理图像中的像素的深度信息:
Figure PCTCN2019114611-appb-000007
在上述公式(6)中,Depth表示像素的深度值;f x为已知值,表示摄像装置的水平方向(三维坐标系中的X轴方向)焦距;b为已知值,表示获得视差图的卷积神经网络所使用的双目图像样本的基线(baseline),b属于双目摄像装置的标定参数;Disparity表示像素的视差。
S110、获取待处理图像和参考图像之间的光流信息。
在一个可选示例中,本公开中的待处理图像和参考图像可以为:在同一摄像装置的连续拍摄(如多张连续照相或者录像)过程中,所形成的存在时序关系的两幅图像。形成两幅图像的时间间隔通常较短,以保证两幅图像的画面内容大部分相同。例如,形成两幅图像的时间间隔可以为相邻的两视频帧之间的时间间隔。再例如,形成两幅图像的时间隔间可以为摄像装置的连续照相模式的相邻两照片之间的时间间隔。可选的,待处理图像可以为摄像装置所拍摄的视频中的一视频帧(如当前视频帧),而待处理图像的参考图像为该视频中的另一视频帧,如参考图像为当前视频帧的前一视频帧。本公开 也不排除参考图像为当前视频帧的后一视频帧的情况。可选的,待处理图像可以为摄像装置基于连续照相模式所拍摄的多张照片中的其中一张照片,而待处理图像的参考图像可以为多张照片中的另一张照片,如待处理图像的前一张照片或者后一张照片等。本公开中的待处理图像和参考图像可以均为RGB(Red Green Blue,红绿蓝)图像等。本公开中的摄像装置可以为设置在移动物体上的摄像装置,例如,设置于车辆、火车以及飞机等交通工具上的摄像装置。
在一个可选示例中,本公开中的参考图像通常为单目图像。即参考图像通常是利用单目摄像装置进行拍摄,所获得的图像。在待处理图像和参考图像均为单目图像的情况下,本公开可以在不需要设置双目摄像装置的情况下,实现运动物体检测,从而有利于降低运动物体检测成本。
在一个可选示例中,本公开中的待处理图像和参考图像之间的光流信息可以认为是待处理图像和参考图像中的像素的二维运动场,光流信息并不能表征出像素在三维空间中的真实运动。本公开在获取待处理图像和参考图像之间的光流信息的过程中,可以引入摄像装置在拍摄待处理图像和参考图像时的位姿变化,即本公开根据摄像装置的位姿变化信息,来获取待处理图像和参考图像之间的光流信息,从而有利于消除获得的光流信息中的由于摄像装置的位姿变化而引入的干扰。本公开的根据摄像装置的位姿变化信息,获取待处理图像和参考图像之间的光流信息的方式,可以包括如下步骤:
步骤1、获取摄像装置拍摄待处理图像和参考图像的位姿变化信息。
可选的,本公开中的位姿变化信息是指:摄像装置在拍摄待处理图像时的位姿,与在拍摄参考图像时的位姿之间的差异。该位姿变化信息为基于三维空间的位姿变化信息。该位姿变化信息可以包括:摄像装置的平移信息以及摄像装置的旋转信息。其中的摄像装置的平移信息可以包括:摄像装置分别在三个坐标轴(如图12所示的坐标系)上的位移量。其中的摄像装置的旋转信息可以为:基于Roll、Yaw和Pitch的旋转向量。也就是说,摄像装置的旋转信息可以包括:基于Roll、Yaw和Pitch,这三个旋转方向的旋转分量向量。
例如,摄像装置的旋转信息可以表示为如下公式(7)所示:
Figure PCTCN2019114611-appb-000008
在上述公式(7)中:
R表示旋转信息,为3×3的矩阵;R 11表示cosαcosγ-cosβsinαsinγ,
R 12表示-cosβcosγsinα-cosαsinγ,R 13表示sinαsinβ,
R 21表示cosγsinα+cosαcosβsinγ,R 22表示cosαcosβcosγ-sinαsinγ,
R 23表示sinαsinβ,R 31表示sinβsinγ,R 32表示cosγsinβ,R 33表示cosβ,
欧拉角(α,β,γ)表示基于Roll、Yaw和Pitch的旋转角。
可选的,本公开可以利用视觉技术,来获取摄像装置拍摄待处理图像和参考图像的位姿变化信息,例如,利用SLAM(Simultaneous Localization And Mapping,即时定位与地图构建)方式,获取位姿变化信息。进一步的,本公开可以利用开源ORB(Oriented FAST and Rotated BRIEF,定向快速和旋转摘要,一种描述子)-SLAM框架的RGBD(Red Green Blue Detph)模型,获取位姿变化信息。例如,将待处理图像(RGB图像)、待处理图像的深度图以及参考图像(RGB图像)输入RGBD模型,根据RGBD模型的输出获得位姿变化信息。另外,本公开也可以采用其他方式获得位姿变化信息,例如,利用GPS(Global Positioning System,全球定位系统)和角速度传感器,获得位姿变化信息等。
可选的,本公开可以采用如下述公式(8)所示的4×4的齐次矩阵,来表示位姿变化信息:
Figure PCTCN2019114611-appb-000009
在上述公式(8)中,T l c表示摄像装置拍摄待处理图像(如当前视频帧c)和参考图像(如当前视频帧c的前一视频帧l)的位姿变化信息,如位姿变化矩阵;R表示摄像装置的旋转信息,为3×3的矩阵,即
Figure PCTCN2019114611-appb-000010
t表示摄像装置的平移信息,即平移向量;t可以利用t x、t y和t z三个平移分量来表示,t x表示X轴方向的平移分量,t y表示在Y轴方向的平移分量,t z表示在Z轴方向的平移分量。
步骤2、根据位姿变化信息,建立待处理图像中的像素的像素值与参考图像中的像素的像素值之间的对应关系。
可选的,在摄像装置处于运动状态下,摄像装置在拍摄待处理图像时的位姿与拍摄参考图像时的位姿通常不相同,因此,待处理图像对应的三维坐标系(即摄像装置拍摄待处理图像时的三维坐标系)与参考图像对应的三维坐标系(即摄像装置拍摄参考图像时的三维坐标系)不相同。本公开在建立对应关系时,可以先针对像素的三维空间位置进行转换,使待处理图像中的像素和参考图像中的像素在同一三维坐标系中。
可选的,本公开可以先根据上述获得的深度信息和摄像装置的参数(已知值),获取待处理图像中的像素(如所有像素)在待处理图像对应的摄像装置的三维坐标系中的第一坐标;即本公开先将待处理图像中的像素转换到三维空间中,从而获得像素在三维空间中的坐标(即三维坐标)。例如,本公开可以利用下述公式(9)获得待处理图像中的任一像素的三维坐标:
Figure PCTCN2019114611-appb-000011
在上述公式(9)中,Z表示像素的深度值,X、Y和Z表示像素的三维坐标(即第一坐标);f x表示摄像装置的水平方向(三维坐标系中的X轴方向)焦距;f y表示摄像装置的竖直方向(三维坐标系中的Y轴方向)焦距;(u,v)表示像素在待处理图像中的二维坐标;c x,c y表示摄像装置的像主点坐标;Disparity表示像素的视差。
可选的,假定待处理图像中的任一像素被表示为p i(u i,v i),且多个像素均被转换到三维空间后,任一像素被表示为P i(X i,Y i,Z i),那么,三维空间中的多个像素(如所有像素)所形成的三维空间点集可以表示为{P i c}。其中,P i c表示待处理图像中的第i个像素的三维坐标,即P i(X i,Y i,Z i);c表示待处理图像,i的取值范围与多个像素的数量相关。例如,多个像素的数量为N(N为大于1的整数),则i的取值范围可以为1至N或者0至N-1。
可选的,在获得了待处理图像中的多个像素(如所有像素)的第一坐标之后,本公开可以根据上述位姿变化信息,将多个像素的第一坐标分别转换到参考图像对应的摄像装置的三维坐标系中,获得多个像素的第二坐标。例如,本公开可以利用下述公式(10)获得待处理图像中的任一像素的第二坐标:
P i l=T l cP i c      公式(10)
在上述公式(10)中,P i l表示待处理图像中的第i个像素的第二坐标,T l c表示摄像装置拍摄待处理图像(如当前视频帧c)和参考图像(如当前视频帧c的前一视频帧l)的位姿变化信息,如位姿变化矩阵,即
Figure PCTCN2019114611-appb-000012
P i c表示待处理图像中的第i个像素的第一坐标。
可选的,在获得了待处理图像中的多个像素的第二坐标之后,本公开可以基于二维图像的二维坐标系,对多个像素的第二坐标进行投影处理,从而获得被转换到参考图像对应的三维坐标系的待处理图像的投影二维坐标。例如,本公开可以利用下述公式(11)获得投影二维坐标:
Figure PCTCN2019114611-appb-000013
在上述公式(11)中,(u,v)表示待处理图像中的像素的投影二维坐标;f x表示摄像装置的水平方向(三维坐标系中的X轴方向)焦距;f y表示摄像装置的竖直方向(三维坐标系中的Y轴方向)焦距;c x,c y表示摄像装置的像主点坐标;(X,Y,Z)表示待处理图像中的像素的第二坐标。
可选的,在获得了待处理图像中的像素的投影二维坐标之后,本公开可以根据投影二维坐标和参考图像的二维坐标,建立待处理图像中的像素的像素值与参考图像中的像素的像素值之间的对应关系。该对应关系可以表示出:对于投影二维坐标所形成的图像和参考图像中的相同位置处的任一像素而言,该像素在待处理图像中的像素值以及该像素在参考图像中的像素值。
步骤3、根据上述对应关系,对参考图像进行变换处理。
可选的,本公开可以利用上述对应关系,对参考图像进行Warp(卷绕)处理,从而将参考图像变换到待处理图像中。对参考图像进行Warp处理的一个例子,如图13所示。图13中的左图为参考图像,图13中的右图为对参考图像进行Warp处理后所形成的图像。
步骤4、根据待处理图像和变换处理后的图像,计算待处理图像和参考图像之间的光流信息。
可选的,本公开的光流信息包括但不限于:稠密光流信息。例如,针对图像中的所有像素点均计算出光流信息。本公开可以利用视觉技术,来获取光流信息,例如,利用OpenCV(Open Source Computer Vision Library,开源计算机视觉库)方式,获取光流信息。进一步的,本公开可以将待处理图像和变换处理后的图像输入基于OpenCV的模型中,该模型会输出输入的两张图像之间的光流信息,从而本公开获得待处理图像和参考图像之间的光流信息。该模型所采用的计算光流信息的算法包括但不限于:Gunnar Farneback(人名)算法。
可选的,假定本公开所获得的待处理图像中的任一像素的光流信息表示为I of(Δu,Δv),那么,该像素的光流信息通常符合下述公式(12):
I t(u t,v t)+I of(Δu,Δv)=I t+1(u t+1,v t+1)       公式(12)
在上述公式(12)中,I t(u t,v t)表示参考图像中的一像素;I t+1(u t+1,v t+1)表示待处理图像中的相应位置处的像素。
可选的,Warp处理后的参考图像(如Warp处理后的前一视频帧)、待处理图像(如当前视频帧)以及计算获得的光流信息如图14所示。图14中的上图为Warp处理后的参考图像,图14中的中图为待处理图像,图14中的下图为待处理图像和参考图像之间的光流信息,即待处理图像相对于参考图 像的光流信息。图14中的竖线是为了方便细节对比而后添加的。
S120、根据深度信息和光流信息,获取待处理图像中的像素相对于参考图像的三维运动场。
在一个可选示例中,本公开在获得了深度信息和光流信息之后,可以根据深度信息和光流信息,获取待处理图像中的像素(如所有像素)相对于参考图像的三维运动场(可以简称为待处理图像中的像素的三维运动场)。本公开中的三维运动场可以认为是:由三维空间中的场景运动所形成的三维运动场。换而言之,待处理图像中的像素的三维运动场可以认为是:待处理图像中的像素在待处理图像和参考图像之间的三维空间位移。三维运动场可以使用场景流(Scene Flow)表示。
可选的,本公开可以使用下述公式(13)来获得待处理图像中的多个像素的场景流I sf(ΔX,ΔY,ΔZ):
Figure PCTCN2019114611-appb-000014
在上述公式(13)中,(ΔX,ΔY,ΔZ)表示待处理图像中的任一像素在三维坐标系的三个坐标轴方向上的位移;ΔI depth表示该像素的深度值,(Δu,Δv)表示该像素的光流信息,即该像素在待处理图像和参考图像之间的二维图像中的位移;f x表示摄像装置的水平方向(三维坐标系中的X轴方向)焦距;f y表示摄像装置的竖直方向(三维坐标系中的Y轴方向)焦距;c x,c y表示摄像装置的像主点坐标。
S130、根据三维运动场,确定待处理图像中的运动物体。
在一个可选示例中,本公开可以根据三维运动场,确定待处理图像中的物体在三维空间中的运动信息。物体在三维空间中的运动信息可以表示出该物体是否为运动物体。可选的,本公开可以先根据三维运动场,获取待处理图像中的像素在三维空间的运动信息;然后,根据像素在三维空间的运动信息,对像素进行聚类处理;最后,根据聚类处理的结果,确定待处理图像中的物体在三维空间的运动信息,以确定待处理图像中的运动物体。
在一个可选示例中,待处理图像中的像素在三维空间的运动信息可以包括但不限于:待处理图像中的多个像素(如所有像素)在三维空间的速度。这里的速度通常是为矢量形式,即本公开中的像素的速度可以体现出像素的速度大小和像素的速度方向。本公开通过借助三维运动场,可以便捷的获得待处理图像中的像素在三维空间的运动信息。
在一个可选示例中,本公开中的三维空间包括:基于三维坐标系的三维空间。其中的三维坐标系可以是:拍摄待处理图像的摄像装置的三维坐标系。该三维坐标系的Z轴通常是摄像装置的光轴,即深度方向。在摄像装置设置于车辆上的应用场景中的情况下,本公开的三维坐标系的X轴、Y轴、Z轴和原点的一个例子如图12所示。从图12的车辆自身角度而言(即面向车辆前方角度而言),X轴指向为水平向右,Y轴指向为车辆下方,Z轴指向为车辆前方,三维坐标系的原点位于摄像装置的光心位置。
在一个可选示例中,本公开可以根据三维运动场以及摄像装置拍摄待处理图像和参考图像之间的时间差Δt,计算待处理图像中的像素在待处理图像对应的摄像装置的三维坐标系的三个坐标轴方向上的速度。进一步的,本公开可以通过下述公式(14)获得速度:
Figure PCTCN2019114611-appb-000015
在上述公式(14)中,v x、v y和v z分别表示待处理图像中的任一像素在待处理图像对应的摄像装置的三维坐标系的三个坐标轴方向上的速度;(ΔX,ΔY,ΔZ)表示待处理图像中的该像素在待处理图像对应的摄像装置的三维坐标系的三个坐标轴方向上的位移;Δt表示摄像装置拍摄待处理图像和参考图像之间的时间差。
上述速度的速度大小|v|可以表示为下述公式(15)所示的形式:
Figure PCTCN2019114611-appb-000016
上述速度的速度方向
Figure PCTCN2019114611-appb-000017
可以表示为下述公式(16)所示的形式:
Figure PCTCN2019114611-appb-000018
在一个可选示例中,本公开可以先确定出待处理图像中的运动区域,并针对运动区域中的像素进行聚类处理。例如,根据运动区域中的像素在三维空间的运动信息,对运动区域中的像素进行聚类处理。再例如,根据运动区域中的像素在三维空间的运动信息以及像素在三维空间中的位置,对运动区域中的像素进行聚类处理。可选的,本公开可以利用运动掩膜确定出待处理图像中的运动区域。例如,本公开可以根据像素在三维空间的运动信息,获取待处理图像的运动掩膜(Motion Mask)。
可选的,本公开可以根据预设速度阈值,对待处理图像中的多个像素(如所有像素)的速度大小进行过滤处理,从而根据过滤处理的结果,形成待处理图像的运动掩膜。例如,本公开可以利用下述公式(17)获得待处理图像的运动掩膜:
Figure PCTCN2019114611-appb-000019
在上述公式(17)中,I motion表示运动掩膜中的一个像素;如果该像素的速度大小|v|大于等于预设速度阈值v_thresh,则该像素的取值为1,表示该像素属于待处理图像中的运动区域;否则,该像素的取值为0,表示该像素不属于待处理图像中的运动区域。
可选的,本公开可以将运动掩膜中取值为1的像素所组成的区域称为运动区域,运动掩膜的大小与待处理图像的大小相同。因此,本公开可以根据运动掩膜中的运动区域确定出待处理图像中的运动区域。本公开中的运动掩膜的一个例子如图15所示。图15下图为待处理图像,图15上图是待处理图像的运动掩膜。上图中的黑色部分为非运动区域,上图中的灰色部分为运动区域。上图中的运动区域与下图中的运动物体基本相符。另外,随着获取深度信息、位姿变化信息以及计算光流信息的技术的提高,本公开确定待处理图像中的运动区域的精度也会随之提高。
在一个可选示例中,本公开在根据运动区域中的像素的三维空间位置信息和运动信息,进行聚类处理时,可以先对运动区域中的像素的三维空间位置信息和运动信息分别进行标准化处理,从而使运 动区域中的像素的三维空间坐标值转化到预定坐标区间(如[0,1])中,并使运动区域中的像素的速度转化到预定速度区间(如[0,1])中。之后,在利用转化后的三维空间坐标值和速度,进行密度聚类处理,从而获得至少一个类簇。
可选的,本公开中的标准化处理包括但不限于:min-max(最小-最大)标准化处理、以及Z-score(分值)标准化处理等。
例如,对运动区域中的像素的三维空间位置信息进行min-max标准化处理可以通过下述公式(18)表示,对运动区域中的像素的运动信息进行min-max标准化处理可以通过下述公式(19)表示:
Figure PCTCN2019114611-appb-000020
在上述公式(18)中,(X,Y,Z)表示待处理图像中的运动区域中的一像素的三维空间位置信息;(X *,Y *,Z *)表示该像素的标准化处理后的像素的三维空间位置信息;(X min,Y min,Z min)表示运动区域中的所有像素的三维空间位置信息中的最小X坐标、最小Y坐标和最小Z坐标;(X max,Y max,Z max)表示运动区域中的所有像素的三维空间位置信息中的最大X坐标、最大Y坐标和最大Z坐标。
Figure PCTCN2019114611-appb-000021
在上述公式(19)中,(v x,v y,v z)表示运动区域中的像素在三维空间中的三个坐标轴方向的速度;
Figure PCTCN2019114611-appb-000022
表示对(v x,v y,v z)进行min-max标准化处理处理后的速度;(v xmin,v ymin,v zmin)表示运动区域中的所有像素在三维空间中的三个坐标轴方向的最小速度;(v xmax,v ymax,v zmax)表示运动区域中的所有像素在三维空间中的三个坐标轴方向的最大速度。
在一个可选示例中,本公开在聚类处理时所采用的聚类算法包括但不限于:密度聚类算法。例如,DBSCAN(Density-Based Spatial Clustering of Applications with Noise,具有噪声的基于密度的聚类方法)等。通过聚类获得的每一个类簇对应一个运动物体实例,即每一个类簇均可以被作为待处理图像中的一个运动物体。
在一个可选示例中,针对任一个类簇而言,本公开可以根据该类簇中的多个像素(如所有像素)的速度大小和速度方向,确定出该类簇所对应的运动物体实例的速度大小和速度方向。可选的,本公开可以用该类簇中的所有像素的平均速度大小以及平均方向,来表示该类簇所对应的运动物体实例的速度大小和方向。例如,本公开可使用下述公式(20)来表示一类簇所对应的运动物体实例的速度大小和方向:
Figure PCTCN2019114611-appb-000023
在上述公式(20)中,|v o|表示聚类处理获得的一类簇所对应的运动物体实例的速度大小;|v i|表示该类簇中的第i个像素的速度大小;n表示该类簇所包含的像素的数量;
Figure PCTCN2019114611-appb-000024
表示一类簇所对应的运动物体实例的速度方向;
Figure PCTCN2019114611-appb-000025
表示该类簇中的第i个像素的速度方向。
在一个可选示例中,本公开还可以根据属于同一个类簇的多个像素(如所有像素)在二维图像中的位置信息(即在待处理图像中的二维坐标),确定该类簇所对应的运动物体实例在待处理图像中的运动物体检测框(Bounding-Box)。例如,对于一个类簇而言,本公开可以计算该类簇中的所有像素在待处理图像中的最大列坐标u max以及最小列坐标u min,并计算该类簇中的所有像素的最大的行坐标v max以及最小行坐标v min(注:假定图像坐标系的原点位于图像的左上角)。本公开所获得的运动物体检测框在待处理图像中的坐标可以表示为(u min,v min,u max,v max)。
可选的,本公开确定出的待处理图像中的运动物体检测框的一个例子如图16中的下图所示。如果在运动掩膜中体现出运动物体检测框,则如图16中的上图所示。图16的上图和下图中的多个矩形框均为本公开获得的运动物体检测框。
在一个可选示例中,本公开也可以根据属于同一个类簇的多个像素在三维空间中的位置信息,确定运动物体在三维空间中的位置信息。运动物体在三维空间中的位置信息包括但不限于:运动物体在水平方向坐标轴(X坐标轴)上的坐标、运动物体在深度方向坐标轴(Z坐标轴)上的坐标以及运动物体在竖直方向上的高度(即运动物体的高度)等。
可选的,本公开可以先根据属于同一个类簇的所有像素在三维空间中的位置信息,确定一该类簇中的所有像素与摄像装置之间的距离,然后,将距离最近的像素在三维空间中的位置信息,作为运动物体在三维空间中的位置信息。
可选的,本公开可以采用下述公式(21)来计算一个类簇中的多个像素与摄像装置之间的距离,并选取出最小距离:
Figure PCTCN2019114611-appb-000026
在上述公式(21)中,d min表示最小距离;X i表示一个类簇中的第i个像素的X坐标;Z i表示一个类簇中的第i个像素的Z坐标。
在确定出最小距离后,可以将具有该最小距离的像素的X坐标和Z坐标作为该运动物体在三维空间中的位置信息,如下述公式(22)所示:
O X=X close
O Z=Z close         公式(22)
在上述公式(22)中,O X表示运动物体在水平方向坐标轴上的坐标,即运动物体的X坐标;O Z表示运动物体在深度方向坐标轴(Z坐标轴)上的坐标,即运动物体的Z坐标;X close表示上述计算出具有最小距离的像素的X坐标;Z close表示上述计算出具有最小距离的像素的Z坐标。
可选的,本公开可以采用下述公式(23)来计算运动物体的高度:
O H=Y max-Y min                 公式(23)
在上述公式(23)中,O H表示运动物体在三维空间中的高度;Y max表示一类簇中的所有像素在三维空间中的最大Y坐标;Y min表示一类簇中的所有像素在三维空间中的最小Y坐标。
本公开训练卷积神经网络的一个实施方式的流程,如图17所示。
S1700、将双目图像样本中的一目图像样本输入至待训练的卷积神经网络中。
可选的,本公开输入卷积神经网络中的图像样本可以始终为双目图像样本的左目图像样本,也可以始终为双目图像样本的右目图像样本。在输入卷积神经网络中的图像样本始终为双目图像样本的左目图像样本的情况下,成功训练后的卷积神经网络,在测试或者实际应用场景中,会将输入的待处理图像作为待处理左目图像。在输入卷积神经网络中的图像样本始终为双目图像样本的右目图像样本的情况下,成功训练后的卷积神经网络,在测试或者实际应用场景中,会将输入的待处理图像作为待处理右目图像。
S1710、经由卷积神经网络进行视差分析处理,基于该卷积神经网络的输出,获得左目图像样本的视差图和右目图像样本的视差图。
S1720、根据左目图像样本及右目图像样本的视差图重建右目图像。
可选的,本公开重建右目图像的方式包括但不限于:对左目图像样本以及右目图像样本的视差图进行重投影计算,从而获得重建的右目图像。
S1730、根据右目图像样本及左目图像样本的视差图重建左目图像。
可选的,本公开重建左目图像的方式包括但不限于:对右目图像样本以及左目图像样本的视差图进行重投影计算,从而获得重建的左目图像。
S1740、根据重建的左目图像和左目图像样本之间的差异、以及重建的右目图像和右目图像样本之间的差异,调整卷积神经网络的网络参数。
可选的,本公开在确定差异时,所采用的损失函数包括但不限于:L1损失函数、smooth损失函数以及lr-Consistency损失函数等。另外,本公开在将计算出的损失反向传播,以调整卷积神经网络的网络参数(如卷积核的权值)时,可以基于卷积神经网络的链式求导所计算出的梯度,来反向传播损失,从而有利于提高卷积神经网络的训练效率。
在一个可选示例中,在针对卷积神经网络的训练达到预定迭代条件时,本次训练过程结束。本公开中的预定迭代条件可以包括:基于卷积神经网络输出的视差图而重建的左目图像与左目图像样本之间的差异、以及基于卷积神经网络输出的视差图而重建的右目图像和右目图像样本之间的差异,满足预定差异要求。在该差异满足要求的情况下,本次对卷积神经网络成功训练完成。本公开中的预定迭代条件也可以包括:对卷积神经网络进行训练,所使用的双目图像样本的数量达到预定数量要求等。在使用的双目图像样本的数量达到预定数量要求,然而,基于卷积神经网络输出的视差图而重建的左目图像与左目图像样本之间的差异、以及基于卷积神经网络输出的视差图而重建的右目图像和右目图像样本之间的差异,并未满足预定差异要求情况下,本次对卷积神经网络并未训练成功。
图18为本公开的智能驾驶控制方法的一个实施例的流程图。本公开的智能驾驶控制方法可以适用但不限于:自动驾驶(如完全无人辅助的自动驾驶)环境或辅助驾驶环境中。
S1800、通过车辆上设置的摄像装置获取车辆所在路面的视频流。该摄像装置包括但不限于:基于RGB的摄像装置等。
S1810、对视频流包括的至少一视频帧进行运动物体检测,获得视频帧中的运动物体,例如,获得视频帧中的物体在三维空间的运动信息。本步骤的具体实现过程可参见上述方法实施方式中针对图1的描述,在此不再详细说明。
S1820、根据视频帧中的运动物体生成并输出车辆的控制指令。例如,根据视频帧中的物体在三维空间的运动信息生成并输出车辆的控制指令,以控制车辆。
可选的,本公开生成的控制指令包括但不限于:速度保持控制指令、速度调整控制指令(如减速行驶指令、加速行驶指令等)、方向保持控制指令、方向调整控制指令(如左转向指令、右转向指令、向左侧车道并线指令、或者向右侧车道并线指令等)、鸣笛指令、预警提示控制指令或者驾驶模式切换控制指令(如切换为自动巡航驾驶模式等)。
需要特别说明的是,本公开的运动物体检测技术除了可以适用于智能驾驶控制领域之外,还可以应用在其他领域中;例如,可以实现工业制造中的运动物体检测、超市等室内领域的运动物体检测以及安防领域中的运动物体检测等等,本公开不限制运动物体检测技术的适用场景。
本公开提供的运动物体检测装置如图19所示。图19所示的装置包括:第一获取模块1900、第二获取模块1910、第三获取模块1920以及确定运动物体模块1930。可选的,该装置还可以包括:训练模块。
第一获取模块1900用于获取待处理图像中的像素的深度信息。可选的,第一获取模块1900可包括:第一子模块和第二子模块。第一子模块用于获取待处理图像的第一视差图。第二子模块用于根据待处理图像的第一视差图,获取待处理图像中的像素的深度信息。可选的,本公开中的待处理图像包括:单目图像。第一子模块包括:第一单元、第二单元和第三单元。其中的第一单元用于将待处理图像输入至卷积神经网络中,经由卷积神经网络进行视差分析处理,基于卷积神经网络的输出,获得待处理图像的第一视差图。其中,所述卷积神经网络是训练模块利用双目图像样本,训练获得的。其中的第二单元用于获取待处理图像的第一水平镜像图的第二视差图的第二水平镜像图,待处理图像的第一水平镜像图是对待处理图像进行水平方向上的镜像处理所形成的镜像图,第二视差图的第二水平镜像图是对第二视差图进行水平方向上的镜像处理所形成的镜像图。其中的第三单元用于根据待处理图像的第一视差图的权重分布图以及第二视差图的第二水平镜像图的权重分布图,对待处理图像的第一视差图进行视差调整,最终获得待处理图像的第一视差图。
可选的,第二单元可以将待处理图像的第一水平镜像图输入至卷积神经网络中,经由卷积神经网络进行视差分析处理,基于神经网络的输出,获得待处理图像的第一水平镜像图的第二视差图;第二单元对待处理图像的第一水平镜像图的第二视差图进行镜像处理,获得待处理图像的第一水平镜像图的第二视差图的第二水平镜像图。
可选的,本公开中的权重分布图包括:第一权重分布图以及第二权重分布图中的至少一个;第一权重分布图是针对多个待处理图像统一设置的权重分布图;第二权重分布图是针对不同待处理图像分别设置的权重分布图。第一权重分布图包括至少两个左右分列的区域,不同区域具有不同的权重值。
在待处理图像被作为左目图像的情况下:对于待处理图像的第一视差图的第一权重分布图中的任意两个区域而言,位于右侧的区域的权重值大于位于左侧的区域的权重值;对于第二视差图的第二水平镜像图的第一权重分布图中的任意两个区域而言,位于右侧的区域的权重值大于位于左侧的区域的权重值。对于待处理图像的第一视差图的第一权重分布图中的至少一区域而言,该区域中左侧部分的权重值不大于该区域中右侧部分的权重值;对于第二视差图的第二水平镜像图的第一权重分布图中的至少一区域而言,该区域中左侧部分的权重值不大于该区域中右侧部分的权重值。
在待处理图像被作为右目图像的情况下:对于待处理图像的第一视差图的第一权重分布图中的任意两个区域而言,位于左侧的区域的权重值大于位于右侧的区域的权重值;对于第二视差图的第二水平镜像图的第一权重分布图中的任意两个区域而言,位于左侧的区域的权重值大于位于右侧的区域的权重值。对于待处理图像的第一视差图的第一权重分布图中的至少一区域而言,该区域中右侧部分的权重值不大于该区域中左侧部分的权重值;对于第二视差图的第二水平镜像图的第一权重分布图中的至少一区域而言,该区域中右侧部分的权重值不大于该区域中左侧部分的权重值。
可选的,第三单元还用于设置待处理图像的第一视差图的第二权重分布图,例如,第三单元对待处理图像的第一视差图进行水平镜像处理,形成镜像视差图;对于镜像视差图中的任一像素点而言,如果该像素点的视差值大于该像素点对应的第一变量,则将待处理图像的第二权重分布图中的该像素点的权重值设置为第一值,否则,设置为第二值;其中,第一值大于第二值。其中,像素点对应的第一变量是根据待处理图像的第一视差图中的该像素点的视差值以及大于零的常数值,设置的。
可选的,第三单元还用于设置第二视差图的第二水平镜像图的第二权重分布图,例如,对于第二视差图的第二水平镜像图中的任一像素点而言,如果待处理图像的第一视差图中的该像素点的视差值大于该像素点对应的第二变量,则第三单元将第二视差图的第二水平镜像图的第二权重分布图中的该像素点的权重值设置为第一值,否则,第三单元将其设置为第二值;其中,第一值大于第二值。其中,像素点对应的第二变量是根据待处理图像的第一视差图的水平镜像图中的相应像素点的视差值以及大于零的常数值,设置的。
可选的,第三单元可以进一步用于:首先,根据待处理图像的第一视差图的第一权重分布图和第二权重分布图,调整待处理图像的第一视差图中的视差值;之后,第三单元根据第二视差图的第二水平镜像图的第一权重分布图和第二权重分布图,调整第二视差图的第二水平镜像图中的视差值;最后,第三单元合并视差值调整后的第一视差图和视差值调整后的第二水平镜像图,最终获得待处理图像的第一视差图。第一获取模块1900以及其包括的各子模块和单元具体执行的操作,可以参见上述针对S100的描述,在此不再详细说明。
第二获取模块1910用于获取待处理图像和参考图像之间的光流信息。其中的参考图像和待处理 图像是基于摄像装置的连续拍摄而获得的具有时序关系的两幅图像。例如,待处理图像为摄像装置拍摄的视频中的一视频帧,待处理图像的参考图像包括:视频帧的前一视频帧。
可选的,第二获取模块1910可以包括:第三子模块、第四子模块、第五子模块和第六子模块。其中的第三子模块用于获取摄像装置拍摄待处理图像和参考图像的位姿变化信息;第四子模块用于根据位姿变化信息,建立待处理图像中的像素的像素值与参考图像中的像素的像素值之间的对应关系;第五子模块,用于根据上述对应关系,对参考图像进行变换处理;第六子模块,用于根据待处理图像和变换处理后的参考图像,计算待处理图像和参考图像之间的光流信息。其中的第四子模块可以先根据深度信息和摄像装置的预设参数,获取待处理图像中的像素在待处理图像对应的摄像装置的三维坐标系中的第一坐标;之后,第四子模块可以根据位姿变化信息,将第一坐标转换到所述参考图像对应的摄像装置的三维坐标系中的第二坐标;之后,基于二维图像的二维坐标系,第四子模块对第二坐标进行投影处理,获得待处理图像的投影二维坐标;最后,第四子模块根据待处理图像的投影二维坐标和参考图像的二维坐标,建立待处理图像中的像素的像素值与参考图像中的像素的像素值之间的对应关系。第二获取模块1910以及其包括的各子模块和单元具体执行的操作,可参见上述针对S110的描述,在此不再详细说明。
第三获取模块1920用于根据深度信息和光流信息,获取待处理图像中的像素相对于参考图像的三维运动场。第三获取模块1920具体执行的操作,可以参见上述针对S120的描述,在此不再详细说明。
确定运动物体模块1930用于根据三维运动场,确定待处理图像中的运动物体。可选的,确定运动物体模块可以包括:第七子模块、第八子模块和第九子模块。第七子模块用于根据三维运动场,获取待处理图像中的像素在三维空间的运动信息。例如,第七子模块可以根据三维运动场以及拍摄待处理图像和参考图像之间的时间差,计算待处理图像中的像素在待处理图像对应的摄像装置的三维坐标系的三个坐标轴方向上的速度。第八子模块用于根据像素在三维空间的运动信息对像素进行聚类处理。例如,第八子模块包括:第四单元、第五单元和第六单元。第四单元用于根据像素在三维空间的运动信息,获取待处理图像的运动掩膜。其中的像素在三维空间的运动信息包括:像素在三维空间的速度大小,第四单元可以根据预设速度阈值,对待处理图像中的像素的速度大小进行过滤处理,形成待处理图像的运动掩膜。第五单元用于根据运动掩膜,确定待处理图像中的运动区域。第六单元用于根据运动区域中的像素的三维空间位置信息和运动信息,对运动区域中的像素进行聚类处理。例如,第六单元可以将运动区域中的像素的三维空间坐标值转化到预定坐标区间;之后,第六单元将运动区域中的像素的速度转化到预定速度区间;最后,第六单元根据转化后的三维空间坐标值和转化后的速度,对运动区域中的像素进行密度聚类处理,获得至少一个类簇。第九子模块用于根据聚类处理的结果,确定待处理图像中的运动物体。例如,针对任一个类簇,第九子模块可以根据该类簇中的多个像素的速度大小和速度方向,确定运动物体的速度大小和速度方向;其中,一个类簇被作为待处理图像中的一个运动物体。第九子模块还用于:根据属于同一个类簇的像素的空间位置信息,确定待处理图像中的运动物体检测框。确定运动物体模块1930以及其包括的各子模块和单元具体执行的操作,可以参见上述针对S130的描述,在此不再详细说明。
训练模块用于将双目图像样本中的其中一目图像样本输入至待训练的卷积神经网络中,经由卷积神经网络进行视差分析处理,基于卷积神经网络的输出,训练模块获得左目图像样本的视差图和右目图像样本的视差图;训练模块根据左目图像样本及右目图像样本的视差图重建右目图像;训练模块根据右目图像样本及左目图像样本的视差图重建左目图像;训练模块根据重建的左目图像和左目图像样本之间的差异、以及重建的右目图像和右目图像样本之间的差异,调整卷积神经网络的网络参数。训练模块执行的具体操作可以参见上述针对图17的描述,在此不再详细说明。
本公开提供的智能驾驶控制装置如图20所示。图20所示的装置包括:第四获取模块2000、运动物体检测装置2010以及控制模块2020。其中的第四获取模块2000用于通过车辆上设置的摄像装置获取车辆所在路面的视频流。运动物体检测装置2010用于对视频流包括的至少一视频帧进行运动物体检测,确定该视频帧中的运动物体。运动物体检测装置2010的结构以及各模块、子模块和单元具体执行的操作可以参见上述图19的描述,在此不再详细说明。控制模块2020用于根据运动物体生成并输出车辆的控制指令。控制模块2020生成并输出的控制指令包括但不限于:速度保持控制指令、速度调整控制指令、方向保持控制指令、方向调整控制指令、预警提示控制指令、驾驶模式切换控制指令。
示例性设备
图21示出了适于实现本公开的示例性设备2100,设备2100可以是汽车中配置的控制系统/电子 系统、移动终端(例如,智能移动电话等)、个人计算机(PC,例如,台式计算机或者笔记型计算机等)、平板电脑以及服务器等。图21中,设备2100包括一个或者多个处理器、通信部等,所述一个或者多个处理器可以为:一个或者多个中央处理单元(CPU)2101,和/或,一个或者多个利用神经网络进行视觉跟踪的图像处理器(GPU)2113等,处理器可以根据存储在只读存储器(ROM)2102中的可执行指令或者从存储部分2108加载到随机访问存储器(RAM)2103中的可执行指令而执行各种适当的动作和处理。通信部2112可以包括但不限于网卡,所述网卡可以包括但不限于IB(Infiniband)网卡。处理器可与只读存储器2102和/或随机访问存储器2103中通信以执行可执行指令,通过总线2104与通信部2112相连、并经通信部2112与其他目标设备通信,从而完成本公开中的相应步骤。
上述各指令所执行的操作可以参见上述方法实施例中的相关描述,在此不再详细说明。此外,在RAM 2103中,还可以存储有装置操作所需的各种程序以及数据。CPU2101、ROM2102以及RAM2103通过总线2104彼此相连。
在有RAM2103的情况下,ROM2102为可选模块。RAM2103存储可执行指令,或在运行时向ROM2102中写入可执行指令,可执行指令使中央处理单元2101执行上述运动物体检测方法或者智能驾驶控制方法所包括的步骤。输入/输出(I/O)接口2105也连接至总线2104。通信部2112可以集成设置,也可以设置为具有多个子模块(例如,多个IB网卡),并分别与总线连接。
以下部件连接至I/O接口2105:包括键盘、鼠标等的输入部分2106;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分2107;包括硬盘等的存储部分2108;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分2109。通信部分2109经由诸如因特网的网络执行通信处理。驱动器2110也根据需要连接至I/O接口2105。可拆卸介质2111,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器2110上,以便于从其上读出的计算机程序根据需要被安装在存储部分2108中。
需要特别说明的是,如图21所示的架构仅为一种可选实现方式,在具体实践过程中,可根据实际需要对上述图21的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如,GPU2113和CPU2101可分离设置,再如理,可将GPU2113集成在CPU2101上,通信部可分离设置,也可集成设置在CPU2101或GPU2113上等。这些可替换的实施方式均落入本公开的保护范围。
特别地,根据本公开的实施方式,下文参考流程图描述的过程可以被实现为计算机软件程序,例如,本公开实施方式包括一种计算机程序产品,其包含有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的步骤的程序代码,程序代码可包括对应执行本公开提供的方法中的步骤对应的指令。
在这样的实施方式中,该计算机程序可以通过通信部分2109从网络上被下载及安装,和/或从可拆卸介质2111被安装。在该计算机程序被中央处理单元(CPU)2101执行时,执行本公开中记载的实现上述相应步骤的指令。
在一个或多个可选实施方式中,本公开实施例还提供了一种计算机程序程序产品,用于存储计算机可读指令,所述指令被执行时使得计算机执行上述任意实施例中所述的运动物体检测方法或者智能驾驶控制方法。该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选例子中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选例子中,所述计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
在一个或多个可选实施方式中,本公开实施例还提供了另一种运动物体检测方法或者智能驾驶控制方法及其对应的装置和电子设备、计算机存储介质、计算机程序以及计算机程序产品,其中的方法包括:第一装置向第二装置发送运动物体检测指示或者智能驾驶控制指示,该指示使得第二装置执行上述任一可能的实施例中的运动物体检测方法或者智能驾驶控制方法;第一装置接收第二装置发送的运动物体检测结果或者智能驾驶控制结果。
在一些实施例中,该视运动物体检测指示或者智能驾驶控制指示可以具体为调用指令,第一装置可以通过调用的方式指示第二装置执行运动物体检测操作或者智能驾驶控制操作,相应地,响应于接收到调用指令,第二装置可以执行上述运动物体检测方法或者智能驾驶控制方法中的任意实施例中的步骤和/或流程。
应理解,本公开实施例中的“第一”、“第二”等术语仅仅是为了区分,而不应理解成对本公开实施例的限定。还应理解,在本公开中,“多个”可以指两个或两个以上,“至少一个”可以指一个、两个或两个以上。还应理解,对于本公开中提及的任一部件、数据或结构,在没有明确限定或者在前后文给出相反启示的情况下,一般可以理解为一个或多个。还应理解,本公开对各个实施例的描述着 重强调各个实施例之间的不同之处,其相同或相似之处可以相互参考,为了简洁,不再一一赘述。
可能以许多方式来实现本公开的方法和装置、电子设备以及计算机可读存储介质。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本公开的方法和装置、电子设备以及计算机可读存储介质。用于方法的步骤的上述顺序仅是为了进行说明,本公开的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施方式中,还可将本公开实施为记录在记录介质中的程序,这些程序包括用于实现根据本公开的方法的机器可读指令。因而,本公开还覆盖存储用于执行根据本公开的方法的程序的记录介质。
本公开的描述,是为了示例和描述起见而给出的,而并不是无遗漏的或者将本公开限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言,是显然的。选择和描述实施方式是为了更好说明本公开的原理以及实际应用,并且使本领域的普通技术人员能够理解本公开实施例可以从而设计适于特定用途的带有各种修改的各种实施方式。

Claims (61)

  1. 一种运动物体检测方法,其特征在于,包括:
    获取待处理图像中的像素的深度信息;
    获取所述待处理图像和参考图像之间的光流信息;其中,所述参考图像和所述待处理图像是基于摄像装置的连续拍摄而获得的具有时序关系的两幅图像;
    根据所述深度信息和光流信息,获取所述待处理图像中的像素相对于所述参考图像的三维运动场;
    根据所述三维运动场,确定所述待处理图像中的运动物体。
  2. 根据权利要求1所述的方法,其特征在于,所述待处理图像为所述摄像装置拍摄的视频中的一视频帧,所述待处理图像的参考图像包括:所述视频帧的前一视频帧。
  3. 根据权利要求1或2所述的方法,其特征在于,所述获取待处理图像中的像素的深度信息,包括:
    获取待处理图像的第一视差图;
    根据所述第一视差图,获取所述待处理图像中的像素的深度信息。
  4. 根据权利要求3所述的方法,其特征在于,所述待处理图像包括:单目图像,所述获取待处理图像的第一视差图,包括:
    将待处理图像输入至卷积神经网络中,经由所述卷积神经网络进行视差分析处理,基于所述卷积神经网络的输出,获得所述待处理图像的第一视差图;
    其中,所述卷积神经网络是利用双目图像样本,训练获得的。
  5. 根据权利要求4所述的方法,其特征在于,所述获取待处理图像的第一视差图,还包括:
    获取所述待处理图像的第一水平镜像图的第二视差图的第二水平镜像图,所述待处理图像的第一水平镜像图是对所述待处理图像进行水平方向上的镜像处理所形成的镜像图,所述第二视差图的第二水平镜像图是对所述第二视差图进行水平方向上的镜像处理所形成的镜像图;
    根据所述第一视差图的权重分布图、以及所述第二水平镜像图的权重分布图,对所述第一视差图进行视差调整,最终获得所述待处理图像的第一视差图。
  6. 根据权利要求5所述的方法,其特征在于,所述获取所述待处理图像的第一水平镜像图的第二视差图的第二水平镜像图,包括:
    将待处理图像的第一水平镜像图输入至卷积神经网络中,经由所述卷积神经网络进行视差分析处理,基于所述神经网络的输出,获得所述待处理图像的第一水平镜像图的第二视差图;
    对所述第二视差图进行镜像处理,获得所述第二水平镜像图。
  7. 根据权利要求5或6所述的方法,其特征在于,所述权重分布图包括:第一权重分布图以及第二权重分布图中的至少一个;
    所述第一权重分布图是针对多个待处理图像统一设置的权重分布图;
    所述第二权重分布图是针对不同待处理图像分别设置的权重分布图。
  8. 根据权利要求7所述的方法,其特征在于,所述第一权重分布图包括至少两个左右分列的区域,不同区域具有不同的权重值。
  9. 根据权利要求7或8所述的方法,其特征在于,在所述待处理图像被作为左目图像的情况下:
    对于所述第一视差图的第一权重分布图中的任意两个区域而言,位于右侧的区域的权重值大于位于左侧的区域的权重值;
    对于所述第二水平镜像图的第一权重分布图中的任意两个区域而言,位于右侧的区域的权重值大于位于左侧的区域的权重值。
  10. 根据权利要求9所述的方法,其特征在于:
    对于所述第一视差图的第一权重分布图中的至少一区域而言,该区域中左侧部分的权重值不大于该区域中右侧部分的权重值;
    对于所述第二水平镜像图的第一权重分布图中的至少一区域而言,该区域中左侧部分的权重值不大于该区域中右侧部分的权重值。
  11. 根据权利要求7或8所述的方法,其特征在于,在所述待处理图像被作为右目图像的情况下:
    对于所述第一视差图的第一权重分布图中的任意两个区域而言,位于左侧的区域的权重值大于位于右侧的区域的权重值;
    对于所述第二水平镜像图的第一权重分布图中的任意两个区域而言,位于左侧的区域的权重值大 于位于右侧的区域的权重值。
  12. 根据权利要求11所述的方法,其特征在于:
    对于所述第一视差图的第一权重分布图中的至少一区域而言,该区域中右侧部分的权重值不大于该区域中左侧部分的权重值;
    对于所述第二水平镜像图的第一权重分布图中的至少一区域而言,该区域中右侧部分的权重值不大于该区域中左侧部分的权重值。
  13. 根据权利要求7至12中任一项所述的方法,其特征在于,所述第一视差图的第二权重分布图的设置方式包括:
    对所述第一视差图进行水平镜像处理,形成镜像视差图;
    对于所述镜像视差图中的任一像素点而言,如果该像素点的视差值大于该像素点对应的第一变量,则将所述第一视差图的第二权重分布图中的该像素点的权重值设置为第一值,否则,设置为第二值;
    其中,所述第一值大于第二值。
  14. 根据权利要求13所述的方法,其特征在于,所述像素点对应的第一变量是根据所述第一视差图中的该像素点的视差值以及大于零的常数值,设置的。
  15. 根据权利要求7至14中任一项所述的方法,其特征在于,所述第二水平镜像图的第二权重分布图的设置方式包括:
    对于所述第二水平镜像图中的任一像素点而言,如果所述第一视差图中的该像素点的视差值大于该像素点对应的第二变量,则将所述第二水平镜像图的第二权重分布图中的该像素点的权重值设置为第一值,否则,设置为第二值;
    其中,所述第一值大于第二值。
  16. 根据权利要求15所述的方法,其特征在于,所述像素点对应的第二变量是根据所述第一视差图的水平镜像图中的相应像素点的视差值以及大于零的常数值,设置的。
  17. 根据权利要求7至16所述的方法,其特征在于,所述根据所述第一视差图的权重分布图、以及所述第二水平镜像图的权重分布图,对所述第一视差图进行视差调整,包括:
    根据所述第一视差图的第一权重分布图和第二权重分布图,调整所述第一视差图中的视差值;
    根据所述第二水平镜像图的第一权重分布图和第二权重分布图,调整所述第二水平镜像图中的视差值;
    合并视差值调整后的第一视差图和视差值调整后的第二水平镜像图,最终获得所述待处理图像的第一视差图。
  18. 根据权利要求4至17中任一项所述的方法,其特征在于,所述卷积神经网络的训练过程,包括:
    将双目图像样本中的其中一目图像样本输入至待训练的卷积神经网络中,经由所述卷积神经网络进行视差分析处理,基于所述卷积神经网络的输出,获得左目图像样本的视差图和右目图像样本的视差图;
    根据所述左目图像样本及所述右目图像样本的视差图重建右目图像;
    根据所述右目图像样本及所述左目图像样本的视差图重建左目图像;
    根据重建的左目图像和左目图像样本之间的差异、以及重建的右目图像和右目图像样本之间的差异,调整所述卷积神经网络的网络参数。
  19. 根据权利要求1至18中任一项所述的方法,其特征在于,所述获取所述待处理图像和参考图像之间的光流信息,包括:
    获取摄像装置拍摄所述待处理图像和所述参考图像的位姿变化信息;
    根据所述位姿变化信息,建立所述待处理图像中的像素的像素值与所述参考图像中的像素的像素值之间的对应关系;
    根据所述对应关系,对参考图像进行变换处理;
    根据所述待处理图像和所述变换处理后的参考图像,计算所述待处理图像和参考图像之间的光流信息。
  20. 根据权利要求19所述的方法,其特征在于,所述根据所述位姿变化信息,建立所述待处理图像中的像素的像素值与所述参考图像中的像素的像素值之间的对应关系,包括:
    根据所述深度信息和摄像装置的预设参数,获取所述待处理图像中的像素在待处理图像对应的摄像装置的三维坐标系中的第一坐标;
    根据所述位姿变化信息,将所述第一坐标转换到所述参考图像对应的摄像装置的三维坐标系中的第二坐标;
    基于二维图像的二维坐标系,对所述第二坐标进行投影处理,获得所述待处理图像的投影二维坐标;
    根据所述待处理图像的投影二维坐标和所述参考图像的二维坐标,建立所述待处理图像中的像素的像素值与所述参考图像中的像素的像素值之间的对应关系。
  21. 根据权利要求1-20任一所述的方法,其特征在于,根据所述三维运动场,确定所述待处理图像中的运动物体,包括:
    根据所述三维运动场,获取所述待处理图像中的像素在三维空间的运动信息;
    根据所述像素在三维空间的运动信息,对所述像素进行聚类处理;
    根据所述聚类处理的结果,确定所述待处理图像中的运动物体。
  22. 根据权利要求21所述的方法,其特征在于,所述根据所述三维运动场,获取所述待处理图像中的像素在三维空间的运动信息,包括:
    根据所述三维运动场以及拍摄所述待处理图像和所述参考图像之间的时间差,计算所述待处理图像中的像素在待处理图像对应的摄像装置的三维坐标系的三个坐标轴方向上的速度。
  23. 根据权利要求21或22所述的方法,其特征在于,所述根据所述像素在三维空间的运动信息,对所述像素进行聚类处理,包括:
    根据所述像素在三维空间的运动信息,获取所述待处理图像的运动掩膜;
    根据所述运动掩膜,确定待处理图像中的运动区域;
    根据运动区域中的像素的三维空间位置信息和运动信息,对所述运动区域中的像素进行聚类处理。
  24. 根据权利要求23所述的方法,其特征在于,所述像素在三维空间的运动信息包括:像素在三维空间的速度大小,所述根据所述像素在三维空间的运动信息,获取所述待处理图像的运动掩膜,包括:
    根据预设速度阈值,对所述待处理图像中的像素的速度大小进行过滤处理,形成所述待处理图像的运动掩膜。
  25. 根据权利要求23或24所述的方法,其特征在于,所述根据运动区域中的像素的三维空间位置信息和运动信息,对所述运动区域中的像素进行聚类处理,包括:
    将所述运动区域中的像素的三维空间坐标值转化到预定坐标区间;
    将所述运动区域中的像素的速度转化到预定速度区间;
    根据转化后的三维空间坐标值和转化后的速度,对所述运动区域中的像素进行密度聚类处理,获得至少一个类簇。
  26. 根据权利要求25所述的方法,其特征在于,所述根据所述聚类处理的结果,确定所述待处理图像中的运动物体,包括:
    针对任一个类簇,根据该类簇中的多个像素的速度大小和速度方向,确定运动物体的速度大小和速度方向;
    其中,一个类簇被作为待处理图像中的一个运动物体。
  27. 根据权利要求21至26中任一项所述的方法,其特征在于,所述根据所述聚类处理的结果,确定所述待处理图像中的运动物体,还包括:
    根据属于同一个类簇的像素的空间位置信息,确定所述待处理图像中的运动物体检测框。
  28. 一种智能驾驶控制方法,其特征在于,包括:
    通过车辆上设置的摄像装置获取所述车辆所在路面的视频流;
    采用如权利要求1-27中任一项所述的方法,对所述视频流包括的至少一视频帧进行运动物体检测,确定该视频帧中的运动物体;
    根据所述运动物体生成并输出所述车辆的控制指令。
  29. 根据权利要求28所述的方法,其特征在于,所述控制指令包括以下至少之一:速度保持控制指令、速度调整控制指令、方向保持控制指令、方向调整控制指令、预警提示控制指令、驾驶模式切换控制指令。
  30. 一种运动物体检测装置,其特征在于,包括:
    第一获取模块,用于获取待处理图像中的像素的深度信息;
    第二获取模块,用于获取所述待处理图像和参考图像之间的光流信息;其中,所述参考图像和所 述待处理图像是基于摄像装置的连续拍摄而获得的具有时序关系的两幅图像;
    第三获取模块,用于根据所述深度信息和光流信息,获取所述待处理图像中的像素相对于所述参考图像的三维运动场;
    确定运动物体模块,用于根据所述三维运动场,确定所述待处理图像中的运动物体。
  31. 根据权利要求30所述的装置,其特征在于,所述待处理图像为所述摄像装置拍摄的视频中的一视频帧,所述待处理图像的参考图像包括:所述视频帧的前一视频帧。
  32. 根据权利要求30或31所述的装置,其特征在于,所述第一获取模块包括:
    第一子模块,用于获取待处理图像的第一视差图;
    第二子模块,用于根据所述待处理图像的第一视差图,获取所述待处理图像中的像素的深度信息。
  33. 根据权利要求32所述的装置,其特征在于,所述待处理图像包括:单目图像,所述第一子模块,包括:
    第一单元,用于将待处理图像输入至卷积神经网络中,经由所述卷积神经网络进行视差分析处理,基于所述卷积神经网络的输出,获得所述待处理图像的第一视差图;
    其中,所述卷积神经网络是利用双目图像样本,训练获得的。
  34. 根据权利要求33所述的装置,其特征在于,所述第一子模块,还包括:
    第二单元,用于获取所述待处理图像的第一水平镜像图的第二视差图的第二水平镜像图,所述待处理图像的第一水平镜像图是对所述待处理图像进行水平方向上的镜像处理所形成的镜像图,所述第二视差图的第二水平镜像图是对所述第二视差图进行水平方向上的镜像处理所形成的镜像图;
    第三单元,用于根据所述第一视差图的权重分布图、以及所述第二水平镜像图的权重分布图,对所述第一视差图进行视差调整,最终获得所述待处理图像的第一视差图。
  35. 根据权利要求34所述的装置,其特征在于,所述第二单元用于:
    将所述第一水平镜像图输入至卷积神经网络中,经由所述卷积神经网络进行视差分析处理,基于所述神经网络的输出,获得所述第二视差图;
    对所述第二视差图进行镜像处理,获得所述第二水平镜像图。
  36. 根据权利要求34或35所述的装置,其特征在于,所述权重分布图包括:第一权重分布图以及第二权重分布图中的至少一个;
    所述第一权重分布图是针对多个待处理图像统一设置的权重分布图;
    所述第二权重分布图是针对不同待处理图像分别设置的权重分布图。
  37. 根据权利要求36所述的装置,其特征在于,所述第一权重分布图包括至少两个左右分列的区域,不同区域具有不同的权重值。
  38. 根据权利要求36或37所述的装置,其特征在于,在所述待处理图像被作为左目图像的情况下:
    对于所述第一视差图的第一权重分布图中的任意两个区域而言,位于右侧的区域的权重值大于位于左侧的区域的权重值;
    对于所述第二水平镜像图的第一权重分布图中的任意两个区域而言,位于右侧的区域的权重值大于位于左侧的区域的权重值。
  39. 根据权利要求38所述的装置,其特征在于:
    对于所述第一视差图的第一权重分布图中的至少一区域而言,该区域中左侧部分的权重值不大于该区域中右侧部分的权重值;
    对于所述第二水平镜像图的第一权重分布图中的至少一区域而言,该区域中左侧部分的权重值不大于该区域中右侧部分的权重值。
  40. 根据权利要求36或37所述的装置,其特征在于,在所述待处理图像被作为右目图像的情况下:
    对于所述第一视差图的第一权重分布图中的任意两个区域而言,位于左侧的区域的权重值大于位于右侧的区域的权重值;
    对于所述第二水平镜像图的第一权重分布图中的任意两个区域而言,位于左侧的区域的权重值大于位于右侧的区域的权重值。
  41. 根据权利要求40所述的装置,其特征在于:
    对于所述第一视差图的第一权重分布图中的至少一区域而言,该区域中右侧部分的权重值不大于该区域中左侧部分的权重值;
    对于所述第二水平镜像图的第一权重分布图中的至少一区域而言,该区域中右侧部分的权重值不 大于该区域中左侧部分的权重值。
  42. 根据权利要求36至41中任一项所述的装置,其特征在于,所述第三单元还用于设置所述第一视差图的第二权重分布图,第三单元设置所述第一视差图的第二权重分布图的方式包括:
    对所述第一视差图进行水平镜像处理,形成镜像视差图;
    对于所述镜像视差图中的任一像素点而言,如果该像素点的视差值大于该像素点对应的第一变量,则将所述第一视差图的第二权重分布图中的该像素点的权重值设置为第一值,否则,设置为第二值;
    其中,所述第一值大于第二值。
  43. 根据权利要求42所述的装置,其特征在于,所述像素点对应的第一变量是根据所述第一视差图中的该像素点的视差值以及大于零的常数值,设置的。
  44. 根据权利要求36至43中任一项所述的装置,其特征在于,所述第三单元还用于设置所述第二水平镜像图的第二权重分布图,所述第三单元设置所述第二视差图的第二水平镜像图的第二权重分布图的方式包括:
    对于所述第二水平镜像图中的任一像素点而言,如果所述第一视差图中的该像素点的视差值大于该像素点对应的第二变量,则将所述第二水平镜像图的第二权重分布图中的该像素点的权重值设置为第一值,否则,设置为第二值;
    其中,所述第一值大于第二值。
  45. 根据权利要求44所述的装置,其特征在于,所述像素点对应的第二变量是根据所述第一视差图的水平镜像图中的相应像素点的视差值以及大于零的常数值,设置的。
  46. 根据权利要求36至45所述的装置,其特征在于,所述第三单元用于:
    根据所述第一视差图的第一权重分布图和第二权重分布图,调整所述第一视差图中的视差值;
    根据所述第二水平镜像图的第一权重分布图和第二权重分布图,调整所述第二水平镜像图中的视差值;
    合并视差值调整后的第一视差图和视差值调整后的第二水平镜像图,最终获得所述待处理图像的第一视差图。
  47. 根据权利要求33至46中任一项所述的装置,其特征在于,所述装置还包括:训练模块,用于:
    将双目图像样本中的其中一目图像样本输入至待训练的卷积神经网络中,经由所述卷积神经网络进行视差分析处理,基于所述卷积神经网络的输出,获得左目图像样本的视差图和右目图像样本的视差图;
    根据所述左目图像样本及所述右目图像样本的视差图重建右目图像;
    根据所述右目图像样本及所述左目图像样本的视差图重建左目图像;
    根据重建的左目图像和左目图像样本之间的差异、以及重建的右目图像和右目图像样本之间的差异,调整所述卷积神经网络的网络参数。
  48. 根据权利要求30至47中任一项所述的装置,其特征在于,所述第二获取模块,包括:
    第三子模块,用于获取摄像装置拍摄所述待处理图像和所述参考图像的位姿变化信息;
    第四子模块,用于根据所述位姿变化信息,建立所述待处理图像中的像素的像素值与所述参考图像中的像素的像素值之间的对应关系;
    第五子模块,用于根据所述对应关系,对参考图像进行变换处理;
    第六子模块,用于根据所述待处理图像和所述变换处理后的参考图像,计算所述待处理图像和参考图像之间的光流信息。
  49. 根据权利要求48所述的装置,其特征在于,所述第四子模块用于:
    根据所述深度信息和摄像装置的预设参数,获取所述待处理图像中的像素在待处理图像对应的摄像装置的三维坐标系中的第一坐标;
    根据所述位姿变化信息,将所述第一坐标转换到所述参考图像对应的摄像装置的三维坐标系中的第二坐标;
    基于二维图像的二维坐标系,对所述第二坐标进行投影处理,获得所述待处理图像的投影二维坐标;
    根据所述待处理图像的投影二维坐标和所述参考图像的二维坐标,建立所述待处理图像中的像素的像素值与所述参考图像中的像素的像素值之间的对应关系。
  50. 根据权利要求30-49任一所述的装置,其特征在于,所述确定运动物体模块,包括:
    第七子模块,用于根据所述三维运动场,获取所述待处理图像中的像素在三维空间的运动信息;
    第八子模块,用于根据所述像素在三维空间的运动信息,对所述像素进行聚类处理;
    第九子模块,用于根据所述聚类处理的结果,确定所述待处理图像中的运动物体。
  51. 根据权利要求50所述的装置,其特征在于,所述第七子模块用于:
    根据所述三维运动场以及拍摄所述待处理图像和所述参考图像之间的时间差,计算所述待处理图像中的像素在待处理图像对应的摄像装置的三维坐标系的三个坐标轴方向上的速度。
  52. 根据权利要求50或51所述的装置,其特征在于,所述第八子模块包括:
    第四单元,用于根据所述像素在三维空间的运动信息,获取所述待处理图像的运动掩膜;
    第五单元,用于根据所述运动掩膜,确定待处理图像中的运动区域;
    第六单元,用于根据运动区域中的像素的三维空间位置信息和运动信息,对所述运动区域中的像素进行聚类处理。
  53. 根据权利要求52所述的装置,其特征在于,所述像素在三维空间的运动信息包括:像素在三维空间的速度大小,所述第四单元用于:
    根据预设速度阈值,对所述待处理图像中的像素的速度大小进行过滤处理,形成所述待处理图像的运动掩膜。
  54. 根据权利要求52或53所述的装置,其特征在于,所述第六单元用于:
    将所述运动区域中的像素的三维空间坐标值转化到预定坐标区间;
    将所述运动区域中的像素的速度转化到预定速度区间;
    根据转化后的三维空间坐标值和转化后的速度,对所述运动区域中的像素进行密度聚类处理,获得至少一个类簇。
  55. 根据权利要求54所述的装置,其特征在于,所述第九子模块用于:
    针对任一个类簇,根据该类簇中的多个像素的速度大小和速度方向,确定运动物体的速度大小和速度方向;
    其中,一个类簇被作为待处理图像中的一个运动物体。
  56. 根据权利要求50至55中任一项所述的装置,其特征在于,所述第九子模块还用于:
    根据属于同一个类簇的像素的空间位置信息,确定所述待处理图像中的运动物体检测框。
  57. 一种智能驾驶控制装置,其特征在于,包括:
    第四获取模块,用于通过车辆上设置的摄像装置获取所述车辆所在路面的视频流;
    权利要求1-27中任一项所述的运动物体检测装置,用于对所述视频流包括的至少一视频帧进行运动物体检测,确定该视频帧中的运动物体;
    控制模块,用于根据所述运动物体生成并输出所述车辆的控制指令。
  58. 根据权利要求57所述的装置,其特征在于,所述控制指令包括以下至少之一:速度保持控制指令、速度调整控制指令、方向保持控制指令、方向调整控制指令、预警提示控制指令、驾驶模式切换控制指令。
  59. 一种电子设备,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述存储器中存储的计算机程序,且所述计算机程序被执行时,实现上述权利要求1-29中任一项所述的方法。
  60. 一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时,实现上述权利要求1-29中任一项所述的方法。
  61. 一种计算机程序,包括计算机指令,当所述计算机指令在设备的处理器中运行时,实现上述权利要求1-29中任一项所述的方法。
PCT/CN2019/114611 2019-05-29 2019-10-31 运动物体检测及智能驾驶控制方法、装置、介质及设备 WO2020238008A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR1020217001946A KR20210022703A (ko) 2019-05-29 2019-10-31 운동 물체 검출 및 지능형 운전 제어 방법, 장치, 매체 및 기기
JP2020567917A JP7091485B2 (ja) 2019-05-29 2019-10-31 運動物体検出およびスマート運転制御方法、装置、媒体、並びに機器
SG11202013225PA SG11202013225PA (en) 2019-05-29 2019-10-31 Methods, devices, media, and apparatuses of detecting moving object, and of intelligent driving control
US17/139,492 US20210122367A1 (en) 2019-05-29 2020-12-31 Methods, devices, media, and apparatuses of detecting moving object, and of intelligent driving control

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910459420.9 2019-05-29
CN201910459420.9A CN112015170A (zh) 2019-05-29 2019-05-29 运动物体检测及智能驾驶控制方法、装置、介质及设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/139,492 Continuation US20210122367A1 (en) 2019-05-29 2020-12-31 Methods, devices, media, and apparatuses of detecting moving object, and of intelligent driving control

Publications (1)

Publication Number Publication Date
WO2020238008A1 true WO2020238008A1 (zh) 2020-12-03

Family

ID=73501844

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/114611 WO2020238008A1 (zh) 2019-05-29 2019-10-31 运动物体检测及智能驾驶控制方法、装置、介质及设备

Country Status (6)

Country Link
US (1) US20210122367A1 (zh)
JP (1) JP7091485B2 (zh)
KR (1) KR20210022703A (zh)
CN (1) CN112015170A (zh)
SG (1) SG11202013225PA (zh)
WO (1) WO2020238008A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114037740A (zh) * 2021-11-09 2022-02-11 北京字节跳动网络技术有限公司 图像数据流的处理方法、装置及电子设备

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113727141B (zh) * 2020-05-20 2023-05-12 富士通株式会社 视频帧的插值装置以及方法
CN112784738B (zh) * 2021-01-21 2023-09-19 上海云从汇临人工智能科技有限公司 运动目标检测告警方法、装置以及计算机可读存储介质
CN113096151B (zh) * 2021-04-07 2022-08-09 地平线征程(杭州)人工智能科技有限公司 对目标的运动信息进行检测的方法和装置、设备和介质
CN113553986B (zh) * 2021-08-02 2022-02-08 浙江索思科技有限公司 一种船舶上运动目标检测方法及系统
CN113781539A (zh) * 2021-09-06 2021-12-10 京东鲲鹏(江苏)科技有限公司 深度信息获取方法、装置、电子设备和计算机可读介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8866821B2 (en) * 2009-01-30 2014-10-21 Microsoft Corporation Depth map movement tracking via optical flow and velocity prediction
CN104318561A (zh) * 2014-10-22 2015-01-28 上海理工大学 基于双目立体视觉与光流融合的车辆运动信息检测方法
JP2016081108A (ja) * 2014-10-10 2016-05-16 トヨタ自動車株式会社 物体検出装置
CN107341815A (zh) * 2017-06-01 2017-11-10 哈尔滨工程大学 基于多目立体视觉场景流的剧烈运动检测方法
CN109272493A (zh) * 2018-08-28 2019-01-25 中国人民解放军火箭军工程大学 一种基于递归卷积神经网络的单目视觉里程计方法

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5627905A (en) * 1994-12-12 1997-05-06 Lockheed Martin Tactical Defense Systems Optical flow detection system
JP2010204805A (ja) 2009-03-02 2010-09-16 Konica Minolta Holdings Inc 周辺監視装置および該方法
JP2011209070A (ja) 2010-03-29 2011-10-20 Daihatsu Motor Co Ltd 画像処理装置
CN102867311B (zh) * 2011-07-07 2015-11-25 株式会社理光 目标跟踪方法和目标跟踪设备
CN104902246B (zh) * 2015-06-17 2020-07-28 浙江大华技术股份有限公司 视频监视方法和装置
CN105100771A (zh) * 2015-07-14 2015-11-25 山东大学 一种基于场景分类和几何标注的单视点视频深度获取方法
CN107330924A (zh) * 2017-07-07 2017-11-07 郑州仁峰软件开发有限公司 一种基于单目摄像头识别运动物体的方法
CN107808388B (zh) * 2017-10-19 2021-10-12 中科创达软件股份有限公司 包含运动目标的图像处理方法、装置及电子设备
CN109727273B (zh) * 2018-12-29 2020-12-04 北京茵沃汽车科技有限公司 一种基于车载鱼眼相机的移动目标检测方法
CN109727275B (zh) * 2018-12-29 2022-04-12 北京沃东天骏信息技术有限公司 目标检测方法、装置、系统和计算机可读存储介质
CN111247557A (zh) * 2019-04-23 2020-06-05 深圳市大疆创新科技有限公司 用于移动目标物体检测的方法、系统以及可移动平台

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8866821B2 (en) * 2009-01-30 2014-10-21 Microsoft Corporation Depth map movement tracking via optical flow and velocity prediction
JP2016081108A (ja) * 2014-10-10 2016-05-16 トヨタ自動車株式会社 物体検出装置
CN104318561A (zh) * 2014-10-22 2015-01-28 上海理工大学 基于双目立体视觉与光流融合的车辆运动信息检测方法
CN107341815A (zh) * 2017-06-01 2017-11-10 哈尔滨工程大学 基于多目立体视觉场景流的剧烈运动检测方法
CN109272493A (zh) * 2018-08-28 2019-01-25 中国人民解放军火箭军工程大学 一种基于递归卷积神经网络的单目视觉里程计方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114037740A (zh) * 2021-11-09 2022-02-11 北京字节跳动网络技术有限公司 图像数据流的处理方法、装置及电子设备

Also Published As

Publication number Publication date
US20210122367A1 (en) 2021-04-29
JP2021528732A (ja) 2021-10-21
KR20210022703A (ko) 2021-03-03
SG11202013225PA (en) 2021-01-28
CN112015170A (zh) 2020-12-01
JP7091485B2 (ja) 2022-06-27

Similar Documents

Publication Publication Date Title
WO2020238008A1 (zh) 运动物体检测及智能驾驶控制方法、装置、介质及设备
KR102319177B1 (ko) 이미지 내의 객체 자세를 결정하는 방법 및 장치, 장비, 및 저장 매체
US11100310B2 (en) Object three-dimensional detection method and apparatus, intelligent driving control method and apparatus, medium and device
WO2020108311A1 (zh) 目标对象3d检测方法、装置、介质及设备
WO2019179464A1 (zh) 用于预测目标对象运动朝向的方法、车辆控制方法及装置
US11064178B2 (en) Deep virtual stereo odometry
US11049270B2 (en) Method and apparatus for calculating depth map based on reliability
WO2020258703A1 (zh) 障碍物检测方法、智能驾驶控制方法、装置、介质及设备
WO2022156626A1 (zh) 一种图像的视线矫正方法、装置、电子设备、计算机可读存储介质及计算机程序产品
EP4165547A1 (en) Image augmentation for analytics
US20220138977A1 (en) Two-stage depth estimation machine learning algorithm and spherical warping layer for equi-rectangular projection stereo matching
CN110060230B (zh) 三维场景分析方法、装置、介质及设备
US20210334569A1 (en) Image depth determining method and living body identification method, circuit, device, and medium
US11823415B2 (en) 3D pose estimation in robotics
US11961266B2 (en) Multiview neural human prediction using implicit differentiable renderer for facial expression, body pose shape and clothes performance capture
US20210078597A1 (en) Method and apparatus for determining an orientation of a target object, method and apparatus for controlling intelligent driving control, and device
CN113129352A (zh) 一种稀疏光场重建方法及装置
WO2023082822A1 (zh) 图像数据的处理方法和装置
CN113592706B (zh) 调整单应性矩阵参数的方法和装置
CN107274477B (zh) 一种基于三维空间表层的背景建模方法
US20220138978A1 (en) Two-stage depth estimation machine learning algorithm and spherical warping layer for equi-rectangular projection stereo matching
US11417063B2 (en) Determining a three-dimensional representation of a scene
KR20240012426A (ko) 비제약 이미지 안정화
CN111260544B (zh) 数据处理方法及装置、电子设备和计算机存储介质
Kong et al. Self-supervised indoor 360-degree depth estimation via structural regularization

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2020567917

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19931428

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20217001946

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19931428

Country of ref document: EP

Kind code of ref document: A1