WO2021218123A1 - 用于检测车辆位姿的方法及装置 - Google Patents

用于检测车辆位姿的方法及装置 Download PDF

Info

Publication number
WO2021218123A1
WO2021218123A1 PCT/CN2020/130107 CN2020130107W WO2021218123A1 WO 2021218123 A1 WO2021218123 A1 WO 2021218123A1 CN 2020130107 W CN2020130107 W CN 2020130107W WO 2021218123 A1 WO2021218123 A1 WO 2021218123A1
Authority
WO
WIPO (PCT)
Prior art keywords
vehicle
image
viewpoint image
coordinates
feature vector
Prior art date
Application number
PCT/CN2020/130107
Other languages
English (en)
French (fr)
Inventor
张伟
叶晓青
谭啸
孙昊
文石磊
章宏武
丁二锐
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Priority to EP20934056.1A priority Critical patent/EP4050562A4/en
Priority to JP2022540700A priority patent/JP2023510198A/ja
Publication of WO2021218123A1 publication Critical patent/WO2021218123A1/zh
Priority to US17/743,402 priority patent/US20220270289A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/128Adjusting depth or disparity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20228Disparity calculation for image-based rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N2013/0074Stereoscopic image analysis
    • H04N2013/0081Depth or disparity estimation from stereoscopic image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N2013/0074Stereoscopic image analysis
    • H04N2013/0092Image segmentation from stereoscopic image signals

Definitions

  • the present application discloses a method and device for detecting the pose of a vehicle, and relates to the field of computer technology, and in particular to the field of automatic driving.
  • Three-dimensional vehicle tracking is an indispensable and important technology in application scenarios such as autonomous driving and robotics.
  • the inherent difficulty is how to obtain accurate depth information to achieve accurate detection and positioning of each vehicle.
  • Three-dimensional pose detection technology can be divided into three categories according to the way of acquiring depth information: three-dimensional pose detection technology based on monocular vision, three-dimensional pose detection technology based on binocular vision, and three-dimensional pose detection technology based on lidar.
  • Stereo-RCNN This method can realize the matching of two-dimensional detection and detection frame on the left and right images at the same time, and then extract based on the left and right detection frames. Return to the two-dimensional key points and three-dimensional length, width and height information, and finally use the key points to establish a three-dimensional-two-dimensional projection equation to solve the three-dimensional pose of the vehicle.
  • the other is Pseudo-LiDAR.
  • This method first estimates the pixel-level disparity of the entire image, then obtains a sparse pseudo point cloud, and applies the point cloud 3D detection model trained on the real point cloud data of Lidar to the pseudo point On the cloud, to predict the three-dimensional pose of the vehicle.
  • the embodiments of the present application provide a method, device, equipment, and storage medium for detecting the pose of a vehicle.
  • an embodiment of the present application provides a method for detecting the pose of a vehicle.
  • the method includes: inputting a left-viewpoint image and a right-viewpoint image of the vehicle into a part prediction and masking constructed based on a priori data of the vehicle part.
  • the membrane segmentation network model determines the foreground pixels in the reference image and the position coordinates of each foreground pixel. The position coordinates are used to characterize the position of the foreground pixel in the coordinate system of the vehicle to be detected.
  • the reference image is the left view point of the vehicle Image or vehicle right viewpoint image; based on the disparity map of the left viewpoint image of the vehicle and the right viewpoint image of the vehicle, the position coordinates of the foreground pixels and the camera internal parameters of the reference image, the coordinates of the foreground pixels in the reference image are converted into the foreground pixels.
  • the coordinates in the camera coordinate system are used to obtain the pseudo point cloud; the pseudo point cloud is input into the pre-trained pose prediction model to obtain the pose information of the vehicle to be detected.
  • an embodiment of the present application provides a device for detecting the pose of a vehicle.
  • the device includes: an image segmentation module configured to input a left-viewpoint image and a right-viewpoint image of the vehicle into a priori based on the position of the vehicle.
  • the part prediction and mask segmentation network model constructed by the data determines the foreground pixels in the reference image and the position coordinates of each foreground pixel.
  • the position coordinates are used to represent the position of the foreground pixel in the coordinate system of the vehicle to be detected
  • the reference image is the vehicle left view image or the vehicle right view image
  • the point cloud generation module is configured to be based on the parallax map of the vehicle left view image and the vehicle right view image, the position coordinates of the foreground pixels and the camera internal parameters of the reference image,
  • the coordinates of the foreground pixels in the reference image are converted into the coordinates of the foreground pixels in the camera coordinate system to obtain a pseudo point cloud
  • the pose prediction module is configured to input the pseudo point cloud into the pre-trained pose prediction model to obtain The pose information of the vehicle to be detected.
  • the problem that the occlusion phenomenon reduces the accuracy of the vehicle's three-dimensional pose prediction is solved, and the collected left viewpoint image and right viewpoint image of the vehicle are subjected to part prediction and mask segmentation based on the prior data of the vehicle part. More accurate segmentation results can be obtained, thus improving the accuracy of vehicle pose prediction.
  • Fig. 1 is an exemplary system architecture diagram in which the embodiments of the present application can be applied;
  • Fig. 2 is a schematic diagram according to the first embodiment of the present application.
  • Fig. 3 is a schematic diagram of a scene embodiment of a method for detecting a vehicle pose provided according to an embodiment of the present application
  • Fig. 4 is a schematic diagram according to a second embodiment of the present application.
  • FIG. 5 is a block diagram of an electronic device used to implement the method for detecting the pose of a vehicle according to an embodiment of the present application
  • Fig. 6 is a scene diagram of a computer storable medium that can implement the embodiments of the present application.
  • FIG. 1 shows an exemplary system architecture 100 of the method for detecting the pose of a vehicle or the apparatus for detecting the pose of a vehicle to which an embodiment of the present application can be applied.
  • the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105.
  • the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send data, etc., for example, to send the acquired left and right viewpoint images of the vehicle to be detected to the server 105 and the receiving server 105 The detected pose information of the vehicle to be detected.
  • the terminal devices 101, 102, 103 may be hardware or software.
  • the terminal devices 101, 102, 103 When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices that have the function of data interaction with the server, including but not limited to smart phones, tablet computers, and in-vehicle computers.
  • the terminal devices 101, 102, 103 When the terminal devices 101, 102, 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules for providing distributed services, or as a single software or software module. There is no specific limitation here.
  • the server 105 may be a server that provides data processing services, for example, a background data server that processes left-viewpoint images and right-viewpoint images of the vehicle to be detected uploaded by the terminal devices 101, 102, and 103.
  • a background data server that processes left-viewpoint images and right-viewpoint images of the vehicle to be detected uploaded by the terminal devices 101, 102, and 103.
  • the method for detecting the pose of the vehicle provided by the embodiment of the present application may be executed by the server 105, and accordingly, the device for detecting the pose of the vehicle may be set in the server 105.
  • the terminal device can send the scene image collected by the binocular camera or the left viewpoint image and right viewpoint image of the vehicle to be detected to the server 105 through the network, and the server 105 predicts the pose information of the vehicle therefrom.
  • the method for detecting the pose of the vehicle provided by the embodiments of the present application can also be executed by a terminal device, such as an on-board computer. Accordingly, the device for detecting the pose of the vehicle may be set in the terminal device.
  • the left-view image and right-view image of the vehicle to be detected are extracted from the scene images collected by the eye camera, and then the pose information of the vehicle to be detected is predicted from them, which is not limited in this application.
  • FIG. 2 shows a flowchart of the first embodiment of the method for detecting the pose of a vehicle disclosed in the present application, including the following steps:
  • Step S201 Input the left viewpoint image of the vehicle and the right viewpoint image of the vehicle into the part prediction and mask segmentation network model constructed based on the prior data of the part of the vehicle, and determine the foreground pixels in the reference image and the position coordinates of each foreground pixel.
  • the part coordinates are used to characterize the position of the foreground pixel in the coordinate system of the vehicle to be detected
  • the reference image is the left-view image of the vehicle or the right-view image of the vehicle.
  • the foreground pixels are used to characterize the pixels in the reference image that are located in the contour area of the vehicle to be detected, that is, the points located on the surface of the vehicle to be detected in the actual scene.
  • the left view image of the vehicle and the right view image of the vehicle are two frames of images of the vehicle to be detected extracted from the scene image collected by the binocular camera, and the pose information predicted by the executing subject is in the reference image The pose of the vehicle to be detected.
  • the execution subject can input the scene left viewpoint image and scene right viewpoint image of the same scene collected by the binocular camera into the pre-built Stereo-RPN model, which can simultaneously realize the two-dimensional scene left viewpoint image and the scene right viewpoint image
  • the detection and detection frame match, and the two images of the same vehicle instance segmented from the two scene images are the vehicle's left viewpoint image and the vehicle's right viewpoint image of the vehicle.
  • the executive body can also directly obtain the vehicle left view image and the vehicle right view image through the pre-trained vehicle left view image and vehicle right view image extraction network. After that, you can select the left view image of the vehicle or the right view image of the vehicle as the reference image according to actual needs. For example, select an image with a smaller occluded area of the vehicle to be detected to obtain higher accuracy, or randomly select one of the images As a reference image.
  • the part prediction and mask segmentation network model when constructing the part prediction and mask segmentation network model, a priori data of the vehicle part is introduced to improve the accuracy of segmenting foreground pixels from the reference image.
  • the part prediction and mask segmentation network model includes a part prediction sub-network and a mask segmentation sub-network.
  • the part prediction sub-network is used to determine the position coordinates of each foreground pixel, and the mask segmentation sub-network is used to determine from the reference image Out the foreground pixels.
  • the execution subject can construct a mask based on the contour of the vehicle, and use the input pixels in the left view image and the right view image of the vehicle that are located in the mask area as foreground pixels, and compare the left view image and the right view image of the vehicle.
  • the foreground and background scenes are segmented to obtain the sets of foreground pixels in the left-view image of the vehicle and the right-view image of the vehicle respectively. It can be understood that by arranging the foreground pixels according to their pixel coordinates in the vehicle left view image or the vehicle right view image, the image contour of the vehicle to be detected in the corresponding image can be obtained. Because there is a large occluded area in the reference image, the front and back scene segmentation boundary of the reference image may be inaccurate.
  • the accuracy of the front and back scene segmentation of the reference image will be lower than that of another frame of image.
  • the foreground pixels extracted from another frame of image can be compared with the foreground pixels extracted from the reference image, so as to improve the accuracy of segmenting the foreground pixels from the reference image.
  • the part prediction network establishes the vehicle coordinate system of the foreground pixels extracted from the reference image according to the image composed of their pixel coordinates, and the coordinates of the foreground pixels in the vehicle coordinate system are obtained, namely Is the position coordinates of the foreground pixel, which is used to characterize the position of the foreground pixel in the vehicle to be detected.
  • only the reference image may be input into the part prediction and mask segmentation network model constructed based on the prior data of vehicle parts, so as to obtain the foreground pixels and each foreground in the reference image.
  • the coordinates of the pixel point may be input into the part prediction and mask segmentation network model constructed based on the prior data of vehicle parts, so as to obtain the foreground pixels and each foreground in the reference image. The coordinates of the pixel point.
  • Step S202 Based on the disparity map of the left view image of the vehicle and the right view image of the vehicle, the position coordinates of the foreground pixels and the camera internal parameters of the reference image, the coordinates of the foreground pixels in the reference image are converted into the foreground pixels in the camera coordinate system. To get the pseudo point cloud.
  • the feature information of each foreground pixel in the pseudo point cloud includes not only the location feature of the foreground pixel in the reference image, but also the location feature of the pixel in the vehicle to be detected.
  • the executor can generate a pseudo point cloud through the following steps: First, based on the disparity map of the left view image of the vehicle and the right view image of the vehicle, calculate the depth value of each foreground pixel in the reference image, and then combine it with the reference image.
  • the internal parameters of the camera convert the two-dimensional coordinates of the foreground pixels in the reference image to the three-dimensional coordinates in the camera coordinate system to obtain a point cloud composed of foreground pixels.
  • the point cloud only includes the points of the foreground pixels.
  • Cloud coordinates and then aggregate the position coordinates of the foreground pixels into the point cloud to obtain a pseudo point cloud composed of foreground pixels.
  • the feature dimension of the pseudo point cloud data is N*6, where N*3 dimensions are the pseudo point cloud coordinates of the foreground pixels, and N* The three dimensions are the coordinates of the foreground pixels.
  • determining the depth value of the pixel point according to the disparity map and converting the two-dimensional coordinate of the pixel point into the three-dimensional coordinate in combination with the internal parameters of the camera belong to a mature technology in the field of computer vision, which is not limited in this application.
  • the execution subject can also determine the pseudo point cloud by the following steps: determine the foreground pixels based on the camera internal parameters of the reference image and the disparity map of the vehicle left view point image and the vehicle right view point image The depth value of the foreground pixels; based on the coordinates and depth values of the foreground pixels in the reference image, the initial coordinates of the foreground pixels in the camera coordinate system are obtained; based on the position coordinates of the foreground pixels, the initial coordinates are updated to obtain the pseudo points of the foreground pixels Cloud coordinates.
  • the executive body does not simply aggregate the position coordinates of the foreground pixels into the point cloud data, but takes the position coordinates of the foreground pixels as constraints, and corrects the initial coordinates of the foreground pixels, based on The corrected coordinates construct a pseudo point cloud, thereby obtaining point cloud data with higher accuracy.
  • Step S203 Input the pseudo point cloud into the pre-trained pose prediction model to obtain the pose information of the vehicle to be detected.
  • the execution subject can input the pseudo point cloud obtained in step S202 into the pre-trained Dense Fusion model, and the Point net network in the Dense Fusion model generates the corresponding geometry based on the pseudo point cloud coordinates and position coordinates of the foreground pixels.
  • the feature vector and location feature vector, and then the geometric feature vector and location feature vector are input to the pixel-level fusion network, and the fusion network predicts the camera external parameters of the reference image based on the geometric feature vector and location feature vector (the camera's rotation matrix and translation matrix ), and then based on the camera's external parameters, determine the coordinates of each foreground pixel in the world coordinate system, and then the pose information of the vehicle to be detected can be obtained.
  • the method for detecting the pose of a vehicle in the above-mentioned embodiments disclosed in the present application performs part prediction and mask segmentation on the collected left-viewpoint image and right-viewpoint image of the vehicle based on the priori data of the vehicle part, so as to obtain more accurate segmentation. As a result, the accuracy of vehicle pose prediction is therefore improved.
  • Fig. 3 shows an application scenario for detecting the pose of a vehicle provided by the present application.
  • the execution subject 301 may be an on-board computer in an unmanned vehicle, and the unmanned vehicle is equipped with a binocular camera.
  • the on-board computer extracts the left viewpoint image and the right viewpoint image of each vehicle to be detected in the scene from the scene images collected by the binocular camera in real time, and then determines from the left viewpoint image and the right viewpoint image of each vehicle to be detected
  • the reference image and the disparity map are determined, the foreground pixels and the position coordinates of each foreground pixel are determined from the reference image, and the pseudo point cloud is generated based on the obtained foreground pixels, and finally the position of each vehicle to be detected in the scene is predicted. Attitude information, thereby providing support for the path planning of the driverless car.
  • FIG. 4 shows a flowchart of a second embodiment of the method for detecting the pose of a vehicle disclosed in the present application, including the following steps:
  • Step S401 from the scene left viewpoint image and the scene right viewpoint image of the same scene collected by the binocular camera, the original left viewpoint image and the original right viewpoint image of the vehicle to be detected are respectively extracted.
  • the execution subject may input the scene left viewpoint image and the scene right viewpoint image into the Stereo-RPN network model, and extract the original left viewpoint image and the original right viewpoint image of the vehicle to be detected from it.
  • Step S402 The original left viewpoint image and the original right viewpoint image are respectively scaled to a preset size to obtain the vehicle left viewpoint image and the vehicle right viewpoint image.
  • the execution subject scales the original left viewpoint image and the original right viewpoint image obtained in step S401 to a preset size, respectively, to obtain a vehicle left viewpoint image and a vehicle right viewpoint image with higher definition and consistent sizes. image.
  • Step S403 Based on the initial camera internal parameters of the scene left viewpoint image, the initial camera internal parameters of the scene right viewpoint image and the zoom factor, the camera internal parameters of the vehicle left viewpoint image and the vehicle right viewpoint image are determined respectively.
  • the left viewpoint image of the vehicle and the right viewpoint image of the vehicle are obtained after scaling, the left viewpoint image of the vehicle and the right viewpoint image of the vehicle in the camera corresponding to the left viewpoint image of the scene and the right viewpoint image of the scene correspond to The internal camera parameters are different.
  • the execution subject may determine the camera internal parameters of the vehicle left viewpoint image and the vehicle right viewpoint image through the following formula (1) and formula (2), respectively.
  • P 1 and P 2 represent the camera internal parameters corresponding to the scene left view image and the scene right view image
  • P 3 and P 4 respectively represent the camera internal parameters of the vehicle left view image and the vehicle right view image
  • k represents the vehicle left view image Relative to the zoom factor of the original left-viewpoint image in the horizontal direction
  • m represents the vertical zoom factor of the left-viewpoint image of the vehicle relative to the original right-viewpoint image
  • f u and f v represent the focal length of the camera
  • c u and c v represent the principal point offset
  • b x represents the baseline relative to the reference camera.
  • Step S404 Based on the internal camera parameters of the left viewpoint image of the vehicle and the internal camera parameters of the right viewpoint image of the vehicle, a disparity map of the left viewpoint image of the vehicle and the right viewpoint image of the vehicle is determined.
  • the execution subject may input the left viewpoint image of the vehicle and the right viewpoint image of the vehicle into the PSMnet model to obtain the corresponding disparity map.
  • the zoomed left view image of the vehicle and the right view image of the vehicle have higher resolutions. Therefore, it is different from the disparity map directly predicted from the original left view image and the original right view image. In comparison, the accuracy of the disparity map obtained in step S404 is higher.
  • Step S405 Input the vehicle left view image and the vehicle right view image into the part prediction and mask segmentation network model respectively to obtain the encoding feature vector of the vehicle left view image and the encoding feature vector of the vehicle right view image.
  • the part prediction and mask segmentation network model is a model that adopts an encoder-decoder framework. After inputting the vehicle left viewpoint image and the vehicle right viewpoint image into the part prediction and mask segmentation network model, The encoder respectively generates the encoding feature vector of the left view image of the vehicle and the encoding feature vector of the right view image of the vehicle.
  • Step S406 Fusion the coded feature vector of the left view image of the vehicle and the coded feature vector of the right view image of the vehicle to obtain a fused coded feature vector.
  • the characteristics of the left view image of the vehicle and the right view image of the vehicle are realized. Fusion.
  • Step S407 Decode the fused encoded feature vector to obtain the foreground pixel point and the position coordinates of each foreground pixel point in the reference image.
  • the reference image is the vehicle left view image or the vehicle right view image.
  • the encoded feature vector after fusion includes the features of the left view image of the vehicle and the right view image of the vehicle, it is possible to avoid the adverse effect of the occlusion region in the reference image on the segmentation accuracy.
  • Step S408 Based on the disparity map of the left view image of the vehicle and the right view image of the vehicle, the position coordinates of the foreground pixels and the camera internal parameters of the reference image, the coordinates of the foreground pixels in the reference image are converted into the foreground pixels in the camera coordinate system. To get the pseudo point cloud.
  • the influence of the scaling factor needs to be considered in the process of constructing the pseudo point cloud. For example, you can restore the left view point image and the right view point image of the vehicle to the original size according to the zoom factor, and then convert the two-dimensional coordinates of the foreground pixel in the reference image according to the camera internal parameters corresponding to the scene left view point image and the scene right view point image Make the three-dimensional coordinates in the coordinate system for the camera to get a pseudo point cloud.
  • the execution subject does not need to restore the left view image of the vehicle and the right view image of the vehicle to the original size, and can directly determine the coordinates of the foreground pixel in the camera coordinate system through the following steps, combined with the formula ( 1) and formula (2) are given as examples.
  • the reference image is the left view image of the vehicle
  • its coordinates in the reference image are (kx, my), which corresponds to the left view image of the vehicle
  • the parallax compensation of the right viewpoint image of the vehicle is The baseline distance between the camera internal reference P 3 and P 4 of the vehicle's left viewpoint image and the vehicle's right viewpoint image It can be obtained by the following formula (3).
  • any foreground pixel in the reference image its coordinates are (u, v), and the three-dimensional coordinates (x, y, z) of the foreground pixel in the camera coordinate system can be determined and calculated by the following formula (4):
  • du,v represents the disparity value of the foreground pixel, which can be obtained in step S404.
  • the execution subject can input the pseudo point cloud into the pre-built pose prediction model, and predict the pose information of the vehicle to be detected through the following steps S409 to S412.
  • the Dense Fusion model after deleting the CNN (Convolutional Neural Networks, convolutional neural network) module is used as the pose prediction model, and the color interpolation in the Dense Fusion model is used to predict the position.
  • CNN Convolutional Neural Networks, convolutional neural network
  • Step S409 Determine the global feature vector of the vehicle to be detected based on the pseudo point cloud coordinates and the position coordinates of the foreground pixels.
  • the executive body can input the pseudo point cloud obtained in step S408 into the pre-built pose prediction model, and the Point Net in the pose prediction model generates geometric feature vectors and parts based on the pseudo point cloud coordinates and part coordinates of the foreground pixels.
  • the feature vector is then merged by the MLP (Multilayer Perceptron, artificial neural network) module with the geometric feature vector and the part feature vector, and then the global feature vector is generated through the average pooling layer.
  • the global feature vector is used to characterize the overall characteristics of the vehicle to be detected.
  • Step S410 sampling a preset number of foreground pixels from the pseudo point cloud.
  • the foreground pixels in the pseudo point cloud are all distributed on the surface of the vehicle to be detected, a preset number of foreground pixels can be randomly sampled from the pseudo point cloud, which can be used without affecting the predicted pose information. Under the premise of high accuracy, reduce the amount of calculation.
  • Step S411 Predict the camera external parameters of the reference image based on the pseudo point cloud coordinates, the part coordinates and the global feature vector of the preset number of foreground pixels.
  • the executive agent inputs the sampled pseudo point cloud coordinates, part coordinates, and global feature vectors of the foreground pixels into the pose prediction and optimization sub-network of the pose prediction model at the same time, so that the feature vector of each foreground pixel includes the pseudo point cloud coordinates, part coordinates, and global feature vectors.
  • Step S412 Determine the pose information of the vehicle to be detected based on the external camera parameters of the reference image. Based on the camera external parameters of the reference image and the pseudo point cloud coordinates of the foreground pixels, the coordinates of the foreground pixels in the world coordinate system can be determined, that is, the pose information of the vehicle to be detected can be obtained.
  • it may further include: using the fused encoded feature vector as a stereo feature vector; based on the stereo feature vector and the global feature vector, a three-dimensional fitting score is obtained, and the three-dimensional fitting score is used To guide the training of pose prediction models.
  • the executive body can input the stereo feature vector and the global feature vector into the fully connected network, thereby obtaining the three-dimensional fitting score.
  • the three-dimensional fitting score can more accurately evaluate the pose information output by the pose prediction model, thus improving the prediction accuracy of the pose prediction model.
  • the second embodiment embodies that the vehicle left view point image and the vehicle right view point image of the same size are obtained by zooming and the vehicle left view point image and the vehicle left view point image are merged.
  • the characteristics of the right viewpoint image of the vehicle determine the foreground pixels in the reference image, avoiding the decrease in the accuracy of the vehicle's pose prediction caused by the long distance, and further improving the accuracy of the vehicle's pose prediction.
  • Fig. 5 shows a block diagram of an electronic device according to the method for detecting the pose of a vehicle disclosed in the present application.
  • the electronic device includes: an image segmentation module 501, configured to input the left view image of the vehicle and the right view image of the vehicle into a part prediction and mask segmentation network model constructed based on the prior data of the vehicle part to determine the foreground pixels in the reference image Point and the position coordinates of each foreground pixel.
  • the position coordinates are used to represent the position of the foreground pixel in the coordinate system of the vehicle to be detected.
  • the reference image is the vehicle left viewpoint image or the vehicle right viewpoint image;
  • the point cloud generation module 502 It is configured to convert the coordinates of the foreground pixels in the reference image into the foreground pixels in the camera coordinate system based on the disparity map of the left view image of the vehicle and the right view image of the vehicle, the position coordinates of the foreground pixels, and the camera internal parameters of the reference image
  • the pose prediction module 503 is configured to input the pseudo point cloud into a pre-trained pose prediction model to obtain the pose information of the vehicle to be detected.
  • the device further includes an image zoom module configured to determine the left viewpoint image of the vehicle and the right viewpoint image of the vehicle through the following steps: the scene left viewpoint image and the scene right viewpoint image of the same scene collected from the binocular camera , Extracting the original left viewpoint image and the original right viewpoint image of the vehicle to be detected respectively; respectively scaling the original left viewpoint image and the original right viewpoint image to a preset size to obtain the vehicle left viewpoint image and the vehicle right viewpoint image; and, the device It also includes a disparity map generation module, configured to determine the disparity map of the left viewpoint image of the vehicle and the right viewpoint image of the vehicle through the following steps: the initial camera internal parameters based on the scene left viewpoint image, the initial camera internal parameters and the zoom factor of the scene right viewpoint image, respectively Determine the camera internal parameters of the vehicle left view image and the vehicle right view image; based on the camera internal parameters of the vehicle left view image and the vehicle right view image, determine the disparity map of the vehicle left view image and the vehicle right view image.
  • the part prediction and mask segmentation network model is a model using an encoder-decoder framework; and, the image segmentation module 501 is further configured to: input the vehicle left viewpoint image and the vehicle right viewpoint image into the part respectively Prediction and mask segmentation network model to obtain the encoding feature vector of the left view image of the vehicle and the encoding feature vector of the right view image of the vehicle; fuse the encoding feature vector of the left view image of the vehicle and the encoding feature vector of the right view image of the vehicle to obtain the fusion After the encoded feature vector; decode the fused encoded feature vector to obtain the foreground pixel point and the position coordinates of each foreground pixel point in the reference image.
  • the pose prediction module 503 is further configured to: determine the global feature vector of the vehicle to be detected based on the pseudo point cloud coordinates and position coordinates of the foreground pixels; and sample a preset number from the pseudo point cloud Foreground pixels; based on the pseudo point cloud coordinates, part coordinates and global feature vectors of a preset number of foreground pixels, the camera external parameters of the reference image are predicted; based on the camera external parameters, the pose information of the vehicle to be detected is determined.
  • the device further includes a model training module configured to: use the fused encoded feature vector as a stereo feature vector; obtain a three-dimensional fitting score based on the stereo feature vector and the global feature vector, and use the three-dimensional fitting score to Guide the training of pose prediction models.
  • a model training module configured to: use the fused encoded feature vector as a stereo feature vector; obtain a three-dimensional fitting score based on the stereo feature vector and the global feature vector, and use the three-dimensional fitting score to Guide the training of pose prediction models.
  • the point cloud generation module 502 is further configured to: determine the depth value of the foreground pixel based on the camera internal parameters of the reference image and the disparity map of the vehicle left view image and the vehicle right view image; The coordinates and depth values in the reference image are used to obtain the initial coordinates of the foreground pixels in the camera coordinate system; based on the position coordinates of the foreground pixels, the initial coordinates are updated to obtain the pseudo point cloud coordinates of the foreground pixels.
  • the present application also provides an electronic device and a readable storage medium.
  • FIG. 6 it is a block diagram of an electronic device of a computer storage medium method according to an embodiment of the present application.
  • Electronic devices are intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices can also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are merely examples, and are not intended to limit the implementation of the application described and/or required herein.
  • the electronic device includes one or more processors 601, a memory 602, and interfaces for connecting various components, including a high-speed interface and a low-speed interface.
  • the various components are connected to each other using different buses, and can be installed on a common motherboard or installed in other ways as needed.
  • the processor may process instructions executed in the electronic device, including instructions stored in or on the memory to display graphical information of the GUI on an external input/output device (such as a display device coupled to an interface).
  • an external input/output device such as a display device coupled to an interface.
  • multiple processors and/or multiple buses can be used with multiple memories and multiple memories.
  • multiple electronic devices can be connected, and each device provides part of the necessary operations (for example, as a server array, a group of blade servers, or a multi-processor system).
  • a processor 601 is taken as an example.
  • the memory 602 is a non-transitory computer-readable storage medium provided by this application.
  • the memory stores instructions executable by at least one processor, so that the at least one processor executes the computer storable medium method provided in this application.
  • the non-transitory computer-readable storage medium of the present application stores computer instructions, and the computer instructions are used to make a computer execute the method of the computer-readable medium provided by the present application.
  • the memory 602 can be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as the program instructions/modules corresponding to the computer storage medium method in the embodiment of the present application (for example, the image segmentation module 501, the point cloud generation module 52, and the pose prediction module 503 shown in FIG. 5).
  • the processor 601 executes various functional applications and data processing of the server by running the non-transitory software programs, instructions, and modules stored in the memory 602, that is, implementing the computer-storable medium method in the foregoing method embodiment.
  • the memory 602 may include a storage program area and a storage data area.
  • the storage program area may store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of a computer storage medium electronic device. Wait.
  • the memory 602 may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices.
  • the memory 602 may optionally include a memory remotely provided with respect to the processor 601, and these remote memories may be connected to an electronic device of a computer storable medium through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the electronic device of the computer storage medium method may further include: an input device 603 and an output device 604.
  • the processor 601, the memory 602, the input device 603, and the output device 604 may be connected by a bus or in other ways. In FIG. 6, the connection by a bus is taken as an example.
  • the input device 603 can receive input digital or character information, and generate key signal input related to the user settings and function control of the electronic equipment of the computer storage medium, such as touch screen, keypad, mouse, track pad, touch pad, pointer stick , One or more mouse buttons, trackballs, joysticks and other input devices.
  • the output device 604 may include a display device, an auxiliary lighting device (for example, LED), a tactile feedback device (for example, a vibration motor), and the like.
  • the display device may include, but is not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display, and a plasma display. In some embodiments, the display device may be a touch screen.
  • Various implementations of the systems and techniques described herein can be implemented in digital electronic circuit systems, integrated circuit systems, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: being implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, the programmable processor It can be a dedicated or general-purpose programmable processor that can receive data and instructions from the storage system, at least one input device, and at least one output device, and transmit the data and instructions to the storage system, the at least one input device, and the at least one output device. An output device.
  • machine-readable medium and “computer-readable medium” refer to any computer program product, device, and/or device used to provide machine instructions and/or data to a programmable processor ( For example, magnetic disks, optical disks, memory, programmable logic devices (PLD)), including machine-readable media that receive machine instructions as machine-readable signals.
  • machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
  • the systems and techniques described here can be implemented on a computer that has: a display device for displaying information to the user (for example, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) ); and a keyboard and a pointing device (for example, a mouse or a trackball) through which the user can provide input to the computer.
  • a display device for displaying information to the user
  • LCD liquid crystal display
  • keyboard and a pointing device for example, a mouse or a trackball
  • Other types of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (for example, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, voice input, or tactile input) to receive input from the user.
  • the systems and technologies described herein can be implemented in a computing system that includes back-end components (for example, as a data server), or a computing system that includes middleware components (for example, an application server), or a computing system that includes front-end components (for example, A user computer with a graphical user interface or a web browser, through which the user can interact with the implementation of the system and technology described herein), or includes such back-end components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system can be connected to each other through any form or medium of digital data communication (for example, a communication network). Examples of communication networks include: local area network (LAN), wide area network (WAN), and the Internet.
  • the computer system can include clients and servers.
  • the client and server are generally far away from each other and usually interact through a communication network.
  • the relationship between the client and the server is generated by computer programs that run on the corresponding computers and have a client-server relationship with each other.
  • the collected left-view image and right-view image of the vehicle are subjected to part prediction and mask segmentation based on the priori data of the vehicle part, so that more accurate segmentation results can be obtained, and thus the vehicle pose prediction is improved. Accuracy.

Abstract

一种用于检测车辆位姿的方法和装置,涉及计算机视觉和自动驾驶领域。具体实现方案为:将车辆左视点图像和车辆右视点图像输入部位预测和掩膜分割网络模型,确定出基准图像中的前景像素点及其部位坐标;基于车辆左视点图像和车辆右视点图像的视差图、前景像素点的部位坐标和基准图像的相机内参,得到伪点云;将伪点云输入预先训练的位姿预测模型中,得到待检测的车辆的位姿信息。基于车辆部位先验数据对采集到的车辆左视点图像和右视点图像进行部位预测和掩膜分割,可以获得更准确地分割结果,因此提高了车辆位姿预测的准确度。

Description

用于检测车辆位姿的方法及装置
本申请要求于2020年04月28日提交的、申请号为202010347485.7、发明名称为“用于检测车辆位姿的方法及装置”的中国专利申请的优先权,该申请的全文以引用的方式并入本申请中。
技术领域
本申请公开了一种用于检测车辆位姿的方法及装置,涉及计算机技术领域,尤其涉及自动驾驶领域。
背景技术
三维车辆跟踪是自动驾驶、机器人等应用场景中不可或缺的重要技术,其中固有的难点是如何获取精确的深度信息,实现对每个车辆的精确检测和定位。三维位姿检测技术依据深度信息获取的方式,可以分为3类:基于单目视觉的三维位姿检测技术、基于双目视觉的三维位姿检测技术以及基于激光雷达的三维位姿检测技术。
相关技术中,基于双目视觉预测车辆三维位姿的方法分为两种,一种是Stereo-RCNN,该方法可以实现左右图同时完成二维检测与检测框的匹配,随后基于左右检测框提取的特征,回归二维关键点与三维长宽高信息,最后利用关键点建立三维-二维投影方程,求解得到车辆的三维位姿。另一种是Pseudo-LiDAR,该方法首先对全图做像素级视差估计,然后得到较为稀疏的伪点云,并将基于激光雷达真实点云数据训练得到的点云三维检测模型应用在伪点云上,以预测出车辆的三维位姿。
发明内容
本申请实施例提供了一种用于检测车辆位姿的方法、装置、设备以及存储介质。
根据第一方面,本申请实施例提供了一种用于检测车辆位姿的方法, 该方法包括:将车辆左视点图像和车辆右视点图像输入基于车辆的部位先验数据构建的部位预测和掩膜分割网络模型,确定出基准图像中的前景像素点以及每个前景像素点的部位坐标,部位坐标用于表征前景像素点在待检测的车辆的坐标系中的位置,基准图像为车辆左视点图像或车辆右视点图像;基于车辆左视点图像和车辆右视点图像的视差图、前景像素点的部位坐标和基准图像的相机内参,将前景像素点在基准图像中的坐标转化为前景像素点在相机坐标系中的坐标,得到伪点云;将伪点云输入预先训练的位姿预测模型中,得到待检测的车辆的位姿信息。
根据第二方面,本申请实施例提供了一种用于检测车辆位姿的装置,该装置包括:图像分割模块,被配置成将车辆左视点图像和车辆右视点图像输入基于车辆的部位先验数据构建的部位预测和掩膜分割网络模型,确定出基准图像中的前景像素点以及每个前景像素点的部位坐标,部位坐标用于表征前景像素点在待检测的车辆的坐标系中的位置,基准图像为车辆左视点图像或车辆右视点图像;点云生成模块,被配置成基于车辆左视点图像和车辆右视点图像的视差图、前景像素点的部位坐标和基准图像的相机内参,将前景像素点在基准图像中的坐标转化为前景像素点在相机坐标系中的坐标,得到伪点云;位姿预测模块,被配置成将伪点云输入预先训练的位姿预测模型中,得到待检测的车辆的位姿信息。
根据本申请的上述实施例,解决了遮挡现象降低车辆的三维位姿预测准确度的问题,基于车辆部位先验数据对采集到的车辆左视点图像和右视点图像进行部位预测和掩膜分割,可以获得更准确地分割结果,因此提高了车辆位姿预测的准确度。
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。
附图说明
附图用于更好地理解本方案,不构成对本申请的限定。其中:
图1是本申请的实施例可以应用于其中的示例性系统架构图;
图2是根据本申请第一实施例的示意图;
图3是根据本申请的实施例提供的用于检测车辆位姿的方法的场景实施例的示意图;
图4是根据本申请第二实施例的示意图;
图5是用来实现本申请实施例的用于检测车辆位姿的方法的电子设备的框图;
图6是可以实现本申请实施例的计算机可存储介质的场景图。
具体实施方式
以下结合附图对本申请的示范性实施例做出说明,其中包括本申请实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本申请的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。
图1示出了可以应用本申请的实施例的用于检测车辆位姿的方法或用于检测车辆位姿的装置的示例性系统架构100。
如图1所示,如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送数据等,例如将获取到的待检测车辆的左视点图像和右视点图像发送至服务器105,以及接收服务器105检测出的待检测车辆的位姿信息。
终端设备101、102、103可以是硬件,也可以是软件。当终端设备101、102、103为硬件时,可以是具有与服务器进行数据交互功能的各种电子设备,包括但不限于智能手机、平板电脑和车载电脑等等。当终端设备101、102、103为软件时,可以安装在上述所列举的电子设备中。其可以实现成例如用来提供分布式服务的多个软件或软件模块,也可以实现成单个软件或软件模块。在此不做具体限定。
服务器105可以是提供数据处理服务的服务器,例如对终端设备101、102、103上传的待检测车辆的左视点图像和右视点图像进行处理的后台数据服务器。
需要说明的是,本申请的实施例所提供的用于检测车辆位姿的方法可以由服务器105执行,相应地,用于检测车辆位姿的装置可以设置于服务器105中。此时,终端设备可以通过网络将双目相机采集到的场景图像或待检测车辆的左视点图像和右视点图像发送至服务器105,由服务器105从中预测出车辆的位姿信息。本申请的实施例所提供的用于检测车辆位姿的方法还可以由终端设备执行,例如车载电脑,相应地,用于检测车辆位姿的装置可以设置于终端设备中,车载电脑从车载双目相机采集到的场景图像中提取出待检测车辆的左视点图像和右视点图像,然后从中预测出待检测车辆的位姿信息,本申请对此不作限定。
继续参考图2,图2示出了根据本申请公开的用于检测车辆位姿的方法的第一个实施例的流程图,包括以下步骤:
步骤S201、将车辆左视点图像和车辆右视点图像输入基于车辆的部位先验数据构建的部位预测和掩膜分割网络模型,确定出基准图像中的前景像素点以及每个前景像素点的部位坐标,其中部位坐标用于表征前景像素点在待检测车辆的坐标系中的位置,基准图像为车辆左视点图像或车辆右视点图像。
在本实施例中,前景像素点用于表征基准图像中位于待检测车辆的轮廓区域内的像素点,即实际场景中位于待检测车辆表面的点。
在本实施例中,车辆左视点图像和车辆右视点图像是根据双目相机采集到的场景图像中提取出的待检测车辆的两帧图像,而执行主体所预测的位姿信息是基准图像中所呈现的待检测车辆的位姿。
作为示例,执行主体可以将双目相机采集到的同一场景的场景左视点图像和场景右视点图像输入预先构建的Stereo-RPN模型中,可以同时实现场景左视点图像和场景右视点图像的二维检测和检测框匹配,从两帧场景图像中分割出的同一个车辆实例的两帧图像,即为该车辆的车辆左视点图像和车辆右视点图像。执行主体还可以直接通过预先训练的车辆左视点图 像和车辆右视点图像提取网络获取车辆左视点图像和车辆右视点图像。之后,可以根据实际需求选择车辆左视点图像或车辆右视点图像作为基准图像,例如选择待检测车辆的被遮挡区域面积较小的图像以获得更高的准确度,还可以随机选择其中一帧图像作为基准图像。
在本实施例中,在构建部位预测和掩膜分割网络模型时,引入了车辆的部位先验数据,以此提高从基准图像中分割出前景像素点的准确度。部位预测和掩膜分割网络模型包括部位预测子网络和掩膜分割子网络,其中,部位预测子网络用于确定每个前景像素点的部位坐标,掩膜分割子网络用于从基准图像中确定出前景像素点。
作为示例,执行主体可以基于车辆的轮廓构建掩膜,将输入的车辆左视点图像和车辆右视点图像中位于掩膜区域内的像素点作为前景像素点,对车辆左视点图像和车辆右视点图像进行前后景分割,分别得到车辆左视点图像和车辆右视点图像中的前景像素点的集合。可以理解的是,将前景像素点按照其在车辆左视点图像或车辆右视点图像中的像素坐标排列,即可得到待检测车辆在对应的图像中的图像轮廓。由于基准图像中存在较大的被遮挡区域,可能导致基准图像的前后景分割边界不准确,因而基准图像的前后景分割的准确度相对于另外一帧图像的准确度会有所下降,此时,可以将另外一帧图像中提取出的前景像素点与基准图像中提取出的前景像素点进行对比,以此提高从基准图像中分割前景像素点的准确度。
之后由部位预测网络基于车辆三维部位的先验数据,对基准图像中提取出的前景像素点按照其像素坐标组成的图像建立车辆坐标系,得到的前景像素点在车辆坐标系中的坐标,即为前景像素点的部位坐标,用于表征该前景像素点在待检测车辆的部位特征。
在本实施的一些可选的实现方式中,可以只将基准图像输入基于车辆部位先验数据构建的部位预测和掩膜分割网络模型中,以此得到基准图像中的前景像素点和每个前景像素点的部位坐标。
步骤S202、基于车辆左视点图像和车辆右视点图像的视差图、前景像素点的部位坐标和基准图像的相机内参,将前景像素点在基准图像中的坐标转化为前景像素点在相机坐标系中的坐标,得到伪点云。
在本实施例中,伪点云中每个前景像素点的特征信息不仅包括该前景 像素点在基准图像中的位置特征,还包括了该像素点在待检测车辆中的部位特征。
作为示例,执行主体可以通过如下步骤生成伪点云:首先基于车辆左视点图像和车辆右视点图像的视差图,计算得到每个前景像素点在基准图像中的深度值,然后再结合基准图像对应的相机内参,将前景像素点在基准图像中的二维坐标转换为相机坐标系中的三维坐标,得到由前景像素点组成的点云,此时的点云中只包括了前景像素点的点云坐标,然后将前景像素点的部位坐标聚合到点云中,即可得到由前景像素点组成的伪点云。结合具体场景进行举例说明,假设基准图像中包括N个前景像素点,那么伪点云数据的特征维度为N*6,其中N*3个维度为前景像素点的伪点云坐标,另外N*3个维度为前景像素点的部位坐标。
可以理解的是,根据视差图确定像素点的深度值以及结合相机内参将像素点的二维坐标转换为三维坐标,属于计算机视觉领域的成熟技术,本申请对此不做限定。
在本实施例的一些可选的实现方式中,执行主体还可以通过如下步骤确定伪点云:基于基准图像的相机内参以及车辆左视点图像和车辆右视点图像的视差图,确定出前景像素点的深度值;基于前景像素点在基准图像中的坐标和深度值,得到前景像素点在相机坐标系中的初始坐标;基于前景像素点的部位坐标,更新初始坐标,得到前景像素点的伪点云坐标。
在本实现方式中,执行主体并不是简单地将前景像素点的部位坐标聚合到点云数据中,而是将前景像素点的部位坐标作为约束,对前景像素点的初始坐标进行修正,再基于修正后的坐标构建伪点云,从而得到准确度更高的点云数据。
步骤S203、将伪点云输入预先训练的位姿预测模型中,得到待检测的车辆的位姿信息。
作为示例,执行主体可以将步骤S202中得到的伪点云输入预先训练的Dense Fusion模型中,由Dense Fusion模型中的Point net网络基于前景像素点的伪点云坐标和部位坐标,生成对应的几何特征向量和部位特征向量,然后将几何特征向量和部位特征向量输入像素级的fusion网络,由fusion网络基于几何特征向量和部位特征向量预测出基准图像的相机外参 (相机的旋转矩阵和平移矩阵),然后基于相机外参,确定出每个前景像素点在世界坐标系下的坐标,即可得到待检测车辆的位姿信息。
需要说明的是基于相机外参,将图像中像素点在相机中的三维坐标转化为世界坐标,属于计算机视觉领域中成熟的技术手段,此处不再赘述。
本申请公开的上述实施例中的用于检测车辆位姿的方法,基于车辆部位先验数据对采集到的车辆左视点图像和右视点图像进行部位预测和掩膜分割,可以获得更准确地分割结果,因此提高了车辆位姿预测的准确度。
继续参考图3,图3示出了本申请提供的用于检测车辆位姿的一个应用场景。在图3的应用场景中,执行主体301可以是无人驾驶汽车中的车载电脑,同时无人驾驶汽车上设置有双目相机。车载电脑从双目相机实时采集的场景图像中提取出场景中各个待检测车辆的车辆左视点图像和车辆右视点图像,然后从每个待检测车辆的车辆左视点图像和车辆右视点图像中确定出基准图像和视差图,并从基准图像中确定出前景像素点以及每个前景像素点的部位坐标,再基于得到的前景像素点生成伪点云,最后预测出场景中各个待检测车辆的位姿信息,从而为无人驾驶汽车的路径规划提供支持。
继续参考图4,图4示出了根据本申请公开的用于检测车辆位姿的方法第二实施例的流程图,包括以下步骤:
步骤S401、从双目相机采集到的同一场景的场景左视点图像和场景右视点图像中,分别提取出待检测的车辆的原始左视点图像和原始右视点图像。
作为示例,执行主体可以将场景左视点图像和场景右视点图像输入Stereo-RPN网络模型,从中提取出待检测车辆的原始左视点图像和原始右视点图像。
步骤S402、将原始左视点图像和原始右视点图像分别缩放至预设尺寸,得到车辆左视点图像和车辆右视点图像。
通常,双目相机距离待检测车辆的采集距离越远,则步骤S401中获得的车辆左视点图像和车辆右视点图像的尺寸就越小,而且两者的尺寸也不相同,在此基础上预测得到的待检测车辆的位姿信息准确度相对较低。因而,在本实施例中,执行主体将步骤S401中获得的原始左视点图像和 原始右视点图像分别缩放至预设尺寸,以得到清晰度更高且尺寸一致的车辆左视点图像和车辆右视点图像。
步骤S403、基于场景左视点图像的初始相机内参、场景右视点图像的初始相机内参和缩放系数,分别确定出车辆左视点图像的相机内参和车辆右视点图像的相机内参。
在本实施例中,由于车辆左视点图像和车辆右视点图像是经过缩放之后得到的,因此车辆左视点图像和车辆右视点图像所对应的相机内参与场景左视点图像和场景右视点图像所对应的相机内参是不同的。
作为示例,执行主体可以通过如下公式(1)和公式(2)分别确定车辆左视点图像和车辆右视点图像的相机内参。
Figure PCTCN2020130107-appb-000001
Figure PCTCN2020130107-appb-000002
其中,P 1和P 2分别表征场景左视点图像和场景右视点图像所对应的相机内参,P 3和P 4分别表征车辆左视点图像和车辆右视点图像的相机内参,k表示车辆左视点图像相对于原始左视点图像在水平方向张的缩放系数,m表示车辆左视点图像相对于原始右视点图像在竖直方向上的缩放系数。f u和f v表示相机的聚焦长度,c u和c v表示主点偏移量,b x表示相对于参考相机的基线。
步骤S404、基于车辆左视点图像的相机内参和车辆右视点图像的相机内参,确定出车辆左视点图像和车辆右视点图像的视差图。
作为示例,执行主体可以将车辆左视点图像和所述车辆右视点图像输入PSMnet模型,得到对应的视差图。对于距离较远的待检测车辆,经过缩放后的车辆左视点图像和所述车辆右视点图像的分辨率更高,因此,与直接从原始左视点图像和原始右视点图像中预测出的视差图相比,步骤S404中得到的视差图的精度更高。
步骤S405、将车辆左视点图像和车辆右视点图像分别输入部位预测 和掩膜分割网络模型,得到车辆左视点图像的编码特征向量和车辆右视点图像的编码特征向量。
在本实施例中,部位预测和掩膜分割网络模型为采用编码器-解码器框架的模型,将车辆左视点图像和车辆右视点图像分别输入部位预测和掩膜分割网络模型后,由模型中的编码器分别生成车辆左视点图像的编码特征向量和车辆右视点图像的编码特征向量。
步骤S406、将车辆左视点图像的编码特征向量和车辆右视点图像的编码特征向量进行融合,得到融合后的编码特征向量。
在本实施例中,通过融合(例如相加、拼接、或线性转换)车辆左视点图像的编码特征向量和车辆右视点图像的编码特征向量,实现了车辆左视点图像和车辆右视点图像的特征融合。
步骤S407、对融合后的编码特征向量进行解码,得到基准图像中的前景像素点和每个前景像素点的部位坐标,基准图像为车辆左视点图像或车辆右视点图像。
在本实施例中,由于融合后的编码特征向量包含了车辆左视点图像和车辆右视点图像的特征,因而可以避免基准图像中的遮挡区域对分割精度的不利影响。
步骤S408、基于车辆左视点图像和车辆右视点图像的视差图、前景像素点的部位坐标和基准图像的相机内参,将前景像素点在基准图像中的坐标转化为前景像素点在相机坐标系中的坐标,得到伪点云。
在本实施例中,由于车辆左视点图像和车辆右视点图像是由原始图像经过缩放得到的,因此构建伪点云的过程中需要考虑到缩放系数的影响。例如,可以根据缩放系数将车辆左视点图像和车辆右视点图像回复成原始尺寸,然后根据场景左视点图像和场景右视点图像对应的相机内参,将前景像素点在基准图像中的二维坐标转换为相机做坐标系中的三维坐标,得到伪点云。
在本实施例的一些可选实现方式中,执行主体无需将车辆左视点图像和车辆右视点图像回复成原始尺寸,可以通过如下步骤直接确定前景像素点在相机坐标系中的坐标,结合公式(1)和公式(2)进行举例说明。
假设基准图像为车辆左视点图像,则对于原始左视点图像中坐标为(x, y)的点,其在基准图像中的坐标为(kx,my),对于该点对应于车辆左视点图像和车辆右视点图像的视差补偿为
Figure PCTCN2020130107-appb-000003
车辆左视点图像和车辆右视点图像的相机内参P 3和P 4之间的基线距离
Figure PCTCN2020130107-appb-000004
可以通过如下公式(3)获得。
对于基准图像中任意一个前景像素点,其坐标为(u,v),可以通过如下公式(4)确定计算得到该前景像素点在相机坐标系中的三维坐标(x,y,z):
Figure PCTCN2020130107-appb-000005
Figure PCTCN2020130107-appb-000006
其中,d u,v表示该前景像素点的视差值,可以由步骤S404中得到。
之后,执行主体可以将伪点云输入预先构建的位姿预测模型中,通过如下步骤S409至步骤S412预测出待检测车辆的位姿信息。
在本实施例中,将删除CNN(Convolutional Neural Networks,卷积神经网络)模块后的Dense Fusion模型作为位姿预测模型,并利用Dense Fusion模型中的颜色插值进行部位预测。
步骤S409、基于前景像素点的伪点云坐标和部位坐标,确定待检测的车辆的全局特征向量。
执行主体可以将步骤S408中得到的伪点云输入预先构建的位姿预测模型中,由位姿预测模型中的Point Net基于前景像素点的伪点云坐标和部位坐标分别生成几何特征向量和部位特征向量,然后由MLP(Multilayer Perceptron,人工神经网络)模块将几何特征向量和部位特征向量融合,再经过平均池化层生成全局特征向量,全局特征向量用于表征待检测车辆的总体特征。
步骤S410、从伪点云中采样出预设数量的前景像素点。
在本实施例中,由于伪点云中的前景像素点均分布于待检测车辆表面,因此可以从伪点云中随机采样出预设数量的前景像素点,可以在不影响到 预测位姿信息的准确度的前提下,降低运算量。
步骤S411、基于预设数量的前景像素点的伪点云坐标、部位坐标和全局特征向量,预测出基准图像的相机外参。
执行主体将采样出的前景像素点的伪点云坐标、部位坐标和全局特征向量同时输入位姿预测模型中的姿态预测与优化子网络中,使得每个前景像素点的特征向量中包括了伪点云坐标对应的几何特征向量、部位坐标对应的部位特征向量和全局特征向量,再基于各个前景像素点的特征向量预测出基准图像对应的相机外参(即旋转矩阵和平移矩阵),由此得到的相近外参的精度更高。
步骤S412、基于基准图像的相机外参,确定待检测车辆的位姿信息。基于基准图像的相机外参和前景像素点的伪点云坐标,可以确定出前景像素点在世界坐标系下的坐标,即得到了待检测车辆的位姿信息。
在上述实施例的一些可选的实现方式中,还可以进一步包括:将融合后的编码特征向量作为立体特征向量;基于立体特征向量和全局特征向量,得到三维拟合分数,三维拟合分数用于指导位姿预测模型的训练。例如执行主体可以将立体特征向量和全局特征向量输入全连接网络,由此得到三维拟合分数。通过三维拟合分数可以更准确地评估位姿预测模型输出的位姿信息,因而可以提高位姿预测模型的预测准确度。
从图4中可以看出,第二实施例与图2示出的第一实施例相比,体现了通过缩放获得尺寸一致的车辆左视点图像和车辆右视点图像以及通过融合车辆左视点图像和车辆右视点图像的特征确定基准图像中的前景像素点,避免了距离较远导致待检测车辆的位姿预测准确度下降,进一步提高了车辆位姿预测的准确度。
图5示出了根据本申请公开的用于检测车辆位姿的方法的电子设备的框图。该电子设备包括:图像分割模块501,被配置成将车辆左视点图像和车辆右视点图像输入基于车辆的部位先验数据构建的部位预测和掩膜分割网络模型,确定出基准图像中的前景像素点以及每个前景像素点的部位坐标,部位坐标用于表征前景像素点在待检测的车辆的坐标系中的位置,基准图像为车辆左视点图像或车辆右视点图像;点云生成模块502,被配置成基于车辆左视点图像和车辆右视点图像的视差图、前景像素点的部位 坐标和基准图像的相机内参,将前景像素点在基准图像中的坐标转化为前景像素点在相机坐标系中的坐标,得到伪点云;位姿预测模块503,被配置成将伪点云输入预先训练的位姿预测模型中,得到待检测的车辆的位姿信息。
在本实施例中,装置还包括图像缩放模块,被配置成经由如下步骤确定车辆左视点图像和车辆右视点图像:从双目相机采集到的同一场景的场景左视点图像和场景右视点图像中,分别提取出待检测的车辆的原始左视点图像和原始右视点图像;将原始左视点图像和原始右视点图像分别缩放至预设尺寸,得到车辆左视点图像和车辆右视点图像;以及,装置还包括视差图生成模块,被配置成经由如下步骤确定车辆左视点图像和车辆右视点图像的视差图:基于场景左视点图像的初始相机内参、场景右视点图像的初始相机内参和缩放系数,分别确定出车辆左视点图像的相机内参和车辆右视点图像的相机内参;基于车辆左视点图像的相机内参和车辆右视点图像的相机内参,确定出车辆左视点图像和车辆右视点图像的视差图。
在本实施例中,部位预测和掩膜分割网络模型为采用编码器-解码器的框架的模型;以及,图像分割模块501被进一步配置成:将车辆左视点图像和车辆右视点图像分别输入部位预测和掩膜分割网络模型,得到车辆左视点图像的编码特征向量和车辆右视点图像的编码特征向量;将车辆左视点图像的编码特征向量和车辆右视点图像的编码特征向量进行融合,得到融合后的编码特征向量;对融合后的编码特征向量进行解码,得到基准图像中的前景像素点和每个前景像素点的部位坐标。
在本实施例中,位姿预测模块503被进一步配置成:基于前景像素点的伪点云坐标和部位坐标,确定待检测的车辆的全局特征向量;从伪点云中采样出预设数量的前景像素点;基于预设数量的前景像素点的伪点云坐标、部位坐标和全局特征向量,预测出基准图像的相机外参;基于相机外参,确定待检测的车辆的位姿信息。
在本实施例中,装置还包括模型训练模块,被配置成:将融合后的编码特征向量作为立体特征向量;基于立体特征向量和全局特征向量,得到三维拟合分数,三维拟合分数用于指导位姿预测模型的训练。
在本实施例中,点云生成模块502被进一步配置成:基于基准图像的 相机内参以及车辆左视点图像和车辆右视点图像的视差图,确定出前景像素点的深度值;基于前景像素点在基准图像中的坐标和深度值,得到前景像素点在相机坐标系中的初始坐标;基于前景像素点的部位坐标,更新初始坐标,得到前景像素点的伪点云坐标。
根据本申请的实施例,本申请还提供了一种电子设备和一种可读存储介质。
如图6所示,是根据本申请实施例的计算机可存储介质的方法的电子设备的框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本申请的实现。
如图6所示,该电子设备包括:一个或多个处理器601、存储器602,以及用于连接各部件的接口,包括高速接口和低速接口。各个部件利用不同的总线互相连接,并且可以被安装在公共主板上或者根据需要以其它方式安装。处理器可以对在电子设备内执行的指令进行处理,包括存储在存储器中或者存储器上以在外部输入/输出装置(诸如,耦合至接口的显示设备)上显示GUI的图形信息的指令。在其它实施方式中,若需要,可以将多个处理器和/或多条总线与多个存储器和多个存储器一起使用。同样,可以连接多个电子设备,各个设备提供部分必要的操作(例如,作为服务器阵列、一组刀片式服务器、或者多处理器系统)。图6中以一个处理器601为例。
存储器602即为本申请所提供的非瞬时计算机可读存储介质。其中,所述存储器存储有可由至少一个处理器执行的指令,以使所述至少一个处理器执行本申请所提供的计算机可存储介质的方法。本申请的非瞬时计算机可读存储介质存储计算机指令,该计算机指令用于使计算机执行本申请所提供的计算机可存储介质的方法。
存储器602作为一种非瞬时计算机可读存储介质,可用于存储非瞬时软件程序、非瞬时计算机可执行程序以及模块,如本申请实施例中的计算 机可存储介质的方法对应的程序指令/模块(例如,附图5所示的图像分割模块501、点云生成模块52和位姿预测模块503)。处理器601通过运行存储在存储器602中的非瞬时软件程序、指令以及模块,从而执行服务器的各种功能应用以及数据处理,即实现上述方法实施例中的计算机可存储介质的方法。
存储器602可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据计算机可存储介质的电子设备的使用所创建的数据等。此外,存储器602可以包括高速随机存取存储器,还可以包括非瞬时存储器,例如至少一个磁盘存储器件、闪存器件、或其他非瞬时固态存储器件。在一些实施例中,存储器602可选包括相对于处理器601远程设置的存储器,这些远程存储器可以通过网络连接至计算机可存储介质的电子设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
计算机可存储介质的方法的电子设备还可以包括:输入装置603和输出装置604。处理器601、存储器602、输入装置603和输出装置604可以通过总线或者其他方式连接,图6中以通过总线连接为例。
输入装置603可接收输入的数字或字符信息,以及产生与计算机可存储介质的电子设备的用户设置以及功能控制有关的键信号输入,例如触摸屏、小键盘、鼠标、轨迹板、触摸板、指示杆、一个或者多个鼠标按钮、轨迹球、操纵杆等输入装置。输出装置604可以包括显示设备、辅助照明装置(例如,LED)和触觉反馈装置(例如,振动电机)等。该显示设备可以包括但不限于,液晶显示器(LCD)、发光二极管(LED)显示器和等离子体显示器。在一些实施方式中,显示设备可以是触摸屏。
此处描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、专用ASIC(专用集成电路)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输 入装置、和该至少一个输出装置。
这些计算程序(也称作程序、软件、软件应用、或者代码)包括可编程处理器的机器指令,并且可以利用高级过程和/或面向对象的编程语言、和/或汇编/机器语言来实施这些计算程序。如本文使用的,术语“机器可读介质”和“计算机可读介质”指的是用于将机器指令和/或数据提供给可编程处理器的任何计算机程序产品、设备、和/或装置(例如,磁盘、光盘、存储器、可编程逻辑装置(PLD)),包括,接收作为机器可读信号的机器指令的机器可读介质。术语“机器可读信号”指的是用于将机器指令和/或数据提供给可编程处理器的任何信号。
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。
根据本申请实施例的技术方案,基于车辆部位先验数据对采集到的车辆左视点图像和右视点图像进行部位预测和掩膜分割,可以获得更准确地 分割结果,因此提高了车辆位姿预测的准确度。
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发申请中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本申请公开的技术方案所期望的结果,本文在此不进行限制。
上述具体实施方式,并不构成对本申请保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本申请的精神和原则之内所作的修改、等同替换和改进等,均应包含在本申请保护范围之内。

Claims (14)

  1. 一种用于检测车辆位姿的方法,包括:
    将车辆左视点图像和车辆右视点图像输入基于车辆的部位先验数据构建的部位预测和掩膜分割网络模型,确定出所述基准图像中的前景像素点以及每个前景像素点的部位坐标,所述部位坐标用于表征所述前景像素点在所述待检测的车辆的坐标系中的位置,所述基准图像为所述车辆左视点图像或所述车辆右视点图像;
    基于所述车辆左视点图像和所述车辆右视点图像的视差图、所述前景像素点的部位坐标和所述基准图像的相机内参,将所述前景像素点在所述基准图像中的坐标转化为所述前景像素点在相机坐标系中的坐标,得到伪点云;以及
    将所述伪点云输入预先训练的位姿预测模型中,得到所述待检测的车辆的位姿信息。
  2. 根据权利要求1所述的方法,其中,所述车辆左视点图像和所述车辆右视点图像经由如下步骤确定:
    从双目相机采集到的同一场景的场景左视点图像和场景右视点图像中,分别提取出所述待检测的车辆的原始左视点图像和原始右视点图像;
    将原始左视点图像和原始右视点图像分别缩放至预设尺寸,得到车辆左视点图像和车辆右视点图像;
    以及,所述车辆左视点图像和所述车辆右视点图像的视差图基于以下步骤确定:
    基于所述场景左视点图像的初始相机内参、所述场景右视点图像的初始相机内参和缩放系数,分别确定出所述车辆左视点图像的相机内参和所述车辆右视点图像的相机内参;
    基于所述车辆左视点图像的相机内参和所述车辆右视点图像的相机内参,确定出所述车辆左视点图像和所述车辆右视点图像的视差图。
  3. 根据权利要求1-2任一项所述的方法,其中,所述部位预测和掩膜分割网络模型为采用编码器-解码器的框架的模型;以及,
    所述将所述车辆左视点图像和所述车辆右视点图像输入基于车辆的部位先验数据构建的部位预测和掩膜分割网络模型,确定出所述基准图像中的前景像素点以及每个前景像素点的部位坐标,包括:
    将所述车辆左视点图像和所述车辆右视点图像分别输入所述部位预测和掩膜分割网络模型,得到所述车辆左视点图像的编码特征向量和所述车辆右视点图像的编码特征向量;
    将所述车辆左视点图像的编码特征向量和所述车辆右视点图像的编码特征向量进行融合,得到融合后的编码特征向量;以及
    对所述融合后的编码特征向量进行解码,得到所述基准图像中的前景像素点和每个前景像素点的部位坐标。
  4. 根据权利要求3所述的方法,所述将所述伪点云输入预先训练的位姿预测模型中,得到所述待检测的车辆的位姿信息,包括:
    基于所述前景像素点的伪点云坐标和部位坐标,确定所述待检测的车辆的全局特征向量;
    从所述伪点云中采样出预设数量的前景像素点;
    基于所述预设数量的前景像素点的伪点云坐标、部位坐标和所述全局特征向量,预测出所述基准图像的相机外参;以及
    基于所述相机外参,确定所述待检测的车辆的位姿信息。
  5. 根据权利要求4所述的方法,所述方法还包括:
    将所述融合后的编码特征向量作为立体特征向量;以及
    基于所述立体特征向量和所述全局特征向量,得到三维拟合分数,所述三维拟合分数用于指导所述位姿预测模型的训练。
  6. 根据权利要求1至5任一项所述的方法,其中,所述基于所述车辆左视点图像和所述车辆右视点图像的视差图、所述前景像素点的部位坐标和所述基准图像的相机内参,将所述前景像素点在所述基准图像中的坐标转化为所述前景像素点在相机坐标系中的坐标,得到伪点云,包括:
    基于所述基准图像的相机内参以及所述车辆左视点图像和所述车辆右视点图像的视差图,确定出所述前景像素点的深度值;
    基于所述前景像素点在所述基准图像中的坐标和所述深度值,得到所述前景像素点在所述相机坐标系中的初始坐标;以及
    基于所述前景像素点的部位坐标,更新所述初始坐标,得到所述前景像素点的伪点云坐标。
  7. 一种用于检测车辆位姿的装置,包括:
    图像分割模块,被配置成将车辆左视点图像和车辆右视点图像输入基于车辆的部位先验数据构建的部位预测和掩膜分割网络模型,确定出所述基准图像中的前景像素点以及每个前景像素点的部位坐标,所述部位坐标用于表征所述前景像素点在所述待检测的车辆的坐标系中的位置,所述基准图像为所述车辆左视点图像或所述车辆右视点图像;
    点云生成模块,被配置成基于所述车辆左视点图像和所述车辆右视点图像的视差图、所述前景像素点的部位坐标和所述基准图像的相机内参,将所述前景像素点在所述基准图像中的坐标转化为所述前景像素点在相机坐标系中的坐标,得到伪点云;以及
    位姿预测模块,被配置成将所述伪点云输入预先训练的位姿预测模型中,得到所述待检测的车辆的位姿信息。
  8. 根据权利要求7所述的装置,其中,所述装置还包括图像缩放模块,被配置成经由如下步骤确定所述车辆左视点图像和所述车辆右视点图像:
    从双目相机采集到的同一场景的场景左视点图像和场景右视 点图像中,分别提取出所述待检测的车辆的原始左视点图像和原始右视点图像;
    将原始左视点图像和原始右视点图像分别缩放至预设尺寸,得到车辆左视点图像和车辆右视点图像;
    以及,所述装置还包括视差图生成模块,被配置成经由如下步骤确定所述车辆左视点图像和所述车辆右视点图像的视差图:
    基于所述场景左视点图像的初始相机内参、所述场景右视点图像的初始相机内参和缩放系数,分别确定出所述车辆左视点图像的相机内参和所述车辆右视点图像的相机内参;
    基于所述车辆左视点图像的相机内参和所述车辆右视点图像的相机内参,确定出所述车辆左视点图像和所述车辆右视点图像的视差图。
  9. 根据权利要求7-8任一项所述的装置,其中,所述部位预测和掩膜分割网络模型为采用编码器-解码器的框架的模型;以及,所述图像分割模块被进一步配置成:
    将所述车辆左视点图像和所述车辆右视点图像分别输入所述部位预测和掩膜分割网络模型,得到所述车辆左视点图像的编码特征向量和所述车辆右视点图像的编码特征向量;
    将所述车辆左视点图像的编码特征向量和所述车辆右视点图像的编码特征向量进行融合,得到融合后的编码特征向量;以及
    对所述融合后的编码特征向量进行解码,得到所述基准图像中的前景像素点和每个前景像素点的部位坐标。
  10. 根据权利要求9所述的装置,其中,所述位姿预测模块被进一步配置成:
    基于所述前景像素点的伪点云坐标和部位坐标,确定所述待检测的车辆的全局特征向量;
    从所述伪点云中采样出预设数量的前景像素点;
    基于所述预设数量的前景像素点的伪点云坐标、部位坐标和 所述全局特征向量,预测出所述基准图像的相机外参;以及
    基于所述相机外参,确定所述待检测的车辆的位姿信息。
  11. 根据权利要求10所述的装置,所述装置还包括模型训练模块,被配置成:
    将所述融合后的编码特征向量作为立体特征向量;以及
    基于所述立体特征向量和所述全局特征向量,得到三维拟合分数,所述三维拟合分数用于指导所述位姿预测模型的训练。
  12. 根据权利要求7至11任一项所述的装置,其中,所述点云生成模块被进一步配置成:
    基于所述基准图像的相机内参以及所述车辆左视点图像和所述车辆右视点图像的视差图,确定出所述前景像素点的深度值;
    基于所述前景像素点在所述基准图像中的坐标和所述深度值,得到所述前景像素点在所述相机坐标系中的初始坐标;以及
    基于所述前景像素点的部位坐标,更新所述初始坐标,得到所述前景像素点的伪点云坐标。
  13. 一种电子设备,其特征在于,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-6中任一项所述的方法。
  14. 一种存储有计算机指令的非瞬时计算机可读存储介质,其特征在于,所述计算机指令用于使所述计算机执行权利要求1-6中任一项所述的方法。
PCT/CN2020/130107 2020-04-28 2020-11-19 用于检测车辆位姿的方法及装置 WO2021218123A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP20934056.1A EP4050562A4 (en) 2020-04-28 2020-11-19 METHOD AND DEVICE FOR DETECTING THE POSITION OF A VEHICLE
JP2022540700A JP2023510198A (ja) 2020-04-28 2020-11-19 車両姿勢を検出するための方法及び装置
US17/743,402 US20220270289A1 (en) 2020-04-28 2022-05-12 Method and apparatus for detecting vehicle pose

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010347485.7A CN111539973B (zh) 2020-04-28 2020-04-28 用于检测车辆位姿的方法及装置
CN202010347485.7 2020-04-28

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/743,402 Continuation US20220270289A1 (en) 2020-04-28 2022-05-12 Method and apparatus for detecting vehicle pose

Publications (1)

Publication Number Publication Date
WO2021218123A1 true WO2021218123A1 (zh) 2021-11-04

Family

ID=71977314

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/130107 WO2021218123A1 (zh) 2020-04-28 2020-11-19 用于检测车辆位姿的方法及装置

Country Status (5)

Country Link
US (1) US20220270289A1 (zh)
EP (1) EP4050562A4 (zh)
JP (1) JP2023510198A (zh)
CN (1) CN111539973B (zh)
WO (1) WO2021218123A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114419564A (zh) * 2021-12-24 2022-04-29 北京百度网讯科技有限公司 车辆位姿检测方法、装置、设备、介质及自动驾驶车辆
CN116206068A (zh) * 2023-04-28 2023-06-02 北京科技大学 基于真实数据集的三维驾驶场景生成与构建方法及装置

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111539973B (zh) * 2020-04-28 2021-10-01 北京百度网讯科技有限公司 用于检测车辆位姿的方法及装置
CN112766206A (zh) * 2021-01-28 2021-05-07 深圳市捷顺科技实业股份有限公司 一种高位视频车辆检测方法、装置、电子设备和存储介质
CN116013091B (zh) * 2023-03-24 2023-07-07 山东康威大数据科技有限公司 基于车流量大数据的隧道监控系统与分析方法
CN116740498A (zh) * 2023-06-13 2023-09-12 北京百度网讯科技有限公司 模型预训练方法、模型训练方法、对象处理方法及装置
CN116993817B (zh) * 2023-09-26 2023-12-08 深圳魔视智能科技有限公司 目标车辆的位姿确定方法、装置、计算机设备及存储介质
CN117496477B (zh) * 2024-01-02 2024-05-03 广汽埃安新能源汽车股份有限公司 一种点云目标检测方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107505644A (zh) * 2017-07-28 2017-12-22 武汉理工大学 基于车载多传感器融合的三维高精度地图生成系统及方法
CN108534782A (zh) * 2018-04-16 2018-09-14 电子科技大学 一种基于双目视觉系统的地标地图车辆即时定位方法
CN108749819A (zh) * 2018-04-03 2018-11-06 吉林大学 基于双目视觉的轮胎垂向力估算系统及估算方法
US20200125869A1 (en) * 2018-10-17 2020-04-23 Automotive Research & Testing Center Vehicle detecting method, nighttime vehicle detecting method based on dynamic light intensity and system thereof
CN111539973A (zh) * 2020-04-28 2020-08-14 北京百度网讯科技有限公司 用于检测车辆位姿的方法及装置

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100922429B1 (ko) * 2007-11-13 2009-10-16 포항공과대학교 산학협력단 스테레오 영상을 이용한 사람 검출 방법
GB2492779B (en) * 2011-07-11 2016-03-16 Toshiba Res Europ Ltd An image processing method and system
JP6431404B2 (ja) * 2015-02-23 2018-11-28 株式会社デンソーアイティーラボラトリ 姿勢推定モデル生成装置及び姿勢推定装置
JP6551336B2 (ja) * 2016-08-12 2019-07-31 株式会社デンソー 周辺監査装置
CN106447661A (zh) * 2016-09-28 2017-02-22 深圳市优象计算技术有限公司 一种深度图快速生成方法
CN106908775B (zh) * 2017-03-08 2019-10-18 同济大学 一种基于激光反射强度的无人车实时定位方法
CN108381549B (zh) * 2018-01-26 2021-12-14 广东三三智能科技有限公司 一种双目视觉引导机器人快速抓取方法、装置及存储介质
GB201804400D0 (en) * 2018-03-20 2018-05-02 Univ Of Essex Enterprise Limited Localisation, mapping and network training
CN108765496A (zh) * 2018-05-24 2018-11-06 河海大学常州校区 一种多视点汽车环视辅助驾驶系统及方法
CN108961339B (zh) * 2018-07-20 2020-10-20 深圳辰视智能科技有限公司 一种基于深度学习的点云物体姿态估计方法、装置及其设备
CN109360240B (zh) * 2018-09-18 2022-04-22 华南理工大学 一种基于双目视觉的小型无人机定位方法
CN109278640A (zh) * 2018-10-12 2019-01-29 北京双髻鲨科技有限公司 一种盲区检测系统和方法
CN110082779A (zh) * 2019-03-19 2019-08-02 同济大学 一种基于3d激光雷达的车辆位姿定位方法及系统
CN110208783B (zh) * 2019-05-21 2021-05-14 同济人工智能研究院(苏州)有限公司 基于环境轮廓的智能车辆定位方法
CN110738200A (zh) * 2019-12-23 2020-01-31 广州赛特智能科技有限公司 车道线3d点云地图构建方法、电子设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107505644A (zh) * 2017-07-28 2017-12-22 武汉理工大学 基于车载多传感器融合的三维高精度地图生成系统及方法
CN108749819A (zh) * 2018-04-03 2018-11-06 吉林大学 基于双目视觉的轮胎垂向力估算系统及估算方法
CN108534782A (zh) * 2018-04-16 2018-09-14 电子科技大学 一种基于双目视觉系统的地标地图车辆即时定位方法
US20200125869A1 (en) * 2018-10-17 2020-04-23 Automotive Research & Testing Center Vehicle detecting method, nighttime vehicle detecting method based on dynamic light intensity and system thereof
CN111539973A (zh) * 2020-04-28 2020-08-14 北京百度网讯科技有限公司 用于检测车辆位姿的方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4050562A1

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114419564A (zh) * 2021-12-24 2022-04-29 北京百度网讯科技有限公司 车辆位姿检测方法、装置、设备、介质及自动驾驶车辆
CN114419564B (zh) * 2021-12-24 2023-09-01 北京百度网讯科技有限公司 车辆位姿检测方法、装置、设备、介质及自动驾驶车辆
CN116206068A (zh) * 2023-04-28 2023-06-02 北京科技大学 基于真实数据集的三维驾驶场景生成与构建方法及装置

Also Published As

Publication number Publication date
EP4050562A1 (en) 2022-08-31
CN111539973A (zh) 2020-08-14
EP4050562A4 (en) 2023-01-25
US20220270289A1 (en) 2022-08-25
JP2023510198A (ja) 2023-03-13
CN111539973B (zh) 2021-10-01

Similar Documents

Publication Publication Date Title
WO2021218123A1 (zh) 用于检测车辆位姿的方法及装置
JP6745328B2 (ja) 点群データを復旧するための方法及び装置
US11468585B2 (en) Pseudo RGB-D for self-improving monocular slam and depth prediction
US11615605B2 (en) Vehicle information detection method, electronic device and storage medium
JP7258066B2 (ja) 測位方法、測位装置及び電子機器
US11915439B2 (en) Method and apparatus of training depth estimation network, and method and apparatus of estimating depth of image
CN111739005B (zh) 图像检测方法、装置、电子设备及存储介质
JP7228623B2 (ja) 障害物検出方法、装置、設備、記憶媒体、及びプログラム
WO2022262160A1 (zh) 传感器标定方法及装置、电子设备和存储介质
CN111666876B (zh) 用于检测障碍物的方法、装置、电子设备和路侧设备
US20220083787A1 (en) Obstacle three-dimensional position acquisition method and apparatus for roadside computing device
JP7189270B2 (ja) 三次元物体検出方法、三次元物体検出装置、電子機器、記憶媒体及びコンピュータプログラム
CN110675635B (zh) 相机外参的获取方法、装置、电子设备及存储介质
EP3989117A1 (en) Vehicle information detection method and apparatus, method and apparatus for training detection model, electronic device, storage medium and program
KR20210040876A (ko) 차량용 카메라의 외부 파라미터 캘리브레이션 방법, 장치, 시스템 및 저장매체
CN111797745A (zh) 一种物体检测模型的训练及预测方法、装置、设备及介质
CN111753739A (zh) 物体检测方法、装置、设备以及存储介质
KR20210040305A (ko) 이미지 생성 방법 및 장치
KR20210037633A (ko) 장애물 속도를 결정하는 방법, 장치, 전자 기기, 저장 매체 및 프로그램
CN111260722B (zh) 车辆定位方法、设备及存储介质
CN115147809B (zh) 一种障碍物检测方法、装置、设备以及存储介质
CN115790621A (zh) 高精地图更新方法、装置及电子设备
US20220068024A1 (en) Determining a three-dimensional representation of a scene
CN112561995A (zh) 一种实时高效的6d姿态估计网络、构建方法及估计方法
CN116105720B (zh) 低照度场景机器人主动视觉slam方法、装置和设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20934056

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020934056

Country of ref document: EP

Effective date: 20220523

ENP Entry into the national phase

Ref document number: 2022540700

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE