WO2023082822A1 - 图像数据的处理方法和装置 - Google Patents

图像数据的处理方法和装置 Download PDF

Info

Publication number
WO2023082822A1
WO2023082822A1 PCT/CN2022/118735 CN2022118735W WO2023082822A1 WO 2023082822 A1 WO2023082822 A1 WO 2023082822A1 CN 2022118735 W CN2022118735 W CN 2022118735W WO 2023082822 A1 WO2023082822 A1 WO 2023082822A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
road surface
neural network
aspect ratio
feature
Prior art date
Application number
PCT/CN2022/118735
Other languages
English (en)
French (fr)
Inventor
陈腾
隋伟
谢佳锋
张骞
黄畅
Original Assignee
北京地平线信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京地平线信息技术有限公司 filed Critical 北京地平线信息技术有限公司
Priority to US18/549,231 priority Critical patent/US20240169712A1/en
Priority to JP2023553068A priority patent/JP2024508024A/ja
Priority to EP22891627.6A priority patent/EP4290456A1/en
Publication of WO2023082822A1 publication Critical patent/WO2023082822A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/003Reconstruction from projections, e.g. tomography
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/579Depth or shape recovery from multiple images from motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/588Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present disclosure relates to the technical field of image processing, in particular to a method and device for processing image data.
  • the plane parallax method models the 3D scene based on the difference between two viewing angles to observe the same target or scene. This method depends on a specific plane and can restore the height of a pixel in the scene to the plane and the distance to the observation point, that is, the pixel The pixel aspect ratio of the point.
  • a method for processing image data including:
  • the first image and the second image are processed by the first neural network to obtain a homography matrix, wherein the first image is taken at the first moment, the second image is taken at the second moment, and the first image is taken at the second moment.
  • the first image and the second image have pavement elements of the same area;
  • the homography matrix determine the mapped image feature of the first image feature, wherein the first image feature is a feature extracted based on the first image;
  • the features of the fused image are processed by using a second neural network to obtain a first pixel aspect ratio of the second image.
  • an image data processing device including:
  • a homography matrix determination module configured to use the first neural network to process the first image and the second image to obtain a homography matrix, wherein the first image is taken at the first moment, and the second image is shooting at the second moment, and the first image and the second image have road surface elements in the same area;
  • a mapped image feature determination module configured to determine a mapped image feature of a first image feature according to the homography matrix, wherein the first image feature is a feature extracted based on the first image;
  • a fusion module configured to fuse the mapped image features and second image features to obtain fused image features, wherein the second image features are features extracted based on the second image;
  • the first pixel aspect ratio determination module is configured to use the second neural network to process the fusion image features to obtain the first pixel aspect ratio of the second image.
  • a computer-readable storage medium stores a computer program, and the computer program is used to perform the processing of the image data described in the above-mentioned first aspect method.
  • an electronic device includes:
  • the processor is configured to read the executable instructions from the memory, and execute the instructions to implement the image data processing method described in the first aspect above.
  • the first image and the second image captured by the camera and having road surface elements in a common area are processed by using the first neural network to obtain a homography matrix; and then by The homography matrix maps the first image feature to obtain the mapped image feature, and fuses the mapped image feature with the second image feature to obtain the fused image feature; uses the second neural network to process the fused image feature to determine the first pixel aspect ratio.
  • the first pixel aspect ratio is the ratio between the height of the pixel of the target object relative to the road surface and the pixel depth in the second image, and this ratio can be used for 3D scene modeling.
  • the image data processing method of the embodiment of the present disclosure can obtain dense and accurate pixel aspect ratio based on the image data, and then can assist in 3D scene modeling.
  • FIG. 1 is a schematic flow diagram of a method for processing image data in an embodiment of the present disclosure
  • FIG. 2 is a schematic flow diagram of step S1 in an embodiment of the present disclosure
  • Fig. 3 is a working principle diagram of the first neural network in an example of the present disclosure
  • FIG. 4 is a schematic flowchart of a method for processing image data in another embodiment of the present disclosure.
  • Fig. 5 is a schematic flow chart after step S4 in another embodiment of the present disclosure.
  • Fig. 6 is a schematic flow chart after step S4 in another embodiment of the present disclosure.
  • Fig. 7 is a schematic flow diagram of step S5" in an embodiment of the present disclosure.
  • Fig. 8 is a schematic flow chart of step S5"-6 in one embodiment of the present disclosure.
  • Fig. 9 is a structural block diagram of an image data processing device in an embodiment of the present disclosure.
  • FIG. 10 is a structural block diagram of a homography matrix determination module 100 in an embodiment of the present disclosure.
  • Fig. 11 is a structural block diagram of an image data processing device in another embodiment of the present disclosure.
  • Fig. 12 is a structural block diagram of an image data processing device in another embodiment of the present disclosure.
  • Fig. 13 is a structural block diagram of an image data processing device in another embodiment of the present disclosure.
  • Fig. 14 is a structural block diagram of an overall loss value determination module 1000 in an embodiment of the present disclosure.
  • Fig. 15 is a structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • plural may refer to two or more than two, and “at least one” may refer to one, two or more than two.
  • the term "and/or" in the present disclosure is only an association relationship describing associated objects, indicating that there may be three relationships, for example, A and/or B may indicate: A exists alone, and A and B exist at the same time , there are three cases of B alone.
  • the character "/" in the present disclosure generally indicates that the contextual objects are an "or" relationship.
  • Embodiments of the present disclosure may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which may operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well known terminal devices, computing systems, environments and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, servers include, but are not limited to: personal computer systems, server computer systems, thin clients, thick client Computers, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, minicomputer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the foregoing, etc.
  • Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by the computer system.
  • program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computing system storage media including storage devices.
  • Fig. 1 is a schematic flowchart of a method for processing image data in an embodiment of the present disclosure. This embodiment can be applied on the server, as shown in Figure 1, including the following steps:
  • a camera is set on the vehicle, and camera internal parameters and camera external parameters are preset. During the driving of the vehicle, images are captured by the camera.
  • a first image taken by the same camera at a first moment and a second image taken by the same camera at a second moment are acquired.
  • the image may be captured in a manner of capturing a video, or multiple frames of images may be captured in a manner of capturing an image at intervals.
  • the first moment and the second moment may be separated by M frames, where M is an integer greater than 0.
  • this embodiment uses the road surfaces in the first image and the second image as reference planes required by the plane parallax method.
  • the feature extraction network is used to extract the features of the first image to obtain the features of the first image
  • the feature extraction network is used to extract the features of the second image to obtain the features of the second image.
  • the feature extraction network may belong to the first neural network, or may be a network independent of the first neural network.
  • feature extraction is performed in the same down-sampling manner. For example, downsampling the first image originally in 3*h*w dimensions (h and w represent the image width and length, respectively) to obtain the first feature map of n*h'*w' dimensions as the first image feature. And down-sampling the second image that was originally 3*h*w dimension to obtain the second feature map of n*h'*w' dimension as the second image feature.
  • n is the number of channels
  • h' can be 1/32, 1/64, etc. of h.
  • w' is 1/32, 1/64, etc. of w.
  • the values of h' and w' can be the same or different.
  • the first image feature and the second image feature are processed by using the first neural network to obtain a homography matrix for aligning the road surfaces in the first image and the second image.
  • a homography matrix for aligning the road surfaces in the first image and the second image.
  • the calculation method of the homography matrix is as follows:
  • H represents the homography matrix, for example, a matrix with elements of 3 ⁇ 3 can be used, K represents the camera internal reference, K -1 represents the inverse matrix of K, d represents the height of the camera relative to the road surface, d can be obtained through calibration, R and t denote the camera relative rotation matrix (eg 3 ⁇ 3) and relative translation matrix (eg 1 ⁇ 3) between the first image and the second image, respectively, and N denotes the road surface normal.
  • the dimension of the mapped image feature is the same as the dimension of the first image feature and the second image feature, according to step S1
  • the dimension of the mapped image features is n*h'*w'.
  • the mapped image feature and the second image feature are superimposed according to the channel dimension to obtain the fused image feature.
  • the dimension of the fused image feature is 2n*h'*w'.
  • the second neural network is a pre-trained model, such as a deep learning model.
  • the second neural network can predict the pixel aspect ratio based on the fused image features.
  • the pixel aspect ratio predicted by the second neural network for the fused image feature is used as the first pixel aspect ratio.
  • the first image and the second image captured by the camera and having road surface elements in the common area are processed by using the first neural network to obtain a homography matrix; then the first image features are processed through the homography matrix
  • the mapped image feature is obtained by mapping, and the mapped image feature is fused with the second image feature to obtain the fused image feature; the fused image feature is processed by the second neural network to determine the first pixel aspect ratio.
  • the first pixel aspect ratio is the ratio between the height of the pixel of the target object relative to the road surface and the pixel depth in the second image, and this ratio can be used for 3D scene modeling.
  • the image data processing method of the embodiments of the present disclosure can obtain dense and accurate pixel aspect ratios based on the image data, thereby assisting 3D scene modeling.
  • step S1 includes:
  • S1-1 Fusion the first image feature and the second image feature to obtain the third image feature.
  • Fig. 3 is a working principle diagram of the first neural network in an example of the present disclosure.
  • the feature extraction network belongs to the first neural network.
  • the feature extraction network extracts the first image feature and the second image feature
  • the first image feature and the second image feature are input to the feature fusion module for fusion.
  • the feature fusion module can superimpose the first image feature and the second image feature according to the channel dimension to obtain the fused image feature. For example, when the dimension of the first image feature is n*h'*w', and the dimension of the second image feature is n*h'*w', then the dimension of the third image feature is 2n*h'*w'.
  • the feature fusion module inputs the third image feature to the road surface sub-network, and the road surface sub-network predicts according to the third image feature, and outputs road surface normal information.
  • the road surface sub-network is a network model that predicts according to the input image features with road surface characteristics, and outputs road surface normal information.
  • n X , n Y and n z are three-dimensional coordinates in the road surface coordinate system.
  • S1-3 Utilize the posture sub-network in the first neural network to process the features of the third image to determine the relative posture of the camera between the first image and the second image.
  • the feature fusion module inputs the third image feature into the attitude sub-network, and the attitude sub-network performs prediction based on the third image feature, and outputs the relative pose of the camera.
  • the attitude sub-network is a network model that predicts according to the input image features and outputs the relative attitude of the camera.
  • the camera relative pose includes a camera relative rotation matrix and a relative translation matrix.
  • S1-4 Determine the homography matrix based on the normal information of the road surface, the relative pose of the camera and the pre-stored height of the camera relative to the road surface.
  • the first neural network can determine the homography matrix by using the homography matrix calculation method above through the height of the camera relative to the road surface, road surface normal information, and the relative pose of the camera. It should be noted that the first neural network may also output the height of the camera relative to the road surface, road surface normal information, and the relative pose of the camera, and then other modules other than the first neural network may determine the homography matrix.
  • the road surface subnetwork and attitude subnetwork in the first neural network are used to process the fused third image features respectively, for example, the road surface subnetwork and attitude subnetwork can be used to process the first image feature and the second image feature Process the third image feature superimposed on the channel dimension to obtain road surface normal information and camera relative pose. Based on road surface normal information, camera relative pose and pre-stored height of the camera relative to the road surface, the homography can be accurately determined matrix.
  • Fig. 4 is a schematic flowchart of a method for processing image data in another embodiment of the present disclosure. As shown in Figure 4, in this embodiment, after step S4, it also includes:
  • S5 Determine a second pixel aspect ratio of the target object based on radar scanning data corresponding to the target object in the second image within the acquisition time of the second image.
  • the vehicle is provided with an on-board radar.
  • the acquisition time of the second image is t 2
  • the radar scanning data near the vehicle at the time t 2 is acquired through the vehicle radar.
  • the radar scanning data corresponding to the target object can be extracted from the radar scanning data near the vehicle according to the analysis result.
  • the position of the target object relative to the vehicle and the volume of the target object can be accurately obtained, and then the true value of the pixel aspect ratio at time t2 can be generated, which is recorded as the second pixel aspect ratio.
  • S6 Adjust the parameters of the second neural network based on the difference between the first pixel aspect ratio and the second pixel aspect ratio.
  • the difference between the true value of the pixel aspect ratio and the predicted value is used to reasonably adjust the second neural network.
  • the parameters of the second neural network improve the prediction accuracy of the second neural network.
  • Fig. 5 is a schematic flow chart after step S4 in another embodiment of the present disclosure. As shown in Figure 5, in this embodiment, after step S4, it also includes:
  • S5′ Perform image reconstruction on the first image using the homography matrix to obtain a first reconstructed image.
  • the homography matrix is used to perform image reconstruction on the first image in a manner of reverse mapping to obtain the first reconstructed image.
  • the first reconstructed image and the second image will be aligned on the road surface; if the matrix parameters of the homography matrix are not optimal, the first reconstructed image and the second image will be aligned. There will be pixel displacement in the road surface part of the second image.
  • the matrix parameters of the homography matrix can be adjusted reasonably.
  • the homography matrix is determined based on the road surface normal information predicted by the road surface sub-network, the relative pose of the camera predicted by the pose self-network, and the pre-stored height of the camera relative to the road surface. Therefore, through the homography matrix after adjusting the matrix parameters, the parameters of the road sub-network and the attitude sub-network are reasonably adjusted through back propagation.
  • the pixel displacement between the first reconstructed image and the second image on the road surface elements in the same area can reasonably adjust the matrix parameters of the homography matrix, and the homography matrix after adjusting the matrix parameters is used as the supervision Information, rationally adjusting the parameters of the road sub-network and attitude sub-network can improve the prediction accuracy of the road sub-network and attitude sub-network.
  • Fig. 6 is a schematic flow chart after step S4 in another embodiment of the present disclosure. As shown in Figure 6, in this embodiment, after step S4, it also includes:
  • this embodiment takes the first neural network and the second neural network as a whole.
  • the overall photometric loss value can be calculated through the photometric loss function.
  • the position of the target image relative to the vehicle and the volume of the target object can be obtained, which in turn can assist in determining the overall supervision loss value.
  • an overall loss value can be determined.
  • the first neural network and the second neural network are considered as a whole, and by calculating the overall loss value, the parameters of the first neural network and the second neural network can be reasonably adjusted to improve the first neural network. and the prediction accuracy of the second neural network.
  • Fig. 7 is a schematic flow chart of step S5 "in an embodiment of the present disclosure. As shown in Fig. 7, in this embodiment, step S5 "includes:
  • S5′′-1 Based on the radar scan data corresponding to the target object in the second image, determine a second pixel aspect ratio of the target object.
  • the vehicle is provided with an on-board radar.
  • the acquisition time of the second image is t 2
  • the radar scanning data near the vehicle at the time t 2 is acquired through the vehicle radar.
  • the radar scanning data corresponding to the target object can be extracted from the radar scanning data near the vehicle according to the analysis result.
  • the position of the target object relative to the vehicle and the volume of the target object can be accurately obtained, and then the true value of the pixel aspect ratio at time t2 can be generated, which is recorded as the second pixel aspect ratio.
  • S5′′-2 Determine the first loss value based on the first pixel aspect ratio and the second pixel aspect ratio.
  • the first loss value can be obtained by subtracting the first pixel aspect ratio from the second pixel aspect ratio.
  • S5′′-3 Perform image reconstruction on the first image using the homography matrix to obtain a first reconstructed image.
  • the homography matrix is used to perform image reconstruction on the first image in a manner of reverse mapping to obtain the first reconstructed image.
  • S5′′-4 Based on the first pixel aspect ratio, determine the pixel displacement between the first image area and the second image area.
  • the first image area is the remaining image area in the first reconstructed image except the road surface image area
  • the second image area is the remaining image area in the second image except the road surface image area.
  • the basic alignment of the first reconstructed image and the second image on the road image area is realized (if the matrix parameters of the homography matrix are not optimal, the first reconstructed image and the second image are in There are still partial pixel displacements in the road surface part), but the remaining image areas of the first reconstructed image and the second image except the road surface image area are not aligned.
  • the remaining image areas in the first reconstructed image and the second image except the road surface image area are compared pixel by pixel to obtain the pixel displacement between the first image area and the second image area.
  • S5′′-5 Based on the pixel displacement between the first image area and the second image area, adjust the pixel position of the first reconstructed image to obtain the second reconstructed image.
  • the pixel positions of the first reconstructed image are adjusted, so that the second reconstructed image and the second image can be pixel-aligned on the road surface image area.
  • the first reconstructed image has been basically aligned with the second image in the road surface image area
  • the second reconstructed image is basically aligned with the second image on the whole image.
  • S5′′-6 Determine a second loss value based on the second reconstructed image, the second image, and the road surface mask of the second image.
  • the luminosity loss between the second reconstructed image and the second image may be calculated based on the second reconstructed image, the second image, and the road surface mask of the second image as the second loss value.
  • S5′′-7 Determine the overall loss value based on the first loss value and the second loss value.
  • the overall loss value can be obtained by adding the first loss value and the second loss value.
  • the overall loss can be reasonably determined based on the radar scan data corresponding to the target object in the second image, the first pixel aspect ratio, and the road surface mask of the second image value, so as to reasonably adjust the parameters of the first neural network and the second neural network based on the overall loss value, thereby improving the prediction accuracy of the first neural network and the second neural network.
  • Fig. 8 is a schematic flow chart of step S5 "-6 in an embodiment of the present disclosure. As shown in Fig. 8, in this embodiment, step S5 "-6 includes:
  • S5′′-6-1 Determine the global photometric error between the second reconstructed image and the second image.
  • the full image photometric error is determined by the following formula:
  • L_photo1 L p (It, Isw)
  • L p represents the photometric loss coefficient
  • represents the weight
  • is a constant
  • It represents the second image
  • Isw represents the second reconstructed image
  • SSIM(It, Isw) represents the structure between the second image and the second reconstructed image
  • L_photo1 represents the photometric error of the whole image.
  • S5′′-6-2 Based on the photometric error of the whole image and the road surface mask of the second image, determine the photometric error between the second reconstructed image and the second image on the road surface image area.
  • the photometric error between the second reconstructed image and the second image on the road surface image area is determined by the following formula:
  • L_photo2 represents the photometric error between the second reconstructed image and the second image on the road surface image area
  • mask_ground represents the road surface mask of the second image
  • S5′′-6-3 Determine the second loss value based on the photometric error of the whole image and the photometric error between the second reconstructed image and the second image on the road surface image area.
  • the second loss value is determined by the following formula:
  • the second loss value between the second reconstructed image and the second image can be reasonably determined, so that based on the second loss value
  • the parameters of the first neural network and the second neural network are reasonably adjusted, thereby improving the prediction accuracy of the first neural network and the second neural network.
  • Any image data processing method provided in the embodiments of the present disclosure may be executed by any appropriate device capable of data processing, including but not limited to: terminal devices, servers, and the like.
  • any image data processing method provided in the embodiments of the present disclosure may be executed by a processor, for example, the processor executes any image data processing method mentioned in the embodiments of the present disclosure by calling corresponding instructions stored in a memory. I won't go into details below.
  • Fig. 9 is a structural block diagram of an image data processing device in an embodiment of the present disclosure.
  • the image data processing device includes: a homography matrix determination module 100 , a mapped image feature determination module 200 , a fusion module 300 and a first pixel aspect ratio determination module 400 .
  • the homography matrix determination module 100 is used to use the first neural network to process the first image and the second image to obtain the homography matrix, wherein the first image is taken at the first moment, and the second The image is taken at the second moment, and the first image and the second image have road surface elements in the same area;
  • the mapped image feature determination module 200 is used to determine the mapped image of the first image feature according to the homography matrix feature, wherein the first image feature is a feature extracted based on the first image;
  • the fusion module 300 is used to fuse the mapped image feature and the second image feature to obtain a fused image feature, wherein the The second image feature is a feature extracted based on the second image;
  • the first pixel aspect ratio determining module 400 is used to process the fusion image feature using a second neural network to obtain the first pixel of the second image aspect ratio.
  • Fig. 10 is a structural block diagram of a homography matrix determination module 100 in an embodiment of the present disclosure. As shown in Figure 10, in this embodiment, the homography matrix determination module 100 includes:
  • a fusion unit 101 configured to fuse the first image feature and the second image feature to obtain a third image feature
  • a road surface normal information determining unit 102 configured to use the road surface sub-network in the first neural network to process the third image feature to determine road surface normal information
  • a camera relative pose determining unit 103 configured to use the pose sub-network in the first neural network to process the third image features, and determine the relative camera pose between the first image and the second image;
  • the homography matrix determining unit 104 is configured to determine the homography matrix based on the road surface normal information, the camera relative pose and the pre-stored height of the camera relative to the road surface.
  • Fig. 11 is a structural block diagram of an image data processing device in another embodiment of the present disclosure. As shown in Figure 11, in this embodiment, the image data processing device also includes:
  • the second pixel aspect ratio determination module 500 is configured to determine the second pixel aspect ratio of the target object based on the radar scanning data corresponding to the target object in the second image within the acquisition time of the second image ;
  • the first parameter adjustment module 600 is configured to adjust the parameters of the second neural network based on the difference between the first pixel aspect ratio and the second pixel aspect ratio.
  • Fig. 12 is a structural block diagram of an image data processing device in another embodiment of the present disclosure. As shown in Figure 12, in this embodiment, the image data processing device also includes:
  • An image reconstruction module 700 configured to use the homography matrix to perform image reconstruction on the first image to obtain a first reconstructed image
  • a homography matrix parameter adjustment module 800 configured to adjust the matrix parameters of the homography matrix based on the pixel displacement between the first reconstructed image and the second image on the road surface elements in the same area;
  • the first network parameter adjustment module 900 is configured to adjust the parameters of the road surface sub-network and the attitude sub-network based on the homography matrix after the matrix parameters are adjusted.
  • Fig. 13 is a structural block diagram of an image data processing device in another embodiment of the present disclosure. As shown in Figure 13, in this embodiment, the image data processing device also includes:
  • the overall loss value determination module 1000 is configured to, within the acquisition time of the second image, based on the first pixel aspect ratio, the road surface mask of the second image, and the corresponding target object in the second image Radar scan data to determine the overall loss value;
  • the second network parameter adjustment module 1100 is configured to adjust the parameters of the first neural network and the second neural network based on the overall loss value.
  • Fig. 14 is a structural block diagram of an overall loss value determination module 1000 in an embodiment of the present disclosure. As shown in Figure 14, in this embodiment, the overall loss value determination module 1000 includes:
  • a second pixel aspect ratio determining unit 1001 configured to determine a second pixel aspect ratio of the target object based on the radar scan data
  • a first loss value determining unit 1002 configured to determine a first loss value based on the first pixel aspect ratio and the second pixel aspect ratio;
  • the first reconstructed image unit 1003 is configured to use the homography matrix to perform image reconstruction on the first image to obtain a first reconstructed image;
  • a pixel displacement determining unit 1004 configured to determine a pixel displacement between a first image area and a second image area based on the first pixel aspect ratio, wherein the first image area is the pixel displacement in the first reconstructed image except The remaining image area other than the road surface image area, the second image area is the remaining image area in the second image except the road surface image area;
  • the second reconstructed image unit 1005 is configured to adjust the pixel position of the first reconstructed image based on the pixel displacement to obtain a second reconstructed image
  • a second loss value determining unit 1006 configured to determine a second loss value based on the second reconstructed image, the second image, and the road surface mask of the second image;
  • An overall loss value determining unit 1007 configured to determine the overall loss value based on the first loss value and the second loss value.
  • the second loss value determining unit 1006 is specifically configured to determine the full-image photometric error between the second reconstructed image and the second image; the second loss value determining unit 1006 is also configured to Based on the photometric error of the full image and the road surface mask of the second image, determine the photometric error between the second reconstructed image and the second image on the road surface image area; the second loss value determination unit 1006 is also used to The second loss value is determined based on the photometric error of the full image and the photometric error between the second reconstructed image and the second image on the road surface image area.
  • the electronic device includes one or more processors 10 and memory 20 .
  • Processor 10 may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.
  • CPU central processing unit
  • Processor 10 may control other components in the electronic device to perform desired functions.
  • Memory 20 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • the volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache).
  • the non-volatile memory may include, for example, a read-only memory (ROM), a hard disk, a flash memory, and the like.
  • One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 10 may execute the program instructions to implement the image data processing methods and/or the various embodiments of the present disclosure described above or other desired functionality.
  • Various contents such as input signals, signal components, noise components, etc. may also be stored in the computer-readable storage medium.
  • the electronic device may further include: an input device 30 and an output device 40, and these components are interconnected through a bus system and/or other forms of connection mechanisms (not shown).
  • the input device 30 can be, for example, a keyboard, a mouse, and the like.
  • the output device 40 may include, for example, a display, a speaker, a printer, and a communication network and remote output devices connected thereto.
  • the electronic device may also include any other suitable components according to specific applications.
  • the computer readable storage medium may utilize any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • a readable storage medium may include, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof, for example. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more conductors, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • the methods and apparatus of the present disclosure may be implemented in many ways.
  • the methods and apparatuses of the present disclosure may be implemented by software, hardware, firmware or any combination of software, hardware, and firmware.
  • the above sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise.
  • the present disclosure can also be implemented as programs recorded in recording media, the programs including machine-readable instructions for realizing the method according to the present disclosure.
  • the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
  • each component or each step can be decomposed and/or reassembled. These decompositions and/or recombinations should be considered equivalents of the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Geometry (AREA)
  • Evolutionary Biology (AREA)
  • Computer Graphics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本公开实施例公开了一种图像数据的处理方法和装置,其中,该处理方法包括:利用第一神经网络对第一图像和第二图像进行处理,得到单应性矩阵;根据所述单应性矩阵,确定第一图像特征的映射图像特征;对所述映射图像特征和所述第二图像特征进行融合,得到融合图像特征;利用第二神经网络对所述融合图像特征进行处理,得到所述第二图像的第一像素高深比。本公开实施例可以得到稠密且准确的像素高深比,进而可以辅助进行3D场景建模。

Description

图像数据的处理方法和装置
本申请要求在2021年11月10日提交的、申请号为202111329386.7、发明名称为“图像数据的处理方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及图像处理技术领域,尤其是一种图像数据的处理方法和装置。
背景技术
平面视差方法基于两个视角观测同一目标或场景的差异来建模3D场景,该方法依赖于某个特定平面,可以恢复场景中一个像素点到平面的高度和到观测点的距离,即该像素点的像素高深比。
目前的平面视差方法依赖于光流估计得到两个视角下对应点的匹配结果。光流方法不能得到稠密的估计结果并且受噪声影响大。如何基于图像数据得到稠密且准确的像素高深比,是一个亟待解决的问题。
发明内容
为了解决上述技术问题,提出了本公开。
根据本公开实施例的第一方面,提供了一种图像数据的处理方法,包括:
利用第一神经网络对第一图像和第二图像进行处理,得到单应性矩阵,其中,所述第一图像为第一时刻拍摄,所述第二图像为第二时刻拍摄,且所述第一图像和所述第二图像具有相同区域的路面元素;
根据所述单应性矩阵,确定第一图像特征的映射图像特征,其中,所述第一图像特征为基于所述第一图像提取的特征;
对所述映射图像特征和第二图像特征进行融合,得到融合图像特征,其中,所述第二图像特征为基于所述第二图像提取的特征;
利用第二神经网络对所述融合图像特征进行处理,得到所述第二图像的第一像素高深比。
根据本公开实施例的第二方面,提供了一种图像数据的处理装置,包括:
单应性矩阵确定模块,用于利用第一神经网络对第一图像和第二图像进行处理,得到单应性矩阵,其中,所述第一图像为第一时刻拍摄,所述第二图像为第二时刻拍摄,且所述第一图像和所述第二图像具有相同区域的路面元素;
映射图像特征确定模块,用于根据所述单应性矩阵,确定第一图像特征的映射图像特征,其中,所述第一图像特征为基于所述第一图像提取的特征;
融合模块,用于对所述映射图像特征和第二图像特征进行融合,得到融合图像特征,其中,所述第二图像特征为基于所述第二图像提取的特征;
第一像素高深比确定模块,用于利用第二神经网络对所述融合图像特征进行处理,得到所述第二图像的第一像素高深比。
根据本公开实施例的第三方面,提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序用于执行上述第一方面所述的图像数据的处理方法。
根据本公开实施例的第四方面,提供了一种电子设备,所述电子设备包括:
处理器;
用于存储所述处理器可执行指令的存储器;
所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现上述第一方面所述的图像数据的处理方法。
基于本公开上述实施例提供的图像数据的处理方法和装置,利用第一神经网络对相机拍摄的、具有共同区域路面元素的第一图像和第二图像进行处理,得到单应性矩阵;接着通过单应性矩阵对第一图像特征进行映射得到映射图像特征,并将映射图像特征与第二图像特征进行融合,得到融合图像特征;利用第二神经网络对融合图像特征进行处理,确定第一像素高深比。其中,第一像素高深比为第二图像中目标物的像素相对于路面的高度与像素深度之间的比值,该比值可以用于3D场景建模。本公开实施例的图像数据的处理方法,基于图像数据可以得到稠密且准确的像素高深比,进而可以辅助进行3D场景建模。
下面通过附图和实施例,对本公开的技术方案做进一步的详细描述。
附图说明
通过结合附图对本公开实施例进行更详细的描述,本公开的上述以及其他目的、特征和优势将变得更 加明显。附图用来提供对本公开实施例的进一步理解,并且构成说明书的一部分,与本公开实施例一起用于解释本公开,并不构成对本公开的限制。在附图中,相同的参考标号通常代表相同部件或步骤。
图1是本公开一个实施例中图像数据的处理方法的流程示意图;
图2是本公开一个实施例中步骤S1的流程示意图;
图3是本公开一个示例中第一神经网络的工作原理图;
图4是本公开另一个实施例中图像数据的处理方法的流程示意图;
图5是本公开又一个实施例中在步骤S4之后的流程示意图;
图6是本公开再一个实施例中在步骤S4之后的流程示意图;
图7是本公开一个实施例中步骤S5″的流程示意图;
图8是本公开一个实施例中步骤S5″-6的流程示意图;
图9是本公开一个实施例中图像数据的处理装置的结构框图;
图10是本公开一个实施例中单应性矩阵确定模块100的结构框图;
图11是本公开另一个实施例中图像数据的处理装置的结构框图;
图12是本公开又一个实施例中图像数据的处理装置的结构框图;
图13是本公开再一个实施例中图像数据的处理装置的结构框图;
图14是本公开一个实施例中整体损失值确定模块1000的结构框图;
图15是本公开一个实施例提供的电子设备的结构图。
具体实施方式
下面,将参考附图详细地描述根据本公开的示例实施例。显然,所描述的实施例仅仅是本公开的一部分实施例,而不是本公开的全部实施例,应理解,本公开不受这里描述的示例实施例的限制。
应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。
本领域技术人员可以理解,本公开实施例中的“第一”、“第二”等术语仅用于区别不同步骤、设备或模块等,既不代表任何特定技术含义,也不表示它们之间的必然逻辑顺序。
还应理解,在本公开实施例中,“多个”可以指两个或两个以上,“至少一个”可以指一个、两个或两个以上。
还应理解,对于本公开实施例中提及的任一部件、数据或结构,在没有明确限定或者在前后文给出相反启示的情况下,一般可以理解为一个或多个。
另外,本公开中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本公开中字符“/”,一般表示前后关联对象是一种“或”的关系。
还应理解,本公开对各个实施例的描述着重强调各个实施例之间的不同之处,其相同或相似之处可以相互参考,为了简洁,不再一一赘述。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本公开实施例可以应用于终端设备、计算机系统、服务器等电子设备,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与终端设备、计算机系统、服务器等电子设备一起使用的众所周知的终端设备、计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
终端设备、计算机系统、服务器等电子设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
示例性方法
图1是本公开一个实施例中图像数据的处理方法的流程示意图。本实施例可应用在服务器上,如图1所示,包括如下步骤:
S1:利用第一神经网络对第一图像和第二图像进行处理,得到单应性矩阵。其中,第一图像为第一时刻拍摄,第二图像为第二时刻拍摄,且第一图像和第二图像具有相同区域的路面元素。
具体地,车辆上设置有相机,并预先设有相机内参和相机外参。在车辆行驶过程中,通过相机拍摄图像。
获取同一个相机在第一时刻拍摄的第一图像,以及该相机在第二时刻拍摄的第二图像。其中,可以通过拍摄视频的方式拍摄图像,也可以通过每间隔一段时间拍摄一次图像的方式拍摄多帧图像。在本实施例中,第一时刻和第二时刻可以间隔M帧,M为大于0的整数。
由于在驾驶场景下拍摄的图像中通常包括路面,因此本实施例将第一图像和第二图像中的路面作为平面视差法所需的参考平面。
利用特征提取网络对第一图像进行特征提取,得到第一图像特征,并利用特征提取网络对第二图像进行特征提取,得到第二图像特征。其中,特征提取网络可以属于第一神经网络,也可以是独立于第一神经网络之外的网络。在本实施例中,按照相同的下采样方式进行特征提取。例如将原本为3*h*w维度(h和w分别代表图像宽和长)的第一图像进行下采样,得到n*h’*w’维度的第一特征图,作为第一图像特征。并将原本为3*h*w维度的第二图像进行下采样,得到n*h’*w’维度的第二特征图,作为第二图像特征。其中n为通道数,h’可以是h的1/32,1/64等。w’为w的1/32,1/64等。h’和w’的取值可以相同,也可以不相同。
利用第一神经网络对第一图像特征和第二图像特征进行处理,得到用于对齐第一图像和第二图像中路面的单应性矩阵。示例性地,单应性矩阵的计算方式如下:
Figure PCTCN2022118735-appb-000001
其中,H表示单应性矩阵,例如可以采用元素为3×3的矩阵,K表示相机内参,K -1表示K的逆矩阵,d表示相机相对于路面的高度,d可以通过标定得到,R和t分别表示第一图像和第二图像之间的相机相对旋转矩阵(例如3×3)和相对平移矩阵(例如1×3),N表示路面法线。
S2:根据单应性矩阵,确定第一图像特征的映射图像特征。
具体地,利用单应性矩阵将第一图像特征映射到第二图像特征的视角上,得到映射图像特征,映射图像特征的维度与第一图像特征和第二图像特征的维度相同,按照步骤S1的示例,映射图像特征的维度为n*h’*w’。
S3:对映射图像特征和第二图像特征进行融合,得到融合图像特征。
在一种可选的方式中,将映射图像特征和第二图像特征按照通道维度进行叠加处理,得到融合图像特征。按照步骤S1和S2的示例,融合图像特征的维度为2n*h’*w’。
S4:利用第二神经网络对融合图像特征进行处理,得到第二图像的第一像素高深比。
具体地,第二神经网络是预先训练好的模型,例如深度学习模型。第二神经网络可以基于融合图像特征预测出像素高深比。在本实施例中,将第二神经网络对融合图像特征预测出的像素高深比作为第一像素高深比。
在本实施例中,利用第一神经网络对相机拍摄的、具有共同区域路面元素的第一图像和第二图像进行处理,得到单应性矩阵;接着通过单应性矩阵对第一图像特征进行映射得到映射图像特征,并将映射图像特征与第二图像特征进行融合,得到融合图像特征;利用第二神经网络对融合图像特征进行处理,确定第一像素高深比。其中,第一像素高深比为第二图像中目标物的像素相对于路面的高度与像素深度之间的比值,该比值可以用于3D场景建模。本公开实施例的图像数据的处理方法,可以基于图像数据得到稠密且准确的像素高深比,进而辅助3D场景建模。
图2是本公开一个实施例中步骤S1的流程示意图。如图2所示,在本实施例中,步骤S1包括:
S1-1:对第一图像特征和第二图像特征进行融合,得到第三图像特征。
图3是本公开一个示例中第一神经网络的工作原理图。如图3所示,在本示例中,特征提取网络属于第一神经网络。特征提取网络提取出第一图像特征和第二图像特征之后,将第一图像特征和第二图像特征输入到特征融合模块进行融合。其中,特征融合模块可以将第一图像特征和第二图像特征按照通道维度进行叠加处理,得到融合图像特征。例如当第一图像特征的维度是n*h’*w’,且第二图像特征的维度是n*h’*w’时,则第三图像特征的维度为2n*h’*w’。
S1-2:利用第一神经网络中的路面子网络对第三图像特征进行处理,确定路面法线信息。
请继续参考图3,特征融合模块将第三图像特征输入给路面子网络,由路面子网络根据第三图像特征进行预测,输出路面法线信息。其中,路面子网络为根据输入的具有路面特征的图像特征进行预测,输出路面法线信息的网络模型。在本实施例中,路面法线信息可以通过N=[n X,n Y,n z]的路面方程进行表示。其中,n X,n Y和n z为在路面坐标系中三维坐标。
S1-3:利用第一神经网络中的姿态子网络对第三图像特征进行处理,确定第一图像与第二图像之间的相机相对姿态。
请继续参考图3,特征融合模块将第三图像特征输入姿态子网络,由姿态子网络根据第三图像特征进行预测,输出相机相对姿态。其中,姿态子网络为根据输入图像特征进行预测,输出相机相对姿态的网络模型。在本实施例中,相机相对姿态包括相机相对旋转矩阵和相对平移矩阵。
S1-4:基于路面法线信息、相机相对姿态和预存的相机相对于路面的高度,确定单应性矩阵。
请继续参考图3,第一神经网络通过相机相对于路面的高度、路面法线信息和相机相对姿态,可以采用上文中的单应性矩阵的计算方式确定单应性矩阵。需要说明的是,也可以由第一神经网络输出相机相对于路面的高度、路面法线信息和相机相对姿态,然后由第一神经网络之外的其他模块确定单应性矩阵。
在本实施例中,利用第一神经网络中路面子网络和姿态子网络分别对融合后的第三图像特征进行处理,例如可以利用路面子网络和姿态子网络对第一图像特征和第二图像特征在通道维度上叠加得到的第三图像特征进行处理,得到路面法线信息和相机相对姿态,基于路面法线信息、相机相对姿态和预存的相机相对于路面的高度、可以准确地确定单应性矩阵。
图4是本公开另一个实施例中图像数据的处理方法的流程示意图。如图4所示,在本实施例中,在步骤S4之后,还包括:
S5:在第二图像的采集时间内,基于与第二图像中的目标对象对应的雷达扫描数据,确定目标对象的第二像素高深比。
具体地,车辆设置有车载雷达。第二图像的采集时刻为t 2,通过车载雷达获取t 2时刻车辆附近的雷达扫描数据。通过对第二图像和车辆附近的雷达扫描数据进行分析,从而可以根据分析结果从车辆附近的雷达扫描数据中提取出目标对象对应的雷达扫描数据。根据提取出的雷达扫描数据可以准确得到目标对象的相对于车辆的位置,以及目标对象的体积,进而可以生成t 2时刻像素高深比的真值,记为第二像素高深比。
S6:基于第一像素高深比与第二像素高深比的差值,对第二神经网络进行参数调整。
具体地,基于在t 2时刻,像素高深比的真值(即第二像素高深比)与像素高深比的预测值(即第一像素高深比)之间的差值,通过反向传播的方式对第二神经网络进行参数调整。
在本实施例中,通过在同一时刻,将雷达数据确定的像素高深比的真值作为第二神经网络的监督信息,通过像素高深比的真值与预测值之间差值,合理地调整第二神经网络的参数,提升第二神经网络的预测准确性。
图5是本公开又一个实施例中在步骤S4之后的流程示意图。如图5所示,在本实施例中,在步骤S4之后,还包括:
S5ˊ:利用单应性矩阵对第一图像进行图像重建,得到第一重建图像。
具体地,利用单应性矩阵,对第一图像采用反向映射的方式进行图像重建,得到第一重建图像。
S6ˊ:基于第一重建图像与第二图像之间在相同区域路面元素上的像素位移,调整单应性矩阵的矩阵参数。
具体地,如果单应性矩阵的矩阵参数达到最优,则第一重建图像与第二图像在路面部分会对齐;如果单应性矩阵的矩阵参数没有达到最优,则第一重建图像与第二图像在路面部分会存在像素位移。
基于第一重建图像与第二图像之间在相同区域路面元素上的像素位移,可以合理的调整单应性矩阵的矩阵参数。
S7ˊ:基于调整矩阵参数后的单应性矩阵,对路面子网络和姿态子网络进行参数调整。
具体地,由于单应性矩阵是根据路面子网络预测出的路面法线信息、姿态自网络预测出的相机相对姿态,以及预存的相机相对路面高度确定。因此,通过调整矩阵参数后的单应性矩阵,通过反向传播的方式合理调整路面子网络和姿态子网络的参数。
在本实施例中,将第一重建图像与第二图像之间在相同区域路面元素上的像素位移可以合理的调整单应性矩阵的矩阵参数,将调整矩阵参数后的单应性矩阵作为监督信息,合理地调整路面子网络和姿态子网络的参数,能够提升路面子网络和姿态子网络的预测准确性。
图6是本公开再一个实施例中在步骤S4之后的流程示意图。如图6所示,在本实施例中,在步骤S4之后,还包括:
S5″:在第二图像的采集时间内,基于第一像素高深比、第二图像的路面掩码以及与第二图像中的目标对象对应的雷达扫描数据,确定整体损失值。
具体地,本实施例将第一神经网络和第二神经网络作为一个整体。其中,基于第一像素高深比和第二图像的路面掩码,通过光度损失函数可以计算出整体的光度损失值。基于在第二图像的采集时间内与第二 图像中的目标对象对应的雷达扫描数据,可以得到目标图像相对于车辆的位置和目标对象的体积,进而可以辅助确定整体的监督损失值。基于整体的光度损失值和监督损失值,可以确定整体损失值。
S6″:基于整体损失值,对第一神经网络和第二神经网络进行参数调整。
在本实施例中,将第一神经网络和第二神经网络作为一个整体,通过计算出整体损失值,可以对第一神经网络和第二神经网络的参数进行合理的调整,提升第一神经网络和第二神经网络的预测准确率。
图7是本公开一个实施例中步骤S5″的流程示意图。如图7所示,在本实施例中,步骤S5″包括:
S5″-1:基于第二图像中的目标对象对应的雷达扫描数据,确定目标对象的第二像素高深比。
具体地,车辆设置有车载雷达。第二图像的采集时刻为t 2,通过车载雷达获取t 2时刻车辆附近的雷达扫描数据。通过对第二图像和车辆附近的雷达扫描数据进行分析,从而可以根据分析结果从车辆附近的雷达扫描数据中提取出目标对象对应的雷达扫描数据。根据提取出的雷达扫描数据可以准确得到目标对象的相对于车辆的位置,以及目标对象的体积,进而可以生成t 2时刻像素高深比的真值,记为第二像素高深比。
S5″-2:基于第一像素高深比和第二像素高深比,确定第一损失值。其中,可以将第一像素高深比与第二像素高深比进行相减,得到第一损失值。
S5″-3:利用单应性矩阵对第一图像进行图像重建,得到第一重建图像。
具体地,利用单应性矩阵,对第一图像采用反向映射的方式进行图像重建,得到第一重建图像。
S5″-4:基于第一像素高深比,确定第一图像区域与第二图像区域之间的像素位移。其中,第一图像区域为第一重建图像中除路面图像区域以外的剩余图像区域,第二图像区域为第二图像中除路面图像区域以外的剩余图像区域。
具体地,基于单应性矩阵实现了第一重建图像与第二图像在路面图像区域上的基本对齐(如果单应性矩阵的矩阵参数没有达到最优,则第一重建图像与第二图像在路面部分仍然存在部分像素有位移),但是第一重建图像与第二图像除了路面图像区域以外的剩余图像区域,则没有对齐。对第一重建图像与第二图像中除了路面图像区域以外的剩余图像区域,逐个像素进行对比,得到第一图像区域与第二图像区域之间的像素位移。
S5″-5:基于第一图像区域与第二图像区域之间的像素位移,对第一重建图像的像素位置进行调整,得到第二重建图像。
具体地,基于第一图像区域与第二图像区域之间的像素位移,对第一重建图像的像素位置进行调整,可以使得第二重建图像与第二图像在路面图像区域上实现像素对齐。结合第一重建图像已经实现了与第二图像在路面图像区域的基本对齐,则实现了第二重建图像与第二图像在全图上的基本对齐。
S5″-6:基于第二重建图像、第二图像和第二图像的路面掩码,确定第二损失值。
具体地,可以基于第二重建图像、第二图像和第二图像的路面掩码计算第二重建图像与第二图像之间的光度损失,作为第二损失值。
S5″-7:基于第一损失值和第二损失值,确定整体损失值。其中,可以将第一损失值与第二损失值进行相加,得到整体损失值。
在本实施例中,在第二图像的采集时间内,可以基于与第二图像中的目标对象对应的雷达扫描数据、第一像素高深比以及第二图像的路面掩码,合理地确定整体损失值,以便基于整体损失值对第一神经网络和第二神经网络的参数进行合理的调整,进而提升第一神经网络和第二神经网络的预测准确率。
图8是本公开一个实施例中步骤S5″-6的流程示意图。如图8所示,在本实施例中,步骤S5″-6包括:
S5″-6-1:确定第二重建图像与第二图像之间的全图光度误差。
在一种可选的方式中,基于光度误差函数,通过以下公式确定全图光度误差:
Figure PCTCN2022118735-appb-000002
L_photo1=L p(It,Isw)
其中,L p表示光度损失系数,α表示权重,且α为常数,It表示第二图像,Isw表示第二重建图像,SSIM(It,Isw)表示第二图像与第二重建图像之间的结构相似参数,L_photo1表示全图光度误差。
S5″-6-2:基于全图光度误差和第二图像的路面掩码,确定第二重建图像与第二图像在路面图像区域上的光度误差。
在一种可选的方式中,通过以下公式确定第二重建图像与第二图像在路面图像区域上的光度误差:
L_photo2=mask_ground*L_photo1
其中,L_photo2表示第二重建图像与第二图像在路面图像区域上的光度误差,mask_ground表示第二图像的路面掩码。
S5″-6-3:基于全图光度误差和第二重建图像与第二图像在路面图像区域上的光度误差,确定第二损失 值。
具体地,通过以下公式确定第二损失值:
L_photoT=L photo1+L_photo2
在本实施例中,基于第二重建图像、第二图像和第二图像的路面掩码,可以合理地确定第二重建图像和第二图像之间的第二损失值,以便基于第二损失值对第一神经网络和第二神经网络的参数进行合理的调整,进而提升第一神经网络和第二神经网络的预测准确率。
本公开实施例提供的任一种图像数据的处理方法,可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。或者,本公开实施例提供的任一种图像数据的处理方法可以由处理器执行,如处理器通过调用存储器存储的相应指令来执行本公开实施例提及的任一种图像数据的处理方法。下文不再赘述。
示例性装置
图9是本公开一个实施例中图像数据的处理装置的结构框图。如图9所示,在本实施例中,图像数据的处理装置包括:单应性矩阵确定模块100、映射图像特征确定模块200、融合模块300和第一像素高深比确定模块400。
其中,单应性矩阵确定模块100用于利用第一神经网络对第一图像和第二图像进行处理,得到单应性矩阵,其中,所述第一图像为第一时刻拍摄,所述第二图像为第二时刻拍摄,且所述第一图像和所述第二图像具有相同区域的路面元素;映射图像特征确定模块200用于根据所述单应性矩阵,确定第一图像特征的映射图像特征,其中,所述第一图像特征为基于所述第一图像提取的特征;融合模块300用于对所述映射图像特征和所述第二图像特征进行融合,得到融合图像特征,其中,所述第二图像特征为基于所述第二图像提取的特征;第一像素高深比确定模块400用于利用第二神经网络对所述融合图像特征进行处理,得到所述第二图像的第一像素高深比。
图10是本公开一个实施例中单应性矩阵确定模块100的结构框图。如图10所示,在本实施例中,单应性矩阵确定模块100包括:
融合单元101,用于对所述第一图像特征和所述第二图像特征进行融合,得到第三图像特征;
路面法线信息确定单元102,用于利用所述第一神经网络中的路面子网络对所述第三图像特征进行处理,确定路面法线信息;
相机相对姿态确定单元103,用于利用所述第一神经网络中的姿态子网络对所述第三图像特征进行处理,确定所述第一图像与所述第二图像之间的相机相对姿态;
单应性矩阵确定单元104,用于基于所述路面法线信息、所述相机相对姿态和预存的相机相对于路面的高度,确定所述单应性矩阵。
图11是本公开另一个实施例中图像数据的处理装置的结构框图。如图11所示,在本实施例中,图像数据的处理装置还包括:
第二像素高深比确定模块500,用于在所述第二图像的采集时间内,基于与所述第二图像中的目标对象对应的雷达扫描数据,确定所述目标对象的第二像素高深比;
第一参数调整模块600,用于基于所述第一像素高深比与所述第二像素高深比的差值,对所述第二神经网络进行参数调整。
图12是本公开又一个实施例中图像数据的处理装置的结构框图。如图12所示,在本实施例中,图像数据的处理装置还包括:
图像重建模块700,用于利用所述单应性矩阵对所述第一图像进行图像重建,得到第一重建图像;
单应性矩阵参数调整模块800,用于基于所述第一重建图像与所述第二图像之间在所述相同区域路面元素上的像素位移,调整所述单应性矩阵的矩阵参数;
第一网络参数调整模块900,用于基于调整矩阵参数后的单应性矩阵,对所述路面子网络和所述姿态子网络进行参数调整。
图13是本公开再一个实施例中图像数据的处理装置的结构框图。如图13所示,在本实施例中,图像数据的处理装置还包括:
整体损失值确定模块1000,用于在所述第二图像的采集时间内,基于所述第一像素高深比、所述第二图像的路面掩码以及与所述第二图像中的目标对象对应的雷达扫描数据,确定整体损失值;
第二网络参数调整模块1100,用于基于所述整体损失值,对所述第一神经网络和所述第二神经网络进行参数调整。
图14是本公开一个实施例中整体损失值确定模块1000的结构框图。如图14所示,在本实施例中,整体损失值确定模块1000包括:
第二像素高深比确定单元1001,用于基于所述雷达扫描数据,确定所述目标对象的第二像素高深比;
第一损失值确定单元1002,用于基于所述第一像素高深比和所述第二像素高深比,确定第一损失值;
第一重建图像单元1003,用于利用所述单应性矩阵对所述第一图像进行图像重建,得到第一重建图像;
像素位移确定单元1004,用于基于所述第一像素高深比,确定第一图像区域与第二图像区域之间的像素位移,其中,所述第一图像区域为所述第一重建图像中除了路面图像区域以外的剩余图像区域,所述第二图像区域为所述第二图像中除了路面图像区域以外的剩余图像区域;
第二重建图像单元1005,用于基于所述像素位移对所述第一重建图像的像素位置进行调整,得到第二重建图像;
第二损失值确定单元1006,用于基于所述第二重建图像、所述第二图像和所述第二图像的路面掩码,确定第二损失值;
整体损失值确定单元1007,用于基于所述第一损失值和所述第二损失值,确定所述整体损失值。
在本公开的一个实施例中,第二损失值确定单元1006具体用于确定所述第二重建图像与所述第二图像之间的全图光度误差;第二损失值确定单元1006还用于基于所述全图光度误差和所述第二图像的路面掩码,确定所述第二重建图像与所述第二图像在路面图像区域上的光度误差;第二损失值确定单元1006还用于基于所述全图光度误差和所述第二重建图像与所述第二图像在路面图像区域上的光度误差,确定所述第二损失值。
需要说明的是,本公开实施例的图像数据的处理装置的具体实施方式与本公开实施例的图像数据的处理方法的具体实施方式类似,具体参见图像数据的处理方法部分,为了减少冗余,不作赘述。
示例性电子设备
下面,参考图15来描述本公开一个实施例提供的电子设备。如图15所示,电子设备包括一个或多个处理器10和存储器20。
处理器10可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其他形式的处理单元,并且可以控制电子设备中的其他组件以执行期望的功能。
存储器20可以包括一个或多个计算机程序产品,所述计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机程序指令,处理器10可以运行所述程序指令,以实现上文所述的本公开的各个实施例的图像数据的处理方法以及/或者其他期望的功能。在所述计算机可读存储介质中还可以存储诸如输入信号、信号分量、噪声分量等各种内容。
在一个示例中,电子设备还可以包括:输入装置30和输出装置40,这些组件通过总线系统和/或其他形式的连接机构(未示出)互连。输入装置30可以例如键盘、鼠标等。输出装置40可以包括例如显示器、扬声器、打印机、以及通信网络及其所连接的远程输出设备等。
当然,为了简化,图15中仅示出了该电子设备中与本公开有关的组件中的一些,省略了诸如总线、输入/输出接口等的组件。除此之外,根据具体应用情况,电子设备还可以包括任何其他适当的组件。
示例性计算机可读存储介质
计算机可读存储介质可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以包括但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
以上结合具体实施例描述了本公开的基本原理,但是,需要指出的是,在本公开中提及的优点、优势、效果等仅是示例而非限制,不能认为这些优点、优势、效果等是本公开的各个实施例必须具备的。另外,上述公开的具体细节仅是为了示例的作用和便于理解的作用,而非限制,上述细节并不限制本公开为必须采用上述具体的细节来实现。
本说明书中各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似的部分相互参见即可。对于系统实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本公开中涉及的器件、装置、设备、系统的方框图仅作为例示性的例子并且不意图要求或暗示必须按照方框图示出的方式进行连接、布置、配置。如本领域技术人员将认识到的,可以按任意方式连接、布置、配置这些器件、装置、设备、系统。诸如“包括”、“包含”、“具有”等等的词语是开放性词汇,指“包括但不 限于”,且可与其互换使用。这里所使用的词汇“或”和“和”指词汇“和/或”,且可与其互换使用,除非上下文明确指示不是如此。这里所使用的词汇“诸如”指词组“诸如但不限于”,且可与其互换使用。
可能以许多方式来实现本公开的方法和装置。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本公开的方法和装置。用于所述方法的步骤的上述顺序仅是为了进行说明,本公开的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本公开实施为记录在记录介质中的程序,这些程序包括用于实现根据本公开的方法的机器可读指令。因而,本公开还覆盖存储用于执行根据本公开的方法的程序的记录介质。
还需要指出的是,在本公开的装置、设备和方法中,各部件或各步骤是可以分解和/或重新组合的。这些分解和/或重新组合应视为本公开的等效方案。
提供所公开的方面的以上描述以使本领域的任何技术人员能够做出或者使用本公开。对这些方面的各种修改对于本领域技术人员而言是非常显而易见的,并且在此定义的一般原理可以应用于其他方面而不脱离本公开的范围。因此,本公开不意图被限制到在此示出的方面,而是按照与在此公开的原理和新颖的特征一致的最宽范围。
为了例示和描述的目的已经给出了以上描述。此外,此描述不意图将本公开的实施例限制到在此公开的形式。尽管以上已经讨论了多个示例方面和实施例,但是本领域技术人员将认识到其某些变型、修改、改变、添加和子组合。

Claims (10)

  1. 一种图像数据的处理方法,包括:
    利用第一神经网络对第一图像和第二图像进行处理,得到单应性矩阵,其中,所述第一图像为第一时刻拍摄,所述第二图像为第二时刻拍摄,且所述第一图像和所述第二图像具有相同区域的路面元素;
    根据所述单应性矩阵,确定第一图像特征的映射图像特征,其中,所述第一图像特征为基于所述第一图像提取的特征;
    对所述映射图像特征和第二图像特征进行融合,得到融合图像特征,其中,所述第二图像特征为基于所述第二图像提取的特征;
    利用第二神经网络对所述融合图像特征进行处理,得到所述第二图像的第一像素高深比。
  2. 根据权利要求1所述的图像数据的处理方法,其中,所述利用第一神经网络对第一图像和第二图像进行处理,得到单应性矩阵,包括:
    对所述第一图像的第一图像特征和所述第二图像的第二图像特征进行融合,得到第三图像特征;
    利用所述第一神经网络中的路面子网络对所述第三图像特征进行处理,确定路面法线信息;
    利用所述第一神经网络中的姿态子网络对所述第三图像特征进行处理,确定所述第一图像与所述第二图像之间的相机相对姿态;
    基于所述路面法线信息、所述相机相对姿态和预存的相机相对于路面的高度,确定所述单应性矩阵。
  3. 根据权利要求1所述的图像数据的处理方法,在所述利用第二神经网络对所述融合图像特征进行处理,得到所述第二图像的第一像素高深比之后,还包括:
    在所述第二图像的采集时间内,基于与所述第二图像中的目标对象对应的雷达扫描数据,确定所述目标对象的第二像素高深比;
    基于所述第一像素高深比与所述第二像素高深比的差值,对所述第二神经网络进行参数调整。
  4. 根据权利要求2所述的图像数据的处理方法,在所述利用第一神经网络对第一图像和第二图像进行处理,得到单应性矩阵之后,还包括:
    利用所述单应性矩阵对所述第一图像进行图像重建,得到第一重建图像;
    基于所述第一重建图像与所述第二图像之间在所述相同区域路面元素上的像素位移,调整所述单应性矩阵的矩阵参数;
    基于调整矩阵参数后的单应性矩阵,对所述路面子网络和所述姿态子网络进行参数调整。
  5. 根据权利要求1所述的图像数据的处理方法,在所述利用第二神经网络对所述融合图像特征进行处理,得到所述第二图像的第一像素高深比之后,还包括:
    在所述第二图像的采集时间内,基于所述第一像素高深比、所述第二图像的路面掩码以及与所述第二图像中的目标对象对应的雷达扫描数据,确定整体损失值;
    基于所述整体损失值,对所述第一神经网络和所述第二神经网络进行参数调整。
  6. 根据权利要求5所述的图像数据的处理方法,其中,所述在所述第二图像的采集时间内,基于所述第一像素高深比、所述第二图像的路面掩码以及与所述第二图像中的目标对象对应的雷达扫描数据,确定整体损失值,包括:
    基于所述雷达扫描数据,确定所述目标对象的第二像素高深比;
    基于所述第一像素高深比和所述第二像素高深比,确定第一损失值;
    利用所述单应性矩阵对所述第一图像进行图像重建,得到第一重建图像;
    基于所述第一像素高深比,确定第一图像区域与第二图像区域之间的像素位移,其中,所述第一图像区域为所述第一重建图像中除了路面图像区域以外的剩余图像区域,所述第二图像区域为所述第二图像中除了路面图像区域以外的剩余图像区域;
    基于所述像素位移对所述第一重建图像的像素位置进行调整,得到第二重建图像;
    基于所述第二重建图像、所述第二图像和所述第二图像的路面掩码,确定第二损失值;
    基于所述第一损失值和所述第二损失值,确定所述整体损失值。
  7. 根据权利要求6所述的图像数据的处理方法,其中,所述基于所述第二重建图像、所述第二图像和所述第二图像的路面掩码,确定第二损失值,包括:
    确定所述第二重建图像与所述第二图像之间的全图光度误差;
    基于所述全图光度误差和所述第二图像的路面掩码,确定所述第二重建图像与所述第二图像在路面图像区域上的光度误差;
    基于所述全图光度误差和所述第二重建图像与所述第二图像在路面图像区域上的光度误差,确 定所述第二损失值。
  8. 一种图像数据的处理装置,包括:
    单应性矩阵确定模块,用于利用第一神经网络对第一图像和第二图像进行处理,得到单应性矩阵,其中,所述第一图像为第一时刻拍摄,所述第二图像为第二时刻拍摄,且所述第一图像和所述第二图像具有相同区域的路面元素;
    映射图像特征确定模块,用于根据所述单应性矩阵,确定第一图像特征的映射图像特征,其中,所述第一图像特征为基于所述第一图像提取的特征;
    融合模块,用于对所述映射图像特征和第二图像特征进行融合,得到融合图像特征,其中,所述第二图像特征为基于所述第二图像提取的特征;
    第一像素高深比确定模块,用于利用第二神经网络对所述融合图像特征进行处理,得到所述第二图像的第一像素高深比。
  9. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序用于执行上述权利要求1-7任一所述的图像数据的处理方法。
  10. 一种电子设备,所述电子设备包括:
    处理器;
    用于存储所述处理器可执行指令的存储器;
    所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现上述权利要求1-7任一所述的图像数据的处理方法。
PCT/CN2022/118735 2021-11-10 2022-09-14 图像数据的处理方法和装置 WO2023082822A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US18/549,231 US20240169712A1 (en) 2021-11-10 2022-09-14 Image data processing method and apparatus
JP2023553068A JP2024508024A (ja) 2021-11-10 2022-09-14 画像データの処理方法及び装置
EP22891627.6A EP4290456A1 (en) 2021-11-10 2022-09-14 Image data processing method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111329386.7 2021-11-10
CN202111329386.7A CN114049388A (zh) 2021-11-10 2021-11-10 图像数据的处理方法和装置

Publications (1)

Publication Number Publication Date
WO2023082822A1 true WO2023082822A1 (zh) 2023-05-19

Family

ID=80208590

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/118735 WO2023082822A1 (zh) 2021-11-10 2022-09-14 图像数据的处理方法和装置

Country Status (5)

Country Link
US (1) US20240169712A1 (zh)
EP (1) EP4290456A1 (zh)
JP (1) JP2024508024A (zh)
CN (1) CN114049388A (zh)
WO (1) WO2023082822A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114049388A (zh) * 2021-11-10 2022-02-15 北京地平线信息技术有限公司 图像数据的处理方法和装置
CN114170325A (zh) * 2021-12-14 2022-03-11 北京地平线信息技术有限公司 确定单应性矩阵的方法、装置、介质、设备和程序产品

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190147621A1 (en) * 2017-11-16 2019-05-16 Nec Europe Ltd. System and method for real-time large image homography processing
CN110378250A (zh) * 2019-06-28 2019-10-25 深圳先进技术研究院 用于场景认知的神经网络的训练方法、装置及终端设备
CN113160294A (zh) * 2021-03-31 2021-07-23 中国科学院深圳先进技术研究院 图像场景深度的估计方法、装置、终端设备和存储介质
CN113379896A (zh) * 2021-06-15 2021-09-10 上海商汤智能科技有限公司 三维重建方法及装置、电子设备和存储介质
CN113592706A (zh) * 2021-07-28 2021-11-02 北京地平线信息技术有限公司 调整单应性矩阵参数的方法和装置
CN113592940A (zh) * 2021-07-28 2021-11-02 北京地平线信息技术有限公司 基于图像确定目标物位置的方法及装置
CN114049388A (zh) * 2021-11-10 2022-02-15 北京地平线信息技术有限公司 图像数据的处理方法和装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190147621A1 (en) * 2017-11-16 2019-05-16 Nec Europe Ltd. System and method for real-time large image homography processing
CN110378250A (zh) * 2019-06-28 2019-10-25 深圳先进技术研究院 用于场景认知的神经网络的训练方法、装置及终端设备
CN113160294A (zh) * 2021-03-31 2021-07-23 中国科学院深圳先进技术研究院 图像场景深度的估计方法、装置、终端设备和存储介质
CN113379896A (zh) * 2021-06-15 2021-09-10 上海商汤智能科技有限公司 三维重建方法及装置、电子设备和存储介质
CN113592706A (zh) * 2021-07-28 2021-11-02 北京地平线信息技术有限公司 调整单应性矩阵参数的方法和装置
CN113592940A (zh) * 2021-07-28 2021-11-02 北京地平线信息技术有限公司 基于图像确定目标物位置的方法及装置
CN114049388A (zh) * 2021-11-10 2022-02-15 北京地平线信息技术有限公司 图像数据的处理方法和装置

Also Published As

Publication number Publication date
CN114049388A (zh) 2022-02-15
EP4290456A1 (en) 2023-12-13
US20240169712A1 (en) 2024-05-23
JP2024508024A (ja) 2024-02-21

Similar Documents

Publication Publication Date Title
WO2023082822A1 (zh) 图像数据的处理方法和装置
US11010925B2 (en) Methods and computer program products for calibrating stereo imaging systems by using a planar mirror
US7929801B2 (en) Depth information for auto focus using two pictures and two-dimensional Gaussian scale space theory
CN111179329B (zh) 三维目标检测方法、装置及电子设备
WO2021139176A1 (zh) 基于双目摄像机标定的行人轨迹跟踪方法、装置、计算机设备及存储介质
CN112733820B (zh) 障碍物信息生成方法、装置、电子设备和计算机可读介质
WO2020238008A1 (zh) 运动物体检测及智能驾驶控制方法、装置、介质及设备
CN112907620A (zh) 相机位姿的估计方法、装置、可读存储介质及电子设备
GB2567245A (en) Methods and apparatuses for depth rectification processing
WO2021244161A1 (zh) 基于多目全景图像的模型生成方法及装置
CN111402404B (zh) 全景图补全方法、装置、计算机可读存储介质及电子设备
Stommel et al. Inpainting of missing values in the Kinect sensor's depth maps based on background estimates
Lan et al. Development of a virtual reality teleconference system using distributed depth sensors
CN113592940A (zh) 基于图像确定目标物位置的方法及装置
Praveen Efficient depth estimation using sparse stereo-vision with other perception techniques
US8509522B2 (en) Camera translation using rotation from device
CN113592706B (zh) 调整单应性矩阵参数的方法和装置
CN114998433A (zh) 位姿计算方法、装置、存储介质以及电子设备
CN111179331B (zh) 深度估计方法、装置、电子设备及计算机可读存储介质
Lin et al. Real-time low-cost omni-directional stereo vision via bi-polar spherical cameras
US20230377180A1 (en) Systems and methods for neural implicit scene representation with dense, uncertainty-aware monocular depth constraints
CN113436269B (zh) 图像稠密立体匹配方法、装置和计算机设备
WO2018100230A1 (en) Method and apparatuses for determining positions of multi-directional image capture apparatuses
Garro et al. Edge-preserving interpolation of depth data exploiting color information
Agrawal et al. Robust ego-motion estimation and 3-D model refinement using surface parallax

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22891627

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023553068

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 18549231

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 2022891627

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022891627

Country of ref document: EP

Effective date: 20230907