WO2024060923A1 - 移动物体的深度估计方法、装置、电子设备及存储介质 - Google Patents

移动物体的深度估计方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2024060923A1
WO2024060923A1 PCT/CN2023/114570 CN2023114570W WO2024060923A1 WO 2024060923 A1 WO2024060923 A1 WO 2024060923A1 CN 2023114570 W CN2023114570 W CN 2023114570W WO 2024060923 A1 WO2024060923 A1 WO 2024060923A1
Authority
WO
WIPO (PCT)
Prior art keywords
depth
target
moving object
inverse
video
Prior art date
Application number
PCT/CN2023/114570
Other languages
English (en)
French (fr)
Inventor
温佳伟
宋小东
郭亨凯
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2024060923A1 publication Critical patent/WO2024060923A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images

Definitions

  • the present disclosure relates to the field of image processing technology, such as depth estimation methods, devices, electronic devices, and storage media for moving objects.
  • SLAM Simultaneous Localization and Mapping
  • the image is input into the SLAM system, and the SLAM system is used to extract the scene depth information in the image to estimate the depth of the object in the image based on the scene depth information.
  • this depth estimation method is only suitable for static images. Objects,For dynamic objects in videos, it is difficult to achieve,effective depth estimation.
  • the present disclosure provides a depth estimation method, a device, an electronic device, and a storage medium for moving objects to achieve the effect of accurately estimating the depth information of moving objects in videos.
  • the present disclosure provides a depth estimation method for a moving object, which method includes:
  • the depth estimate of the moving object in the video frame to be processed is determined.
  • the present disclosure also provides a depth estimation device for a moving object, which device includes:
  • a video processing type determination module configured to determine the video processing type
  • a target processing method determination module configured to determine a target processing method for depth estimation of the moving object according to the video processing type
  • the depth estimation value determination module is configured to determine the depth estimation value of the moving object in the video frame to be processed based on the target processing method.
  • the present disclosure also provides an electronic device, which includes:
  • processors one or more processors
  • a storage device configured to store one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the above-mentioned depth estimation method of a moving object.
  • the present disclosure also provides a storage medium containing computer-executable instructions, which, when executed by a computer processor, are used to perform the above-mentioned depth estimation method of a moving object.
  • the present disclosure also provides a computer program product, including a computer program carried on a non-transitory computer-readable medium, the computer program including program code for executing the above depth estimation method of a moving object.
  • Figure 1 is a schematic flowchart of a depth estimation method for a moving object provided by an embodiment of the present disclosure
  • Figure 2 is a schematic flowchart of another depth estimation method for moving objects provided by an embodiment of the present disclosure
  • Figure 3 is a schematic structural diagram of a depth estimation device for moving objects provided by an embodiment of the present disclosure
  • FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • the term “include” and its variations are open inclusive, that is, “includes.”
  • the term “based on” means “based at least in part on.”
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; and the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
  • a prompt message is sent to the user to clearly remind the user that the operation requested will require the acquisition and use of the user's personal information. Therefore, users can autonomously choose whether to provide personal information to software or hardware such as electronic devices, applications, servers or storage media that perform the operations of the technical solution of the present disclosure based on the prompt information.
  • the method of sending prompt information to the user may be, for example, a pop-up window, and the prompt information may be presented in the form of text in the pop-up window.
  • the pop-up window can also contain a selection control for the user to choose "agree” or "disagree” to provide personal information to the electronic device.
  • the data involved in this technical solution shall comply with the requirements of corresponding laws, regulations and relevant regulations.
  • an exemplary description of the application scenario can be provided.
  • the system can parse the scene depth information in the video to estimate the depth information of the objects contained in the video frame based on the scene depth information.
  • the current depth information estimation method can only estimate the static objects in the video frame. Depth information cannot be estimated accurately for dynamic objects in video frames.
  • the depth information of the moving object in the video frame can be estimated using the scene depth information and three-dimensional space information provided by the SLAM system, thereby realizing the depth of the dynamic object in the video frame. information for accurate estimation.
  • Figure 1 is a schematic flowchart of a depth estimation method for a moving object provided by an embodiment of the present disclosure.
  • the embodiment of the present disclosure is suitable for estimating the depth information of a moving object in a video frame.
  • the method can be based on the depth information of the moving object.
  • depth estimation device which may be implemented in the form of software and/or hardware. For example, it is implemented through an electronic device, which may be a mobile terminal, a personal computer (Personal Computer, PC) or a server.
  • PC Personal Computer
  • the method includes:
  • the device for executing the depth estimation method of a moving object can be integrated into an application software that supports special effect video processing functions, and the software can be installed in an electronic device, for example, the electronic device can be a mobile terminal or a PC.
  • the application software can be a type of software for image/video processing, and its application software will not be described one by one here, as long as image/video processing can be achieved. It can also be a specially developed application to implement the software for adding special effects and displaying special effects, or it can be integrated in the corresponding page, and the user can process the special effect video through the page integrated in the PC.
  • the technical solution of this embodiment can be executed in the process of real-time photography based on the mobile terminal, or can be executed after the system receives the video data actively uploaded by the user.
  • the solution of the disclosed embodiment can be applied to augmented reality (Augmented Reality, AR), virtual reality (Virtual Reality, VR) and autonomous driving in various application scenarios.
  • the video processing type may be a video processing method determined based on the user's upload method of the video to be processed.
  • Video processing types include real-time processing types and post-processing types.
  • the video processing type of the current video to be processed can be It is determined to be a real-time processing type; if the video to be processed is a video that has been shot and is actively uploaded to the system by the user, at this time, the depth of the moving objects contained in the received video to be processed is estimated, then the video to be processed
  • the video processing type can be post-processing type.
  • the video processing type can be determined as a real-time processing type; if the video data received by the system is complete video data that has been shot, the video processing type can be determined as a post-processing type.
  • the advantage of this setting is that it can enhance the diversity of the processing methods for the depth estimation of moving objects, so that the depth estimation of moving objects in the video frame to be processed can be performed in real time based on the mobile terminal, and the depth estimation of moving objects in the complete video can also be performed, which improves the diversity of video processing and meets the personalized needs of users.
  • the camera device of the mobile terminal can face the user in real time to collect the video to be processed, and analyze the video to be processed according to a pre-written program to obtain multiple The video frame to be processed.
  • the video processing type can be determined as real-time Processing type.
  • the video frame to be processed may include moving objects.
  • the moving object can be any object whose posture or position information changes in the frame, such as a user or an animal.
  • Depth estimation can be a subtask in the field of computer vision. Its purpose is to obtain the distance between the object and the shooting point. It can be used for a series of tasks such as three-dimensional reconstruction, distance perception, SLAM, visual mileage estimation, video frame interpolation, and image reconstruction.
  • the depth information of the moving object can be the distance between the pixel corresponding to the moving object and the shooting point in the final image, or it can be expressed by the position coordinates of the pixel in the camera coordinate system.
  • the target processing method for depth estimation of the moving object in the video frame can be a depth mean estimation method corresponding to the real-time processing type.
  • the depth mean estimation method can be to determine the depth values of some pixel points associated with the moving object and average these depth values, so that the final average depth value can be used as the depth information of the moving object.
  • the user can take videos of moving objects in real time based on the camera device of the mobile terminal, and upload them to the mobile terminal in real time. Therefore, the video captured in real time by the system is the video to be processed, based on the pre-written program. By parsing the video to be processed, multiple video frames to be processed can be obtained.
  • the depth estimation value may be the distance between at least one pixel corresponding to the moving object and the shooting point, or may be the coordinate value of at least one pixel corresponding to the moving object in the camera coordinate system.
  • the target processing method can be a depth mean estimation method.
  • the target pixels in the moving object that meet the depth mean estimation conditions can be determined first, and then based on these The depth value of the target pixel determines the depth mean, so that the final depth mean can be used as the depth estimate of the moving object.
  • determining the depth estimate of the moving object in the video frame to be processed may include: determining the shooting parameters corresponding to the video frame to be processed and the pixel parameters of the moving object; based on the shooting parameters, pixel parameters and constraints, Determine the target pixel; based on the point cloud data of the target pixel, determine the depth estimate of the moving object.
  • the shooting parameters may be camera pose parameters after pose optimization of the video frame to be processed.
  • the camera position information and rotation information can be obtained based on the gyroscope and inertial measurement unit in the camera device corresponding to the video frame to be processed, so as to determine the initial pose of the video frame to be processed based on the camera position information and rotation information, based on the light beam
  • the adjustment method (Bundle Adjustment, BA) optimizes the initial pose, and uses the optimized pose as the shooting parameter corresponding to the video frame to be processed.
  • BA Block Adjustment, BA
  • the advantage of this setting is that it can provide a higher BA speed for the simultaneous positioning and mapping system, thus ensuring the real-time processing of video frames by the system.
  • Pixel parameters can be used to form motion in the video frame to be processed.
  • the pixel coordinates of at least one pixel of the animal body When shooting moving objects to obtain multiple video frames to be processed, the video frames to be processed not only contain the moving objects, but also the scene where the moving objects are located. Therefore, when determining the pixel parameters of the moving objects, you can first A mask image of the moving object is determined, so that the pixel coordinates of at least one pixel point constituting the moving object can be determined based on the mask image.
  • the constraint condition may be a spatial geometric information constraint condition, that is, when observing a pixel point at a specific position, it is determined whether the state of the pixel point corresponds to the specific position. If the state of the pixel point corresponds to its observation position, Corresponding, it can be determined that the pixel satisfies the constraint conditions; if the state of the pixel does not correspond to its observation position, it can be determined that the pixel does not satisfy the constraint conditions.
  • a spatial geometric information constraint condition that is, when observing a pixel point at a specific position, it is determined whether the state of the pixel point corresponds to the specific position. If the state of the pixel point corresponds to its observation position, Corresponding, it can be determined that the pixel satisfies the constraint conditions; if the state of the pixel does not correspond to its observation position, it can be determined that the pixel does not satisfy the constraint conditions.
  • the initial pose of the video frame to be processed can be determined based on the parameters of the sensor of the camera device corresponding to the video frame to be processed, and then the initial pose is determined based on the pose optimization method.
  • the pose is optimized, and the optimized pose is used as the shooting parameter corresponding to the video frame to be processed.
  • the pixel coordinates of the moving object in the video frame to be processed are determined as pixel parameters. Based on the shooting parameters, pixel parameters and Constraints determine the target pixel, so that the depth estimate of the moving object can be determined based on the point cloud data of the target pixel.
  • the advantage of this setting is that multiple pixels of a moving object can be divided into dynamic pixels and static pixels based on constraints, and the dynamic pixels can be filtered out as tracking pixels for the moving object, which improves the depth estimation of the moving object.
  • the accuracy of the value improves the positioning effect of moving objects in the video frame to be processed.
  • the initial pose of the video frame to be processed can be first determined, and the initial pose can be optimized based on the pose optimization method to obtain the shooting parameters corresponding to the video frame to be processed.
  • the shooting parameters corresponding to the moving object can be determined.
  • Determine the target pixel based on the shooting parameters, pixel parameters and constraints including: triangulating the shooting parameters and pixel parameters to obtain point cloud data corresponding to the pixel parameters; based on the point cloud data and constraints, Determine the back-projection pixel parameters; determine the target pixel based on the pixel parameters and the back-projection pixel parameters.
  • the triangulation process may be to determine corresponding point cloud data based on a corner point detection algorithm.
  • the corner detection algorithm may be the KLT corner detection method, also known as the KLT optical flow tracking method.
  • the KLT corner detection method determines a reference key frame suitable for tracking among multiple key frames and determines the feature points of the reference key frame, thereby determining the corresponding point cloud data (PCD) based on the feature points.
  • Point cloud data is usually used in reverse engineering. It is a kind of data recorded in the form of points. These points can represent coordinates in three-dimensional space, as well as information such as color or light intensity. In practical applications, point clouds
  • the data generally also includes point coordinate accuracy, spatial resolution, surface normal vector, etc., and is generally saved in PCD format. In this format, point cloud data is highly operable and can be used in In the subsequent process, the speed of point cloud registration and fusion is improved, which will not be described in detail in the embodiments of the present disclosure.
  • the shooting parameters and pixel parameters can be triangulated based on the corner detection algorithm, so that three-dimensional point cloud data corresponding to the pixel parameters can be obtained.
  • the point cloud data and constraints determine the parameters of these point cloud data in the camera coordinate system, that is, convert the three-dimensional point cloud data into the form of two-dimensional coordinates, and the converted two-dimensional coordinate parameters can be used as back-projection pixel parameters , since the point cloud data is determined based on the pixel parameters, and the back-projection pixel parameters are determined based on the point cloud data, and both the pixel parameters and the back-projection pixel parameters are two-dimensional coordinate parameters, therefore, the pixel parameters can be determined by The target pixel is determined by whether it is consistent with the corresponding back-projected pixel parameter, that is, the pixel whose parameter is inconsistent with the corresponding back-projected pixel parameter is used as the target pixel.
  • the pixels of moving objects are determined based on the mask image.
  • the model deployed on the mobile terminal is usually used to process the video frame to be processed to obtain the mask image corresponding to the moving object.
  • the model deployed on the mobile terminal is usually a model with a simple model structure and fast processing speed. This model is used to mask the moving object image of the video frame to be processed. During processing, the resulting mask image may be larger than the actual size of the moving object, thereby dividing the static background points that do not belong to the moving object. For static pixels, the constraints are generally satisfied, but for dynamic pixels, the constraints are not satisfied.
  • the dynamic pixels can be converted into dynamic pixels by determining whether the pixels corresponding to the moving objects satisfy the constraints.
  • Points and static pixels are distinguished so that different processing methods can be adopted for different pixels, so that the depth estimate of the moving object can finally be obtained.
  • the advantage of this setting is that the pixels of moving objects can be determined more accurately, and different processing methods can be adopted for different pixels, which improves the accuracy of depth estimation of moving objects.
  • determining the back-projection pixel parameters can be determined based on the following formula:
  • s i can represent the depth value of any pixel
  • (u i , vi ) can represent the pixel coordinates of any pixel
  • K can represent the camera internal parameters
  • exp( ⁇ ⁇ ) can represent the camera attitude, that is, R
  • the T matrix, (X i , Y i , Z i ) can represent the three-dimensional point cloud coordinates of any pixel.
  • the depth estimate of the moving object can be determined based on the point cloud data of the target pixel.
  • Determining the depth estimate of the moving object based on the point cloud data of the target pixel includes: determining at least two to-be-used video frames to which the target pixel belongs based on the point cloud data of the target pixel; The depth value of the pixel in at least two video frames to be used is used to determine the depth estimate of the moving object.
  • these target pixel points can be triangulated to obtain point cloud data corresponding to the target pixel points, and the points can be compared in multiple to-be-processed video frames containing moving objects. Cloud data is observed, and at least two video frames to be processed in which point cloud data can be observed are used as video frames to be used.
  • the depth value of the target pixel in the camera coordinate system can be determined, and these depth values can be averaged, and the final obtained
  • the average depth value serves as a depth estimate for a moving object.
  • the multiple pixels of the moving object determined based on the mask image all meet the constraint conditions, that is, the pixel parameters of the multiple pixels are consistent with the back-projection pixel parameters.
  • These pixels can be triangulated to obtain point cloud data corresponding to these pixels, and these point cloud data can be stored in the SLAM system so that the depth estimate of the moving object can be determined through the SLAM system.
  • This embodiment determines the depth estimation value of the moving object in the video frame to be processed when the video processing type is real-time processing type.
  • the video processing type is post-processing type
  • its corresponding target The processing methods will also change accordingly.
  • the post-processing types can be elaborated below.
  • the technical solution of the embodiment of the present disclosure determines the video processing type, determines the target processing method for depth estimation of the moving object according to the video processing type, and finally determines the depth estimation value of the moving object in the video frame to be processed based on the target processing method, It solves the problem in related technologies that only the depth information of static objects can be estimated, achieves the effect of accurately estimating the depth information of moving objects in video frames, and improves the scope of application of depth estimation to meet the needs of users. Personalized needs improve user experience.
  • FIG. 2 is a schematic flowchart of another depth estimation method for moving objects provided by an embodiment of the present disclosure.
  • the corresponding target processing method can be inverse depth. Estimation method, and then based on the inverse depth estimation method, the depth estimate of the moving object can be determined.
  • the technical solution of this embodiment please refer to the technical solution of this embodiment. The technical terms that are the same as or corresponding to the above embodiments will not be described again here.
  • the method includes the following steps:
  • the above embodiment is to determine the depth estimation value of the moving object in the video frame to be processed when the video processing type is real-time processing type. Based on the above embodiment, when the video processing type is post-processing type, its corresponding target The processing methods will also change accordingly.
  • the post-processing types can be explained below.
  • the video upload control can be developed in advance.
  • the video actively uploaded by the user can be received, and the video can be used as a video to be processed, based on the pre-processed video.
  • the written program parses the video to be processed to obtain multiple video frames to be processed.
  • the video frame to be processed contains moving objects.
  • the moving objects can be users, animals, or any object whose posture or position information changes in the frame.
  • video frames containing moving objects can be used as video frames to be processed, and special effects processing is performed on these video frames to obtain corresponding special effects video frames.
  • This video processing method can be used as Post-processing type.
  • the target processing method for depth estimation of the moving object determines the target processing method for depth estimation of the moving object to be the inverse depth estimation method.
  • the target processing method for depth estimation of moving objects in the video frame to be processed can be determined as the inverse depth estimation method.
  • the inverse depth estimation method may be to determine the depth estimate of the moving object based on the inverse depth value of at least one pixel corresponding to the moving object.
  • the video processing method is post-processing type, that is, the depth estimation of moving objects is performed on the complete video data.
  • the depth information of each pixel in each video frame to be processed is used to estimate the depth information of the moving object based on this depth information.
  • the depth information of different pixels in each video frame to be processed has a large distribution range.
  • the depth distribution form is unstable. Therefore, inverse depth information corresponding to the depth information can be determined to determine the depth estimate of the moving object based on the inverse depth information.
  • the advantage of this setting is that the inverse depth distribution form is more consistent with the Gaussian distribution form and will be more stable, so that the depth estimate of the moving object will be more accurate.
  • Each video frame to be processed includes distant view pixels and near-view pixels.
  • distant view pixels since the distance between these distant view pixels and the shooting point is relatively long, the disparity of these pixels is small.
  • the inverse depth method can be used to weaken the impact of the distant view pixels on the calculation process, and separate the distant view pixels and the close view pixels.
  • the depth values are converted into inverse depth values, and subsequent calculations can be performed based on these inverse depth values, thereby achieving the effect of improving calculation accuracy.
  • S230 Determine a depth estimation value of a moving object in a to-be-processed video frame based on an inverse depth estimation method.
  • the target processing method after determining that the target processing method is the inverse depth estimation method, it can be determined that the target processing method is the inverse depth estimation method.
  • determine the depth estimate of the moving object in the video frame to be processed including: triangulating each video frame to be processed in the target video to obtain the inverse depth of each pixel in each video frame to be processed. value; determine the depth estimate of a moving object by clustering multiple inverse depth values in the same video frame to be processed.
  • the target video may be a video actively uploaded by the user, and the depth information of the moving objects in the video needs to be determined.
  • each video frame to be processed can be triangulated based on the corner detection algorithm to obtain the corresponding video frame to each video frame to be processed.
  • the point cloud data corresponding to each video frame to be processed can be converted to the camera coordinate system according to the translation and rotation matrix to obtain the depth value of each pixel in the camera coordinate system.
  • these depth values Performing inversion processing, that is, determining the negative first power of these depth values, can obtain the inverse depth value of each pixel, so that multiple inverse depth values in the same video frame to be processed can be clustered, Determine depth estimates for moving objects.
  • the advantage of this setting is that the depth estimation of moving objects based on the inverse depth value of each pixel can weaken the impact of distant pixels on depth estimation, thereby improving the accuracy of depth estimation and improving the accuracy of moving objects in different target videos. Display effect of freeze point under timestamp.
  • the clustering process may be a classification process for multiple inverse depth values, and may be a binary classification process, that is, the multiple inverse depth values are divided into two major categories.
  • Determine the depth estimate of the moving object by clustering multiple inverse depth values in the same video frame to be processed, including: sorting according to the size of the multiple inverse depth values, and determining the distance between two adjacent inverse depth values.
  • the depth difference between the two targets is obtained; the two target inverse depth values with the largest depth difference are obtained, and the depth estimate of the moving object is determined based on multiple inverse depth values that are greater than the target inverse depth value.
  • each inverse depth value can be first determined, sorted according to the size of the multiple inverse depth values, and then the adjacent two inverse depth values are determined.
  • the difference between the two inverse depth values is used as the depth difference, and the two adjacent inverse depth values corresponding to the maximum depth difference are determined.
  • These two inverse depth values are used as the target inverse depth value, which can be based on these two targets.
  • the inverse depth value divides multiple inverse depth values into two major categories, one category is multiple inverse depth values greater than the target inverse depth value, and the other category is multiple inverse depth values smaller than the target inverse depth value.
  • the target inverse depth value used can be either of the two target inverse depth values, which can achieve the effect of classifying multiple inverse depth values.
  • the depth estimate of the moving object based on multiple inverse depth values that are greater than the target inverse depth value it also includes: if between the number of inverse depth values that are greater than or less than the target inverse depth value and the total number of inverse depth values The ratio is less than the preset ratio, then delete the inverse depth value that is greater than or less than the target inverse depth value, and re-perform the operation of determining the target inverse depth value.
  • the preset ratio can be any value, and the preset ratio can be 5%.
  • the two categories can be determined The ratio between the number of inverse depth values and the total number of inverse depth values in the current video frame to be processed. If the ratio corresponding to any one category is less than the preset ratio, the inverse depth values in this category can be deleted.
  • the depth value classifies the remaining multiple inverse depth values, so that the depth estimate of the moving object can finally be determined based on the multiple inverse depth values that are greater than the target inverse depth value.
  • Determining a depth estimation value of a moving object based on multiple inverse depth values greater than a target inverse depth value includes: performing average processing on multiple inverse depth values greater than the target inverse depth value to obtain an inverse depth average, and determining a depth estimation value of the moving object according to the inverse depth average.
  • multiple inverse depth values that are larger than the target inverse depth value can be averaged, and the obtained inverse depth average can be inverted again to obtain a depth average corresponding to the inverse depth average.
  • the depth can be The mean serves as a depth estimate for a moving object. The advantage of this setting is that based on the close-up image The depth information of the prime points determines the depth information of the moving object, which can improve the accuracy of depth estimation.
  • the above technical method can be used to determine the depth estimate of the moving object in the video frame. Furthermore, after obtaining the depth estimate of the moving object in each video frame to be processed, multiple By splicing the video frames to be processed, the depth estimate of the moving object in the complete target video can be obtained.
  • the technical solution of the disclosed embodiment determines that the video processing type is a post-processing type, and according to the post-processing type, determines that the target processing method for depth estimation of moving objects is an inverse depth estimation method. Finally, based on the inverse depth estimation method, the depth estimation value of the moving object in the video frame to be processed is determined, which solves the problem that only the depth information of static objects can be estimated in the related technology, achieves the effect of accurately estimating the depth information of moving objects in the video frame, and improves the scope of application of depth estimation, meets the personalized needs of users, and improves the user experience.
  • Figure 3 is a schematic structural diagram of a depth estimation device for moving objects provided by an embodiment of the present disclosure. As shown in Figure 3, the device includes: a video processing type determination module 310, a target processing method determination module 320, and a depth estimation value. Determine module 330.
  • the video processing type determination module 310 is configured to determine the video processing type; the target processing method determination module 320 is configured to determine the target processing method for depth estimation of the moving object based on the video processing type; the depth estimation value determination module 330 is configured to determine the depth estimation value of the moving object in the video frame to be processed based on the target processing method.
  • the video processing type includes real-time processing type and post-processing type.
  • the target processing method includes a depth mean estimation method corresponding to the real-time processing type, or an inverse depth estimation method corresponding to the post-processing type.
  • the target processing method includes a depth mean estimation method
  • the depth estimate value determination module 330 includes: a shooting parameter determination sub-module, a target pixel point determination sub-module and a depth estimate value determination sub-module.
  • the shooting parameter determination sub-module is configured to determine the shooting parameters corresponding to the video frame to be processed and the pixel parameters of the moving object; the target pixel determination sub-module is configured to determine based on the shooting parameters, pixel parameters and The constraint condition is to determine the target pixel point; the depth estimation value determination sub-module is configured to determine the depth estimation value of the moving object based on the point cloud data of the target pixel point.
  • the target pixel point determination sub-module includes: a point cloud data determination unit, a back-projection pixel parameter determination unit and a target pixel point determination unit.
  • a point cloud data determination unit is configured to triangulate the shooting parameters and the pixel parameters to obtain point cloud data corresponding to the pixel parameters; a back-projection pixel parameter determination unit is configured to perform a triangulation process based on the The point cloud data and the constraint conditions determine the back-projection pixel parameters; the target pixel point determination unit is configured to determine the target pixel point based on the pixel point parameters and the back-projection pixel parameters.
  • the depth estimation value determination sub-module includes: a video frame to be used determination unit and a depth estimation value determination unit.
  • the video frame determination unit to be used is configured to determine at least two video frames to be used to which the target pixel point belongs based on the point cloud data of the target pixel point; the depth estimation value determination unit is configured to determine the video frame to be used based on the point cloud data of the target pixel point. Determine a depth estimate of the moving object based on the depth values of the at least two video frames to be used.
  • the target processing method includes an inverse depth estimation method
  • the depth estimation value determination module 330 further includes: an inverse depth value determination submodule and a depth estimation value determination submodule.
  • the inverse depth value determination submodule is set to triangulate each to-be-processed video frame in the target video to obtain the inverse depth value of each pixel in each to-be-processed video frame; the depth estimate value determination submodule is set to The depth estimate of the moving object is determined by clustering multiple inverse depth values in the same video frame to be processed.
  • the depth estimation value determination submodule includes: a depth difference determination unit and a depth estimation value determination unit.
  • the depth difference determination unit is configured to determine the depth difference between two adjacent inverse depth values after sorting according to the size of multiple inverse depth values; the depth estimate value determination unit is configured to obtain the two with the largest depth difference. a target inverse depth value, and determine a depth estimate of the moving object based on a plurality of inverse depth values greater than the target inverse depth value.
  • the device further includes: an inverse depth value deletion module.
  • the inverse depth value deleting module is configured to, before determining the depth estimation value of the moving object based on multiple inverse depth values greater than the target inverse depth value, delete the inverse depth values greater than or less than the target inverse depth value if the ratio between the number of inverse depth values greater than or less than the target inverse depth value and the total number of inverse depth values is less than a preset ratio, and re-execute the operation of determining the target inverse depth value.
  • the depth estimation value determination unit is configured to perform average processing on multiple inverse depth values greater than the target inverse depth value to obtain an inverse depth average, and determine the depth estimation value of the moving object according to the inverse depth average.
  • the technical solution of the embodiment of the present disclosure determines the target processing method for depth estimation of the moving object by determining the video processing type. Finally, based on the target processing method, determines the target processing method to be used for depth estimation. Processing the depth estimation value of moving objects in video frames solves the problem in related technologies that only the depth information of static objects can be estimated, achieves the effect of accurately estimating the depth information of moving objects in video frames, and improves The applicable scope of depth estimation meets the personalized needs of users and improves user experience.
  • the depth estimation device for moving objects can execute the depth estimation method for moving objects provided by any embodiment of the disclosure, and has functional modules and effects corresponding to the execution method.
  • the multiple units and modules included in the above-mentioned device are only divided according to functional logic, but are not limited to the above-mentioned divisions, as long as they can achieve the corresponding functions; in addition, the names of the multiple functional units are only for the convenience of distinguishing each other. , are not used to limit the protection scope of the embodiments of the present disclosure.
  • FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • Terminal devices in embodiments of the present disclosure may include mobile phones, notebook computers, digital broadcast receivers, personal digital assistants (Personal Digital Assistant, PDA), tablet computers (Portable Android Device, PAD), portable multimedia players (Portable Media Mobile terminals such as Player, PMP), vehicle-mounted terminals (such as vehicle-mounted navigation terminals), and fixed terminals such as digital televisions (Television, TV), desktop computers, and the like.
  • the electronic device 500 shown in FIG. 4 is only an example and should not bring any limitations to the functions and usage scope of the embodiments of the present disclosure.
  • the electronic device 500 may include a processing device (e.g., a central processing unit, a graphics processing unit, etc.) 501, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 502 or a program loaded from a storage device 508 to a random access memory (RAM) 503.
  • a processing device e.g., a central processing unit, a graphics processing unit, etc.
  • RAM random access memory
  • various programs and data required for the operation of the electronic device 500 are also stored.
  • the processing device 501, the ROM 502, and the RAM 503 are connected to each other via a bus 504.
  • An input/output (I/O) interface 505 is also connected to the bus 504.
  • input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a Liquid Crystal Display (LCD) , an output device 507 such as a speaker, a vibrator, etc.; a storage device 508 including a magnetic tape, a hard disk, etc.; and a communication device 509.
  • Communication device 509 may allow electronic device 500 to communicate wirelessly or wiredly with other devices to exchange data.
  • FIG. 4 illustrates electronic device 500 with various means, implementation or availability of all illustrated means is not required. More or fewer means may alternatively be implemented or provided.
  • an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a non-transitory computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program can be downloaded and installed from the network through the communication device 509, or installed from the storage device 508, or installed from the ROM 502.
  • the processing device 501 executes the above functions defined in the method of the embodiment of the present disclosure.
  • the electronic device provided by the embodiments of the present disclosure belongs to the same concept as the depth estimation method of moving objects provided by the above embodiments.
  • Technical details that are not described in detail in this embodiment can be found in the above embodiments, and this embodiment has the same characteristics as the above embodiments. Same effect.
  • Embodiments of the present disclosure provide a computer storage medium on which a computer program is stored.
  • the program is executed by a processor, the depth estimation method of a moving object provided in the above embodiments is implemented.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium may be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination thereof.
  • Examples of computer readable storage media may include: an electrical connection having one or more wires, a portable computer disk, a hard drive, RAM, ROM, Erasable Programmable Read-Only Memory (EPROM, or flash memory) , optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including electromagnetic signals, optical signals, or any suitable combination of the above.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium that can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer-readable medium can be transmitted using any appropriate medium, including: wire, optical cable, radio frequency (Radio Frequency, RF), etc., or any suitable combination of the above.
  • the client and server may communicate using any currently known or future developed network protocol such as the HyperText Transfer Protocol (HTTP), and may interact with any form or medium of digital data communication (e.g., a communication network).
  • HTTP HyperText Transfer Protocol
  • Examples of communication networks include a Local Area Network (LAN), a Wide Area Network (WAN), an Internet (eg, the Internet), and a peer-to-peer network (eg, an ad hoc peer-to-peer network), as well as any currently known or future developed network.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; it may also exist independently without being assembled into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs.
  • the electronic device determines the video processing type; determines the target processing method for depth estimation of the moving object based on the video processing type; and determines the depth estimation value of the moving object in the video frame to be processed based on the target processing method.
  • Computer program code for performing the operations of the present disclosure may be written in one or more programming languages, or a combination thereof, including object-oriented programming languages, such as Java, Smalltalk, C++, and conventional procedural programming languages, such as "C" or similar programming languages.
  • the program code may be executed entirely on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a LAN or WAN, or may be connected to an external computer (e.g., through the Internet using an Internet service provider).
  • each block in the flowchart or block diagram may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or operations. , or can be implemented using a combination of specialized hardware and computer instructions.
  • the units involved in the embodiments of the present disclosure can be implemented in software or hardware.
  • the name of the unit does not constitute a limitation on the unit itself.
  • the first acquisition unit can also be described as "the unit that acquires at least two Internet Protocol addresses.”
  • exemplary types of hardware logic components include: Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (Application Specific Integrated Circuit) Integrated Circuit (ASIC), Application Specific Standard Parts (ASSP), System on Chip (SOC), Complex Programming Logic Device (CPLD), etc.
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • ASSP Application Specific Standard Parts
  • SOC System on Chip
  • CPLD Complex Programming Logic Device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing. Examples of machine-readable storage media would include an electrical connection based on one or more wires, a portable computer disk, a hard drive, RAM, ROM, EPROM or flash memory, optical fiber, CD-ROM, optical storage device, magnetic storage device, or Any suitable combination of the above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

移动物体的深度估计方法、装置、电子设备及存储介质。移动物体的深度估计方法包括:确定视频处理类型(S110);根据视频处理类型,确定对移动物体进行深度估计的目标处理方式(S120);基于目标处理方式,确定待处理视频帧中移动物体的深度估计值(S130)。

Description

移动物体的深度估计方法、装置、电子设备及存储介质
本申请要求在2022年09月22日提交中国专利局、申请号为202211160924.9的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本公开涉及图像处理技术领域,例如涉及移动物体的深度估计方法、装置、电子设备及存储介质。
背景技术
随着计算机视觉技术的发展,同步定位和建图(Simultaneous Localization and Mapping,SLAM)算法被广泛应用于增强现实、虚拟现实、自动驾驶以及机器人或无人机的定位导航等领域。
相关技术中,将图像输入至SLAM系统中,利用SLAM系统对图像中的场景深度信息进行提取,以基于场景深度信息对图像中的物体深度进行估计,然而,这种深度估计方式仅适用于静态物体,对于视频中的动态物体,很难实现有效的深度估计。
发明内容
本公开提供移动物体的深度估计方法、装置、电子设备及存储介质,以实现对视频中的移动物体深度信息进行准确估计的效果。
第一方面,本公开提供了一种移动物体的深度估计方法,该方法包括:
确定视频处理类型;
根据所述视频处理类型,确定对移动物体进行深度估计的目标处理方式;
基于所述目标处理方式,确定待处理视频帧中移动物体的深度估计值。
第二方面,本公开还提供了一种移动物体的深度估计装置,该装置包括:
视频处理类型确定模块,设置为确定视频处理类型;
目标处理方式确定模块,设置为根据所述视频处理类型,确定对移动物体进行深度估计的目标处理方式;
深度估计值确定模块,设置为基于所述目标处理方式,确定待处理视频帧中移动物体的深度估计值。
第三方面,本公开还提供了一种电子设备,所述电子设备包括:
一个或多个处理器;
存储装置,设置为存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现上述的移动物体的深度估计方法。
第四方面,本公开还提供了一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行上述的移动物体的深度估计方法。
第五方面,本公开还提供了一种计算机程序产品,包括承载在非暂态计算机可读介质上的计算机程序,所述计算机程序包含用于执行上述的移动物体的深度估计方法的程序代码。
附图说明
图1是本公开实施例所提供的一种移动物体的深度估计方法的流程示意图;
图2是本公开实施例所提供的另一种移动物体的深度估计方法的流程示意图;
图3是本公开实施例所提供的一种移动物体的深度估计装置的结构示意图;
图4是本公开实施例所提供的一种电子设备的结构示意图。
具体实施方式
下面将参照附图描述本公开的实施例。虽然附图中显示了本公开的一些实施例,然而本公开可以通过多种形式来实现,提供这些实施例是为了理解本公开。本公开的附图及实施例仅用于示例性作用。
本公开的方法实施方式中记载的多个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存 关系。
本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
在使用本公开实施例公开的技术方案之前,均应当依据相关法律法规通过恰当的方式对本公开所涉及个人信息的类型、使用范围、使用场景等告知用户并获得用户的授权。
例如,在响应于接收到用户的主动请求时,向用户发送提示信息,以明确地提示用户,其请求执行的操作将需要获取和使用到用户的个人信息。从而,使得用户可以根据提示信息来自主地选择是否向执行本公开技术方案的操作的电子设备、应用程序、服务器或存储介质等软件或硬件提供个人信息。
作为一种实现方式,响应于接收到用户的主动请求,向用户发送提示信息的方式例如可以是弹窗的方式,弹窗中可以以文字的方式呈现提示信息。此外,弹窗中还可以承载供用户选择“同意”或者“不同意”向电子设备提供个人信息的选择控件。
上述通知和获取用户授权过程仅是示意性的,不对本公开的实现方式构成限定,其它满足相关法律法规的方式也可应用于本公开的实现方式中。
本技术方案所涉及的数据(包括数据本身、数据的获取或使用)应当遵循相应法律法规及相关规定的要求。
在介绍本技术方案之前,可以先对应用场景进行示例性说明。示例性的,当用户利用移动终端的摄像装置拍摄视频,并将拍摄后的视频上传至基于SLAM算法的系统,或者,在数据库中选择目标视频,并将视频主动上传至基于SLAM算法的系统后,系统即可对视频中的场景深度信息进行解析,以基于场景深度信息对视频帧中所包含的物体深度信息进行估计,然而,基于当前的深度信息估计方法仅可以对视频帧中的静态物体的深度信息进行估计,无法对视频帧中的动态物体进行准确的深度估计。此时,基于本公开实施例的方案,可以利用SLAM系统所提供的场景深度信息以及三维空间信息对视频帧中的移动物体的深度信息进行估计,从而实现了对视频帧中的动态物体的深度信息进行准确估计的效果。
图1是本公开实施例所提供的一种移动物体的深度估计方法的流程示意图,本公开实施例适用于对视频帧中的移动物体的深度信息进行估计的情形,该方法可以由移动物体的深度估计装置来执行,该装置可以通过软件和/或硬件的形 式实现,例如,通过电子设备来实现,该电子设备可以是移动终端、个人电脑(Personal Computer,PC)端或服务器等。
如图1所示,所述方法包括:
S110、确定视频处理类型。
在本实施例中,执行本公开实施例提供的移动物体的深度估计方法的装置,可以集成在支持特效视频处理功能的应用软件中,且该软件可以安装至电子设备中,例如,电子设备可以是移动终端或者PC端等。应用软件可以是对图像/视频处理的一类软件,其应用软件在此不再一一赘述,只要可以实现图像/视频处理即可。还可以是专门研发的应用程序,来实现添加特效并将特效进行展示的软件中,亦或是集成在相应的页面中,用户可以通过PC端中集成的页面来实现对特效视频的处理。
本实施例的技术方案可以在基于移动终端实时摄像的过程中执行,也可以在系统接收到用户主动上传的视频数据后执行,同时,本公开实施例的方案可以应用于增强现实(Augmented Reality,AR)、虚拟现实(Virtual Reality,VR)以及自动驾驶等多种应用场景内。
在本实施例中,视频处理类型可以为基于用户对待处理视频的上传方式所确定的视频处理方式。视频处理类型包括实时处理类型和后处理类型。在实际应用中,若待处理视频为用户基于移动终端摄像装置实时拍摄得到的,并且,基于移动终端对待处理视频中所包含的移动物体进行深度估计时,可以将当前待处理视频的视频处理类型确定为实时处理类型;若待处理视频为已拍摄完成的视频,并且,由用户主动上传至系统,此时,对接收到的待处理视频中所包含的移动物体进行深度估计,则待处理视频的视频处理类型可以为后处理类型。
在本实施例中,若系统接收到的视频数据为基于移动终端摄像装置实时拍摄得到的,则可以将视频处理类型确定为实时处理类型;若系统接收到视频数据时视频数据是已经拍摄完成的完整视频数据,则可以将视频处理类型确定为后处理类型。这样设置的好处在于:可以增强移动物体深度估计处理方式的多样性,以使待处理视频帧中的移动物体深度估计既可以基于移动终端实时进行,还可以对完整视频中的移动物体进行深度估计,提高了视频处理的多样性,满足了用户的个性化需求。
S120、根据视频处理类型,确定对移动物体进行深度估计的目标处理方式。
在本实施例中,当检测到用户触发特效操作时,移动终端的摄像装置可以是实时面向用户以实现待处理视频的采集,并根据预先编写的程序对待处理视频进行解析,即可得到多个待处理视频帧,此时,可以将视频处理类型确定为实时 处理类型。相应的,待处理视频帧中可以包括移动物体。移动物体可以为入镜画面中任何姿态或位置信息发生变化的对象,例如可以是用户或者动物等。
深度估计可以是计算机视觉领域中的一个子任务,其目的是获取物体与拍摄点之间的距离,可以为三维重建、距离感知、SLAM、视觉里程估计、视频插帧、图像重建等一系列任务提供深度信息。移动物体的深度信息可以是最终所呈现的画面中与移动物体相对应的像素点与拍摄点之间的距离,也可以通过像素点在相机坐标系下的位置坐标进行表示。
在本实施例中,当确定视频处理类型为实时处理类型时,则可以确定对视频帧中移动物体进行深度估计的目标处理方式可以为与实时处理类型相对应的深度均值估计方式。其中,深度均值估计方式可以为确定与移动物体相关联的一些像素点的深度值,并对这些深度值取均值,从而可以将最终得到的平均深度值作为移动物体的深度信息。
S130、基于目标处理方式,确定待处理视频帧中移动物体的深度估计值。
在本实施例中,用户可以基于移动终端的摄像装置实时对移动物体进行视频拍摄,并实时上传至移动终端,因此,系统所获取的实时拍摄的视频即为待处理视频,基于预先编写的程序对待处理视频进行解析,即可得到多个待处理视频帧。深度估计值可以为与移动物体相对应的至少一个像素点与拍摄点之间的距离,也可以为与移动物体相对应的至少一个像素点在相机坐标系下的坐标值。
在本实施例中,目标处理方式可以为深度均值估计方式,在基于目标处理方式确定移动物体的深度估计值时,可以首先确定移动物体中满足深度均值估计条件的目标像素点,进而可以基于这些目标像素点的深度值确定深度均值,从而可以将最终得到的深度均值作为移动物体的深度估计值。
基于目标处理方式,确定待处理视频帧中移动物体的深度估计值,可以包括:确定与待处理视频帧相对应的拍摄参数以及移动物体的像素点参数;基于拍摄参数、像素点参数以及约束条件,确定目标像素点;基于目标像素点的点云数据,确定移动物体的深度估计值。
在本实施例中,拍摄参数可以是待处理视频帧在进行位姿优化之后的相机位姿参数。可以基于与待处理视频帧相对应的摄像装置中的陀螺仪以及惯性测量单元,获取相机位置信息和旋转信息,以基于相机位置信息和旋转信息,确定待处理视频帧的初始位姿,基于光束平差法(Bundle Adjustment,BA)对初始位姿进行优化,并将优化后的位姿作为与待处理视频帧相对应的拍摄参数。这样设置的好处在于:可以为同步定位与建图系统提供较高的BA速度,从而保证了系统对视频帧进行处理的实时性。像素点参数可以为待处理视频帧中用于构成移 动物体的至少一个像素点的像素坐标。在对移动物体进行拍摄,以得到多个待处理视频帧时,待处理视频帧中不仅包含移动物体,还可以包含移动物体所在的场景,因此,在确定移动物体的像素点参数时,可以首先确定移动物体的掩膜图像,从而可以基于该掩膜图像确定构成移动物体的至少一个像素点的像素坐标。
在本实施例中,约束条件可以为空间几何信息约束条件,即在特定位置观测一像素点时,确定该像素点的状态是否与该特定位置相对应,若该像素点的状态与其观测位置相对应,则可以确定该像素点满足约束条件;若该像素点的状态与其观测位置不对应,则可以确定该像素点不满足约束条件。
在本实施例中,在得到待处理视频帧后,可以基于与待处理视频帧相对应的摄像装置的传感器的参数,确定待处理视频帧的初始位姿,再基于位姿优化方法对初始位姿进行优化,将优化后的位姿作为与待处理视频帧相对应的拍摄参数,同时,确定待处理视频帧中移动物体的像素点坐标,以作为像素点参数,基于拍摄参数、像素点参数以及约束条件,确定目标像素点,从而可以基于目标像素点的点云数据,确定移动物体的深度估计值。这样设置的好处在于:可以基于约束条件将移动物体的多个像素点划分为动态像素点和静态像素点,并将动态像素点筛选出来作为移动物体的跟踪像素点,提高了移动物体的深度估计值的准确率,提升了待处理视频帧中移动物体的定位效果。
在实际应用中,可以首先确定待处理视频帧的初始位姿,并基于位姿优化方法对初始位姿进行优化,得到与待处理视频帧相对应的拍摄参数,同时,确定与移动物体相对应的至少一个像素点的像素坐标,以得到像素点参数,基于拍摄参数、像素点参数以及约束条件,确定与移动物体相对应的多个像素点中满足约束条件的像素点,并将这些像素点作为目标像素点。
基于拍摄参数、像素点参数以及约束条件,确定目标像素点,包括:对拍摄参数和像素点参数进行三角化处理,以得到像素点参数所对应的点云数据;基于点云数据和约束条件,确定反投影像素参数;基于像素点参数和反投影像素参数,确定所述目标像素点。
在本实施例中,三角化处理可以为基于角点检测算法确定相应的点云数据。其中,角点检测算法可以为KLT角点检测法,也称为KLT光流追踪法。KLT角点检测法通过在多个关键帧中确定一个适合跟踪的参考关键帧,并确定该参考关键帧的特征点,从而基于该特征点确定相应的点云数据(Point Cloud Data,PCD)。点云数据通常用于逆向工程中,是一种以点的形式记录的数据,这些点既可以表示三维空间中的坐标,也可以表示颜色或者光照强度等信息,在实际应用过程中,点云数据一般还包括点坐标精度、空间分辨率和表面法向量等内容,一般以PCD格式进行保存,在这种格式下,点云数据的可操作性较强,能够在 后续过程中提高点云配准和融合的速度,本公开实施例对此不再赘述。
在实际应用中,在确定拍摄参数和像素点参数后,即可基于角点检测算法对拍摄参数和像素点参数进行三角化处理,从而可以得到与像素点参数相对应的三维点云数据。根据点云数据和约束条件,确定这些点云数据在相机坐标系下的参数,即,将三维点云数据转换为二维坐标的形式,可以将转换之后的二维坐标参数作为反投影像素参数,由于点云数据是基于像素点参数确定的,反投影像素参数是基于点云数据确定的,并且,像素点参数和反投影像素参数均为二维坐标参数,因此,可以通过确定像素点参数与相应的反投影像素参数是否一致,来确定目标像素点,即,将像素点参数与相应的反投影像素参数不一致的像素点作为目标像素点。移动物体的像素点是基于掩膜图像确定的,在实际应用过程中,通常采用部署在移动终端的模型对待处理视频帧进行处理,以得到与移动物体相对应的掩膜图像,一般情况下,为了提高移动终端的处理效率以及减少模型在移动终端的内存占用率,部署在移动终端的模型通常为模型结构简单、处理速度快的模型,在应用该模型对待处理视频帧进行移动物体掩膜图像处理时,可能会使得到的掩膜图像大于移动物体的实际尺寸,从而会将不属于移动物体的静态背景点也划分进来。对于静态像素点来说,一般是可以满足约束条件的,对于动态像素点来说,是不满足约束条件的,因此,可以通过确定与移动物体相对应的像素点是否满足约束条件,将动态像素点和静态像素点区分出来,以便可以针对不同的像素点采取不同的处理方式,从而可以最终得到移动物体的深度估计值。这样设置的好处在于:可以更加精准地确定移动物体的像素点,进而可以针对不同像素点采取不同的处理方式,提高了移动物体深度估计值的准确率。
示例性的,基于点云数据和约束条件,确定反投影像素参数可以基于如下公式进行确定:
其中,si可以表示任一像素点的深度值,(ui,vi)可以表示任一像素点的像素坐标,K可以表示相机内参,exp(ξ)可以表示相机姿态,即R,T矩阵,(Xi,Yi,Zi)可以表示任一像素点的三维点云坐标。
在确定目标像素点后,即可根据目标像素点的点云数据,确定移动物体的深度估计值。
基于目标像素点的点云数据,确定移动物体的深度估计值,包括:根据目标像素点的点云数据,确定目标像素点所属的至少两个待使用视频帧;根据目标像 素点在至少两个待使用视频帧的深度值,确定移动物体的深度估计值。
在本实施例中,在得到目标像素点后,可以对这些目标像素点进行三角化处理,得到与目标像素点相对应的点云数据,在包含移动物体的多个待处理视频帧中对点云数据进行观测,并将可以观测到点云数据的至少两个待处理视频帧作为待使用视频帧。
在实际应用中,在确定目标像素点所属的至少两个待使用视频帧后,即可确定目标像素点在相机坐标系下的深度值,并对这些深度值作均值处理,可以将最终得到的深度值均值作为移动物体的深度估计值。这样设置的好处在于:可以在移动终端实现移动物体深度信息的粗略估计,提高了移动物体的深度估计的效率。
若移动物体是处于静止状态时,则基于掩膜图像所确定的移动物体的多个像素点均满足约束条件,即,多个像素点的像素点参数和反投影像素参数相一致,此时,可以对这些像素点进行三角化处理,得到与这些像素点相对应的点云数据,并将这些点云数据存放至SLAM系统中,以便可以通过SLAM系统确定移动物体的深度估计值。
本实施例是对视频处理类型为实时处理类型的待处理视频帧中的移动物体的深度估计值进行确定,在本实施例的基础上,当视频处理类型为后处理类型时,其对应的目标处理方式也会相应发生变化,下面可以对后处理类型进行详细阐述。
本公开实施例的技术方案,通过确定视频处理类型,根据视频处理类型,确定对移动物体进行深度估计的目标处理方式,最后,基于目标处理方式,确定待处理视频帧中移动物体的深度估计值,解决了相关技术中仅可以对静态物体的深度信息进行估计的问题,实现了对视频帧中的移动物体的深度信息进行准确估计的效果,并且,提高了深度估计的适用范围,满足了用户的个性化需求,提升了用户体验。
图2是本公开实施例所提供的另一种移动物体的深度估计方法的流程示意图,在前述实施例的基础上,当视频处理类型为后处理类型时,其对应目标处理方式可以为逆深度估计方式,进而可以基于逆深度估计方式,确定移动物体的深度估计值。其实施方式可以参见本实施例技术方案。其中,与上述实施例相同或者相应的技术术语在此不再赘述。
如图2所示,该方法包括如下步骤:
S210、确定视频处理类型为后处理类型。
上述实施例是对视频处理类型为实时处理类型的待处理视频帧中的移动物体的深度估计值进行确定,在上述实施例的基础上,当视频处理类型为后处理类型时,其对应的目标处理方式也会相应发生变化,下面可以对后处理类型进行阐述。
在本实施例中,可以预先开发视频上传控件,当检测到用户对应用内的视频上传控件的触发操作时,可以对用户主动上传的视频进行接收,并将该视频作为待处理视频,基于预先编写的程序对待处理视频进行解析,即可得到多个待处理视频帧。相应的,待处理视频帧中包含移动物体,移动物体可以为用户、动物或者入镜画面中任何姿态或位置信息发生变化的物体。当接收到完整的待处理视频时,则可以将包含移动物体的视频帧作为待处理视频帧,并对这些视频帧进行特效处理,以得到相应的特效视频帧,可以将此种视频处理方式作为后处理类型。
S220、根据后处理类型,确定对移动物体进行深度估计的目标处理方式为逆深度估计方式。
在本实施例中,在接收到待处理视频并确定视频处理类型为后处理类型后,即可将对待处理视频帧中移动物体进行深度估计的目标处理方式确定为逆深度估计方式。其中,逆深度估计方式可以为基于与移动物体相对应的至少一个像素点的逆深度值,对移动物体的深度估计值进行确定。
当视频处理方式为后处理类型时,即,是对完整的视频数据进行移动物体的深度估计,此时,与实时处理类型不同的是,在接收到完成的视频数据后,可以确定视频数据中每个待处理视频帧中每个像素点的深度信息,进而基于这些深度信息对移动物体的深度信息进行估计,然而,每一待处理视频帧中不同像素点的深度信息的分布范围较大,并且深度分布形式不稳定,因此,可以确定与这些深度信息相对应的逆深度信息,以基于这些逆深度信息对移动物体的深度估计值进行确定。这样设置的好处在于:逆深度分布形式更加符合高斯分布的分布形式,会更加稳定,从而在确定移动物体的深度估计值也会更加准确。
在每一待处理视频帧中包括远景像素点和近景像素点,对于远景像素点来说,由于这些远景像素点与拍摄点之间的距离较远,导致这些像素点的视差较小,在确定与这些远景像素点相对应的点云数据时,点云数据的精度也会较低,因此,可以采用逆深度的方式以减弱远景像素点对计算过程的影响,将远景像素点和近景像素点的深度值转换为逆深度值,进而可以基于这些逆深度值进行后续的计算,从而可以达到提高计算精度的效果。
S230、基于逆深度估计方式,确定待处理视频帧中移动物体的深度估计值。
在本实施例中,在确定目标处理方式为逆深度估计方式后,即可确定待处理 视频帧中每个像素点的逆深度值,从而可以基于这些逆深度值对移动物体的深度估计值进行确定。
基于逆深度估计方式,确定待处理视频帧中移动物体的深度估计值,包括:对目标视频中每个待处理视频帧进行三角化处理,得到每个待处理视频帧中每个像素点的逆深度值;通过对同一待处理视频帧中的多个逆深度值进行聚类处理,确定移动物体的深度估计值。
在本实施例中,目标视频可以为用户主动上传的,需要对视频中的移动物体的深度信息进行确定的视频。在实际应用过程中,在接收到目标视频中多个待处理视频帧时,可以基于角点检测算法对每个待处理视频帧进行三角化处理,以得到与每个待处理视频帧相对应的点云数据,可以根据平移旋转矩阵将与每个待处理视频帧相对应的点云数据转换至相机坐标系下,以得到每个像素点在相机坐标下的深度值,然后,对这些深度值进行求逆处理,即,分别确定这些深度值的负一次方,即可得到每个像素点的逆深度值,从而可以通过对同一待处理视频帧中的多个逆深度值进行聚类处理,确定移动物体的深度估计值。这样设置的好处在于:基于每个像素点的逆深度值进行移动物体的深度估计,可以减弱远景像素点对深度估计的影响,从而可以提高深度估计值的准确率,提升目标视频中移动物体在不同时间戳下定格点的显示效果。
聚类处理可以为对多个逆深度值进行分类处理,可以为二分类,即,将多个逆深度值划分为两大类。
通过对同一待处理视频帧中的多个逆深度值进行聚类处理,确定移动物体的深度估计值,包括:依据多个逆深度值的大小进行排序后,确定相邻两个逆深度值之间的深度差值;获取深度差值最大的两个目标逆深度值,并基于大于目标逆深度值的多个逆深度值确定移动物体的深度估计值。
在实际应用中,对于同一待处理视频帧中的多个逆深度值来说,可以首先确定每个逆深度值的大小,并依据多个逆深度值的大小进行排序,然后,确定相邻两个逆深度值之间差值作为深度差值,并确定与最大深度差值相对应的相邻两个逆深度值,将这两个逆深度值作为目标逆深度值,可以基于这两个目标逆深度值将多个逆深度值划分为两大类,一类为大于目标逆深度值的多个逆深度值,另一类为小于目标逆深度值的多个逆深度值,最后,可以基于大于目标逆深度值的多个逆深度值来对移动物体的深度估计值进行确定。这样设置的好处在于:可以基于多个逆深度值,实现近景像素点和远景像素点的分类,从而可以基于近景像素点的深度信息确定移动物体的深度信息。
本实施例中,由于多个逆深度值是按照从大到小的顺序排序的,故两个目标逆深度值为多个逆深度值中数值相邻的两个值。因此在基于大于目标逆深度值 的多个逆深度值确定移动物体的深度估计值时,所使用的目标逆深度值可以为两个目标逆深度值中的任意一个,均可以达到对多个逆深度值进行分类的效果。
在基于目标逆深度值对多个逆深度值进行分类时,若任一类中的逆深度值的数量小于预设阈值时,则可以认为这一类中的逆深度值可能是存在一定误差的,为了提高移动物体深度估计值的准确率,可以将这些逆深度值删除,并对剩余的多个逆深度值再次执行排序分类的操作,以在再次分类结束后,基于再次分类结果中大于目标逆深度值的多个逆深度值,确定移动物体的深度估计值。
基于此,在基于大于目标逆深度值的多个逆深度值确定移动物体的深度估计值之前,还包括:若大于或小于目标逆深度值的逆深度值的数量与逆深度值总数量之间的比值小于预设比例,则删除该大于或小于目标逆深度值的逆深度值,并重新执行确定所述目标逆深度值的操作。
在本实施例中,预设比例可以为任意值,预设比例可以为5%。
在实际应用中,在基于目标逆深度值将多个逆深度值划分为大于目标逆深度值的多个逆深度值,以及小于目标逆深度值的多个逆深度值之后,可以确定这两类中逆深度值的数量与当前待处理视频帧中逆深度值的总数量之间的比值,若其中任意一类对应的比值小于预设比例,则可以将这一类中的逆深度值删除,并基于剩余的多个逆深度值的大小进行重新排序,然后,确定相邻两个逆深度之间的差值,并将差值最大的两个逆深度值作为目标逆深度值,基于目标逆深度值对剩余的多个逆深度值进行分类,从而可以最终基于大于目标逆深度值的多个逆深度值,确定移动物体的深度估计值。这样设置的好处在于:可以将存在较大误差的逆深度值筛选出来并删除,以达到提高移动物体深度估计值的准确率的效果。
基于大于目标逆深度值的多个逆深度值确定移动物体的深度估计值,包括:对大于目标逆深度值的多个逆深度值进行均值处理得到逆深度均值,根据逆深度均值确定移动物体的深度估计值。
在得到大于目标逆深度值的多个逆深度值之后,由于这些逆深度值所对应的像素点为待处理视频帧的近景像素点,在基于近景像素点进行计算时,可以得到较为准确的计算结果,并且,对于移动物体来说,一般会处于待处理视频帧中的前景部分,因此,在确定移动物体的深度估计值时,基于大于目标逆深度值的多个逆深度值进行计算,可以得到更加精确的深度估计结果。
在实际应用中,可以对大于目标逆深度值的多个逆深度值作均值处理,并对得到的逆深度均值进行再次求逆处理,得到与逆深度均值相对应的深度均值,可以将该深度均值作为移动物体的深度估计值。这样设置的好处在于:基于近景像 素点的深度信息确定移动物体的深度信息,可以达到提高深度估计准确率的效果。
对于目标视频中的任一待处理视频帧,均可以采用上述技术方法确定视频帧中移动物体的深度估计值,进而,在得到每一待处理视频帧中移动物体的深度估计值之后,可以将多个待处理视频帧拼接起来,即可得到完整目标视频中移动物体的深度估计值。
本公开实施例的技术方案,通过确定视频处理类型为后处理类型,根据后处理类型,确定对移动物体进行深度估计的目标处理方式为逆深度估计方式,最后,基于逆深度估计方式,确定待处理视频帧中移动物体的深度估计值,解决了相关技术中仅可以对静态物体的深度信息进行估计的问题,实现了对视频帧中的移动物体的深度信息进行准确估计的效果,并且,提高了深度估计的适用范围,满足了用户的个性化需求,提升了用户体验。
图3为本公开实施例所提供的一种移动物体的深度估计装置的结构示意图,如图3所示,所述装置包括:视频处理类型确定模块310、目标处理方式确定模块320以及深度估计值确定模块330。
视频处理类型确定模块310,设置为确定视频处理类型;目标处理方式确定模块320,设置为根据所述视频处理类型,确定对移动物体进行深度估计的目标处理方式;深度估计值确定模块330,设置为基于所述目标处理方式,确定待处理视频帧中移动物体的深度估计值。
在上述技术方案的基础上,所述视频处理类型包括实时处理类型和后处理类型。
在上述技术方案的基础上,所述目标处理方式包括与所述实时处理类型相对应的深度均值估计方式,或与所述后处理类型所对应的逆深度估计方式。
在上述技术方案的基础上,所述目标处理方式包括深度均值估计方式,深度估计值确定模块330包括:拍摄参数确定子模块、目标像素点确定子模块以及深度估计值确定子模块。
拍摄参数确定子模块,设置为确定与所述待处理视频帧相对应的拍摄参数以及所述移动物体的像素点参数;目标像素点确定子模块,设置为基于所述拍摄参数、像素点参数以及约束条件,确定目标像素点;深度估计值确定子模块,设置为基于所述目标像素点的点云数据,确定所述移动物体的深度估计值。
在上述技术方案的基础上,目标像素点确定子模块包括:点云数据确定单元、反投影像素参数确定单元以及目标像素点确定单元。
点云数据确定单元,设置为对所述拍摄参数和所述像素点参数进行三角化处理,以得到所述像素点参数所对应的点云数据;反投影像素参数确定单元,设置为基于所述点云数据和所述约束条件,确定反投影像素参数;目标像素点确定单元,设置为基于所述像素点参数和所述反投影像素参数,确定所述目标像素点。
在上述技术方案的基础上,深度估计值确定子模块包括:待使用视频帧确定单元和深度估计值确定单元。
待使用视频帧确定单元,设置为根据所述目标像素点的点云数据,确定所述目标像素点所属的至少两个待使用视频帧;深度估计值确定单元,设置为根据所述目标像素点在所述至少两个待使用视频帧的深度值,确定所述移动物体的深度估计值。
在上述技术方案的基础上,所述目标处理方式包括逆深度估计方式,深度估计值确定模块330还包括:逆深度值确定子模块和深度估计值确定子模块。
逆深度值确定子模块,设置为对目标视频中每个待处理视频帧进行三角化处理,得到每个待处理视频帧中每个像素点的逆深度值;深度估计值确定子模块,设置为通过对同一待处理视频帧中的多个逆深度值进行聚类处理,确定所述移动物体的深度估计值。
在上述技术方案的基础上,深度估计值确定子模块包括:深度差值确定单元和深度估计值确定单元。
深度差值确定单元,设置为依据多个逆深度值的大小进行排序后,确定相邻两个逆深度值之间的深度差值;深度估计值确定单元,设置为获取深度差值最大的两个目标逆深度值,并基于大于所述目标逆深度值的多个逆深度值确定所述移动物体的深度估计值。
在上述技术方案的基础上,所述装置还包括:逆深度值删除模块。
逆深度值删除模块,设置为在所述基于大于所述目标逆深度值的多个逆深度值确定所述移动物体的深度估计值之前,若大于或小于所述目标逆深度值的逆深度值的数量与逆深度值总数量之间的比值小于预设比例,则删除所述大于或小于所述目标逆深度值的逆深度值,并重新执行确定所述目标逆深度值的操作。
在上述技术方案的基础上,深度估计值确定单元,设置为对大于所述目标逆深度值的多个逆深度值进行均值处理得到逆深度均值,根据所述逆深度均值确定所述移动物体的深度估计值。
本公开实施例的技术方案,通过确定视频处理类型,根据视频处理类型,确定对移动物体进行深度估计的目标处理方式,最后,基于目标处理方式,确定待 处理视频帧中移动物体的深度估计值,解决了相关技术中仅可以对静态物体的深度信息进行估计的问题,实现了对视频帧中的移动物体的深度信息进行准确估计的效果,并且,提高了深度估计的适用范围,满足了用户的个性化需求,提升了用户体验。
本公开实施例所提供的移动物体的深度估计装置可执行本公开任意实施例所提供的移动物体的深度估计方法,具备执行方法相应的功能模块和效果。
上述装置所包括的多个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,多个功能单元的名称也只是为了便于相互区分,并不用于限制本公开实施例的保护范围。
图4为本公开实施例所提供的一种电子设备的结构示意图。下面参考图4,其示出了适于用来实现本公开实施例的电子设备(例如图4中的终端设备或服务器)500的结构示意图。本公开实施例中的终端设备可以包括诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,PDA)、平板电脑(Portable Android Device,PAD)、便携式多媒体播放器(Portable Media Player,PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字电视(Television,TV)、台式计算机等等的固定终端。图4示出的电子设备500仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图4所示,电子设备500可以包括处理装置(例如中央处理器、图形处理器等)501,其可以根据存储在只读存储器(Read-Only Memory,ROM)502中的程序或者从存储装置508加载到随机访问存储器(Random Access Memory,RAM)503中的程序而执行多种适当的动作和处理。在RAM 503中,还存储有电子设备500操作所需的多种程序和数据。处理装置501、ROM 502以及RAM 503通过总线504彼此相连。输入/输出(Input/Output,I/O)接口505也连接至总线504。
通常,以下装置可以连接至I/O接口505:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置506;包括例如液晶显示器(Liquid Crystal Display,LCD)、扬声器、振动器等的输出装置507;包括例如磁带、硬盘等的存储装置508;以及通信装置509。通信装置509可以允许电子设备500与其他设备进行无线或有线通信以交换数据。虽然图4示出了具有多种装置的电子设备500,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软 件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置509从网络上被下载和安装,或者从存储装置508被安装,或者从ROM 502被安装。在该计算机程序被处理装置501执行时,执行本公开实施例的方法中限定的上述功能。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
本公开实施例提供的电子设备与上述实施例提供的移动物体的深度估计方法属于同一构思,未在本实施例中详尽描述的技术细节可参见上述实施例,并且本实施例与上述实施例具有相同的效果。
本公开实施例提供了一种计算机存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述实施例所提供的移动物体的深度估计方法。
本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的例子可以包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、RAM、ROM、可擦式可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括:电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如超文本传输协议(HyperText Transfer Protocol,HTTP)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互 连。通信网络的示例包括局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:确定视频处理类型;根据所述视频处理类型,确定对移动物体进行深度估计的目标处理方式;基于所述目标处理方式,确定待处理视频帧中移动物体的深度估计值。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括LAN或WAN—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开多种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在一种情况下并不构成对该单元本身的限定,例如,第一获取单元还可以被描述为“获取至少两个网际协议地址的单元”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(Field Programmable Gate Array,FPGA)、专用集成电路(Application Specific  Integrated Circuit,ASIC)、专用标准产品(Application Specific Standard Parts,ASSP)、片上系统(System on Chip,SOC)、复杂可编程逻辑设备(Complex Programming Logic Device,CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、RAM、ROM、EPROM或快闪存储器、光纤、CD-ROM、光学储存设备、磁储存设备、或上述内容的任何合适组合。
此外,虽然采用特定次序描绘了多个操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了多个实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的一些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的多种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。

Claims (14)

  1. 一种移动物体的深度估计方法,包括:
    确定视频处理类型;
    根据所述视频处理类型,确定对移动物体进行深度估计的目标处理方式;
    基于所述目标处理方式,确定待处理视频帧中移动物体的深度估计值。
  2. 根据权利要求1所述的方法,其中,所述视频处理类型包括实时处理类型和后处理类型。
  3. 根据权利要求2所述的方法,其中,所述目标处理方式包括与所述实时处理类型相对应的深度均值估计方式,或与所述后处理类型所对应的逆深度估计方式。
  4. 根据权利要求1所述的方法,其中,所述目标处理方式包括深度均值估计方式,所述基于所述目标处理方式,确定待处理视频帧中移动物体的深度估计值,包括:
    确定与所述待处理视频帧相对应的拍摄参数以及所述移动物体的像素点参数;
    基于所述拍摄参数、所述像素点参数以及约束条件,确定目标像素点;
    基于所述目标像素点的点云数据,确定所述移动物体的深度估计值。
  5. 根据权利要求4所述的方法,其中,所述基于所述拍摄参数、所述像素点参数以及约束条件,确定目标像素点,包括:
    对所述拍摄参数和所述像素点参数进行三角化处理,以得到所述像素点参数所对应的点云数据;
    基于所述点云数据和所述约束条件,确定反投影像素参数;
    基于所述像素点参数和所述反投影像素参数,确定所述目标像素点。
  6. 根据权利要求4所述的方法,其中,所述基于所述目标像素点的点云数据,确定移动物体的深度估计值,包括:
    根据所述目标像素点的点云数据,确定所述目标像素点所属的至少两个待使用视频帧;
    根据所述目标像素点在所述至少两个待使用视频帧的深度值,确定所述移动物体的深度估计值。
  7. 根据权利要求1所述的方法,其中,所述目标处理方式包括逆深度估计方式,所述基于所述目标处理方式,确定待处理视频帧中移动物体的深度估计值, 包括:
    对目标视频中每个待处理视频帧进行三角化处理,得到每个待处理视频帧中每个像素点的逆深度值;
    通过对同一待处理视频帧中的多个逆深度值进行聚类处理,确定所述移动物体的深度估计值。
  8. 根据权利要求7所述的方法,其中,所述通过对同一待处理视频帧中的多个逆深度值进行聚类处理,确定所述移动物体的深度估计值,包括:
    依据所述多个逆深度值的大小进行排序后,确定相邻两个逆深度值之间的深度差值;
    获取深度差值最大的两个目标逆深度值,并基于大于所述目标逆深度值的多个逆深度值确定所述移动物体的深度估计值。
  9. 根据权利要求8所述的方法,在所述基于大于所述目标逆深度值的多个逆深度值确定所述移动物体的深度估计值之前,还包括:
    在大于或小于所述目标逆深度值的逆深度值的数量与逆深度值总数量之间的比值小于预设比例的情况下,删除所述大于或小于所述目标逆深度值的逆深度值,并重新执行确定所述目标逆深度值的操作。
  10. 根据权利要求8所述的方法,其中,所述基于大于所述目标逆深度值的多个逆深度值确定所述移动物体的深度估计值,包括:
    对大于所述目标逆深度值多个逆深度值进行均值处理得到逆深度均值,根据所述逆深度均值确定所述移动物体的深度估计值。
  11. 一种移动物体的深度估计装置,包括:
    视频处理类型确定模块,设置为确定视频处理类型;
    目标处理方式确定模块,设置为根据所述视频处理类型,确定对移动物体进行深度估计的目标处理方式;
    深度估计值确定模块,设置为基于所述目标处理方式,确定待处理视频帧中移动物体的深度估计值。
  12. 一种电子设备,包括:
    至少一个处理器;
    存储装置,设置为存储至少一个程序;
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理 器实现如权利要求1-10中任一所述的移动物体的深度估计方法。
  13. 一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行如权利要求1-10中任一所述的移动物体的深度估计方法。
  14. 一种计算机程序产品,包括承载在非暂态计算机可读介质上的计算机程序,所述计算机程序包含用于执行如权利要求1-10中任一所述的移动物体的深度估计方法的程序代码。
PCT/CN2023/114570 2022-09-22 2023-08-24 移动物体的深度估计方法、装置、电子设备及存储介质 WO2024060923A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211160924.9 2022-09-22
CN202211160924.9A CN117788542A (zh) 2022-09-22 2022-09-22 移动物体的深度估计方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2024060923A1 true WO2024060923A1 (zh) 2024-03-28

Family

ID=90391591

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/114570 WO2024060923A1 (zh) 2022-09-22 2023-08-24 移动物体的深度估计方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN117788542A (zh)
WO (1) WO2024060923A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130314400A1 (en) * 2012-05-28 2013-11-28 Kabushiki Kaisha Toshiba Three-dimensional image generating apparatus and three-dimensional image generating method
CN105100771A (zh) * 2015-07-14 2015-11-25 山东大学 一种基于场景分类和几何标注的单视点视频深度获取方法
CN113643342A (zh) * 2020-04-27 2021-11-12 北京达佳互联信息技术有限公司 一种图像处理方法、装置、电子设备及存储介质
CN113989717A (zh) * 2021-10-29 2022-01-28 北京字节跳动网络技术有限公司 视频图像处理方法、装置、电子设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130314400A1 (en) * 2012-05-28 2013-11-28 Kabushiki Kaisha Toshiba Three-dimensional image generating apparatus and three-dimensional image generating method
CN105100771A (zh) * 2015-07-14 2015-11-25 山东大学 一种基于场景分类和几何标注的单视点视频深度获取方法
CN113643342A (zh) * 2020-04-27 2021-11-12 北京达佳互联信息技术有限公司 一种图像处理方法、装置、电子设备及存储介质
CN113989717A (zh) * 2021-10-29 2022-01-28 北京字节跳动网络技术有限公司 视频图像处理方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN117788542A (zh) 2024-03-29

Similar Documents

Publication Publication Date Title
US11145083B2 (en) Image-based localization
CN111325796B (zh) 用于确定视觉设备的位姿的方法和装置
CN113811920A (zh) 分布式姿势估计
CN112733820B (zh) 障碍物信息生成方法、装置、电子设备和计算机可读介质
CN115147558B (zh) 三维重建模型的训练方法、三维重建方法及装置
CN112101209B (zh) 用于路侧计算设备的确定世界坐标点云的方法和装置
JP2023530545A (ja) 空間幾何情報推定モデルの生成方法及び装置
CN115578515B (zh) 三维重建模型的训练方法、三维场景渲染方法及装置
CN115578433B (zh) 图像处理方法、装置、电子设备及存储介质
WO2024104248A1 (zh) 虚拟全景图的渲染方法、装置、设备及存储介质
CN114399588B (zh) 三维车道线生成方法、装置、电子设备和计算机可读介质
WO2023193639A1 (zh) 图像渲染方法、装置、可读介质及电子设备
WO2024001526A1 (zh) 图像处理方法、装置及电子设备
WO2024037556A1 (zh) 图像处理方法、装置、设备及存储介质
CN113129352A (zh) 一种稀疏光场重建方法及装置
CN114638846A (zh) 拾取位姿信息确定方法、装置、设备和计算机可读介质
CN114863071A (zh) 目标对象标注方法、装置、存储介质及电子设备
CN111833459B (zh) 一种图像处理方法、装置、电子设备及存储介质
CN115578432B (zh) 图像处理方法、装置、电子设备及存储介质
WO2024060923A1 (zh) 移动物体的深度估计方法、装置、电子设备及存储介质
US11417063B2 (en) Determining a three-dimensional representation of a scene
CN111260544B (zh) 数据处理方法及装置、电子设备和计算机存储介质
CN114612544A (zh) 图像处理方法、装置、设备和存储介质
CN112037280A (zh) 物体距离测量方法及装置
WO2023216971A1 (zh) 特效视频生成方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23867216

Country of ref document: EP

Kind code of ref document: A1