WO2023093739A1

WO2023093739A1 - Multi-view three-dimensional reconstruction method

Info

Publication number: WO2023093739A1
Application number: PCT/CN2022/133598
Authority: WO
Inventors: 付明亮; 董智超; 周振坤; 凌康
Original assignee: 华为技术有限公司
Priority date: 2021-11-25
Filing date: 2022-11-23
Publication date: 2023-06-01
Also published as: CN116168143A

Abstract

Embodiments of the present application provide a multi-view three-dimensional reconstruction method, comprising: determining a second three-dimensional point cloud according to a first image sequence, a first instance mask and a first three-dimensional point cloud of a target object and pose information of an image acquisition device; obtaining a 2D view region according to the second three-dimensional point cloud, the 2D view region comprising a region of interest of the target object; and generating a third three-dimensional point cloud according to the 2D view region, wherein the third three-dimensional point cloud comprises a dense three-dimensional point cloud of the target object, and the dense three-dimensional point cloud is used for displaying the target object. Therefore, complete and efficient multi-view three-dimensional reconstruction can be realized on the premise of no user interaction, no target object category limitation, and no calibration pattern constraint.

Description

A method for multi-view 3D reconstruction

This application claims the priority of the Chinese patent application with the application number 202111414740.6 and the application title "A Method for Multi-View Three-dimensional Reconstruction" submitted to the China Patent Office on November 25, 2021, the entire contents of which are incorporated by reference in this application middle.

technical field

The embodiments of the present application relate to the field of computer technology, and in particular to a method and device for multi-view three-dimensional reconstruction.

Background technique

Multi-view stereo vision (multi-view stereo, MVS) aims to restore the three-dimensional scene surface from a set of calibrated two-dimensional images and estimated camera parameters, and is widely used in autonomous driving, augmented reality, digital presentation and protection of cultural relics, urban scale measurement etc.

In the prior art, in the process of 3D reconstruction, it is necessary to outline the region to be processed in various shapes from the processed image, so as to identify and extract the region of interest (region of interest, ROI) of the main body of the scene. For example, use user interaction to determine the ROI of the initial frame on the input image sequence; another example, use the foreground segmentation algorithm to realize the segmentation of the foreground ROI on the image sequence; another example, use the preset logo pattern in the scene to realize the marking and extract. The current multi-view stereo reconstruction technology has greatly inspired the research of intensive computing scenarios, but on the one hand, the current ROI extraction technology is prone to incomplete ROI problems such as false detection and missed detection; on the other hand, based on specific signs or user Interactive methods are not suitable for automation or high-volume reconstruction scenarios. In addition, in intensive computing scenarios, a large number of calculations occur in the background area, and the calculation overhead of the background area is too large, which affects the efficiency of 3D reconstruction.

How to achieve a complete and efficient multi-view 3D reconstruction without user interaction, no restriction on object types, and no constraints on calibration patterns has become an urgent problem to be solved in the industry.

Contents of the invention

The embodiment of the present application provides a method for multi-view 3D reconstruction, which can realize complete and efficient multi-view 3D reconstruction without user interaction, target object type restriction and calibration pattern restriction.

In the first aspect, a method for multi-view 3D reconstruction is provided, including: determining the second 3D point according to the first image sequence of the target object, the first instance mask, the first 3D point cloud, and the pose information of the image acquisition device cloud, wherein the first sequence of images includes a plurality of images that are de-distorted after surrounding shooting of the target object, and the first instance mask includes a segmentation mask of the target object in the first sequence of images code and the segmentation mask of the background object, the first three-dimensional point cloud includes the sparse point cloud of the target object and the sparse point cloud of the background object in the first image sequence, and the pose information includes the image acquisition device Parameter information when shooting around the target object, the second 3D point cloud includes a sparse point cloud of the target object; acquire a 2D view area according to the second 3D point cloud, and the 2D view area includes the target A region of interest of the object; generating a third 3D point cloud according to the 2D view region, the third 3D point cloud including a dense 3D point cloud of the target object, and the dense 3D point cloud is used to display the target object.

Based on the above technical solution, in the multi-view 3D reconstruction scene of the present application, based on the visual axis prior information of the image acquisition device, the automatic extraction of 2D ROI is realized by obtaining the 3D ROI of the target object, and the 3D reconstruction of the target object is further realized. In this way, it is possible to avoid ROI misdetection, missed detection, and incomplete problems, and efficiently realize the complete multi-view 3D reconstruction of the target object.

With reference to the first aspect, in a possible implementation manner, the determining of the second 3D point cloud according to the first image sequence of the target object, the first instance mask, the first 3D point cloud, and the pose information of the image acquisition device , comprising: determining a 3D spherical model of the target object according to the pose information; acquiring a 2D circular image of the target object according to the 3D spherical model; removing the background object from the 2D circular image A segmentation mask, determining a segmentation mask of the target object; determining the second 3D point cloud according to the first 3D point cloud and the segmentation mask of the target object.

Based on the above technical solution, in this application, the 3D spherical model is determined based on the visual axis information of the image acquisition device, and the background mask is removed according to the projection of the spherical model on the 2D viewpoint image, so as to avoid ROI false detection and missed detection problems.

With reference to the first aspect, in a possible implementation manner, a least square method is used to fit the center and radius of the 3D spherical model according to the pose information.

Based on the above technical solution, in this application, the 3D spherical model is fitted based on the visual axis information of the image acquisition device based on the least square method to ensure the accuracy of the position and outline of the 3D spherical model.

With reference to the first aspect, in a possible implementation manner, determining the segmentation mask of the target object according to the overlap between the 2D circular image and the first instance mask includes: when the 2D circular image If the partial mask included in the shape image overlaps with the first instance mask, then determine that the overlapping partial mask is the segmentation mask of the target object, and remove the non-overlapping mask; or when the The 2D circular image and the partial masks included in the first example mask do not overlap, then remove the non-overlapping partial masks, and determine the remaining segmentation masks as the segmentation masks of the corresponding target objects, the The non-overlapping partial masks are the segmentation masks of the background objects.

Based on the above technical solution, in this application, the background mask is removed based on the 2D projection image of the 3D spherical model to ensure the accuracy of the mask of the target object, thereby ensuring the accuracy and integrity of the ROI, and further reducing the background The calculation of the area can effectively improve the reconstruction efficiency.

With reference to the first aspect, in a possible implementation manner, the first 3D point cloud is projected onto a 2D view image, and the The second 3D point cloud includes: the 2D view image of a part of the first 3D point cloud overlaps with the segmentation mask of the target object, and then determine the part of the overlapping points of the first 3D point cloud The cloud is the second 3D point cloud, and the remaining point clouds in the first 3D point cloud that do not overlap with the segmentation mask of the target object are removed.

Based on the above technical solution, in this application, the 3D point cloud of the target object is determined based on the 2D projection image of the sparse point cloud, so that the 3D point cloud of the target object is more accurate, which is conducive to the complete extraction of ROI, so as to realize the complete target object multi-view 3D reconstruction.

With reference to the first aspect, in a possible implementation manner, a 2D convex hull region is obtained according to the second 3D point cloud and the pose information, and the 2D convex hull region includes 2D point set, the 2D point set includes the 2D projection point of the second three-dimensional point cloud; edge detection is performed on the 2D convex hull area, and the 2D extended point set is obtained, and the edge detection is used to move according to the edge point set In addition to the sparse point cloud of the background object included in the 2D convex hull region, the 2D extended point set includes the 2D point set and the edge point set; the target object is obtained according to the 2D extended point set 2D view area.

Based on the above technical solution, in this application, the expansion of the 2D point set is realized based on the 2D convex hull and edge detection to ensure the integrity of the 2D ROI, thereby realizing the complete multi-view 3D reconstruction of the target object.

With reference to the first aspect, in a possible implementation manner, the parameter information when the image acquisition device shoots around the target object includes a degree of freedom parameter when the image acquisition device moves relative to the target object.

Based on the above technical solution, in this application, 3D reconstruction is performed based on the degree of freedom parameters of the image acquisition device, which can ensure the accuracy of 3D ROI and 2D ROI, thereby realizing a complete multi-view 3D reconstruction of the target object.

With reference to the first aspect, in a possible implementation manner, the image acquisition device includes multiple devices.

In a second aspect, a device for multi-view 3D reconstruction is provided, the device includes: a first determination module, configured to use the first image sequence of the target object, the first instance mask, the first 3D point cloud, and an image acquisition device The pose information of the second 3D point cloud is determined, wherein the first image sequence includes a plurality of images that are de-distorted after shooting around the target object, and the first instance mask includes the first A segmentation mask of the target object and a segmentation mask of the background object in the image sequence, the first 3D point cloud includes a sparse point cloud of the target object and a sparse point cloud of the background object in the first image sequence, the The pose information includes parameter information when the image acquisition device shoots around the target object, and the second 3D point cloud includes a sparse point cloud of the target object; the second determination module is configured to The point cloud acquires a 2D view area, and the 2D view area includes the region of interest of the target object; a building module is used to generate a third three-dimensional point cloud according to the 2D view area, and the third three-dimensional point cloud includes the target A dense three-dimensional point cloud of the object, where the dense three-dimensional point cloud is used to display the target object.

Based on the above technical solution, in this application, in the multi-view 3D reconstruction scene of this application, based on the visual axis prior information of the image acquisition device, the automatic extraction of 2D ROI is realized by obtaining the 3D ROI of the target object, and the target object is further realized. 3D reconstruction. In this way, it is possible to avoid ROI misdetection, missed detection, and incomplete problems, and efficiently realize the complete multi-view 3D reconstruction of the target object.

With reference to the second aspect, in a possible implementation manner, the first determining module is specifically configured to determine a 3D spherical model of the target object according to the pose information; acquire the target object according to the 3D spherical model 2D circular image; remove the segmentation mask of the background object according to the 2D circular image, and determine the segmentation mask of the target object; according to the first 3D point cloud and the segmentation mask of the target object The code determines the second 3D point cloud.

With reference to the second aspect, in a possible implementation manner, the first determination module is specifically configured to use a least square method to fit the center and radius of the 3D spherical model according to the camera pose information.

With reference to the second aspect, in a possible implementation manner, the first determination module is specifically configured to determine the segmentation mask of the target object according to the overlap between the 2D circular image and the first instance mask. The code includes: when the 2D circular image overlaps with a partial mask included in the first instance mask, then determining that the overlapped partial mask is the segmentation mask of the target object, and removing the non-identical Overlapping masks; or when the partial masks included in the 2D circular image and the first instance mask do not overlap, remove the non-overlapping partial masks, and determine that the remaining segmentation masks belong to A segmentation mask of the target object, the non-overlapping partial mask is the segmentation mask of the background object.

With reference to the second aspect, in a possible implementation manner, the first determination module is specifically configured to project the first 3D point cloud to a 2D view image, and according to the segmentation of the 2D view image and the target object If the mask overlaps to determine the second 3D point cloud, it includes: the 2D view image of a part of the point cloud of the first 3D point cloud overlaps with the segmentation mask of the target object, then determining all of the overlapping The partial point cloud of the first three-dimensional point cloud is the second three-dimensional point cloud, and the remaining point clouds in the first three-dimensional point cloud that do not overlap with the segmentation mask of the target object are removed.

With reference to the second aspect, in a possible implementation manner, the second determination module is specifically configured to obtain a 2D convex hull region according to the second 3D point cloud and the pose information, and the 2D convex hull region includes A 2D point set within the outer contour of the target object, the 2D point set including 2D projection points of the second 3D point cloud; edge detection is performed on the 2D convex hull area to obtain a 2D extended point set, the Edge detection is used to remove the sparse point cloud of the background object included in the 2D convex hull area according to the edge point set, the 2D extended point set includes the 2D point set and the edge point set; according to the The 2D extension point set obtains the 2D viewing area of the target object.

With reference to the second aspect, in a possible implementation manner, the parameter information when the image acquisition device shoots around the target object includes a degree of freedom parameter when the image acquisition device moves relative to the target object.

With reference to the second aspect, in a possible implementation manner, the image acquisition device includes multiple devices.

In a third aspect, a device for multi-view 3D reconstruction is provided, including a processor and a memory, the memory is used to store a computer program, and the processor is used to call and run the computer program from the memory, so that the communication device executes the first An image generation method in one aspect and various possible implementations thereof.

Optionally, there are one or more processors, and one or more memories.

Optionally, the memory can be integrated with the processor, or the memory can be set separately from the processor.

In a fourth aspect, a computer-readable storage medium is provided, wherein the computer-readable medium stores program code for execution by a device, and the program code includes the method for executing the first aspect or the second aspect.

According to a fifth aspect, a computer program product containing instructions is provided, and when the computer program product is run on a computer, the computer is made to execute the method in any one of the implementation manners in the foregoing aspects.

According to a sixth aspect, a chip is provided, and the chip includes a processor and a data interface, and the processor reads instructions stored in the memory through the data interface, and executes the method in any one of the above aspects.

Optionally, as an implementation manner, the chip may further include a memory, the memory stores instructions, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the processor is configured to execute the above-mentioned A method in any one of the implementations of the aspect.

The aforementioned chip may specifically be a field-programmable gate array (field-programmable gate array, FPGA) or an application-specific integrated circuit (application-specific integrated circuit, ASIC).

Description of drawings

FIG. 1 shows a schematic structural diagram of a system architecture provided by an embodiment of the present application;

FIG. 2 shows a schematic diagram of a scene structure provided by an embodiment of the present application;

Fig. 3 shows a schematic diagram of a product realization form provided by the embodiment of the present application;

Fig. 4 shows a flowchart of a method for multi-view 3D reconstruction provided by an embodiment of the present application;

Fig. 5 shows a structural diagram of an apparatus for multi-view 3D reconstruction provided by an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

It should be understood that the names of all nodes and devices in this application are only the names set by this application for the convenience of description, and the names in actual applications may be different. It should not be understood that this application limits the names of various nodes and devices. On the contrary, Any names with the same or similar functions as the nodes or devices used in this application are regarded as methods or equivalent replacements in this application, and are all within the scope of protection of this application, and will not be described in detail below.

To facilitate understanding of the embodiment of the present application, firstly, a schematic structural diagram of a system architecture 100 of the embodiment of the present application is briefly described with reference to FIG. 1 . As shown in FIG. 1 , the system architecture 100 includes an image acquisition module 110 and a model reconstruction module 120 . The image acquisition module 110 is one of the bases for model reconstruction, and high-quality large-scale image acquisition is the key to reconstructing high-quality models.

Wherein, in the image acquisition module 110, as shown in FIG. 1 , the image acquisition device 111 is used to acquire the original image, wherein the image acquisition device 111 can be any device with a shooting function, which is not specifically limited in the embodiment of the present application; The image preprocessing device 112 may be used to perform screening, filtering, and de-distortion processing of the original image, where the method for performing de-distortion processing on the image is not limited in this embodiment of the present application.

The image acquisition module 110 also includes an image sequence library 113 for storing image sequences. The image sequence can be used in the model reconstruction module to reconstruct the three-dimensional model.

As shown in FIG. 1, the model reconstruction module 120 includes a view reconstruction device 121, which can perform sparse reconstruction and dense reconstruction based on the image sequence maintained in the image sequence library 113, and further obtain the target 3D view by the view fusion device.

It should be noted that the above-mentioned view reconstruction device 121 and view fusion device 122 can be used as independent devices, or can be coupled as one device to reconstruct the target 3D view. The embodiment of the present application is only an exemplary description, and the embodiment of the present application does not limit it.

It should be noted that, in practical applications, the image sequences maintained in the image sequence library 113 are not necessarily all acquired by the image acquisition device 111, and may also be received from other devices. In addition, it should be noted that the view reconstruction device 121 does not necessarily reconstruct the 3D view entirely based on the image sequence maintained by the image sequence library 113, and may also obtain the image sequence from the cloud or other places to reconstruct the 3D view. The above description should not be regarded as Limitations on the embodiments of this application.

The embodiments of the present application can be applied to different systems or devices, such as execution devices, which can be terminals, such as mobile terminals, tablet computers, notebook computers, augmented reality (augmented reality, AR) AR/virtual reality (virtual reality) reality, VR), vehicle-mounted terminals, etc., it can also be a server or cloud, etc.

It is worth noting that the above-mentioned view reconstruction device 121 can reconstruct and obtain different three-dimensional objects based on different image sequences for different targets or different tasks, so as to provide users with desired results.

It should be noted that FIG. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among devices, devices, modules, etc. shown in the figure does not constitute any limitation.

FIG. 2 shows a schematic diagram of a scene structure provided by an embodiment of the present application. This application scenario is applicable to the above-mentioned system 100 .

In the embodiment of the present application, the application scene input includes two stages, the image sequence 2010 collected around the object and the construction input 2020 . Among them, the construction input 2020 stage usually includes the sparsely reconstructed 3D point cloud output by the SFM algorithm, the pose of the image acquisition device, and the dedistorted image. Wherein, the pose of the image acquisition device can be understood as a parameter of the degree of freedom of the movement of the image acquisition device relative to the object when the image acquisition device shoots around the object.

The stage of extracting 2D ROI 2030 mainly refers to the use of related algorithms to complete the extraction of 2D ROI on the de-distorted image sequence; subsequently, the viewpoint sequence image after 2D ROI extraction is used as the input of dense reconstruction 1040, and the depth map is output, and the depth map is obtained through view fusion 2050. Graph and normal map to synthesize 3D point cloud.

It should be noted that the foregoing scenario is only an exemplary description, and the embodiment of the present application may be used in various multi-view stereo reconstruction scenarios, which is not limited in the embodiment of the present application.

In order to facilitate the understanding of this embodiment, a method for multi-view 3D reconstruction disclosed in this embodiment is firstly introduced in detail. The execution subject of the multi-view 3D reconstruction method provided by the embodiment of the present disclosure is generally a computer with certain computing power equipment, the computer equipment includes, for example: terminal equipment or server or other processing equipment, the terminal equipment can be user equipment (User Equipment, UE), mobile equipment, user terminal, terminal, cellular phone, cordless phone, personal digital processing (Personal Digital Assistant, PDA), handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc. In some possible implementation manners, the method for multi-view three-dimensional reconstruction may be implemented in a manner in which a processor invokes computer-readable instructions stored in a memory.

The multi-view 3D reconstruction method of the embodiment of the present application is described below with reference to FIG. 3 . Fig. 3 shows a schematic flowchart of a method for multi-view 3D reconstruction provided by an embodiment of the present application. The method 300 shown in FIG. 3 may be applied to the system 100 shown in FIG. 1 , and the method 300 may be executed by the above execution device. Optionally, the method 300 may be processed by a CPU, or other processors suitable for three-dimensional reconstruction, which is not limited in this embodiment of the present application.

In the scene of multi-view 3D reconstruction in the embodiment of the present application, the image acquisition device captures multiple sets of discrete viewpoint images surrounding the target object, and the discrete viewpoint image sets include multiple two-dimensional images for displaying the target object. The set of discrete viewpoint images constitutes an image sequence as an input, and the reconstruction module outputs a corresponding 3D image of the target property, and the 3D image can display a stereoscopic image of the target object.

Wherein, the image acquisition device may be any electronic device with a shooting function, for example, a mobile phone, a camera, a computer, and the like. This embodiment of the present application does not limit it.

It should be understood that the target object may be any object in a scene space that the user wants to perform 3D reconstruction.

It should be understood that an image sequence may be called an image set, a view set, an image set, and other similar terms. The embodiment of the present application uses the image sequence as an example for description, and the embodiment of the present application does not limit this.

The method 300 includes step S310 to step S330. Step S310 to step S330 will be described in detail below.

S310. Determine a second 3D point cloud according to the first image sequence of the target object, the first instance mask, the first 3D point cloud, and the pose information of the image acquisition device.

In the embodiment of the present application, the above-mentioned image sequence may be a distorted image obtained by shooting a target object by an image acquisition device, and a de-distorted image sequence may be obtained after de-distortion processing, and the first image sequence is an example of the de-distorted image sequence, that is, The first sequence of images includes a plurality of de-distorted discrete viewpoint images.

In the embodiment of the present application, each object included in the current scene can be used as an instance, and the first image sequence is input frame by frame based on the trained example segmentation model to realize the segmentation of each instance, and obtain the segmentation mask of each instance Code, the segmentation mask includes the contour of the corresponding object and the pixels within the contour. The first instance mask includes a segmentation mask of the background object and a segmentation mask of the target object in the first sequence of images. The method of obtaining the instance mask is not limited in this embodiment of the application.

It should be understood that the above-mentioned background objects are all other objects except the target object in the current shooting scene.

In the embodiment of the present application, the first 3D point cloud includes a sparse 3D point cloud output according to the first graphics sequence, and the sparse 3D point cloud includes a sparse 3D point cloud of a target object and a sparse 3D point cloud of a background object.

In a possible implementation manner, the first three-dimensional point cloud may be acquired by using an SFM algorithm.

In the embodiment of the present application, the pose information of the image acquisition device (for clarity and simplicity, hereinafter referred to as the pose information) can be understood as the parameter information when the image acquisition device shoots around the target object. The parameter information may be a degree of freedom parameter. It should be understood that when the image acquisition device moves around the target object when shooting, each image acquisition device generates a spatial position change relationship relative to the target object, and the spatial position change relationship can be converted into a coordinate system according to the above parameter information, Therefore, the movement track of the image acquisition device can be clarified.

In a possible implementation manner, the pose information of the image acquisition device may be acquired by using the SFM algorithm.

As an example but not a limitation, the degree of freedom parameter may include 3 position vector parameters and 3 Euler angle parameters, that is, the parameter information may include 6 degree of freedom parameters.

In the embodiment of the present application, the second 3D point cloud is determined according to the above-mentioned first image sequence, the first instance mask, the first 3D point cloud, and the pose information of the image acquisition device.

In the embodiment of the present application, the second three-dimensional point cloud may be understood as a sparse point cloud of the target object.

In a possible implementation manner, the 3D spherical model of the target object is determined according to the above pose information, and the 3D spherical model may be understood as a 3D ball of interest including the target object.

It can be understood that when the image acquisition device shoots around the target object, the visual axis formed by each image acquisition device will approximately form an intersection point on the object, and the intersection point is used as the center of the sphere, and the length of the visual axis is used as the radius to form a circle including the target. 3D fun ball with objects inside.

As an example, in actual implementation, 0.2 times the length of the largest dimension in the first three-dimensional point cloud can be taken as the length of the visual axis, that is, the length of the radius. It should be understood that this value is an empirical value in actual implementation. Examples do not create any restrictions.

Specifically, the viewing axis vector corresponding to the pose information can be used to fit the 3D ball of interest.

Further, a 2D circular image of the target object is acquired according to the 3D spherical model, and the 2D circular image can be understood as a 2D circle of interest including the target object.

It can be understood that the 3D spherical model is back-projected to the 2D viewpoint image according to the pose information to form a 2D circular image.

Specifically, the projection matrix corresponding to the pose information may be used to back-project the 3D spherical model onto the first image sequence to calculate the 2D circular image.

Further, the segmentation mask of the background object is removed according to the 2D circular image, and the segmentation mask of the target object is determined.

In one embodiment, when a 3D spherical model is projected onto a 2D viewpoint image, when the center of the 3D spherical model is projected onto a part of the first instance mask, it can be determined that the part of the instance mask is the target object The segmentation mask of the other instance masks is the instance mask of the background object, that is, the instance mask of the background object can be removed.

In another implementation manner, the segmentation mask of the target object is determined according to the overlap between the 2D viewpoint image and the mask of the first instance.

As an example, when the partial mask included in the 2D circular image and the first instance mask overlaps, the overlapping partial mask is determined to be the segmentation mask of the target object, and the non-identical mask is removed. Overlapping masks.

As yet another example, when the partial masks included in the 2D circular image and the first instance mask do not overlap, the non-overlapping partial masks are removed, and the remaining segmentation masks are determined as belonging targets A segmentation mask of an object, wherein the non-overlapping partial mask is a segmentation mask of the background object.

Finally, the second 3D point cloud is determined according to the first 3D point cloud and the segmentation mask of the target object.

It can be understood that the first 3D point cloud is back-projected to the 2D viewpoint image according to the pose information, and the second 3D point cloud is determined according to the overlap between the 2D viewpoint image and the segmentation mask of the target object.

Specifically, the projection matrix corresponding to the pose information is used to project the first 3D point cloud onto the first image sequence, and the second 3D point cloud is extracted according to the overlapping relationship between the 2D projected points and the segmentation mask of the target object.

As an example, when a 2D point of a part of the point cloud projection included in the first 3D point cloud falls on the segmentation mask of the target object, that is, the 2D viewpoint image overlaps with the segmentation mask of the target object, then the part of the point is judged The cloud is a sparse point cloud of the target object, which is the second three-dimensional point cloud. At this point, other point clouds in the first 3D point cloud need to be removed, so as to obtain the second 3D point cloud.

S320. Acquire a 2D view area according to the second 3D point cloud.

In a possible implementation manner, a 2D convex hull region is obtained according to the second 3D point cloud and the pose information, the 2D convex hull region includes a 2D point set within the outer contour of the target object, and the 2D convex hull region includes The set of points includes 2D projected points of said second 3D point cloud.

Wherein, the 2D convex hull region can be understood as a set of pixel points within the approximate outermost contour of the target object, and the 2D point set included in the outer contour ensures the integrity of the target object. It can be understood that the outer contour includes all 2D point sets of the target object, and also includes 2D point sets of some background objects.

Specifically, the second 3D point cloud is back-projected to the 2D viewpoint image according to the pose information, and the 2D convex hull area is determined according to the 2D viewpoint image.

Further, edge detection is performed on the 2D convex hull area to obtain a 2D extended point set, and the edge detection is used to remove the sparse point cloud of the background object included in the 2D convex hull area according to the edge point set, The 2D extended point set includes the 2D point set and the edge point set. Acquire the 2D view area of the target object according to the 2D extension point set.

It can be understood that the 2D convex hull area is not the exact outline of the target object. Therefore, by performing edge detection on the 3D convex hull area, the precise edge point set of the target object can be obtained, thereby obtaining the 2D extension point set. It can be understood that the 2D extension point Sets include 2D point sets and edge point sets.

It should be noted that the purpose of edge detection is to determine the precise outline of the target object based on the edge point set. Therefore, it is necessary to remove the sparse point cloud of the background object included in the 2D convex hull area, that is, to remove the object through morphological operations. The set of points other than the set of edge points inside the 2D convex hull region. It can be understood that the 2D convex hull area is further reduced, so that the 2D convex hull area tends to be more accurate to the outline of the target object, and the 2D view area of the target object is obtained.

S330. Generate a third 3D point cloud according to the 2D view area.

In the embodiment of the present application, dense reconstruction is further performed according to the 2D viewing area to obtain a third 3D point cloud, and the third 3D point cloud is used to display a 3D image of the target object.

According to the technical solution provided by the embodiment of the present application, in the multi-view 3D reconstruction scene, based on the visual axis prior information of the image acquisition device, the 3D ROI of the target object is obtained to realize the automatic extraction of 2D ROI, and the 3D reconstruction of the target object is further realized. In this way, ROI misdetection, missed detection, and incompleteness can be avoided, and the complete multi-view 3D reconstruction of the target object can be realized efficiently.

Fig. 4 shows a flowchart of a method for multi-view 3D reconstruction provided by an embodiment of the present application. The method 400 shown in FIG. 4 may be applied to the system 100 shown in FIG. 1 , and the method 400 includes specific implementation steps of the above-mentioned method 300 .

The method 400 includes six steps S4010 to S4060. The specific implementation process of each step will be described in detail below.

Step S4010, acquiring an image sequence.

In the embodiment of the present application, the image sequence may be a set of images obtained by surrounding shooting of the target object by multiple image acquisition devices. It can be understood that the image set includes multiple images showing the target object from various angles.

It should be understood that the image sequence may be acquired directly from the image acquisition device, or from other devices, or from the cloud or other places. This embodiment of the present application does not limit it.

It should be noted that the image sequence is obtained from any way, and the image sequence is captured by multiple image acquisition devices surrounding it.

Step S4020, construct input.

The input information includes a first image sequence 4021 , a first instance segmentation mask 4022 , a first 3D point cloud 4023 , and pose information 4024 .

Wherein, the first image sequence 4021 is an image after de-distortion processing of the image sequence. for more accurate subsequent calculations.

Wherein, the first instance segmentation mask 4022 includes the segmentation masks of all objects under the shooting lens of the image acquisition device in a 3D reconstruction scene, specifically, the first instance segmentation mask includes the segmentation masks of the target object and the background object code.

Wherein, the first 3D point cloud 4023 includes sparse point clouds of target objects and background objects.

Wherein, pose information 4024 includes parameter information when the image acquisition device shoots around the target object.

For a detailed explanation of the above input information, reference may be made to step S310 in the method 300, which is not repeated in this embodiment of the present application.

In a possible implementation manner, the constructed input information can be expressed as:

Among them, Input represents the input information of the construction.

Indicates the i-th instance segmentation mask in the prediction result of the instance segmentation model on the viewpoint image j, where the segmentation mask is expressed in the form of a 0/1 matrix, 0 indicates the non-object area, 1 indicates the object area, and the matrix rows and columns are the same as the image resolution , where j and i are both positive integers greater than 1.

points _sfm represents the first 3D point cloud reconstructed by the SFM algorithm, that is, the sparse point cloud.

Indicates the pose information corresponding to the viewpoint image J calculated by the SFM algorithm, and the pose information may consist of 6 parameters, including 3 representing position vectors and 3 representing attitude vectors.

view_img _j represents the viewpoint image j of the first image sequence, that is, the viewpoint image from which distortion has been removed.

Step S4030, determining a second 3D point cloud according to the input information.

In the embodiment of the present application, determining the second 3D point cloud according to the input information includes the following specific steps:

Step 4031: Determine the 3D ball of interest of the target object according to the above pose information.

In a possible implementation manner, the viewing axis vector corresponding to the pose information is used to fit the 3D ball of interest. Specifically, the least square method can be used to fit the focal point where the camera viewing axis converges, so as to represent the 3D ball of interest . The 3D ball of interest fitted by the least square method can be expressed as:

Among them, S(x, y, z, r) represents the 3D interest ball with (x, y, z) as the center and r as the radius fitted by the least square method;

LS(·) represents the least squares fitting algorithm;

N represents the total number of viewpoint images, that is, the total number of viewpoint images included in the first image sequence.

Wherein, x, y, z, r, N are all positive integers greater than 1.

Step 4032: Obtain the 2D circle of interest of the target object according to the above pose information and the 3D circle of interest.

In a possible implementation manner, the 3D ball of interest is back-projected to the 2D viewpoint image according to the projection matrix corresponding to the pose information to form a 2D circle of interest.

Step 4033: Remove the segmentation mask of the background object according to the 2D circle of interest.

In a possible implementation manner, the segmentation mask of the background object is removed according to the 2D circle of interest, and the segmentation mask of the target object is determined. The segmentation mask of the determined target object can be expressed as:

Among them, Refine(□) represents the refinement function of the segmentation mask, and it can be understood that according to this function, the 3D interest ball can be back-projected to the 2D viewpoint image to determine the segmentation mask of the target object.

where _Mj denotes the number of instances in the prediction results of the instance segmentation model on viewpoint image j.

refined_mask represents the segmentation mask of the determined target object.

It can be understood that according to the above function, the 2D circle of interest and

Whether to overlap to remove the mask of background objects. For example, if the 2D circle of interest obtained by projecting the 3D spherical model to the 2D viewpoint image overlaps with a part of the mask in the first instance mask, it can be determined that the mask of the overlapping part is the segmentation mask of the target object, and the other non-overlapping parts mask is a mask of background objects that can be removed.

Step 4034: Determine the second 3D point cloud according to the first 3D point cloud and the segmentation mask of the target object.

In a possible implementation manner, the first 3D point cloud is back-projected to the 2D viewpoint image, and the second 3D point cloud is determined according to the overlap between the 2D viewpoint image and the segmentation mask of the target object. The second 3D point cloud can be understood as a 3D bounding box, and the determined 3D bounding box can be expressed as:

Among them, BB( ) represents the calculation function of the 3D bounding box. This function can realize the reverse projection of the first 3D point cloud to the 2D viewpoint image, and determine the final second 3D by judging whether the projected 2D point falls on the refined_mask point cloud.

Step S4040, acquiring a 2D view area according to the second 3D point cloud.

In the embodiment of the present application, obtaining the 2D view area according to the second 3D point cloud includes the following specific steps:

Step S4041, extracting a 2D convex hull area.

In a possible implementation manner, the 2D convex hull region is obtained according to the second 3D point cloud and the pose information, and the obtained 2D convex hull region can be expressed as:

Among them, CH( ) represents the convex hull calculation function, which uses the corresponding pose parameters of view j

Back-project the second 3D point cloud PC _roi to the corresponding view, and then calculate the convex hull convex_hull based on the 2D projected points.

It can be understood that the 2D point set can be obtained through the extraction of the 2D convex hull area. The 2D point set can be understood as all 2D point sets and 3D point sets in the convex hull area. Specifically, the point set includes all of the target object. The point set also contains the point set of a part of the background object.

Step S4042, determine a 2D extension point set.

In the embodiment of the present application, edge detection is performed on the 2D convex hull area to obtain an edge point set, and a relatively accurate 2D outline of the target object can be determined according to the edge point set.

In a possible implementation manner, the 2D extension point set can be expressed as:

Among them, Edge( ) represents the 2D interest point set expansion function, which performs edge detection on the area inside the convex hull convex_hull _{j on the viewpoint image view_img j} _, and combines the edge detection results with the 2D interest point set to obtain the extended interest point set

It can be understood that the above-mentioned determined 2D extension point set includes the above-mentioned 2D point set and edge point set.

Step S4043, the morphological operation acquires the 2D view area.

In the embodiment of this application, the purpose of edge detection is to determine the precise outline of the target object according to the edge point set. Therefore, it is necessary to remove the sparse point cloud of the background object included in the 2D convex hull area, that is, through the morphological operation Remove the point sets other than the edge point set in the 2D convex hull area.

In a possible implementation, the morphological operation can be expressed as:

Among them, Erosion( ) represents the 2D ROI extraction function, which performs the process from the boundary of the convex hull convex_hull _j to the

The determined boundary is corroded, and the final obtained

is the 2D ROI on this view.

Step S4050, acquiring a third 3D point cloud according to the 2D view area.

In a possible implementation manner, dense reconstruction is further performed according to the 2D view area to obtain a 3D point cloud. The dense reconstruction may be any dense reconstruction method in 3D reconstruction, which is not limited in this embodiment of the present application.

Step S4060, complete the 3D reconstruction of the target object through view fusion.

In a possible implementation manner, view fusion is performed on the third three-dimensional point cloud to obtain a final reconstructed image, which is used to display the three-dimensional image of the target object.

According to the technical solution provided by the embodiment of the present application, in the multi-view 3D reconstruction scene, based on the visual axis prior information of the image acquisition device, the 3D ROI of the target object is obtained to realize the automatic extraction of 2D ROI, and the 3D reconstruction of the target object is further realized. In this way, it is possible to avoid ROI misdetection, missed detection, and incomplete problems, and efficiently realize the complete multi-view 3D reconstruction of the target object.

Fig. 5 shows a structural block diagram of an apparatus for multi-view 3D reconstruction provided by an embodiment of the present application. The device 500 for multi-view three-dimensional reconstruction includes: a first determination module 510 , a second determination module 520 , and a construction module 530 .

Wherein, the first determining module 510 is configured to determine the second 3D point cloud according to the first image sequence of the target object, the first instance mask, the first 3D point cloud, and the pose information of the image acquisition device.

In a possible implementation manner, the first determining module 510 determines the 3D spherical model of the target object according to the pose information; acquires a 2D circular image of the target object according to the 3D spherical model; The segmentation mask of the background object is removed from the circular image to determine the segmentation mask of the target object; and the second 3D point cloud is determined according to the first 3D point cloud and the segmentation mask of the target object.

In a possible implementation manner, the first determining module 510 uses a least square method to fit the center and radius of the 3D spherical model according to the camera pose information, thereby determining the 3D spherical model.

In a possible implementation manner, the first determination module 510 determines the segmentation mask of the target object according to the overlap between the 2D circular image and the first instance mask.

As an optional example, when the 2D circular image overlaps with a partial mask included in the first instance mask, determine that the overlapped partial mask is the segmentation mask of the target object, and move Remove the non-overlapping masks; or when the partial masks included in the 2D circular image and the first instance mask do not overlap, remove the non-overlapping partial masks, and determine the remaining segmentation The mask is the segmentation mask of the target object to which it belongs, and the non-overlapping partial mask is the segmentation mask of the background object.

In a possible implementation manner, the first determination module 510 is configured to project the first 3D point cloud to a 2D view image, and determine the 2D view image according to the overlap between the 2D view image and the segmentation mask of the target object. Describe the second 3D point cloud.

As an optional example, if the 2D view image of a part of the first 3D point cloud overlaps with the segmentation mask of the target object, then determine the partial points of the overlapped first 3D point cloud The cloud is the second 3D point cloud, and the remaining point clouds in the first 3D point cloud that do not overlap with the segmentation mask of the target object are removed.

Wherein, the second determining module 520 is configured to acquire a 2D view area according to the second 3D point cloud.

In a possible implementation manner, the second determining module 520 obtains a 2D convex hull area according to the second 3D point cloud and the pose information, and the 2D convex hull area includes a 2D convex hull area within the outer contour of the target object. Point set, the 2D point set includes the 2D projection point of the second three-dimensional point cloud; Edge detection is performed on the 2D convex hull area to obtain a 2D extended point set, and the edge detection is used to remove according to the edge point set The sparse point cloud of the background object included in the 2D convex hull area, the 2D extended point set includes the 2D point set and the edge point set; obtain the target object according to the 2D extended point set 2D view area.

Wherein, the construction module 530 is used for generating a third 3D point cloud according to the 2D view area.

Those skilled in the art can appreciate that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.

Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

If the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

The above is only a specific implementation of the application, but the scope of protection of the application is not limited thereto. Anyone familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the application. Should be covered within the protection scope of this application. Therefore, the protection scope of the present application should be determined by the protection scope of the claims.

Claims

A method for multi-view 3D reconstruction, comprising:

Determine the second 3D point cloud according to the first image sequence of the target object, the first instance mask, the first 3D point cloud, and the pose information of the image acquisition device, wherein the first image sequence includes using the target object A plurality of images subjected to de-distortion after surround shooting, the first instance mask includes a segmentation mask of the target object and a segmentation mask of the background object in the first image sequence, and the first 3D point cloud includes The sparse point cloud of the target object and the sparse point cloud of the background object in the first image sequence, the pose information includes parameter information when the image acquisition device shoots around the target object, and the second three-dimensional point a cloud comprising a sparse point cloud of said target object;

acquiring a 2D view area according to the second 3D point cloud, where the 2D view area includes an area of interest of the target object;

A third three-dimensional point cloud is generated according to the 2D view area, the third three-dimensional point cloud includes a dense three-dimensional point cloud of the target object, and the dense three-dimensional point cloud is used to display the target object.
The method according to claim 1, wherein the second three-dimensional point cloud is determined according to the first image sequence of the target object, the first instance mask, the first three-dimensional point cloud, and the pose information of the image acquisition device, include:

determining a 3D spherical model of the target object according to the pose information;

acquiring a 2D circular image of the target object according to the 3D spherical model;

removing the segmentation mask of the background object according to the 2D circular image, and determining the segmentation mask of the target object;

The second 3D point cloud is determined according to the first 3D point cloud and the segmentation mask of the target object.
The method according to claim 2, wherein determining the 3D spherical model of the target object according to the pose information comprises:

Fitting the center and radius of the 3D spherical model according to the camera pose information by using the least squares method.
The method according to claim 2, wherein removing the segmentation mask of the background object according to the 2D circular image to determine the segmentation mask of the target object comprises:

Determining the segmentation mask of the target object according to the overlap between the 2D circular image and the first instance mask, including:

When the partial mask included in the 2D circular image and the first instance mask overlaps, then determine that the overlapping partial mask is the segmentation mask of the target object, and remove the non-overlapping mask ;or

When the partial masks included in the 2D circular image and the first instance mask do not overlap, the non-overlapping partial masks are removed, and the remaining segmentation masks are determined to be the segmentation masks of the corresponding target object , the non-overlapping partial mask is the segmentation mask of the background object.
The method according to claim 2, wherein determining a second 3D point cloud according to the segmentation mask of the first 3D point cloud and the target object comprises:

Projecting the first 3D point cloud to a 2D view image, and determining the second 3D point cloud according to the overlap between the 2D view image and the segmentation mask of the target object, including:

The 2D view image of the partial point cloud of the first three-dimensional point cloud overlaps with the segmentation mask of the target object, then it is determined that the overlapping partial point cloud of the first three-dimensional point cloud is the second three-dimensional point cloud, removing remaining point clouds in the first 3D point cloud that do not overlap with the segmentation mask of the target object.
The method according to claim 1, wherein said acquiring said 2D view area according to said second 3D point cloud comprises:

Obtain a 2D convex hull area according to the second 3D point cloud and the pose information, the 2D convex hull area includes a 2D point set within the outer contour of the target object, and the 2D point set includes the second 2D projected points of the 3D point cloud;

Performing edge detection on the 2D convex hull area to obtain a 2D extended point set, the edge detection is used to remove the sparse point cloud of the background object included in the 2D convex hull area according to the edge point set, the 2D The extended point set includes the 2D point set and the edge point set;

Acquire the 2D view area of the target object according to the 2D extension point set.
The method according to any one of claims 1-6, wherein the parameter information when the image acquisition device shoots around the target object includes the parameter information when the image acquisition device moves relative to the target object degrees of freedom parameter.
The method according to any one of claims 1-7, characterized in that: the image acquisition device comprises a plurality.
A device for multi-view three-dimensional reconstruction, characterized in that it comprises:

A first determining module, configured to determine a second 3D point cloud according to the first image sequence of the target object, the first instance mask, the first 3D point cloud, and the pose information of the image acquisition device, wherein the first image sequence Including using multiple images de-distorted after surrounding shooting of the target object, the first instance mask includes a segmentation mask of the target object and a segmentation mask of the background object in the first image sequence, so The first three-dimensional point cloud includes a sparse point cloud of a target object and a sparse point cloud of a background object in the first image sequence, and the pose information includes parameter information when the image acquisition device shoots around the target object , the second three-dimensional point cloud includes a sparse point cloud of the target object;

A second determining module, configured to acquire a 2D view area according to the second 3D point cloud, where the 2D view area includes an area of interest of the target object;

A building module, configured to generate a third 3D point cloud according to the 2D view area, the third 3D point cloud includes a dense 3D point cloud of the target object, and the dense 3D point cloud is used to display the target object.
The device for multi-view three-dimensional reconstruction according to claim 9, wherein the first determination module is specifically configured to determine the 3D spherical model of the target object according to the pose information; obtain the 3D spherical model according to the 3D spherical model The 2D circular image of the target object; remove the segmentation mask of the background object according to the 2D circular image, and determine the segmentation mask of the target object; according to the first three-dimensional point cloud and the target A segmentation mask of the object determines the second 3D point cloud.
The device for multi-view three-dimensional reconstruction according to claim 10, wherein the first determination module is specifically configured to use the least squares method to fit the center and the center of the 3D spherical model according to the camera pose information. radius.
The device for multi-view 3D reconstruction according to claim 10, wherein the first determining module is specifically configured to determine the target according to the overlap between the 2D circular image and the first instance mask The segmentation mask of the object includes: when the partial mask included in the 2D circular image and the mask of the first instance overlaps, determining that the overlapping partial mask is the segmentation mask of the corresponding target object, and moving Remove the non-overlapping masks; or when the partial masks included in the 2D circular image and the first instance mask do not overlap, remove the non-overlapping partial masks, and determine the remaining segmentation The mask is the segmentation mask of the target object to which it belongs, and the non-overlapping partial mask is the segmentation mask of the background object.
The device for multi-view 3D reconstruction according to claim 10, wherein the first determination module is specifically configured to project the first 3D point cloud to a 2D view image, and according to the 2D view image and the The segmentation mask of the target object overlaps to determine the second 3D point cloud, including: the 2D view image of the partial point cloud of the first 3D point cloud overlaps with the segmentation mask of the target object, then determining the The overlapping part of the first 3D point cloud is the second 3D point cloud, and the remaining point clouds in the first 3D point cloud that do not overlap with the segmentation mask of the target object are removed.
The device for multi-view 3D reconstruction according to claim 9, wherein the second determining module is specifically configured to obtain a 2D convex hull area according to the second 3D point cloud and the pose information, and the 2D The convex hull area includes a 2D point set within the outer contour of the target object, and the 2D point set includes 2D projection points of the second three-dimensional point cloud; performing edge detection on the 2D convex hull area to obtain 2D extension points set, the edge detection is used to remove the sparse point cloud of the background object included in the 2D convex hull area according to the edge point set, the 2D extended point set includes the 2D point set and the edge point set ; Obtain the 2D view area of the target object according to the 2D extension point set.
The device for multi-view three-dimensional reconstruction according to any one of claims 9-14, characterized in that, the parameter information when the image acquisition device shoots around the target object includes the image acquisition device relative to the The degree of freedom parameter when the target object is moving.
The device for multi-view three-dimensional reconstruction according to any one of claims 9-15, characterized in that: the image acquisition device includes multiple.
An electronic device, characterized in that it comprises:

A processor and a memory, wherein the memory is used to store program instructions, and the processor is used to call the program instructions to execute the method according to any one of claims 1 to 8.
A computer-readable storage medium, characterized in that the computer-readable medium stores program code for execution by a device, and the program code is included for executing the method according to any one of claims 1 to 8.
A chip, characterized in that the chip includes a processor and a data interface, and the processor reads the instructions stored on the memory through the data interface to execute any one of claims 1 to 8. Methods.