WO2019161813A1

WO2019161813A1 - Dynamic scene three-dimensional reconstruction method, apparatus and system, server, and medium

Info

Publication number: WO2019161813A1
Application number: PCT/CN2019/083816
Authority: WO
Inventors: 方璐; 刘烨斌; 许岚; 程巍; 戴琼海
Original assignee: 清华-伯克利深圳学院筹备办公室
Priority date: 2018-02-23
Filing date: 2019-04-23
Publication date: 2019-08-29
Also published as: CN108335353A; US20210074012A1; CN108335353B; US11954870B2

Abstract

Disclosed are a dynamic scene three-dimensional reconstruction method, apparatus and system, a server, and a medium. The method comprises: obtaining a plurality of continuous depth image sequences of a dynamic scene, the plurality of continuous depth image sequences being photographed by an unmanned aerial vehicle array equipped with depth cameras; combining the plurality of continuous depth image sequences, and establishing a three-dimensional reconstruction model of the dynamic scene; calculating a target observation point of the unmanned aerial vehicle array according to the three-dimensional reconstruction model and a current position of the unmanned aerial vehicle array; instructing the unmanned aerial vehicle array to move to the target observation point for photographing, and updating the three-dimensional reconstruction model according to a plurality of continuous depth image sequences photographed by the unmanned aerial vehicle array at the target observation point.

Description

3D reconstruction method for dynamic scene, device and system, server, medium

The present application claims the priority of the Chinese Patent Application No. 201101155616.4, filed on Jan. 23, 2011, the entire disclosure of which is hereby incorporated by reference.

Technical field

Embodiments of the present invention relate to the field of computer vision technology, for example, to a three-dimensional reconstruction method of a dynamic scene, an apparatus and system, a server, and a medium.

Background technique

With the gradual popularization of consumer-grade depth cameras, especially the latest Iphone X has a built-in depth camera based on structured light, which makes virtual reality and mixed reality applications based on dynamic 3D reconstruction possible, and has broad application prospects. Important application value.

Related dynamic scene 3D reconstruction methods often rely on expensive laser scanners, although the accuracy is high, but the scanning process relies on additional wearable sensors, which compromises the comfort of the collector. In addition, the camera array system can also be used to realize dynamic scene 3D reconstruction. However, this method is limited by the fixed camera array, the shooting space is very limited, and additional human resources are needed to control the camera and select the shooting viewpoint. The rebuild process cannot be fully automated.

Summary of the invention

Embodiments of the present invention provide a three-dimensional reconstruction method for a dynamic scene, a device and a system, a server, and a medium, to overcome the defect of how to automatically complete the three-dimensional reconstruction of the dynamic scene without affecting the comfort of the collector and being restricted by the shooting space.

In a first aspect, an embodiment of the present invention provides a method for three-dimensional reconstruction of a dynamic scene, where the method includes:

Acquiring a plurality of consecutive depth image sequences of the dynamic scene, wherein the plurality of consecutive depth image sequences are captured by a drone array equipped with a depth camera;

Performing fusion on the plurality of consecutive depth image sequences to establish a three-dimensional reconstruction model of the dynamic scene;

Calculating a target observation point of the UAV array according to the three-dimensional reconstruction model and the current pose of the UAV array;

The UAV array is instructed to move to the target observation point for shooting, and the three-dimensional reconstruction model is updated according to a plurality of consecutive depth image sequences captured by the UAV array at the target observation point.

In a second aspect, the embodiment of the present invention further provides a three-dimensional reconstruction device for a dynamic scene, the device comprising:

An image sequence acquisition module, configured to acquire a plurality of consecutive depth image sequences of the dynamic scene, wherein the plurality of consecutive depth image sequences are captured by a drone array equipped with a depth camera;

An image fusion module is configured to fuse the plurality of consecutive depth image sequences to establish a three-dimensional reconstruction model of the dynamic scene;

a target observation point calculation module, configured to calculate a target observation point of the UAV array according to the three-dimensional reconstruction model and the current pose of the UAV array;

And a reconstruction model updating module is configured to instruct the UAV array to move to the target observation point for shooting, and update the three-dimensional reconstruction model according to a plurality of consecutive depth image sequences captured by the UAV array at the target observation point.

In a third aspect, an embodiment of the present invention further provides a three-dimensional reconstruction system for a dynamic scene, where the system includes an unmanned aerial vehicle array and a three-dimensional reconstruction platform;

Wherein each drone in the UAV array is equipped with a depth camera, which is set to capture a depth image sequence of the dynamic scene;

The three-dimensional reconstruction platform includes a three-dimensional reconstruction device for a dynamic scene according to any one of the embodiments of the present application, and is configured to generate a three-dimensional reconstruction model of the dynamic scene according to a plurality of consecutive depth image sequences captured by the UAV array.

In a fourth aspect, the embodiment of the present invention further provides a server, including:

One or more processors;

a storage device configured to store one or more programs,

When the one or more programs are executed by the one or more processors, the one or more processors implement a three-dimensional reconstruction method of a dynamic scene as described in any of the embodiments of the present application.

In a fifth aspect, the embodiment of the present invention further provides a computer readable storage medium, where the computer program is stored, and the program is executed by the processor to implement a three-dimensional reconstruction method of a dynamic scene according to any embodiment of the present application. .

BRIEF abstract

1 is a flowchart of a method for reconstructing a dynamic scene according to Embodiment 1 of the present invention;

2 is a flowchart of a method for reconstructing a dynamic scene according to Embodiment 2 of the present invention;

3 is a flowchart of a method for reconstructing a dynamic scene according to Embodiment 3 of the present invention;

4 is a schematic structural diagram of a three-dimensional reconstruction apparatus for a dynamic scene according to Embodiment 4 of the present invention;

5 is a schematic structural diagram of a three-dimensional reconstruction system for a dynamic scene according to Embodiment 5 of the present invention;

FIG. 6 is a schematic structural diagram of a server according to Embodiment 6 of the present invention.

Detailed ways

The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting. In addition, it should be noted that, for the convenience of description, only some but not all of the structures related to the present application are shown in the drawings.

Embodiment 1

1 is a flowchart of a method for three-dimensional reconstruction of a dynamic scene according to Embodiment 1 of the present invention. The present embodiment is applicable to a situation in which a dynamic scene is three-dimensionally reconstructed, such as a scene in which a dancer dances on a stage. The method can be performed by a three-dimensional reconstruction device of a dynamic scene, the device can be implemented in software and/or hardware, and can be integrated in a server. As shown in FIG. 1, the method may include, for example, steps S110 to S140.

In step S110, a plurality of consecutive depth image sequences of the dynamic scene are acquired.

The plurality of consecutive depth image sequences are captured by a drone array equipped with a depth camera.

The UAS array may include multiple UAVs, for example, 3 or 5 U.S. units, for example, may be configured according to the actual needs of the dynamic scenario, which is not limited in this embodiment of the present invention. Different drones in the UAV array can simultaneously capture dynamic scenes at different viewpoints to obtain depth image sequences of dynamic scenes from different angles for better 3D reconstruction.

Each drone is equipped with a depth camera for taking deep images of dynamic scenes. A depth image is an image or image channel that contains information about the distance from the surface of the scene object of the viewpoint. Each pixel value in the depth image is the actual distance of the camera from the object, by which a three-dimensional model can be constructed. The shooting by the drone array equipped with the depth camera is not restricted by the shooting space as in the fixed camera array of the related art, and the drone array can be controlled to automatically take the image.

Initially, the drone array can be controlled to be located at the initial position above the dynamic scene and simultaneously photographed, because the dynamic scene is three-dimensionally reconstructed, and the position or posture of the person or scene in the dynamic scene changes in real time, so each The unmanned person continues to shoot and sends the captured continuous depth image sequence to the 3D reconstruction device for processing in real time. The continuous depth image sequence refers to a sequence of depth images continuously captured in chronological order. Generally, the depth camera can continuously capture 30 frames of images per second, and each image is arranged in chronological order to obtain a sequence of images.

In an embodiment, the acquiring a plurality of consecutive depth image sequences of the dynamic scene includes:

Obtaining a plurality of original depth image sequences of the dynamic scene captured by the UAV array;

Aligning the plurality of original depth image sequences according to a synchronization time stamp to obtain the plurality of consecutive depth image sequences.

Wherein, the plurality of original depth image sequences are original image sequences captured by different drones, and although the UAV array is simultaneously photographed, there is a certain error in time, therefore, it is required These original depth image sequences are aligned according to the synchronization time stamp, thereby ensuring temporal consistency of different depth image sequences captured by different drones from different perspectives, thereby improving the accuracy of the reconstruction model.

In step S120, the plurality of consecutive depth image sequences are fused to establish a three-dimensional reconstruction model of the dynamic scene.

In an embodiment, the continuous depth image sequence may be projected into the three-dimensional space according to the internal reference matrix of the depth camera to obtain a three-dimensional point cloud, and then the three-dimensional point cloud is registered and fused, and finally a three-dimensional reconstruction model is established. When implemented, it can be performed by using the registration and fusion algorithms in the related art, and will not be repeated here.

In step S130, the target observation point of the UAV array is determined according to the three-dimensional reconstruction model and the current pose of the UAV array.

Since the position or posture of a person or a scene in a dynamic scene changes in real time, how to make the drone array conform to the change of the dynamic scene at the most suitable position is an important issue that determines the effect of reconstructing the model. Therefore, in the embodiment of the present invention, the target observation point of the UAV array at the next moment is calculated in real time, and the target observation point is the optimal observation point, thereby indicating that the UAV array moves to the target observation in real time. The point is taken to update the reconstructed model to achieve accurate reproduction of the dynamic scene.

Wherein, the pose can be represented by a rotation angle and a translation distance of the drone, and correspondingly, the control of the drone can also include two parameters of rotation and translation. After the target observation point is determined, the drone can be controlled to move to the optimal target observation point, and the rotation angle of the drone from the current observation point to the target observation point is controlled, that is, the optimal shooting of the drone is controlled. Viewpoint. The target observation point can be determined according to a preset standard, and the shooting points that may exist in the drone are evaluated, and the shooting point whose evaluation result meets the standard is determined as the best target observation point. Wherein, the possible shooting points may be determined according to the current pose of the UAV array, and the evaluation process may be performed according to the possible existing observation points and the established three-dimensional reconstruction model, and the evaluation may exist in different possible In the observation point, the effect of the three-dimensional reconstruction model established by the depth image sequence captured by which observation point conforms to a preset criterion, for example, by calculating the energy function of the observation point that may exist.

In step S140, the drone array is instructed to move to the target observation point for shooting, and the three-dimensional reconstruction model is updated according to a plurality of consecutive depth image sequences captured by the UAV array at the target observation point.

The technical solution of the embodiment uses the UAV array to capture the dynamic scene, and performs image fusion according to the captured multiple consecutive depth image sequences to obtain a three-dimensional reconstruction model of the dynamic scene. Therefore, it is not necessary to rely on additional equipment to ensure the The comfort of the collector. Moreover, in the reconstruction process, the calculation of the target observation point in real time indicates that the UAV array moves to the target observation point for shooting, and the model is updated according to a plurality of consecutive depth image sequences captured by the UAV array at the target observation point, thereby A more accurate 3D reconstruction model is obtained, and the reconstruction process can be completed automatically without being restricted by the shooting space.

Embodiment 2

FIG. 2 is a flowchart of a three-dimensional reconstruction method of a dynamic scene according to Embodiment 2 of the present invention. As shown in FIG. 2, the method may include, for example, steps S210 to S270.

In step S210, a plurality of consecutive depth image sequences of the dynamic scene are acquired.

In step S220, the plurality of consecutive depth image sequences are fused, and the key frame reconstructed body is determined according to a preset period.

In step S230, in each preset period, determining a deformation parameter of the non-rigid deformation node in the current key frame reconstruction body, and updating the reconstruction model in the current key frame reconstruction body to the current data frame reconstruction according to the deformation parameter In the body.

The current data frame reconstructed body refers to a reconstructed body in real time at each moment.

In step S240, a three-dimensional reconstruction model of the dynamic scene is extracted from the current data frame reconstructed body.

In step S250, the current key frame reconstructed body is replaced with the current data frame reconstructed body as the key frame reconstructed body in the next preset period.

Wherein, in step S220-step S250, the depth image is actually merged by using a key frame strategy, and a real-time reconstruction model of the dynamic scene is obtained.

In an embodiment, an initial three-dimensional reconstruction model may be established by image fusion according to a plurality of consecutive depth image sequences, and then a key frame reconstruction body of the three-dimensional reconstruction model is determined according to a preset period, for example, 100 frames, and each preset is During the period, the operations of steps S230 to S250 are performed. The non-rigid deformation node may represent a node in which a character or a scene changes in a dynamic scene, and the reconstruction model in the current key frame reconstruction body is updated to the current data frame reconstruction body by determining the deformation parameter of the non-rigid deformation node, and then from the current The 3D reconstruction model is extracted from the data frame reconstruction body, so that the details of the dynamic scene can be captured, the accuracy of the reconstruction model can be improved, errors and confusion can be avoided, and the jam can be avoided. Finally, the current data frame reconstruction body is used to replace the current key frame reconstruction body as the key frame reconstruction body in the next preset period, thereby implementing the iteration of the current data frame reconstruction body and the key frame reconstruction body, thereby realizing each of the dynamic scenes. A recurrence of a dynamically changing scene.

Among them, the reconstructed body can be understood as a hypothesis in the process of 3D reconstruction. It is assumed that the reconstructed body can surround the entire dynamic scene (or reconstructed object). The reconstructed body is composed of many uniform voxels, through registration, fusion and other algorithms. A three-dimensional reconstruction model of the dynamic scene is built on top of this reconstructed body. The nodes in the reconstructed body and the deformation parameters of the nodes represent the characteristics of the dynamic scene. Therefore, the three-dimensional reconstruction model of the dynamic scene can be extracted from the reconstructed body. In this embodiment, by integrating the depth image by the key frame strategy described above, errors caused by data fusion when the point cloud registration is inaccurate can be avoided.

Wherein, the deformation parameter includes rotation and translation parameters of each deformation node, and the calculation process thereof can be obtained, for example, by solving an energy equation of a non-rigid motion constraint, the energy equation being composed of a non-rigid motion constraint term and a local rigid motion constraint term. They are represented by the following formulas:

Wherein, in the non-rigid motion constraint terms in E _n,

with

Representing the vertex coordinates of the model and its normal direction driven by the non-rigid motion, respectively, u _i represents the position coordinates of the 3D point cloud in the same matching point pair, and c _i represents the i-th element in the set of matching point pairs.

with

Represents the vertex coordinates of the model and its normal direction driven by non-rigid motion. Non-rigid motion constraints guarantee term E _n through the non-rigid motion model driven and aligned as the three-dimensional point cloud obtained from the depth map.

In the local non-rigid motion constraint term E _g , i represents the i-th vertex on the model,

Represents a collection of adjacent vertices around the ith vertex on the model,

with

Representing the driving effects of known non-rigid motion on the surface vertices v _i and v _j of the model, respectively.

with

It represents the positional transformation effect of the non-rigid motion acting on v _i and v _j simultaneously on v _j , that is, to ensure that the non-rigid driving effects of adjacent vertices on the model are as uniform as possible.

The non-rigid motion constraint term E _n ensures that the model driven by the non-rigid motion is aligned with the three-dimensional point cloud obtained from the depth image as much as possible, and the local rigid motion constraint E _g can be guaranteed while the model as a whole is subjected to the local rigid constraint motion. A large amount of reasonable non-rigid motion can also be well solved. In order to make use of the exponential mapping method, the deformed vertices are approximated as follows:

among them,

The cumulative transformation matrix of the model vertex v _i up to the previous frame is a known quantity; I is a four-dimensional unit matrix;

make

That is, the model vertices after the previous frame transformation are transformed:

For each vertex, the unknown parameter that requires the solution is the six-dimensional transformation parameter x = (v ₁ , v ₂ , v ₃ , w _x , w _y , w _z ) ^T .

In step S260, the target observation point of the UAV array is determined according to the three-dimensional reconstruction model and the current pose of the UAV array.

In step S270, the drone array is instructed to move to the target observation point for shooting, and the three-dimensional reconstruction model is updated according to a plurality of consecutive depth image sequences captured by the drone array at the target observation point.

The technical solution of the embodiment utilizes an unmanned aerial vehicle array to capture a dynamic scene, performs image fusion according to a plurality of consecutive continuous depth image sequences, and adopts a key frame strategy to finally obtain a three-dimensional reconstruction model of the dynamic scene, which is not subject to shooting. On the basis of space constraints and automatic completion of the reconstruction process, the accuracy of the reconstruction model is improved, and errors caused by inaccurate registration of point clouds are avoided. At the same time, there is no need to rely on additional equipment to ensure the comfort of the collector.

Embodiment 3

FIG. 3 is a flowchart of a three-dimensional reconstruction method of a dynamic scene according to Embodiment 3 of the present invention. In this embodiment, the calculation operation of the target observation point is further optimized based on the foregoing embodiment. As shown in FIG. 3, the method may include, for example, steps S310 to S360.

In step S310, a plurality of consecutive depth image sequences of the dynamic scene are acquired.

In step S320, the plurality of consecutive depth image sequences are fused to establish a three-dimensional reconstruction model of the dynamic scene.

In step S330, according to the current pose of the UAV array, the spatial neighborhood is rasterized to establish a set of candidate observation points.

The current pose of the UAV array characterizes the current observation point of each UAV in the UAV array, including the current coordinates and shooting angle. The spatial neighborhood range is delineated according to the current pose and the preset distance, and the candidate observation points are determined by rasterizing the spatial neighborhood, that is, each node represents a candidate observation point after rasterization.

In step S340, the total energy value of each of the candidate observation points in the set of candidate observation points is determined using the validity energy function.

In step S350, candidate observation points whose total energy values meet the preset criteria are taken as the target observation points.

In an embodiment, the validity energy function includes a depth energy term, a central energy term, and a motion energy term;

Wherein, the depth energy term is used to determine the extent to which the average depth value of the candidate observation point is close to the target depth value;

The central energy term is used to determine how close the reconstructed model observed by the candidate observation point is to the central portion of the captured image frame;

The kinetic energy term is used to determine the amount of motion occurring in the dynamic scene observed by the candidate observation point.

In an embodiment, the validity energy function is represented by the following formula:

E _t =λ _d E _d +λ _c E _c +λ _m E _m

Where E _t is the total energy term, Ed is the depth energy term, Ec is the central energy term, Em is the motion energy term, and λ _d , λ _c and λ _m correspond to the depth energy term, the central energy term and the motion energy term, respectively. Weight coefficient;

The depth energy term, the center energy term, and the motion energy term are respectively represented by the following formula:

E _d =ψ(d _avg -d _o )

Where T _c and T _V are the poses of the UAV array and the candidate observation points in the reconstruction model respectively; t _v is the translation component of the pose of the candidate observation point; x _n is the voxel of the reconstruction model hit by the ray N _x is the normal of the voxel; x _i is the node where the reconstructed model is non-rigid deformed; x' _i is the node after the non-rigid deformation; π() is the projection perspective transform from the three-dimensional space to the two-dimensional image plane d _avg and d _o represent the average depth value and the target depth value of the candidate observation point respectively; the ψ() function represents the penalty term for the distance; r is the light of the candidate observation point projected through the reconstructed model; du and dv represent respectively The average projected pixel abscissa and ordinate of the reconstructed model at the candidate observation points; λ is the damping factor; φ1 and φ2 are used to calculate the motion information of all the rays of the candidate observation points and the motion information of all the observed deformation nodes, respectively.

Through the weighted summation of the depth energy term, the central energy term and the motion energy term, the candidate observation points can be comprehensively evaluated to determine the effect of the 3D reconstruction model established by the depth image sequence captured by the UAV at which observation point. The standard is to comprehensively consider the average depth, average center degree and cumulative motion information of the depth image collected at the candidate observation points, so that the depth image acquired at the target observation point is more favorable for the current dynamic scene reconstruction. In an embodiment, candidate observation points with the largest total energy value may be selected as the optimal target observation points.

In step S360, the drone array is instructed to move to the target observation point for shooting, and the three-dimensional reconstruction model is updated according to a plurality of consecutive depth image sequences captured by the drone array at the target observation point.

The technical solution of the embodiment uses a UAV array to capture a dynamic scene, performs image fusion according to the captured multiple consecutive depth image sequences, obtains a three-dimensional reconstruction model of the dynamic scene, and performs candidate observation points through a validity energy function. Calculate and evaluate, determine the optimal target observation point, and instruct the drone array to move to the target observation point for shooting, thereby not only achieving automatic shooting and reconstruction, but also improving the reconstruction effect of the three-dimensional model, and is simple and easy Line, has broad application prospects.

Embodiment 4

4 is a schematic structural diagram of a three-dimensional reconstruction apparatus for a dynamic scene according to Embodiment 4 of the present invention. This embodiment can be applied to a case where a dynamic scene is three-dimensionally reconstructed, such as a scene in which a dancer dances on a stage. The three-dimensional reconstruction device of the dynamic scene provided by the embodiment of the present invention can perform the three-dimensional reconstruction method of the dynamic scene provided by any embodiment of the present application, and has the corresponding functional modules and beneficial effects of the execution method. As shown in FIG. 4, the apparatus includes an image sequence acquisition module 410, an image fusion module 420, a target observation point calculation module 430, and a reconstruction model update module 440.

The image sequence acquisition module 410 is configured to acquire a plurality of consecutive depth image sequences of the dynamic scene, wherein the plurality of consecutive depth image sequences are captured by a drone array equipped with a depth camera.

The image fusion module 420 is configured to fuse the plurality of consecutive depth image sequences to establish a three-dimensional reconstruction model of the dynamic scene.

The target observation point calculation module 430 is configured to determine a target observation point of the UAV array according to the three-dimensional reconstruction model and the current pose of the UAV array.

The reconstruction model update module 440 is configured to instruct the UAV array to move to the target observation point for shooting, and update the three-dimensional reconstruction model according to a plurality of consecutive depth image sequences captured by the UAV array at the target observation point.

In an embodiment, the image sequence acquisition module 410 includes:

An original image sequence acquiring unit, configured to acquire a plurality of original depth image sequences of the dynamic scene captured by the UAV array;

And an image sequence alignment unit configured to align the plurality of original depth image sequences according to a synchronization time stamp to obtain the plurality of consecutive depth image sequences.

In an embodiment, the image fusion module 420 is further configured to:

The plurality of consecutive depth image sequences are fused, and the key frame reconstructed body is determined according to a preset period, and in each preset period, the following operations are performed:

Determining a deformation parameter of the non-rigid deformation node in the current key frame reconstruction body, and updating the reconstruction model in the current key frame reconstruction body to the current data frame reconstruction body according to the deformation parameter, wherein the current data frame reconstruction body is Refers to the real-time reconstructed body at each moment;

Extracting a three-dimensional reconstruction model of the dynamic scene from a current data frame reconstructed body;

The current data frame reconstructed body is replaced with the current data frame reconstructed body as a key frame reconstructed body in the next preset period.

In an embodiment, the target observation point calculation module 430 includes:

The candidate observation point establishing unit is configured to rasterize the spatial neighborhood according to the current pose of the UAV array to establish a set of candidate observation points;

An energy value calculation unit configured to determine a total energy value of each of the candidate observation points in the set of candidate observation points by using a validity energy function;

The target observation point determining unit is configured to select a candidate observation point whose total energy value conforms to a preset criterion as the target observation point.

E _t =λ _d E _d +λ _c E _c +λ _m E _m

E _d =ψ(d _avg -d _o )

Embodiment 5

FIG. 5 is a schematic structural diagram of a three-dimensional reconstruction system for a dynamic scene according to Embodiment 5 of the present invention. As shown in FIG. 5, the UAV array 1 and the three-dimensional reconstruction platform 2 are included.

Among them, each drone in the UAV array 1 is equipped with a depth camera, which is set to capture a depth image sequence of a dynamic scene. Exemplarily, FIG. 5 shows that the UAV array 1 includes three UAVs, namely the UAV 11, the UAV 12 and the UAV 13, but the embodiment of the present invention is in the UAV array. The number of drones is not limited, and can be configured according to the actual situation of the dynamic scene to be reconstructed.

The three-dimensional reconstruction platform 2 includes the three-dimensional reconstruction device 21 of the dynamic scene described in any of the above embodiments, and is configured to generate a three-dimensional reconstruction model of the dynamic scene according to the plurality of consecutive depth image sequences captured by the UAV array.

In an embodiment, the three-dimensional reconstruction platform 2 further includes a wireless communication module 22, and is wirelessly connected to the UAV array 1 and configured to receive a plurality of consecutive depth image sequences captured by the UAV array, and is further configured to set the three-dimensional reconstruction device 22 The calculated position information of the target observation point is sent to the drone array 1.

Correspondingly, each of the UAV arrays 1 further includes a navigation module configured to control the drone to move to the target observation point to capture the dynamic scene according to the position information.

Embodiment 6

FIG. 6 is a schematic structural diagram of a server according to Embodiment 6 of the present invention. FIG. 6 shows a block diagram of an exemplary server 612 suitable for use in implementing embodiments of the present invention. The server 612 shown in FIG. 6 is merely an example and should not impose any limitation on the function and scope of use of the embodiments of the present invention.

As shown in Figure 6, server 612 is represented in the form of a general purpose server. Components of server 612 may include, but are not limited to, one or more processors 616, storage 628, and bus 618 that connect different system components, including storage 628 and processor 616.

Bus 618 represents one or more of several types of bus structures, including a memory device bus or memory device controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures. For example, these architectures include, but are not limited to, the Industry Subversive Alliance (ISA) bus, the Micro Channel Architecture (MAC) bus, the Enhanced ISA Bus, and the Video Electronics Standards Association. Association, VESA) Local Bus and Peripheral Component Interconnect (PCI) bus.

Server 612 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by server 612, including volatile and non-volatile media, removable and non-removable media.

Storage device 628 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) 630 and/or cache memory 632. Server 612 may further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only, storage system 634 can be used to read and write non-removable, non-volatile magnetic media (not shown in Figure 6, commonly referred to as "hard disk drives"). Although not shown in FIG. 6, a disk drive for reading and writing to a removable non-volatile disk (for example, a "floppy disk"), and a removable non-volatile disk such as a read-only disk (Compact Disc Read) may be provided. -Only Memory, CD-ROM), Digital Video Disc-Read Only Memory (DVD-ROM) or other optical media. In these cases, each drive can be coupled to bus 618 via one or more data medium interfaces. Storage device 628 can include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of the various embodiments of the present application.

A program/utility 640 having a set (at least one) of program modules 642, which may be stored, for example, in storage device 628, such program program 642 includes but is not limited to an operating system, one or more applications, other program modules, and programs Data, each of these examples or some combination may include an implementation of a network environment. Program module 642 typically performs the functions and/or methods of the embodiments described herein.

Server 612 may also be in communication with one or more external devices 614 (eg, a keyboard, pointing device, display 624, etc.), and may also be in communication with one or more devices that enable a user to interact with the server 612, and/or Server 612 can communicate with any device (e.g., network card, modem, etc.) that is in communication with one or more other computing devices. This communication can take place via an input/output (I/O) interface 622. Moreover, the server 612 can also communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 620. As shown in FIG. 6, network adapter 620 communicates with other modules of server 612 via bus 618. It should be understood that although not shown in the figures, other hardware and/or software modules may be utilized in connection with server 612, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, disk arrays (Redundant Arrays of Independent Disks, RAID) systems, tape drives, and data backup storage systems.

The processor 616 performs a three-dimensional reconstruction method of the dynamic scene provided by the embodiment of the present invention by executing a program stored in the storage device 628, and the method includes:

Determining a target observation point of the UAV array according to the three-dimensional reconstruction model and the current pose of the UAV array;

Example 7

The seventh embodiment of the present invention further provides a computer readable storage medium, where the computer program is stored, and the program is executed by the processor to implement a three-dimensional reconstruction method of a dynamic scene according to the embodiment of the present invention, including:

The computer storage medium of the embodiments of the present invention may employ any combination of one or more computer readable mediums. The computer readable medium can be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples (non-exhaustive lists) of computer readable storage media include: electrical connections having one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read only memory (ROM), Erasable Programmable Read Only Memory (EPROM), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing . In this document, a computer readable storage medium can be any tangible medium that can contain or store a program, which can be used by or in connection with an instruction execution system, apparatus or device.

The computer readable signal medium may comprise a data signal that is propagated in the baseband or as part of a carrier, carrying computer readable program code. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer readable signal medium can also be any computer readable medium other than a computer readable storage medium, which can transmit, propagate, or transport a program for use by or in connection with the instruction execution system, apparatus, or device. .

Program code embodied on a computer readable medium can be transmitted by any suitable medium, including but not limited to wireless, wire, optical cable, radio frequency (RF), and the like, or any suitable combination of the foregoing.

Computer program code for performing the operations of the present application may be written in one or more programming languages, or a combination thereof, including an object oriented programming language such as Java, Smalltalk, C++, and conventional Procedural programming language—such as the "C" language or a similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer, partly on the remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or can be connected to an external computer (eg, using an Internet service provider) Internet connection).

Claims

A three-dimensional reconstruction method for a dynamic scene, comprising:

Acquiring a plurality of consecutive depth image sequences of the dynamic scene, wherein the plurality of consecutive depth image sequences are captured by a drone array equipped with a depth camera;

Performing fusion on the plurality of consecutive depth image sequences to establish a three-dimensional reconstruction model of the dynamic scene;

Determining a target observation point of the UAV array according to the three-dimensional reconstruction model and the current pose of the UAV array;

The UAV array is instructed to move to the target observation point for shooting, and the three-dimensional reconstruction model is updated according to a plurality of consecutive depth image sequences captured by the UAV array at the target observation point.
The method of claim 1, wherein the acquiring a plurality of consecutive depth image sequences of the dynamic scene comprises:

Obtaining a plurality of original depth image sequences of the dynamic scene captured by the UAV array;

Aligning the plurality of original depth image sequences according to a synchronization time stamp to obtain the plurality of consecutive depth image sequences.
The method according to claim 1 or 2, wherein the merging of the plurality of consecutive depth image sequences to establish a three-dimensional reconstruction model of the dynamic scene comprises:

The plurality of consecutive depth image sequences are fused, and the key frame reconstructed body is determined according to a preset period, and in each preset period, the following operations are performed:

Determining a deformation parameter of the non-rigid deformation node in the current key frame reconstruction body, and updating the reconstruction model in the current key frame reconstruction body to the current data frame reconstruction body according to the deformation parameter, wherein the current data frame reconstruction body is Refers to the real-time reconstructed body at each moment;

Extracting a three-dimensional reconstruction model of the dynamic scene from a current data frame reconstructed body;

The current data frame reconstructed body is replaced with the current data frame reconstructed body as a key frame reconstructed body in the next preset period.
The method according to claim 1, wherein the target observation points of the UAV array are determined according to the three-dimensional reconstruction model and the current pose of the UAV array, including:

According to the current pose of the UAV array, the spatial neighborhood of the UAV array is rasterized to establish a set of candidate observation points;

Determining a total energy value of each candidate observation point in the set of candidate observation points by using a validity energy function;

A candidate observation point whose total energy value conforms to a preset criterion is taken as the target observation point.
The method of claim 4 wherein said effectiveness energy function comprises a depth energy term, a central energy term, and a kinematic energy term;

Wherein, the depth energy term is used to determine the extent to which the average depth value of the candidate observation point is close to the target depth value;

The central energy term is used to determine how close the reconstructed model observed by the candidate observation point is to the central portion of the captured image frame;

The kinetic energy term is used to determine the amount of motion occurring in the dynamic scene observed by the candidate observation point.
The method of claim 5 wherein said validity energy function is represented by the following formula:

E t =λ d E d +λ c E c +λ m E m

Where E t is the total energy term, Ed is the depth energy term, Ec is the central energy term, Em is the motion energy term, and λ d , λ c and λ m correspond to the depth energy term, the central energy term and the motion energy term, respectively. Weight coefficient;

The depth energy term, the center energy term, and the motion energy term are respectively represented by the following formula:

E d =ψ(d avg -d o )

Where T c and T V are the poses of the UAV array and the candidate observation points in the reconstruction model respectively; t v is the translation component of the pose of the candidate observation point; x n is the voxel of the reconstruction model hit by the ray N x is the normal of the voxel; x i is the node where the reconstructed model is non-rigid deformed; x' i is the node after the non-rigid deformation; π() is the projection perspective transform from the three-dimensional space to the two-dimensional image plane d avg and d o represent the average depth value and the target depth value of the candidate observation point respectively; the ψ() function represents the penalty term for the distance; r is the light of the candidate observation point projected through the reconstructed model; du and dv represent respectively The average projected pixel abscissa and ordinate of the reconstructed model at the candidate observation points; λ is the damping factor; φ1 and φ2 are used to calculate the motion information of all the rays of the candidate observation points and the motion information of all the observed deformation nodes, respectively.
A three-dimensional reconstruction device for a dynamic scene, comprising:

An image sequence acquisition module, configured to acquire a plurality of consecutive depth image sequences of the dynamic scene, wherein the plurality of consecutive depth image sequences are captured by a drone array equipped with a depth camera;

An image fusion module is configured to fuse the plurality of consecutive depth image sequences to establish a three-dimensional reconstruction model of the dynamic scene;

a target observation point calculation module, configured to determine a target observation point of the UAV array according to the three-dimensional reconstruction model and the current pose of the UAV array;

And a reconstruction model updating module is configured to instruct the UAV array to move to the target observation point for shooting, and update the three-dimensional reconstruction model according to a plurality of consecutive depth image sequences captured by the UAV array at the target observation point.
The apparatus of claim 7, wherein the image sequence acquisition module comprises:

An original image sequence acquiring unit, configured to acquire a plurality of original depth image sequences of the dynamic scene captured by the UAV array;

And an image sequence alignment unit configured to align the plurality of original depth image sequences according to a synchronization time stamp to obtain the plurality of consecutive depth image sequences.
The apparatus according to claim 7 or 8, wherein the image fusion module is further configured to:

The plurality of consecutive depth image sequences are fused, and the key frame reconstructed body is determined according to a preset period, and in each preset period, the following operations are performed:

Determining a deformation parameter of the non-rigid deformation node in the current key frame reconstruction body, and updating the reconstruction model in the current key frame reconstruction body to the current data frame reconstruction body according to the deformation parameter, wherein the current data frame reconstruction body is Refers to the real-time reconstructed body at each moment;

Extracting a three-dimensional reconstruction model of the dynamic scene from a current data frame reconstructed body;

The current data frame reconstructed body is replaced with the current data frame reconstructed body as a key frame reconstructed body in the next preset period.
The apparatus of claim 7, wherein the target observation point calculation module comprises:

The candidate observation point establishing unit is configured to rasterize the spatial neighborhood according to the current pose of the UAV array to establish a set of candidate observation points;

An energy value calculation unit configured to determine a total energy value of each of the candidate observation points in the set of candidate observation points by using a validity energy function;

The target observation point determining unit is configured to select a candidate observation point whose total energy value conforms to a preset criterion as the target observation point.
The apparatus of claim 10, wherein the validity energy function comprises a depth energy term, a center energy term, and a motion energy term;

Wherein, the depth energy term is used to determine the extent to which the average depth value of the candidate observation point is close to the target depth value;

The central energy term is used to determine how close the reconstructed model observed by the candidate observation point is to the central portion of the captured image frame;

The kinetic energy term is used to determine the amount of motion occurring in the dynamic scene observed by the candidate observation point.
The apparatus of claim 11 wherein said validity energy function is represented by the following formula:

E t =λ d E d +λ c E c +λ m E m

Where E t is the total energy term, Ed is the depth energy term, Ec is the central energy term, Em is the motion energy term, and λ d , λ c and λ m correspond to the depth energy term, the central energy term and the motion energy term, respectively. Weight coefficient;

The depth energy term, the center energy term, and the motion energy term are respectively represented by the following formula:

E d =ψ(d avg -d o )

Where T c and T V are the poses of the UAV array and the candidate observation points in the reconstruction model respectively; t v is the translation component of the pose of the candidate observation point; x n is the voxel of the reconstruction model hit by the ray N x is the normal of the voxel; x i is the node where the reconstructed model is non-rigid deformed; x' i is the node after the non-rigid deformation; π() is the projection perspective transform from the three-dimensional space to the two-dimensional image plane d avg and d o represent the average depth value and the target depth value of the candidate observation point respectively; the ψ() function represents the penalty term for the distance; r is the light of the candidate observation point projected through the reconstructed model; du and dv represent respectively The average projected pixel abscissa and ordinate of the reconstructed model at the candidate observation points; λ is the damping factor; φ1 and φ2 are used to calculate the motion information of all the rays of the candidate observation points and the motion information of all the observed deformation nodes, respectively.
A three-dimensional reconstruction system for dynamic scenes, including a drone array and a three-dimensional reconstruction platform;

Wherein each of the drone arrays is equipped with a depth camera, and the depth camera is set to capture a depth image sequence of the dynamic scene;

The three-dimensional reconstruction platform includes the three-dimensional reconstruction device of the dynamic scene according to any one of claims 7 to 12, configured to generate a three-dimensional reconstruction model of the dynamic scene according to the plurality of consecutive depth image sequences captured by the drone array.
The system of claim 13 wherein

The three-dimensional reconstruction platform further includes a wireless communication module wirelessly connected to the UAV array, the wireless communication module being configured to receive a plurality of consecutive depth image sequences captured by the UAV array, and further configured to determine the three-dimensional reconstruction device The location information of the target observation point is sent to the drone array;

Each of the UAV arrays further includes a navigation module, and the navigation module is configured to control the drone to move to the target observation point to capture the dynamic scene according to the location information.
A server that includes:

At least one processor;

a storage device configured to store at least one program,

When the at least one program is executed by the at least one processor, the at least one processor implements the three-dimensional reconstruction method of the dynamic scene according to any one of claims 1-6.
A computer readable storage medium having stored thereon a computer program, the program being executed by a processor to implement a three-dimensional reconstruction method of a dynamic scene according to any one of claims 1-6.