CN116310245A

CN116310245A - Method, device, medium, equipment and product for mounting prop on volume video

Info

Publication number: CN116310245A
Application number: CN202211611427.6A
Authority: CN
Inventors: 邵志兢; 张煜; 孙伟; 吕云
Original assignee: Zhuhai Prometheus Vision Technology Co ltd
Current assignee: Zhuhai Prometheus Vision Technology Co ltd
Priority date: 2022-12-14
Filing date: 2022-12-14
Publication date: 2023-06-23

Abstract

The application discloses a method, a device, a medium, equipment and a product for mounting props on a volume video, which relate to the technical field of computers and comprise the following steps: detecting model skeleton points of each frame of three-dimensional model in the volume video, and acquiring a preset object model carrying preset skeleton points, wherein mounting points are calibrated on the preset object model; respectively aligning model skeleton points of a preset object model with model skeleton points of each three-dimensional model to obtain an aligned object model corresponding to each three-dimensional model; performing shape adjustment on the alignment object model corresponding to each three-dimensional model to obtain a target object model matched with the shape of each three-dimensional model; and obtaining a target volume video based on the sequence of the target object models, wherein each frame of target object model in the target volume video is provided with a mounting point on the preset object model, and the mounting point is used for mounting the virtual prop. According to the prop mounting method and device, prop mounting effect can be effectively improved, and user experience is improved.

Description

Method, device, medium, equipment and product for mounting prop on volume video

Technical Field

The application relates to the technical field of computers, in particular to a method, a device, a medium, equipment and a product for mounting props on a volume video.

Background

The volume video is a model sequence of continuous three-dimensional models, and when the volume video is applied to virtual scenes, game engines and the like, virtual props often need to be mounted on the model sequence. For example, the head of each three-dimensional model in the sequence of models mounts eyeglasses, mounts objects on the hands and mounts accessories on the body, and so on. Because the model is a continuous three-dimensional model, the determination of the mounting points is different from the two-dimensional picture, three-dimensional coordinates need to be obtained, and the three-dimensional coordinates need to track the change of the content of the three-dimensional model in the whole model sequence, for example, the hand needs to be tracked on a plurality of continuous three-dimensional models to form the mounting points for mounting props.

At present, in a related scheme, complex fitting calculation is generally performed on a plurality of continuous three-dimensional models to track mounting points, so that the three-dimensional models have high requirements, tracking errors are easy to occur, and the prop mounting effect is poor, so that the user experience is poor.

Disclosure of Invention

The embodiment of the application provides a scheme, which can improve the mounting effect of virtual props on the volume video and improve the user experience.

In order to solve the technical problems, the embodiment of the application provides the following technical scheme:

according to one embodiment of the application, a method of mounting props on a volumetric video, the method comprising: detecting model skeleton points of each frame of three-dimensional model in the volume video, and acquiring a preset object model carrying preset skeleton points, wherein mounting points are calibrated on the preset object model; respectively aligning model skeleton points of the preset object models with model skeleton points of each three-dimensional model to obtain aligned object models corresponding to each three-dimensional model; performing shape adjustment on the alignment object model corresponding to each three-dimensional model to obtain a target object model matched with the shape of each three-dimensional model; and obtaining a target volume video based on the sequence of the target object models, wherein each frame of target object model in the target volume video is provided with a mounting point on the preset object model, and the mounting point is used for mounting the virtual prop.

In some embodiments of the present application, detecting model bone points of each frame of three-dimensional model in the volumetric video includes: acquiring a multi-angle two-dimensional image corresponding to each frame of three-dimensional model in the volume video; detecting image skeleton points on the multi-angle two-dimensional image corresponding to each frame of three-dimensional model; and mapping the image skeleton points on the multi-angle two-dimensional image corresponding to each frame of three-dimensional model to a three-dimensional space through reprojection, so as to obtain the model skeleton points of each frame of three-dimensional model.

In some embodiments of the present application, detecting model bone points of each frame of three-dimensional model in the volumetric video includes: and respectively inputting each frame of three-dimensional model in the volume video into a bone point detection neural network for detection processing to obtain model bone points of each frame of three-dimensional model.

In some embodiments of the present application, the aligning the model skeleton points of the preset object model with the model skeleton points of each three-dimensional model to obtain an aligned object model corresponding to each three-dimensional model includes: calculating relative shape parameters of the preset object model and each three-dimensional model; and respectively aligning model skeleton points of the preset object model to model skeleton points of each three-dimensional model according to the relative shape parameters of the preset object model and each three-dimensional model to obtain aligned object models corresponding to each three-dimensional model.

In some embodiments of the present application, the adjusting the shape of the alignment object model corresponding to each three-dimensional model to obtain a target object model matched with the shape of each three-dimensional model includes: calculating vertex displacement parameters of each three-dimensional model and the corresponding alignment object model; and according to the vertex displacement parameters of the three-dimensional models and the corresponding alignment object models, performing vertex adjustment on the alignment object models corresponding to each three-dimensional model to obtain target object models matched with the shapes of the three-dimensional models.

In some embodiments of the present application, the method further comprises: displaying a mounting point selection interface; and obtaining the preset object model calibrated with the mounting points according to the selection operation in the mounting point selection interface.

In some embodiments of the present application, the method further comprises: detecting a preset part in the preset object model; and generating corresponding mounting points on the preset positions to obtain the preset object model calibrated with the mounting points.

In some embodiments of the present application, the obtaining a target volume video based on the sequence of the target object models includes: and connecting the target object models corresponding to the three-dimensional models in series according to the sequence of the three-dimensional models in the volume video to obtain the target volume video.

According to one embodiment of the present application, an apparatus for mounting props on a volumetric video, the apparatus comprising: the initialization module is used for detecting model skeleton points of each frame of three-dimensional model in the volume video, and acquiring a preset object model carrying preset skeleton points, wherein mounting points are calibrated on the preset object model; the alignment module is used for respectively carrying out alignment treatment on the model skeleton points of the preset object model and the model skeleton points of each three-dimensional model to obtain an alignment object model corresponding to each three-dimensional model; the adjustment module is used for carrying out shape adjustment on the alignment object model corresponding to each three-dimensional model to obtain a target object model matched with the shape of each three-dimensional model; the generation module is used for obtaining a target volume video based on the sequence of the target object models, each frame of target object model in the target volume video is provided with a mounting point on the preset object model, and the mounting point is used for mounting the virtual prop.

In some embodiments of the present application, the initialization module is configured to: acquiring a multi-angle two-dimensional image corresponding to each frame of three-dimensional model in the volume video; detecting image skeleton points on the multi-angle two-dimensional image corresponding to each frame of three-dimensional model; and mapping the image skeleton points on the multi-angle two-dimensional image corresponding to each frame of three-dimensional model to a three-dimensional space through reprojection, so as to obtain the model skeleton points of each frame of three-dimensional model.

In some embodiments of the present application, the initialization module is configured to: and respectively inputting each frame of three-dimensional model in the volume video into a bone point detection neural network for detection processing to obtain model bone points of each frame of three-dimensional model.

In some embodiments of the present application, the alignment module is configured to: calculating relative shape parameters of the preset object model and each three-dimensional model; and respectively aligning model skeleton points of the preset object model to model skeleton points of each three-dimensional model according to the relative shape parameters of the preset object model and each three-dimensional model to obtain aligned object models corresponding to each three-dimensional model.

In some embodiments of the present application, the adjustment module is configured to: calculating vertex displacement parameters of each three-dimensional model and the corresponding alignment object model; and according to the vertex displacement parameters of the three-dimensional models and the corresponding alignment object models, performing vertex adjustment on the alignment object models corresponding to each three-dimensional model to obtain target object models matched with the shapes of the three-dimensional models.

In some embodiments of the present application, the apparatus further comprises a selection operation module for: displaying a mounting point selection interface; and obtaining the preset object model calibrated with the mounting points according to the selection operation in the mounting point selection interface.

In some embodiments of the present application, the apparatus further comprises an automatic selection module for: detecting a preset part in the preset object model; and generating corresponding mounting points on the preset positions to obtain the preset object model calibrated with the mounting points.

In some embodiments of the present application, the generating module is configured to: and connecting the target object models corresponding to the three-dimensional models in series according to the sequence of the three-dimensional models in the volume video to obtain the target volume video.

According to another embodiment of the present application, a storage medium has stored thereon a computer program which, when executed by a processor of a computer, causes the computer to perform the method described in the embodiments of the present application.

According to another embodiment of the present application, an electronic device may include: a memory storing a computer program; and the processor reads the computer program stored in the memory to execute the method according to the embodiment of the application.

According to another embodiment of the present application, a computer program product or computer program includes computer instructions stored in a computer readable storage medium. The computer instructions are read from the computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform the methods provided in the various alternative implementations described in the embodiments of the present application.

In the scheme of mounting props on the volume video, model skeleton points of each frame of three-dimensional model in the volume video are detected, a preset object model carrying preset skeleton points is obtained, and mounting points are calibrated on the preset object model; respectively aligning model skeleton points of the preset object models with model skeleton points of each three-dimensional model to obtain aligned object models corresponding to each three-dimensional model; performing shape adjustment on the alignment object model corresponding to each three-dimensional model to obtain a target object model matched with the shape of each three-dimensional model; and obtaining a target volume video based on the sequence of the target object models, wherein each frame of target object model in the target volume video is provided with a mounting point on the preset object model, and the mounting point is used for mounting the virtual prop.

In this way, through the steps, the whole sequence represented by the target object model is obtained, a new target volume video corresponding to the initial volume video is obtained, and the mounting points on the preset object model can be accurately and efficiently fitted to the target object model of each frame in the target volume video, so that no special requirement is provided for the three-dimensional model, the mounting points are accurately tracked in the whole sequence, the prop mounting effect is effectively improved, and the user experience is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 shows a schematic diagram of a system to which embodiments of the present application may be applied.

Fig. 2 illustrates a flow chart of a method of mounting props on a volumetric video according to one embodiment of the present application.

Fig. 3 shows a schematic representation of a bone point according to an embodiment of the present application.

FIG. 4 shows a schematic diagram of an alignment model according to one embodiment of the present application.

FIG. 5 illustrates a schematic diagram of adjusting model vertices according to one embodiment of the present application.

Fig. 6 shows a block diagram of an apparatus for mounting props on a volumetric video according to another embodiment of the present application.

Fig. 7 shows a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

Fig. 1 shows a schematic diagram of a system 100 to which embodiments of the present application may be applied. As shown in fig. 1, the system 100 may include a server 101 and a terminal 102.

The server 101 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms, and the like.

The terminal 102 may be any device, and the terminal 102 includes, but is not limited to, a cell phone, a computer, a smart voice interaction device, a smart home appliance, a vehicle terminal, a VR/AR device, a smart watch, a computer, and the like. In one embodiment, the server 101 or terminal 102 may be a node device in a blockchain network or a map internet of vehicles platform.

In one implementation of this example, the server 101 or the terminal 102 may: detecting model skeleton points of each frame of three-dimensional model in the volume video, and acquiring a preset object model carrying preset skeleton points, wherein mounting points are calibrated on the preset object model; respectively aligning model skeleton points of the preset object models with model skeleton points of each three-dimensional model to obtain aligned object models corresponding to each three-dimensional model; performing shape adjustment on the alignment object model corresponding to each three-dimensional model to obtain a target object model matched with the shape of each three-dimensional model; and obtaining a target volume video based on the sequence of the target object models, wherein each frame of target object model in the target volume video is provided with a mounting point on the preset object model, and the mounting point is used for mounting the virtual prop.

Fig. 2 schematically illustrates a flow chart of a method of mounting props on a volumetric video according to one embodiment of the present application. The execution subject of the method for mounting props on the volume video may be any device, such as the server 101 or the terminal 102 shown in fig. 1.

As shown in fig. 2, the method for mounting props on the video may include steps S210 to S240.

Step S210, detecting model skeleton points of each frame of three-dimensional model in the volume video, and acquiring a preset object model carrying preset skeleton points, wherein mounting points are calibrated on the preset object model; step S220, aligning the model skeleton points of the preset object model with the model skeleton points of each three-dimensional model respectively to obtain an aligned object model corresponding to each three-dimensional model; step S230, carrying out shape adjustment on the alignment object model corresponding to each three-dimensional model to obtain a target object model matched with the shape of each three-dimensional model; step S240, obtaining a target volume video based on the sequence of the target object models, where each frame of target object model in the target volume video is provided with a mounting point on the preset object model, and the mounting point is used for mounting a virtual prop.

The volume video is a model sequence of multi-frame three-dimensional models, the three-dimensional models can be corresponding to characters, animals and the like, and the volume video can demonstrate the object behaviors (such as dancing) of the object through the continuous multi-frame three-dimensional models. Corresponding bone points can be detected for the three-dimensional model of each frame, and the bone points corresponding to the three-dimensional model are model bone points. In one example, referring to fig. 3, fig. 3 (1) to (4) respectively show corresponding bone points of model bone points corresponding to a three-dimensional model on two-dimensional images of different angles (as shown by bone lines on a human body).

The preset object model can be a model created through three-dimensional software, preset skeleton points can be carried on the preset object model, and mounting points can be calibrated on the preset object model.

And respectively aligning the model skeleton points of the preset object model with the model skeleton points of each three-dimensional model, so that the preset object model and the three-dimensional models can be respectively aligned to obtain aligned object models corresponding to the postures of the three-dimensional models. In one example, referring to fig. 4, and fig. 2, 4, 6, and 8 show different angle images of an aligned object model obtained after a preset object model (as shown by a white model) is aligned with a three-dimensional model, respectively, from different angles.

The alignment object model corresponding to each three-dimensional model is subjected to shape adjustment to obtain a target object model matched with the shape of each three-dimensional model, referring to fig. 5, in one example, the alignment object model shown in the (1) th figure corresponding to the three-dimensional model of the (2) th figure in fig. 5 is subjected to shape adjustment to obtain a target object model shown in the (3) th figure matched with the shape of the three-dimensional model of the (2) th figure in fig. 5.

Finally, based on the sequence of the target object model corresponding to each three-dimensional model, a new target volume video can be obtained, and each frame of target object model in the target volume video is provided with a mounting point calibrated on a preset object model, so that virtual props can be mounted on the mounting point on the target object model of each frame of target volume video.

In this way, through the steps S210 to S240, the whole sequence represented by the target object model is obtained, and a new target volume video corresponding to the initial volume video is obtained, so that the mounting points on the preset object model can be accurately and efficiently fitted to the target object model of each frame in the target volume video, no special requirement is imposed on the three-dimensional model, the mounting points are accurately tracked in the whole sequence, the prop mounting effect is effectively improved, and the user experience is improved.

Further embodiments of the steps performed when the mounting of props on the video volume is performed in the embodiment of fig. 2 are described below.

In one embodiment, step S210, the detecting model skeleton points of each frame of three-dimensional model in the volumetric video includes: acquiring a multi-angle two-dimensional image corresponding to each frame of three-dimensional model in the volume video; detecting image skeleton points on the multi-angle two-dimensional image corresponding to each frame of three-dimensional model; and mapping the image skeleton points on the multi-angle two-dimensional image corresponding to each frame of three-dimensional model to a three-dimensional space through reprojection, so as to obtain the model skeleton points of each frame of three-dimensional model.

Multi-angle two-dimensional image corresponding to three-dimensional model: the three-dimensional model may be a plurality of two-dimensional images of multiple views (for example, 4 two-dimensional images of multiple views for reconstructing a three-dimensional model are shown in fig. 3), or may be a plurality of two-dimensional images of multiple views taken from different angles for the three-dimensional model.

In this embodiment, after a multi-angle two-dimensional image corresponding to a certain frame of three-dimensional model in the volumetric video is obtained, the neural network may be detected from a preset two-dimensional skeleton point, so as to detect the image skeleton points on the multi-angle two-dimensional image corresponding to the frame of three-dimensional model, where the (1) to (4) th diagrams in fig. 3 respectively show the detected image skeleton points on the two-dimensional images with different angles (as shown by skeleton lines on a human body in the diagram), and then the image skeleton points on the multi-angle two-dimensional image corresponding to the frame of three-dimensional model are mapped to the three-dimensional space through re-projection, so as to obtain the model skeleton points of the frame of three-dimensional model in the three-dimensional space. The preset two-dimensional skeleton point detection neural network can be a pre-trained convolutional neural network.

In one embodiment, step S210, the detecting model skeleton points of each frame of three-dimensional model in the volumetric video includes: and respectively inputting each frame of three-dimensional model in the volume video into a bone point detection neural network for detection processing to obtain model bone points of each frame of three-dimensional model.

Under the embodiment, a certain frame of three-dimensional model is directly input into a bone point detection neural network, and the bone point detection neural network detects and processes the three-dimensional model to obtain model bone points on the three-dimensional model. The bone point detection neural network may be a pre-trained convolutional neural network.

In one embodiment, step S220 includes performing alignment processing on model skeleton points of the preset object model and model skeleton points of each three-dimensional model to obtain aligned object models corresponding to each three-dimensional model, where the alignment processing includes: calculating relative shape parameters of the preset object model and each three-dimensional model; and respectively aligning model skeleton points of the preset object model to model skeleton points of each three-dimensional model according to the relative shape parameters of the preset object model and each three-dimensional model to obtain aligned object models corresponding to each three-dimensional model.

The relative shape parameters may include shape parameters and joint rotation parameters. The relative shape parameters of the preset object model and a certain three-dimensional model can be calculated based on the coordinate data of the model points (such as model vertexes, skeleton points and the like) of the preset object model and the certain three-dimensional model in the same three-dimensional space, and further, according to the relative shape parameters of the preset object model and the certain three-dimensional model, the model skeleton points of the preset object model can be aligned to the model skeleton points of the certain three-dimensional model, and the aligned object model corresponding to the certain three-dimensional model can be obtained.

Further, in an embodiment, step S220, the aligning process is performed on the model skeleton points of the preset object model and the model skeleton points of each three-dimensional model, so as to obtain an aligned object model corresponding to each three-dimensional model, which may be: and displaying a model alignment page, wherein a user can manually align model skeleton points of the preset object model with model skeleton points of each three-dimensional model respectively in the model pair Ji Yemian to obtain an aligned object model corresponding to each three-dimensional model.

In one embodiment, step S230, the adjusting the shape of the aligned object model corresponding to each three-dimensional model to obtain a target object model matched with the shape of each three-dimensional model includes: calculating vertex displacement parameters of each three-dimensional model and the corresponding alignment object model; and according to the vertex displacement parameters of the three-dimensional models and the corresponding alignment object models, performing vertex adjustment on the alignment object models corresponding to each three-dimensional model to obtain target object models matched with the shapes of the three-dimensional models.

The vertex displacement parameters of the three-dimensional model and the alignment object model can be calculated in the same three-dimensional space based on the coordinate data of the vertex of the alignment object model and the vertex of a certain three-dimensional model (namely, the point of the model surface), and the vertex displacement parameters can be the relative displacement between the same vertices on the same parts of the two models. Further, according to the vertex displacement parameters of the three-dimensional model and the alignment object model, the positions of the vertices of the alignment object model can be adjusted, so that the vertices of the alignment object model are aligned with the vertices of the three-dimensional model, and the alignment object model after the adjustment of the vertices is the target object model matched with the shape of the three-dimensional model, as shown in fig. 5 (1), after the alignment object model (the model surface has no detail shape), the target object model (the model surface has detail shape and is consistent with the surface detail and size of the three-dimensional model shown in fig. 5 (2)) shown in fig. 5 is obtained.

In one embodiment, the method further comprises: displaying a mounting point selection interface; and obtaining the preset object model calibrated with the mounting points according to the selection operation in the mounting point selection interface.

In the embodiment, the relevant user can manually calibrate the mounting point on the preset object model through the mounting point selection interface and further manually select the mounting point on the volume video as a whole.

In one embodiment, the method further comprises: detecting a preset part in the preset object model; and generating corresponding mounting points on the preset positions to obtain the preset object model calibrated with the mounting points.

In this embodiment, by automatically detecting a predetermined portion (for example, a hand, an eye, etc.) in the preset object model, and further automatically generating a corresponding mounting point on the predetermined portion, the preset object model calibrated with the mounting point is obtained, and further, the mounting point is automatically selected on the volumetric video as a whole.

In a further foregoing embodiment, the obtaining the target volume video based on the sequence of the target object models includes: and connecting the target object models corresponding to the three-dimensional models in series according to the sequence of the three-dimensional models in the volume video to obtain the target volume video.

Corresponding target object models can be obtained for each frame of three-dimensional model in the volume video, the obtained target object models are sequentially connected in series according to the sequence of the corresponding three-dimensional models, and the target volume video which is consistent with the original volume video in expression but is fitted with the mounting points can be obtained.

Further, in the foregoing embodiment of the present application, the volume Video (also called Volumetric Video, spatial Video, volumetric three-dimensional Video or 6-degree-of-freedom Video) in step S210 is a technology for capturing information (such as depth information and color information) in a three-dimensional space and generating a three-dimensional dynamic model sequence. Compared with the traditional video, the volumetric video adds the concept of space into the video, and the three-dimensional model is used for better restoring the real three-dimensional world, instead of using a two-dimensional plane video plus a mirror to simulate the sense of space of the real three-dimensional world. Because the volume video is a three-dimensional model sequence, a user can adjust to any visual angle to watch according to own preference, and compared with a two-dimensional plane video, the volume video has higher reduction degree and immersion sense.

Alternatively, in the foregoing embodiment of the present application, each frame of the three-dimensional model in the volumetric video described in step S210 may be reconstructed as follows:

firstly, acquiring color images and depth images of different visual angles of a shooting object (the color images and the depth images of different visual angles are a plurality of two-dimensional images of multiple visual angles for reconstructing a three-dimensional model) and camera parameters corresponding to the color images; and training a neural network model implicitly expressing a three-dimensional model of the shooting object according to the acquired color image and the corresponding depth image and camera parameters, and extracting an isosurface based on the trained neural network model to realize three-dimensional reconstruction of the shooting object so as to obtain the three-dimensional model of the shooting object.

It should be noted that, in the embodiments of the present application, the neural network model of which architecture is adopted is not particularly limited, and may be selected by those skilled in the art according to actual needs. For example, a multi-layer perceptron (Multilayer Perceptron, MLP) without a normalization layer may be selected as a base model for model training.

The three-dimensional model reconstruction method provided in the present application will be described in detail below.

Firstly, multiple color cameras and depth cameras can be synchronously adopted to shoot an object to be subjected to three-dimensional reconstruction at multiple angles, so that color images of the object at multiple different angles and corresponding depth images are obtained, namely, at the same shooting time (the difference value of actual shooting time is smaller than or equal to a time threshold, namely, the shooting time is considered to be the same), the color cameras at all angles can shoot the color images of the object at the corresponding angles, and correspondingly, the depth cameras at all angles can shoot the depth images of the object at the corresponding angles. It should be noted that the object may be any object, including but not limited to living objects such as people, animals, and plants, or inanimate objects such as machines, furniture, and dolls.

Therefore, the color images of the object at different visual angles are provided with the corresponding depth images, namely, when shooting, the color cameras and the depth cameras can adopt the configuration of a camera set, and the color cameras at the same visual angle are matched with the depth cameras to synchronously shoot the same object. For example, a studio may be built, in which a central area is a photographing area, around which a plurality of sets of color cameras and depth cameras are paired at a certain angle interval in a horizontal direction and a vertical direction. When the object is in the shooting area surrounded by the color cameras and the depth cameras, the color images and the corresponding depth images of the object at different visual angles can be obtained through shooting by the color cameras and the depth cameras.

In addition, camera parameters of the color camera corresponding to each color image are further acquired. The camera parameters include internal parameters and external parameters of the color camera, which can be determined through calibration, wherein the internal parameters of the color camera are parameters related to the characteristics of the color camera, including but not limited to data such as focal length and pixels of the color camera, and the external parameters of the color camera are parameters of the color camera in a world coordinate system, including but not limited to data such as position (coordinates) of the color camera and rotation direction of the camera.

As described above, after color images of a plurality of different perspectives of an object at the same shooting time and corresponding depth images thereof are acquired, the object can be three-dimensionally reconstructed according to the color images and the corresponding depth images thereof. Different from the mode of converting depth information into point cloud to perform three-dimensional reconstruction in the related technology, the method and the device train a neural network model to realize implicit expression of the three-dimensional model of the object, so that three-dimensional reconstruction of the object is realized based on the neural network model.

Optionally, the application selects a multi-layer perceptron (Multilayer Perceptron, MLP) that does not include a normalization layer as the base model, and trains as follows:

converting pixel points in each color image into rays based on corresponding camera parameters; sampling a plurality of sampling points on the rays, and determining first coordinate information of each sampling point and an SDF value of each sampling point from a pixel point; inputting the first coordinate information of the sampling points into a basic model to obtain a predicted SDF value and a predicted RGB color value of each sampling point output by the basic model; based on a first difference between the predicted SDF value and the SDF value and a second difference between the predicted RGB color value and the RGB color value of the pixel point, adjusting parameters of the basic model until a preset stop condition is met; and taking the basic model meeting the preset stopping condition as a neural network model of the three-dimensional model of the implicit expression object.

Firstly, converting a pixel point in a color image into a ray based on camera parameters corresponding to the color image, wherein the ray can be a ray passing through the pixel point and perpendicular to a color image surface; then, sampling a plurality of sampling points on the ray, wherein the sampling process of the sampling points can be executed in two steps, partial sampling points can be uniformly sampled firstly, and then the plurality of sampling points are further sampled at a key position based on the depth value of the pixel point, so that the condition that the sampling points near the surface of the model can be sampled as many as possible is ensured; then, calculating first coordinate information of each sampling point in a world coordinate system and a directional distance (Signed Distance Field, SDF) value of each sampling point according to camera parameters and depth values of the pixel points, wherein the SDF value can be a difference value between the depth value of the pixel point and the distance between the sampling point and an imaging surface of a camera, the difference value is a signed value, when the difference value is a positive value, the sampling point is outside the three-dimensional model, when the difference value is a negative value, the sampling point is inside the three-dimensional model, and when the difference value is zero, the sampling point is on the surface of the three-dimensional model; then, after sampling of the sampling points is completed and the SDF value corresponding to each sampling point is obtained through calculation, first coordinate information of the sampling points in a world coordinate system is further input into a basic model (the basic model is configured to map the input coordinate information into the SDF value and the RGB color value and then output), the SDF value output by the basic model is recorded as a predicted SDF value, and the RGB color value output by the basic model is recorded as a predicted RGB color value; then, parameters of the basic model are adjusted based on a first difference between the predicted SDF value and the SDF value corresponding to the sampling point and a second difference between the predicted RGB color value and the RGB color value of the pixel point corresponding to the sampling point.

In addition, for other pixel points in the color image, sampling is performed in the above manner, and then coordinate information of the sampling point in the world coordinate system is input to the basic model to obtain a corresponding predicted SDF value and a predicted RGB color value, which are used for adjusting parameters of the basic model until a preset stopping condition is met, for example, the preset stopping condition may be configured to reach a preset number of iterations of the basic model, or the preset stopping condition may be configured to converge the basic model. And when the iteration of the basic model meets the preset stopping condition, obtaining the neural network model capable of accurately and implicitly expressing the three-dimensional model of the object. Finally, an isosurface extraction algorithm can be adopted to extract the three-dimensional model surface of the neural network model, so that a three-dimensional model of the object is obtained.

Optionally, in some embodiments, determining an imaging plane of the color image based on camera parameters; and determining that the rays passing through the pixel points in the color image and perpendicular to the imaging surface are rays corresponding to the pixel points.

The coordinate information of the color image in the world coordinate system, namely the imaging surface, can be determined according to the camera parameters of the color camera corresponding to the color image. Then, it can be determined that the ray passing through the pixel point in the color image and perpendicular to the imaging plane is the ray corresponding to the pixel point.

Optionally, in some embodiments, determining second coordinate information and rotation angle of the color camera in the world coordinate system according to the camera parameters; and determining an imaging surface of the color image according to the second coordinate information and the rotation angle.

Optionally, in some embodiments, the first number of first sampling points are equally spaced on the ray; determining a plurality of key sampling points according to the depth values of the pixel points, and sampling a second number of second sampling points according to the key sampling points; the first number of first sampling points and the second number of second sampling points are determined as a plurality of sampling points obtained by sampling on the rays.

Firstly uniformly sampling n (i.e. a first number) first sampling points on rays, wherein n is a positive integer greater than 2; then, according to the depth value of the pixel point, determining a preset number of key sampling points closest to the pixel point from n first sampling points, or determining key sampling points smaller than a distance threshold from the pixel point from n first sampling points; then, resampling m second sampling points according to the determined key sampling points, wherein m is a positive integer greater than 1; and finally, determining the n+m sampling points obtained by sampling as a plurality of sampling points obtained by sampling on the rays. The m sampling points are sampled again at the key sampling points, so that the training effect of the model is more accurate at the surface of the three-dimensional model, and the reconstruction accuracy of the three-dimensional model is improved.

Optionally, in some embodiments, determining a depth value corresponding to the pixel point according to a depth image corresponding to the color image; calculating an SDF value of each sampling point from the pixel point based on the depth value; and calculating coordinate information of each sampling point according to the camera parameters and the depth values.

After a plurality of sampling points are sampled on the rays corresponding to each pixel point, for each sampling point, determining the distance between the shooting position of the color camera and the corresponding point on the object according to the camera parameters and the depth value of the pixel point, and then calculating the SDF value of each sampling point one by one and the coordinate information of each sampling point based on the distance.

After the training of the basic model is completed, for the coordinate information of any given point, the corresponding SDF value of the basic model after the training is completed can be predicted by the basic model after the training is completed, and the predicted SDF value indicates the position relationship (internal, external or surface) between the point and the three-dimensional model of the object, so as to realize the implicit expression of the three-dimensional model of the object and obtain the neural network model for implicitly expressing the three-dimensional model of the object.

Finally, performing iso-surface extraction on the neural network model, for example, drawing the surface of the three-dimensional model by using an iso-surface extraction algorithm (MC) to obtain the surface of the three-dimensional model, and further obtaining the three-dimensional model of the object according to the surface of the three-dimensional model.

According to the three-dimensional reconstruction scheme, the three-dimensional model of the object is implicitly modeled through the neural network, and depth information is added to improve the training speed and accuracy of the model. By adopting the three-dimensional reconstruction scheme provided by the application, the three-dimensional reconstruction is continuously carried out on the shooting object in time sequence, so that three-dimensional models of the shooting object at different moments can be obtained, and a three-dimensional model sequence formed by the three-dimensional models at different moments according to the time sequence is the volume video shot by the shooting object. Therefore, the volume video shooting can be carried out on any shooting object, and the volume video with specific content presentation can be obtained. For example, the dance shooting object can be shot with a volume video to obtain a volume video of the dance of the object at any angle, the teaching shooting object can be shot with a volume video to obtain a volume video of the teaching object at any angle, and the like.

In order to facilitate better implementation of the method for mounting props on the volume video provided by the embodiment of the application, the embodiment of the application also provides a device for mounting props on the volume video based on the method for mounting props on the volume video. The meaning of the noun is the same as that in the method for mounting the prop on the volume video, and specific implementation details can be referred to the description in the method embodiment. Fig. 3 shows a block diagram of an apparatus for mounting props on a volumetric video according to one embodiment of the present application.

As shown in fig. 6, the apparatus 300 for mounting props on a volumetric video may include: an initialization module 310, an alignment module 320, an adjustment module 330, and a generation module 340.

The initialization module 310 may be configured to detect a model skeleton point of each frame of three-dimensional model in the volumetric video, and obtain a preset object model carrying a preset skeleton point, where the preset object model is calibrated with a mounting point; the alignment module 320 may be configured to perform alignment processing on the model skeleton points of the preset object model and the model skeleton points of each three-dimensional model, so as to obtain an aligned object model corresponding to each three-dimensional model; the adjustment module 330 may be configured to perform shape adjustment on the alignment object model corresponding to each three-dimensional model, so as to obtain a target object model matched with the shape of each three-dimensional model; the generating module 340 may be configured to obtain a target volume video based on the sequence of the target object models, where each frame of the target object model in the target volume video has a mounting point on the preset object model, and the mounting point is used to mount a virtual prop.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit, in accordance with embodiments of the present application. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

In addition, the embodiment of the application further provides an electronic device, which may be a terminal or a server, as shown in fig. 7, which shows a schematic structural diagram of the electronic device according to the embodiment of the application, specifically:

The electronic device may include one or more processing cores 'processors 401, one or more computer-readable storage media's memory 402, power supply 403, and input unit 404, among other components. It will be appreciated by those skilled in the art that the electronic device structure shown in fig. 7 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:

the processor 401 is a control center of the electronic device, connects various parts of the entire computer device using various interfaces and lines, and performs various functions of the computer device and processes data by running or executing software programs and/or modules stored in the memory 402, and calling data stored in the memory 402, thereby performing overall monitoring of the electronic device. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user page, an application program, etc., and the modem processor mainly processes wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by executing the software programs and modules stored in the memory 402. The memory 402 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the computer device, etc. In addition, memory 402 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the memory 402.

The electronic device further comprises a power supply 403 for supplying power to the various components, preferably the power supply 403 may be logically connected to the processor 401 by a power management system, so that functions of managing charging, discharging, and power consumption are performed by the power management system. The power supply 403 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

The electronic device may further comprise an input unit 404, which input unit 404 may be used for receiving input digital or character information and generating keyboard, mouse, joystick, optical or trackball signal inputs in connection with user settings and function control.

Although not shown, the electronic device may further include a display unit or the like, which is not described herein. In particular, in this embodiment, the processor 401 in the electronic device loads executable files corresponding to the processes of one or more computer programs into the memory 402 according to the following instructions, and the processor 401 executes the computer programs stored in the memory 402, so as to implement the functions in the foregoing embodiments of the present application.

The processor 401 may perform the following steps: detecting model skeleton points of each frame of three-dimensional model in the volume video, and acquiring a preset object model carrying preset skeleton points, wherein mounting points are calibrated on the preset object model; respectively aligning model skeleton points of the preset object models with model skeleton points of each three-dimensional model to obtain aligned object models corresponding to each three-dimensional model; performing shape adjustment on the alignment object model corresponding to each three-dimensional model to obtain a target object model matched with the shape of each three-dimensional model; and obtaining a target volume video based on the sequence of the target object models, wherein each frame of target object model in the target volume video is provided with a mounting point on the preset object model, and the mounting point is used for mounting the virtual prop.

It will be appreciated by those of ordinary skill in the art that all or part of the steps of the various methods of the above embodiments may be performed by a computer program, or by computer program control related hardware, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, the present embodiments also provide a storage medium having stored therein a computer program that can be loaded by a processor to perform the steps of any of the methods provided by the embodiments of the present application.

Wherein the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

Since the computer program stored in the storage medium may perform any of the steps in the method provided in the embodiment of the present application, the beneficial effects that can be achieved by the method provided in the embodiment of the present application may be achieved, which are detailed in the previous embodiments and are not described herein.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.

It will be understood that the present application is not limited to the embodiments that have been described above and shown in the drawings, but that various modifications and changes can be made without departing from the scope thereof.

Claims

1. A method of mounting props on a volumetric video, the method comprising:

detecting model skeleton points of each frame of three-dimensional model in the volume video, and acquiring a preset object model carrying preset skeleton points, wherein mounting points are calibrated on the preset object model;

respectively aligning model skeleton points of the preset object models with model skeleton points of each three-dimensional model to obtain aligned object models corresponding to each three-dimensional model;

performing shape adjustment on the alignment object model corresponding to each three-dimensional model to obtain a target object model matched with the shape of each three-dimensional model;

and obtaining a target volume video based on the sequence of the target object models, wherein each frame of target object model in the target volume video is provided with a mounting point on the preset object model, and the mounting point is used for mounting the virtual prop.

2. The method of claim 1, wherein detecting model skeletal points for each frame of the three-dimensional model in the volumetric video comprises:

Acquiring a multi-angle two-dimensional image corresponding to each frame of three-dimensional model in the volume video;

detecting image skeleton points on the multi-angle two-dimensional image corresponding to each frame of three-dimensional model;

and mapping the image skeleton points on the multi-angle two-dimensional image corresponding to each frame of three-dimensional model to a three-dimensional space through reprojection, so as to obtain the model skeleton points of each frame of three-dimensional model.

3. The method of claim 1, wherein detecting model skeletal points for each frame of the three-dimensional model in the volumetric video comprises:

and respectively inputting each frame of three-dimensional model in the volume video into a bone point detection neural network for detection processing to obtain model bone points of each frame of three-dimensional model.

4. The method according to claim 1, wherein the aligning the model skeleton points of the preset object model with the model skeleton points of each three-dimensional model to obtain an aligned object model corresponding to each three-dimensional model includes:

calculating relative shape parameters of the preset object model and each three-dimensional model;

and respectively aligning model skeleton points of the preset object model to model skeleton points of each three-dimensional model according to the relative shape parameters of the preset object model and each three-dimensional model to obtain aligned object models corresponding to each three-dimensional model.

5. The method according to claim 1, wherein the performing shape adjustment on the aligned object model corresponding to each three-dimensional model to obtain a target object model matched with the shape of each three-dimensional model includes:

calculating vertex displacement parameters of each three-dimensional model and the corresponding alignment object model;

and according to the vertex displacement parameters of the three-dimensional models and the corresponding alignment object models, performing vertex adjustment on the alignment object models corresponding to each three-dimensional model to obtain target object models matched with the shapes of the three-dimensional models.

6. The method according to claim 1, wherein the method further comprises:

displaying a mounting point selection interface;

and obtaining the preset object model calibrated with the mounting points according to the selection operation in the mounting point selection interface.

7. The method according to claim 1, wherein the method further comprises:

detecting a preset part in the preset object model;

and generating corresponding mounting points on the preset positions to obtain the preset object model calibrated with the mounting points.

8. The method of claim 1, wherein the deriving a target volume video based on the sequence of target object models comprises:

And connecting the target object models corresponding to the three-dimensional models in series according to the sequence of the three-dimensional models in the volume video to obtain the target volume video.

9. An apparatus for mounting props on a volumetric video, the apparatus comprising:

the initialization module is used for detecting model skeleton points of each frame of three-dimensional model in the volume video, and acquiring a preset object model carrying preset skeleton points, wherein mounting points are calibrated on the preset object model;

the alignment module is used for respectively carrying out alignment treatment on the model skeleton points of the preset object model and the model skeleton points of each three-dimensional model to obtain an alignment object model corresponding to each three-dimensional model;

the adjustment module is used for carrying out shape adjustment on the alignment object model corresponding to each three-dimensional model to obtain a target object model matched with the shape of each three-dimensional model;

the generation module is used for obtaining a target volume video based on the sequence of the target object models, each frame of target object model in the target volume video is provided with a mounting point on the preset object model, and the mounting point is used for mounting the virtual prop.

10. A storage medium having stored thereon a computer program which, when executed by a processor of a computer, causes the computer to perform the method of any of claims 1 to 8.

11. An electronic device, comprising: a memory storing a computer program; a processor reading a computer program stored in a memory to perform the method of any one of claims 1 to 8.

12. A computer program product, characterized in that the computer program product comprises a computer program which, when executed by a processor, implements the method of any one of claims 1 to 8.