CN116248939A

CN116248939A - Performance synchronization method and device, electronic equipment and storage medium

Info

Publication number: CN116248939A
Application number: CN202211611383.7A
Authority: CN
Inventors: 邵志兢; 张煜; 孙伟; 吕云
Original assignee: Zhuhai Prometheus Vision Technology Co ltd
Current assignee: Zhuhai Prometheus Vision Technology Co ltd
Priority date: 2022-12-14
Filing date: 2022-12-14
Publication date: 2023-06-09

Abstract

The embodiment of the application discloses a performance synchronization method, a performance synchronization device, electronic equipment and a storage medium. The method comprises the following steps: the electronic equipment performs interactive performance by acquiring an initial video of the performance object and a virtual object in the volume video; determining a first performance progress of the performance object and a second performance progress of the virtual object in the initial video; determining a target time period for which the performance of the performance object and the virtual object are not synchronous according to the first performance progress and the second performance progress; and in the target time period, adjusting the second performance progress according to the first performance progress so as to control the performance progress synchronization of the virtual object and the performance object. Therefore, the performance progress of the interactive performance of the virtual object and the performance object is synchronized.

Description

Performance synchronization method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a performance synchronization method, a performance synchronization device, an electronic device, and a storage medium.

Background

With the development of computer technology, in some scenarios, interaction between a real character and a virtual character in the real world may be achieved. For example, a real character and virtual character dialogue may be implemented to capture a volumetric video movie.

However, the virtual character is invisible to naked eyes, and the real character cannot control its own performance progress according to the performance progress of the virtual character during performance, so that the finally shot volume video movie may have a phenomenon that the performance progress of the real character and the performance progress of the virtual character are not synchronous.

Disclosure of Invention

The embodiment of the application provides a performance synchronization method, a performance synchronization device, electronic equipment and a storage medium. The performance synchronization method can synchronize the performance progress of the virtual object and the performance object.

In a first aspect, an embodiment of the present application provides a performance synchronization method, including:

acquiring an initial video after interactive performance of a performance object and a virtual object in a volume video;

determining a first performance progress of the performance object and a second performance progress of the virtual object in the initial video;

determining a target time period for which the performance of the performance object and the virtual object are not synchronous according to the first performance progress and the second performance progress;

and in the target time period, adjusting the second performance progress according to the first performance progress so as to control the performance progress synchronization of the virtual object and the performance object.

In a second aspect, implementations of the present application provide a performance synchronization device, comprising:

The acquisition module is used for acquiring an initial video after interactive performance of the performance object and the virtual object in the volume video;

a first determining module for determining a first performance progress of the performance object and a second performance progress of the virtual object in the initial video;

the second determining module is used for determining a target time period of unsynchronized performance of the performance object and the virtual object according to the first performance progress and the second performance progress;

and the adjusting module is used for adjusting the second performance progress according to the first performance progress in the target time period so as to control the performance progress synchronization of the virtual object and the performance object.

In a third aspect, an embodiment of the present application provides an electronic device, including: a memory storing executable program code, a processor coupled to the memory; the processor invokes the executable program code stored in the memory to perform the steps in the performance synchronization method provided by the embodiments of the present application.

In a fourth aspect, embodiments of the present application provide a storage medium storing a plurality of instructions adapted to be loaded by a processor to perform steps in a performance synchronization method provided by embodiments of the present application.

In the embodiment of the application, the electronic equipment performs interactive performance by acquiring the performance object and the initial video after the virtual object in the volume video; determining a first performance progress of the performance object and a second performance progress of the virtual object in the initial video; determining a target time period for which the performance of the performance object and the virtual object are not synchronous according to the first performance progress and the second performance progress; and in the target time period, adjusting the second performance progress according to the first performance progress so as to control the performance progress synchronization of the virtual object and the performance object. Therefore, the performance progress of the interactive performance of the virtual object and the performance object is synchronized.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of a shooting system according to an embodiment of the present application.

Fig. 2 is a schematic diagram of a first flow of a performance synchronization method according to an embodiment of the present application.

Fig. 3 is a second flow chart of a performance synchronization method provided in an embodiment of the present application.

Fig. 4 is a schematic view of a scenario of a performance synchronization method provided in an embodiment of the present application.

Fig. 5 is a schematic structural diagram of a performance synchronization device according to an embodiment of the present application.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

In order to solve the technical problem, the embodiment of the application provides a performance synchronization method, a performance synchronization device, electronic equipment and a storage medium. The performance synchronization method can synchronize the performance progress of the virtual object and the performance object.

Volumetric Video (also known as Volumetric Video, spatial Video, volumetric three-dimensional Video, or 6-degree-of-freedom Video, etc.) is a technique that generates a sequence of three-dimensional models by capturing information (e.g., depth information, color information, etc.) in three-dimensional space. Compared with the traditional video, the volumetric video adds the concept of space into the video, and the three-dimensional model is used for better restoring the real three-dimensional world, instead of using a two-dimensional plane video plus a mirror to simulate the sense of space of the real three-dimensional world. Because the volume video is a three-dimensional model sequence, a user can adjust to any visual angle to watch according to own preference, and compared with a two-dimensional plane video, the volume video has higher reduction degree and immersion sense.

In some scenes, the virtual object in the volumetric video is photographed within a studio. For example, with a camera array system deployed in a matrix, data such as color information, texture information, depth information, etc. of a performer will be photographed and extracted by a professional-level acquisition device such as an infrared IR camera, a 4K ultra-high definition industrial camera, etc. therein. And then calculating model information corresponding to the virtual object in the volume video according to the data.

Referring to fig. 1, fig. 1 is a schematic view of a shooting system according to an embodiment of the present application.

As shown in fig. 1, the photographing system includes an electronic device, a signal source, a camera array and a microphone, wherein the camera array includes a plurality of cameras, each of the cameras is located at a different position, the signal source is connected with each of the cameras in the camera array, the electronic device is connected with the signal source, and the electronic device is connected with the camera array. The electronic device may be a computer, a server, or other electronic device with a certain computing capability.

When a plurality of cameras in the camera array need to shoot a shot object in the camera array, the electronic device can control the signal source to send pulse control signals to each camera at the same time, and after each camera receives the pulse control signals, each camera can shoot the shot object.

In some embodiments, the camera array includes a plurality of locations, and each location may have a plurality of camera modules, and each camera module may have a plurality of cameras disposed therein. For example, in a space perpendicular to the ground, different camera modules are arranged at different heights, and each camera module may include a color camera for capturing a color image and may also include a depth camera. The photographed image photographed by one camera module may include a color image and a depth image.

After the camera array finishes shooting the shooting object, the electronic equipment can receive shooting images sent by each camera in the camera array and time corresponding to the shooting images, and then the electronic equipment performs subsequent image processing according to the received shooting images and time corresponding to the shooting images.

In the process of photographing the photographed object, the electronic device may also start recording the sound emitted by the photographed object, such as through the microphone shown in fig. 1. The microphone may be disposed above an area surrounded by the camera array, and may be disposed on the subject, so as to capture sound.

In some implementations, after the electronic device receives the captured image, it can be determined as image information for a subsequent generation of the volumetric video. After the electronic device receives the audio, it may be determined as corresponding sound information in a subsequent volume of video.

In some embodiments, after obtaining the image information and depth information of the photographed object, a three-dimensional model of the photographed object, that is, a three-dimensional model of a virtual object in the volumetric video may be constructed.

Alternatively, in the present application, the three-dimensional model used to construct the volumetric video may be reconstructed as follows:

firstly, color images and depth images of different visual angles of a shooting object and camera parameters corresponding to the color images are acquired; and training a neural network model implicitly expressing a three-dimensional model of the shooting object according to the acquired color image and the corresponding depth image and camera parameters, and extracting an isosurface based on the trained neural network model to realize three-dimensional reconstruction of the shooting object so as to obtain the three-dimensional model of the shooting object.

It should be noted that, in the embodiments of the present application, the neural network model of which architecture is adopted is not particularly limited, and may be selected by those skilled in the art according to actual needs. For example, a multi-layer perceptron (Multilayer Perceptron, MLP) without a normalization layer may be selected as a base model for model training.

The three-dimensional model reconstruction method provided in the present application will be described in detail below.

Firstly, a plurality of color cameras and depth cameras can be synchronously adopted to shoot a target object (the target object is a shooting object) which needs to be subjected to three-dimensional reconstruction at multiple visual angles, so as to obtain color images and corresponding depth images of the target object at multiple different visual angles, namely, at the same shooting moment (the difference value of actual shooting moments is smaller than or equal to a time threshold, namely, the shooting moments are considered to be the same), the color cameras at all visual angles shoot to obtain color images of the target object at the corresponding visual angles, and correspondingly, the depth cameras at all visual angles shoot to obtain depth images of the target object at the corresponding visual angles. The target object may be any object, including but not limited to living objects such as a person, an animal, and a plant, or inanimate objects such as a machine, furniture, and a doll.

Therefore, the color images of the target object at different visual angles are provided with the corresponding depth images, namely, when shooting, the color cameras and the depth cameras can adopt the configuration of a camera set, and the color cameras at the same visual angle are matched with the depth cameras to synchronously shoot the same target object. For example, a studio may be built, in which a central area is a photographing area, around which a plurality of sets of color cameras and depth cameras are paired at a certain angle interval in a horizontal direction and a vertical direction. When the target object is in the shooting area surrounded by the color cameras and the depth cameras, the color images and the corresponding depth images of the target object at different visual angles can be obtained through shooting by the color cameras and the depth cameras.

In addition, camera parameters of the color camera corresponding to each color image are further acquired. The camera parameters include internal parameters and external parameters of the color camera, which can be determined through calibration, wherein the internal parameters of the color camera are parameters related to the characteristics of the color camera, including but not limited to data such as focal length and pixels of the color camera, and the external parameters of the color camera are parameters of the color camera in a world coordinate system, including but not limited to data such as position (coordinates) of the color camera and rotation direction of the camera.

As described above, after obtaining the color images of the target object at different viewing angles and the corresponding depth images thereof at the same shooting time, the three-dimensional reconstruction of the target object can be performed according to the color images and the corresponding depth images thereof. Different from the mode of converting depth information into point cloud to perform three-dimensional reconstruction in the related technology, the method and the device train a neural network model to achieve implicit expression of the three-dimensional model of the target object, so that three-dimensional reconstruction of the target object is achieved based on the neural network model.

Optionally, the application selects a multi-layer perceptron (Multilayer Perceptron, MLP) that does not include a normalization layer as the base model, and trains as follows:

converting pixel points in each color image into rays based on corresponding camera parameters;

sampling a plurality of sampling points on the rays, and determining first coordinate information of each sampling point and an SDF value of each sampling point from a pixel point;

inputting the first coordinate information of the sampling points into a basic model to obtain a predicted SDF value and a predicted RGB color value of each sampling point output by the basic model;

based on a first difference between the predicted SDF value and the SDF value and a second difference between the predicted RGB color value and the RGB color value of the pixel point, adjusting parameters of the basic model until a preset stop condition is met;

And taking the basic model meeting the preset stopping condition as a neural network model of the three-dimensional model of the implicitly expressed target object.

Firstly, converting a pixel point in a color image into a ray based on camera parameters corresponding to the color image, wherein the ray can be a ray passing through the pixel point and perpendicular to a color image surface; then, sampling a plurality of sampling points on the ray, wherein the sampling process of the sampling points can be executed in two steps, partial sampling points can be uniformly sampled firstly, and then the plurality of sampling points are further sampled at a key position based on the depth value of the pixel point, so that the condition that the sampling points near the surface of the model can be sampled as many as possible is ensured; then, calculating first coordinate information of each sampling point in a world coordinate system and a directional distance (Signed Distance Field, SDF) value of each sampling point according to camera parameters and depth values of the pixel points, wherein the SDF value can be a difference value between the depth value of the pixel point and the distance between the sampling point and an imaging surface of a camera, the difference value is a signed value, when the difference value is a positive value, the sampling point is outside the three-dimensional model, when the difference value is a negative value, the sampling point is inside the three-dimensional model, and when the difference value is zero, the sampling point is on the surface of the three-dimensional model; then, after sampling of the sampling points is completed and the SDF value corresponding to each sampling point is obtained through calculation, first coordinate information of the sampling points in a world coordinate system is further input into a basic model (the basic model is configured to map the input coordinate information into the SDF value and the RGB color value and then output), the SDF value output by the basic model is recorded as a predicted SDF value, and the RGB color value output by the basic model is recorded as a predicted RGB color value; then, parameters of the basic model are adjusted based on a first difference between the predicted SDF value and the SDF value corresponding to the sampling point and a second difference between the predicted RGB color value and the RGB color value of the pixel point corresponding to the sampling point.

In addition, for other pixel points in the color image, sampling is performed in the above manner, and then coordinate information of the sampling point in the world coordinate system is input to the basic model to obtain a corresponding predicted SDF value and a predicted RGB color value, which are used for adjusting parameters of the basic model until a preset stopping condition is met, for example, the preset stopping condition may be configured to reach a preset number of iterations of the basic model, or the preset stopping condition may be configured to converge the basic model. When the iteration of the basic model meets the preset stopping condition, the neural network model which can accurately and implicitly express the three-dimensional model of the shooting object is obtained. Finally, an isosurface extraction algorithm can be adopted to extract the three-dimensional model surface of the neural network model, so that a three-dimensional model of the shooting object is obtained.

Optionally, in some embodiments, determining an imaging plane of the color image based on camera parameters; and determining that the rays passing through the pixel points in the color image and perpendicular to the imaging surface are rays corresponding to the pixel points.

The coordinate information of the color image in the world coordinate system, namely the imaging surface, can be determined according to the camera parameters of the color camera corresponding to the color image. Then, it can be determined that the ray passing through the pixel point in the color image and perpendicular to the imaging plane is the ray corresponding to the pixel point.

Optionally, in some embodiments, determining second coordinate information and rotation angle of the color camera in the world coordinate system according to the camera parameters; and determining an imaging surface of the color image according to the second coordinate information and the rotation angle.

Optionally, in some embodiments, the first number of first sampling points are equally spaced on the ray; determining a plurality of key sampling points according to the depth values of the pixel points, and sampling a second number of second sampling points according to the key sampling points; the first number of first sampling points and the second number of second sampling points are determined as a plurality of sampling points obtained by sampling on the rays.

Firstly uniformly sampling n (i.e. a first number) first sampling points on rays, wherein n is a positive integer greater than 2; then, according to the depth value of the pixel point, determining a preset number of key sampling points closest to the pixel point from n first sampling points, or determining key sampling points smaller than a distance threshold from the pixel point from n first sampling points; then, resampling m second sampling points according to the determined key sampling points, wherein m is a positive integer greater than 1; and finally, determining the n+m sampling points obtained by sampling as a plurality of sampling points obtained by sampling on the rays. The m sampling points are sampled again at the key sampling points, so that the training effect of the model is more accurate at the surface of the three-dimensional model, and the reconstruction accuracy of the three-dimensional model is improved.

Optionally, in some embodiments, determining a depth value corresponding to the pixel point according to a depth image corresponding to the color image; calculating an SDF value of each sampling point from the pixel point based on the depth value; and calculating coordinate information of each sampling point according to the camera parameters and the depth values.

After a plurality of sampling points are sampled on the rays corresponding to each pixel point, for each sampling point, determining the distance between the shooting position of the color camera and the corresponding point on the target object according to the camera parameters and the depth value of the pixel point, and then calculating the SDF value of each sampling point one by one and the coordinate information of each sampling point based on the distance.

After the training of the basic model is completed, for the coordinate information of any given point, the corresponding SDF value of the basic model after the training is completed can be predicted by the basic model after the training is completed, and the predicted SDF value indicates the position relationship (internal, external or surface) between the point and the three-dimensional model of the target object, so as to realize the implicit expression of the three-dimensional model of the target object and obtain the neural network model for implicitly expressing the three-dimensional model of the target object.

Finally, performing iso-surface extraction on the neural network model, for example, drawing the surface of the three-dimensional model by using an iso-surface extraction algorithm (MC), so as to obtain the surface of the three-dimensional model, and further obtaining the three-dimensional model of the target object according to the surface of the three-dimensional model.

According to the three-dimensional reconstruction scheme, the three-dimensional model of the target object is implicitly modeled through the neural network, and depth information is added to improve the training speed and accuracy of the model. By adopting the three-dimensional reconstruction scheme provided by the application, the three-dimensional reconstruction is continuously carried out on the shooting object in time sequence, so that three-dimensional models of the shooting object at different moments can be obtained, and a three-dimensional model sequence formed by the three-dimensional models at different moments according to the time sequence is the volume video shot by the shooting object. Therefore, the volume video shooting can be carried out on any shooting object, and the volume video with specific content presentation can be obtained. For example, the dance shooting object can be shot with a volume video to obtain a volume video of dance of the shooting object at any angle, the teaching shooting object can be shot with a volume video to obtain a teaching volume video of the shooting object at any angle, and the like.

It should be noted that, the volume video according to the following embodiments of the present application may be obtained by shooting using the above volume video shooting method.

In some embodiments, after the volumetric video is obtained, data corresponding to the volumetric video may be transmitted to the cloud. For example, three-dimensional model data of a virtual object in the volume video is transmitted to the cloud. When the three-dimensional model data is required to be called later, the three-dimensional model data of the virtual object required by the user can be downloaded in the cloud.

In some embodiments, the user may use the downloaded volume video locally, e.g., the volume video may be put into a tool such as UE4/UE5/Unity 3D, etc., perfectly fused with the virtual scene or CG special effects. Real-time rendering of the volume video can also be realized, such as application in AR-time and other scenes.

For a more detailed understanding of the performance synchronization method provided in the embodiments of the present application, please continue to refer to fig. 2, fig. 2 is a first flowchart of the performance synchronization method provided in the embodiments of the present application. The performance synchronization method may include the steps of:

110. and acquiring an initial video after interactive performance of the performance object and the virtual object in the volume video.

It will be appreciated that virtual objects in the volumetric video are invisible in the real world and cannot be seen by the performing object in the real world when the performing object is to perform with the virtual object.

In some scenes, such as a scene in which the performance object and the virtual object are AR-shot, the performance object may dance together with the virtual object, for example, as dance movements of the performance object and the virtual object, thereby achieving a double dance effect.

In some scenes, the performance object may shoot a movie together with the virtual object, for example, the performance object may cooperate with the virtual object according to the set scenario and line, thereby realizing shooting the movie in the condition-limited scene and the virtual object.

In some embodiments, the electronic device may photograph the performance object in a preset photographing scene to obtain photographing information; and generating an initial video according to the shooting information and the model information of the virtual object in the volume video.

For example, the shooting information of the performance object may be obtained by shooting the performance object with an electronic device having a shooting function such as a mobile phone or a video camera, and the shooting information may include two-dimensional or three-dimensional image data of the performance object, audio data of the performance object, and environmental audio data in the environment.

In some embodiments, in the process of shooting the performance object, the electronic device may further call the volume video in real time, so as to implement real-time rendering of a picture corresponding to the virtual object in the shooting scene, thereby implementing the co-platform performance of the performance object and the virtual object, and thus obtaining the initial video. In the initial video, the picture data corresponding to the virtual object can be edited or deleted, so that the editing of the initial video by a later-stage creator is facilitated, and the synchronization of the performance progress of the performance object and the virtual object by the later-stage electronic equipment is also facilitated.

It should be noted that, the initial video may be a two-dimensional normal video, such as a video shot by a mobile phone. The initial video may also be a three-dimensional volumetric video, such as that captured by a professional camera matrix in the above embodiments.

120. A first performance progress of the performance object and a second performance progress of the virtual object are determined in the initial video.

It may be appreciated that the first performance progress of the performance object may be determined by the action and voice of the performance object, for example, when the limb action of the performance object is performed to a certain action set in the scenario, and for example, when the voice dialogue of the performance object is performed to a certain sentence set in the scenario, the first performance progress of the performance object may be determined.

Similarly, the second performance progress of the performing object may also be determined by the action and voice of the virtual object, for example, when the limb action of the virtual object is executed to a certain action set in the scenario, for example, when the voice dialogue of the virtual object is executed to a certain sentence set in the scenario, the second performance progress of the virtual object may be determined.

In some embodiments, the electronic device may obtain first motion information of the performance object, determine the first motion information as a first performance progress; and acquiring second action information of the virtual object, and determining the second action information as a second performance progress.

For example, in the performance progress of the performance object, first motion information of the performance object may be obtained, in which a transition from a previous motion to a next motion may occur, and the electronic device may identify the transition of the motion through the gesture recognition model, and when the similarity between the motion of the performance object in the previous frame and the motion of the performance object in the next frame is lower than a preset value, it is indicated that the performance object has undergone the transition of the motion, so as to record the performance progress.

The electronic device may also set a performance progress flag according to each action of the performance object, such as a first action corresponding flag 1, a second action corresponding flag 2, a third action corresponding flag 3 … …, and an nth action corresponding flag N.

Similarly, the electronic device may also obtain second motion information of the virtual object during the performance process of the virtual object, where the transition from the previous motion to the next motion occurs in the second motion information, and the electronic device may identify the transition of the motion through the gesture recognition model, and when the similarity between the motion of the virtual object in the previous frame and the motion of the virtual object in the next frame is lower than a preset value, it indicates that the motion transition occurs in the virtual object, so as to record the performance progress.

130. And determining a target time period for which the performance of the performance object and the virtual object are not synchronous according to the first performance progress and the second performance progress.

The virtual object cannot be seen due to the performance object. The virtual object does not have thinking ability and cannot control the progress of the performance according to the performance object. In the initial video, the first performance progress of the performance object and the second performance progress of the training object may be out of sync. For example, the first performance progress and the second performance progress are unsynchronized during some period of time in the initial video.

In some implementations, the computer device may determine a first time period for the performance object to undergo a motion transition from the first motion information before determining a target time period for the performance object and the virtual object to perform out of synchronization from the first performance schedule and the second performance schedule; in a first time period, recognizing the gesture of the performing object according to the first action information to obtain first gesture information; and in the first time period, recognizing the gesture of the virtual object according to the second action information to obtain second gesture information.

For example, in the course of performance of the performance object, a transition from the last action to the next action may occur, the electronic device may identify the transition of the action through the gesture recognition model, and when the action of the performance object in the last frame and the action of the performance object in the next frame are transitioned, the time period during which the transition of the action occurs is determined as the first time period.

The first action information comprises different actions corresponding to different time points of the performance object, and the gesture of the performance object is identified through the gesture identification model, so that gesture information of the actions at the different time points is identified, and the first gesture information is obtained.

The second action information comprises different actions corresponding to different time points of the virtual object, and the gesture of the virtual object is identified through the gesture identification model, so that gesture information of the actions at the different time points is identified, and the second gesture information is obtained.

In some implementations, the computer device can determine a pose similarity from the first pose information and the second pose information; if the gesture similarity is lower than a preset similarity threshold, the performance object and the virtual object perform asynchronously, and the first time period is determined to be a target time period.

For example, the electronic device may compare the first pose information and the second pose information corresponding to different time points, so as to determine the pose similarity of the performing object and the virtual object. And when the gesture similarity is greater than or equal to a preset similarity threshold, the performance progress of the performance object and the virtual object is synchronous. If the gesture similarity is lower than a preset similarity threshold, the performance object and the virtual object perform asynchronously, and the first time period is determined to be a target time period.

140. And in the target time period, adjusting the second performance progress according to the first performance progress so as to control the performance progress synchronization of the virtual object and the performance object.

Because the shooting information of the performance object is shot, the second performance progress of the virtual object can be adjusted, so that the second performance progress and the first performance progress can be synchronized, and the performance progress of the virtual object and the performance progress of the performance object can be synchronized.

In some implementations, the electronic device can determine a first sequence of actions of the performance object within the target time period based on the first action information; determining a second action sequence of the virtual object according to the second action information; and controlling the second action sequence to be aligned with the first action sequence by taking the first action sequence as a reference so as to control the performance progress synchronization of the virtual object and the performance object.

For example, if actions of the performance object at different time points are recorded in the first action information, the actions of the performance object may be generated into a first action sequence according to a time sequence. And if the second action information records actions of the virtual object at different time points, generating a second action sequence according to the time sequence by the actions of the virtual object.

And taking the first action sequence as a reference, and controlling the second action sequence to be aligned with the first action sequence so as to control the performance progress synchronization of the virtual object and the performance object.

Specifically, the electronic device performs a dynamic time warping (DTW, dynamic Time Warping) process on the second motion sequence and the first motion sequence based on the first motion sequence to control performance progress synchronization of the virtual object and the performance object.

In the scenario where the first motion sequence is used as a reference, the second motion sequence is controlled to be aligned with the first motion sequence, so as to control the synchronization of the progress of the virtual object and the performance of the performance object. Often, the method is applied to the situation that the performance object and the virtual object perform the same action, such as the situation that the performance object and the virtual object perform the same dance, but in the process of jumping, the unsynchronized dance action may occur, so that the performance progress of the virtual object and the performance object can be synchronized in this way.

In some embodiments, the electronic device may further determine, in the initial video, a first identification value of the action and/or voice of each frame of the performance object, and determine a first performance progress according to the first identification value; and in the initial video, determining a second identification value of the action and/or the voice of each frame of virtual object, and determining a second performance progress according to the second identification value.

Specifically, the electronic device may pre-establish a mapping relationship between the identification value of the action and/or voice of the performance object and the identification value of the action and/or voice of the virtual object.

For example, in a scene where a performance object and a virtual object take a movie, if the performance object speaks a sentence, the virtual object responds to the sentence, and if the performance object performs an action, the virtual object also responds to the execution of an action.

The identification value can be set according to the action and/or voice of the performance object, the identification value can be set according to the action and/or voice of the virtual object, and then the mapping relation is established between the identification values of the virtual object and the identification value.

The electronic device may determine, according to the mapping relationship, a time period in which the first identification value and the second identification value are not matched, and determine the time period that is not matched as the target time period.

For example, when the first identification value and the second identification value do not match within a period of time, then it is indicated that the action and/or voice of the performance object is not synchronized with the action and/or voice of the virtual object. The time period is the target time period.

In some embodiments, the electronic device may determine, in the target period, a matched second identification value in the mapping relationship according to the first identification value; and setting the actions and/or voices of the virtual object corresponding to the matched second identification value and the actions and/or voices of the performance object corresponding to the first identification value in the frame corresponding to the first identification value.

For example, the electronic device may determine a second matching identification value according to the first identification value and the mapping relationship, and then determine an action and/or voice corresponding to the virtual object in the initial video according to the second identification value. And finally, setting the actions and/or voices of the virtual objects corresponding to the matched second identification values and the actions and/or voices of the performance objects corresponding to the first identification values in the frames corresponding to the first identification values. Thereby realizing the synchronization of the performance progress of the performance object and the virtual object.

With continued reference to fig. 3, fig. 3 is a second flowchart of a performance synchronization method according to an embodiment of the present application. The performance synchronization method may include the steps of:

201. Shooting the demonstration object in a preset shooting scene to obtain shooting information.

202. And generating an initial video according to the shooting information and the model information of the virtual object in the volume video.

203. And acquiring first action information of the performing object, and determining the first action information as a first performance progress. And acquiring second action information of the virtual object, and determining the second action information as a second performance progress.

204. A first time period for the performance object to undergo a motion transition is determined based on the first motion information.

205. And in the first time period, recognizing the gesture of the performing object according to the first action information to obtain first gesture information, and recognizing the gesture of the virtual object according to the second action information to obtain second gesture information.

206. And determining the gesture similarity according to the first gesture information and the second gesture information.

For example, the electronic device may compare the first pose information and the second pose information corresponding to different time points, so as to determine the pose similarity of the performing object and the virtual object. And when the gesture similarity is greater than or equal to a preset similarity threshold, the performance progress of the performance object and the virtual object is synchronous.

207. If the gesture similarity is lower than a preset similarity threshold, the performance object and the virtual object perform asynchronously, and the first time period is determined to be a target time period.

If the gesture similarity is lower than a preset similarity threshold, the performance object and the virtual object perform asynchronously, and the first time period is determined to be a target time period.

208. And determining a first action sequence of the performing object in the target time period according to the first action information, and determining a second action sequence of the virtual object according to the second action information.

209. And controlling the second action sequence to be aligned with the first action sequence by taking the first action sequence as a reference so as to control the performance progress synchronization of the virtual object and the performance object.

In the embodiment of the application, the electronic device photographs the performance object in a preset photographing scene to obtain photographing information. And generating an initial video according to the shooting information and the model information of the virtual object in the volume video. And acquiring first action information of the performing object, and determining the first action information as a first performance progress. And acquiring second action information of the virtual object, and determining the second action information as a second performance progress. A first time period for the performance object to undergo a motion transition is determined based on the first motion information.

And in the first time period, recognizing the gesture of the performing object according to the first action information to obtain first gesture information, and recognizing the gesture of the virtual object according to the second action information to obtain second gesture information. And determining the gesture similarity according to the first gesture information and the second gesture information. If the gesture similarity is lower than a preset similarity threshold, the performance object and the virtual object perform asynchronously, and the first time period is determined to be a target time period.

And finally, determining a first action sequence of the performing object in the target time period according to the first action information, and determining a second action sequence of the virtual object according to the second action information. And controlling the second action sequence to be aligned with the first action sequence by taking the first action sequence as a reference so as to control the performance progress synchronization of the virtual object and the performance object. Therefore, when the virtual object and the performance object perform interactive performance, the performance progress of the virtual object and the performance object is synchronized.

With continued reference to fig. 4, fig. 4 is a schematic view of a scenario of a performance synchronization method according to an embodiment of the present application.

Wherein, in the performance object and the virtual object in the initial video, the virtual object cannot be seen due to the performance object. The virtual object does not have thinking ability and cannot control the progress of the performance according to the performance object. In the initial video, the first performance progress of the performance object and the second performance progress of the training object may be out of sync. Specifically, as shown before left adjustment, the hand motions of the performance object and the virtual object are different.

As shown in the right diagram of fig. 4, the second performance progress of the virtual object is adjusted, so that the first performance progress of the virtual object is synchronized with the first performance progress of the virtual object. Finally, hand action synchronization of the performance object and the virtual object is realized.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a performance synchronization device according to an embodiment of the present application. The performance synchronizer 300 may include:

the obtaining module 310 is configured to obtain an initial video after the interactive performance is performed by the performance object and the virtual object in the volumetric video.

The obtaining module 310 is further configured to photograph the representation object in a preset photographing scene to obtain photographing information;

And generating an initial video according to the shooting information and the model information of the virtual object in the volume video.

A first determining module 320 for determining a first performance progress of the performance object and a second performance progress of the virtual object in the initial video.

The first determining module 320 is further configured to obtain first action information of the performing object, and determine the first action information as a first performance progress;

and acquiring second action information of the virtual object, and determining the second action information as a second performance progress.

The first determining module 320 is further configured to determine, according to the first motion information, a first time period during which the performance object performs motion transition before determining, according to the first performance progress and the second performance progress, a target time period during which the performance object performs non-synchronization with the virtual object;

in a first time period, recognizing the gesture of the performing object according to the first action information to obtain first gesture information;

and in the first time period, recognizing the gesture of the virtual object according to the second action information to obtain second gesture information.

The first determining module 320 is further configured to determine, in the initial video, a first identification value of an action and/or a voice of each frame of the performing object, and determine a first performance progress according to the first identification value;

And in the initial video, determining a second identification value of the action and/or the voice of each frame of virtual object, and determining a second performance progress according to the second identification value.

The second determining module 330 is configured to determine, according to the first performance progress and the second performance progress, a target period of time during which the performance object and the virtual object perform out of synchronization.

The second determining module 330 is further configured to determine a gesture similarity according to the first gesture information and the second gesture information;

The second determining module 330 is further configured to pre-establish a mapping relationship between an identification value of an action and/or a voice of the performing object and an identification value of an action and/or a voice of the virtual object before determining the target time period for which the performing object and the virtual object perform asynchronously according to the first performance progress and the second performance progress.

The second determining module 330 is further configured to determine, according to the mapping relationship, a time period in which the first identification value and the second identification value do not match, and determine the time period in which the first identification value and the second identification value do not match as the target time period.

And the adjusting module 340 is configured to adjust the second performance progress according to the first performance progress during the target time period, so as to control the performance progress synchronization of the virtual object and the performance object.

The adjusting module 340 is further configured to determine, according to the first motion information, a first motion sequence of the performing object in the target time period;

determining a second action sequence of the virtual object according to the second action information;

and controlling the second action sequence to be aligned with the first action sequence by taking the first action sequence as a reference so as to control the performance progress synchronization of the virtual object and the performance object.

The adjusting module 340 is further configured to perform dynamic time warping processing on the second motion sequence and the first motion sequence based on the first motion sequence, so as to control performance progress synchronization of the virtual object and the performance object.

The adjusting module 340 is further configured to determine, in the target period, a second matched identification value in the mapping relationship according to the first identification value;

and setting the actions and/or voices of the virtual object corresponding to the matched second identification value and the actions and/or voices of the performance object corresponding to the first identification value in the frame corresponding to the first identification value.

In the embodiment of the application, the performance synchronization device acquires an initial video after interactive performance is performed by a performance object and a virtual object in the volume video; determining a first performance progress of the performance object and a second performance progress of the virtual object in the initial video; determining a target time period for which the performance of the performance object and the virtual object are not synchronous according to the first performance progress and the second performance progress; and in the target time period, adjusting the second performance progress according to the first performance progress so as to control the performance progress synchronization of the virtual object and the performance object. Therefore, the performance progress of the interactive performance of the virtual object and the performance object is synchronized.

Accordingly, the present embodiment also provides an electronic device, as shown in fig. 6, where the electronic device 400 may include a memory 401 including one or more computer readable storage media, an input unit 402, a display unit 403, a sensor 404, a processor 405 including one or more processing cores, and a power supply 406. It will be appreciated by those skilled in the art that the electronic device structure shown in fig. 6 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:

the memory 401 may be used to store software programs and modules, and the processor 405 executes various functional applications and data processing by executing the software programs and modules stored in the memory 401. The memory 401 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device (such as audio data, phonebooks, etc.), and the like. In addition, memory 401 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 401 may further include a memory controller to provide access to the memory 401 by the processor 405 and the input unit 402.

The input unit 402 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, in one particular embodiment, the input unit 402 may include a touch-sensitive surface, as well as other input devices. The touch-sensitive surface, also referred to as a touch display screen or a touch pad, may collect touch operations thereon or thereabout by a user (e.g., operations thereon or thereabout by a user using any suitable object or accessory such as a finger, stylus, etc.), and actuate the corresponding connection means according to a predetermined program. Alternatively, the touch-sensitive surface may comprise two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 405, and can receive commands from the processor 405 and execute them. In addition, touch sensitive surfaces may be implemented in a variety of types, such as resistive, capacitive, infrared, and surface acoustic waves. In addition to the touch-sensitive surface, the input unit 402 may also include other input devices. In particular, other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.

The display unit 403 may be used to display information entered by a user or provided to a user as well as various graphical user interfaces of the electronic device, which may be composed of graphics, text, icons, video and any combination thereof. The display unit 403 may include a display panel, which may be optionally configured in the form of a liquid crystal display (LCD, liquid Crystal Display), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch-sensitive surface may overlay a display panel, and upon detection of a touch operation thereon or thereabout, the touch-sensitive surface is passed to the processor 405 to determine the type of touch event, and the processor 405 then provides a corresponding visual output on the display panel based on the type of touch event. Although in fig. 6 the touch sensitive surface and the display panel are implemented as two separate components for input and output functions, in some embodiments the touch sensitive surface may be integrated with the display panel to implement the input and output functions.

The electronic device may also include at least one sensor 404, such as a light sensor, a motion sensor, and other sensors. In particular, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel according to the brightness of ambient light, and a proximity sensor that may turn off the display panel and/or backlight when the electronic device is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and the direction when the device is stationary, and the device can be used for recognizing the gesture of the electronic equipment (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured with the electronic device are not described in detail herein.

The processor 405 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 401, and calling data stored in the memory 401, thereby performing overall monitoring of the electronic device. Optionally, the processor 405 may include one or more processing cores; preferably, the processor 405 may integrate an application processor and a modem processor, wherein the application processor primarily handles operating systems, user interfaces, applications, etc., and the modem processor primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 405.

The electronic device also includes a power supply 406 (e.g., a battery) for powering the various components, which may be logically connected to the processor 405 via a power management system so as to perform functions such as managing charge, discharge, and power consumption via the power management system. The power supply 406 may also include one or more of any components, such as a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

Although not shown, the electronic device may further include a camera, a bluetooth module, etc., which will not be described herein. In particular, in this embodiment, the processor 405 in the electronic device loads the computer program stored in the memory 401, and the processor 405 implements various functions in the performance synchronization method by loading the computer program:

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.

To this end, embodiments of the present application provide a computer readable storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform steps in any of the performance synchronization methods provided by embodiments of the present application. For example, the instructions may perform the steps of:

The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.

Wherein the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

Because the instructions stored in the storage medium may perform steps in any performance synchronization method or image processing method provided in the embodiments of the present application, the beneficial effects that any performance synchronization method or image processing method provided in the embodiments of the present application may be achieved, which are detailed in the previous embodiments and are not described herein.

The foregoing describes in detail a performance synchronization method, apparatus, electronic device and storage medium provided in the embodiments of the present application, and specific examples are applied to illustrate principles and implementations of the present application, where the foregoing examples are only used to help understand the method and core idea of the present application; meanwhile, those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, and the present description should not be construed as limiting the present application in view of the above.

Claims

1. A performance synchronization method, comprising:

determining a target time period for which the performance object and the virtual object perform asynchronously according to the first performance progress and the second performance progress;

2. A performance synchronization method according to claim 1, wherein the acquiring the initial video after the interactive performance of the performance object and the virtual object in the volumetric video comprises:

shooting the performing object in a preset shooting scene to obtain shooting information;

and generating the initial video according to the shooting information and the model information of the virtual object in the volume video.

3. The performance synchronization method of claim 1, wherein the determining a first performance progress of the performance object and a second performance progress of the virtual object in the initial video comprises:

Acquiring first action information of the performance object, and determining the first action information as the first performance progress;

and acquiring second action information of the virtual object, and determining the second action information as the second performance progress.

4. A performance synchronization method according to claim 3, wherein prior to the determining a target period of time for which the performance object and the virtual object perform out of synchronization according to the first performance schedule and the second performance schedule, the method further comprises:

determining a first time period for the action transition of the performance object according to the first action information;

in the first time period, recognizing the gesture of the performing object according to the first action information to obtain first gesture information;

5. The performance synchronization method according to claim 4, wherein the determining a target period of time for which the performance object and the virtual object perform are not synchronized according to the first performance progress and the second performance progress, comprises:

Determining the gesture similarity according to the first gesture information and the second gesture information;

if the gesture similarity is lower than a preset similarity threshold, the performing object and the virtual object perform asynchronously, and the first time period is determined to be the target time period.

6. A performance synchronization method according to claim 3, wherein said adjusting the second performance progress in accordance with the first performance progress during the target time period to control performance progress synchronization of the virtual object and the performance object comprises:

determining a first sequence of actions of the performing object within the target time period according to the first action information;

7. The performance synchronization method according to claim 6, wherein the controlling the second motion sequence to align with the first motion sequence based on the first motion sequence to control performance progress synchronization of the virtual object and the performance object comprises:

And carrying out dynamic time warping processing on the second action sequence and the first action sequence by taking the first action sequence as a reference so as to control the performance progress synchronization of the virtual object and the performance object.

8. The performance synchronization method of claim 1, wherein the determining a first performance progress of the performance object and a second performance progress of the virtual object in the initial video comprises:

determining a first identification value of actions and/or voices of the performance object in each frame in the initial video, and determining the first performance progress according to the first identification value;

and in the initial video, determining a second identification value of the action and/or the voice of the virtual object of each frame, and determining the second performance progress according to the second identification value.

9. The performance synchronization method of claim 8, wherein prior to the determining a target period of time for which the performance object and the virtual object perform are not synchronized based on the first performance progress and the second performance progress, the method further comprises:

pre-establishing a mapping relation between the identification value of the action and/or voice of the performance object and the identification value of the action and/or voice of the virtual object;

The determining, according to the first performance progress and the second performance progress, a target time period for which the performance object and the virtual object perform asynchronously includes:

and according to the mapping relation, determining a time period when the first identification value and the second identification value are not matched, and determining the time period when the first identification value and the second identification value are not matched as the target time period.

10. The performance synchronization method according to claim 9, wherein the adjusting the second performance progress according to the first performance progress during the target period to control performance progress synchronization of the virtual object and the performance object comprises:

in the target time period, determining a matched second identification value in the mapping relation according to the first identification value;

11. A performance synchronization device, comprising:

a second determining module, configured to determine a target time period for which the performance object and the virtual object perform asynchronously according to the first performance progress and the second performance progress;

12. An electronic device, comprising:

a memory storing executable program code, a processor coupled to the memory;

the processor invokes the executable program code stored in the memory to perform the steps in the performance synchronization method of any one of claims 1 to 10.

13. A storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps in the performance synchronization method of any one of claims 1 to 10.