CN116129526A

CN116129526A - Method and device for controlling photographing, electronic equipment and storage medium

Info

Publication number: CN116129526A
Application number: CN202310080244.4A
Authority: CN
Inventors: 邵志兢; 张煜; 孙伟; 吕云
Original assignee: Zhuhai Prometheus Vision Technology Co ltd
Current assignee: Zhuhai Prometheus Vision Technology Co ltd
Priority date: 2023-02-01
Filing date: 2023-02-01
Publication date: 2023-05-16

Abstract

The application provides a method and device for controlling photographing, electronic equipment and a computer readable storage medium. The method for controlling the photographing comprises the following steps: presenting a shooting picture containing a user role; performing gesture recognition on the user role in the shooting picture to obtain the current gesture of the user role; if the current gesture is matched with a preset position control gesture, acquiring a pointing position point of the current gesture; and placing a virtual object in the shooting picture according to the pointing position point so as to control the relative position of the virtual object and the user role in the shooting picture. According to the method and the device, the user role can control and clear the position of the virtual object, so that the user role can make more natural coordinated actions and expressions with the virtual object when taking a picture with the virtual object, the abrupt nature of a picture in time is reduced, and the picture in time effect is more natural.

Description

Method and device for controlling photographing, electronic equipment and storage medium

Technical Field

The application relates to the technical field of shooting processing, in particular to a method and a device for controlling a photographing time, electronic equipment and a computer readable storage medium.

Background

With the development of image capturing technologies, users may capture various videos or images using electronic devices, and the requirements of users for capturing videos or images are also becoming more and more diversified, for example, users desire to capture videos or images in real time with virtual objects (such as volume videos).

However, the inventor of the embodiment of the application finds that in the actual research and development process: in the process of shooting, the user role does not know the relative position of the virtual object and the user role in the picture, so that the actions, the expressions and the like of the user role and the virtual object are not coordinated, and the picture effect of shooting is abrupt.

Disclosure of Invention

The application provides a method, a device, electronic equipment and a computer readable storage medium for controlling a time, which can enable a user role to control and clearly control the position of a virtual object, further enable the user role to make more natural coordinated actions and expressions with the virtual object when the user role is in time with the virtual object, reduce the abrupt nature of a time shooting picture and enable the time shooting picture effect to be more natural.

In a first aspect, the present application provides a method for controlling a time, the method including:

presenting a shooting picture containing a user role;

Performing gesture recognition on the user role in the shooting picture to obtain the current gesture of the user role;

if the current gesture is matched with a preset position control gesture, acquiring a pointing position point of the current gesture;

and placing a virtual object in the shooting picture according to the pointing position point so as to control the relative position of the virtual object and the user role in the shooting picture.

In a second aspect, the present application provides a time taking control device, the time taking control device comprising:

a display unit for presenting a photographed picture including a user character;

the identification unit is used for carrying out gesture identification on the user role in the shooting picture to obtain the current gesture of the user role;

the acquisition unit is used for acquiring the pointing position point of the current gesture if the current gesture is matched with a preset position control gesture;

and the control unit is used for placing a virtual object in the shooting picture according to the pointing position point so as to control the relative position of the virtual object and the user role in the shooting picture.

In some embodiments, the acquiring unit is specifically configured to:

If the current gesture is matched with a preset position control gesture, acquiring a ray formed in a three-dimensional space where the shooting picture is located, wherein the ray is pointed by a finger corresponding to the current gesture;

and acquiring an intersection point between the ray and the supporting surface of the virtual object to serve as a pointing position point of the current gesture.

In some embodiments, before the acquiring the intersection point between the ray and the supporting surface of the virtual object as the pointing position point of the current gesture, the acquiring bit unit is specifically configured to:

adding the standing surface of the user role in the shooting picture to an alternative plane set of the shooting picture;

adding a plane with an included angle smaller than a preset included angle threshold value between the shooting picture and the standing surface into the alternative plane set;

from each plane of the candidate plane set, a plane which has an intersection point with the ray and is closest to the starting point of the ray is acquired as a supporting surface of the virtual object.

In some embodiments, the acquiring unit is specifically configured to:

detecting a gesture type of a current gesture of the user character;

and if the gesture type is a position control gesture and the current gesture is matched with a preset position control gesture, acquiring a pointing position point of the current gesture.

In some embodiments, the control unit is specifically configured to:

if the gesture type is a direction control gesture, acquiring the associated direction of the current gesture;

and controlling the relative orientation of the virtual object and the user role in the shooting picture according to the association orientation.

In some embodiments, the control unit is specifically configured to:

in response to a change in the orientation of the user character, the orientation of the virtual object is updated such that the relative orientation of the user character and the virtual object remains the associated orientation.

In some embodiments, the control unit is specifically configured to:

if the gesture type is a distance control gesture, acquiring the associated distance of the current gesture;

and controlling the relative distance between the virtual object and the user role in the shooting picture according to the association distance.

In some embodiments, the control unit is specifically configured to:

and responding to touch operation for a shooting control, shooting the shooting picture to obtain a target snap video of the virtual object and the user role.

In some embodiments, the control unit is specifically configured to:

Shooting the shooting picture in response to touch operation for a shooting control, and obtaining a preliminary shooting video of the virtual object and the user role;

performing control gesture recognition on the video frames in the preliminary shot video to obtain target video frames containing control gestures;

and filtering the target video frame from the preliminary shot video to obtain the target shot video.

In some embodiments, the control unit is specifically configured to:

when the current gesture is matched with a preset control gesture, detecting whether a camera of the shooting picture is in a shooting state or not;

if the camera is in a shooting state, switching the camera from the shooting state to a pause state;

and switching the camera from a pause state to a shooting state until the virtual object in the shooting picture is placed at the pointing position point.

In some embodiments, the virtual object is a three-dimensional model in a volumetric video, and the control unit is specifically configured to:

and placing a three-dimensional model in the volume video in the shooting picture according to the pointing position point.

In a third aspect, the present application further provides an electronic device, where the electronic device includes a processor and a memory, where the memory stores a computer program, and where the processor executes any of the time-taking control methods provided herein when calling the computer program in the memory.

In a fourth aspect, the present application further provides a computer readable storage medium having stored thereon a computer program, the computer program being loaded by a processor to perform the method of controlling a time frame.

According to the method and the device, gesture recognition is carried out on the user roles in the shooting picture, so that the current gestures of the user roles are obtained; if the current gesture is matched with the preset position control gesture, acquiring a pointing position point of the current gesture; the virtual object is placed in the shooting picture according to the pointing position point, so that the user role can control the position of the virtual object through gestures, the rough position of the virtual object shot by the user role is achieved, further, the user role can make actions and expressions which are more naturally coordinated with the virtual object when shooting with the virtual object, the abrupt nature of the shooting picture is reduced, and the shooting picture effect is more natural.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of a scene of a shooting control system provided in an embodiment of the present application;

fig. 2 is a schematic flow chart of a method for controlling a photographing according to an embodiment of the present application;

fig. 3 is a schematic diagram illustrating a comparison between a shot image and a shot scene provided in an embodiment of the present application;

fig. 4 is a schematic view of a scene of a shooting picture provided in an embodiment of the present application;

fig. 5 is another scene schematic diagram of a shooting picture provided in the embodiment of the present application;

fig. 6 is another scene schematic diagram of a shooting screen provided in an embodiment of the present application;

fig. 7 is a schematic structural diagram of an embodiment of a photographing control apparatus provided in the embodiment of the present application;

fig. 8 is a schematic structural diagram of an embodiment of an electronic device provided in an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

In the description of the embodiments of the present application, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or an implicit indication of the number of features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more of the described features. In the description of the embodiments of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

The following description is presented to enable any person skilled in the art to make and use the application. In the following description, details are set forth for purposes of explanation. It will be apparent to one of ordinary skill in the art that the present application may be practiced without these specific details. In other instances, well-known processes have not been described in detail in order to avoid unnecessarily obscuring descriptions of the embodiments of the present application. Thus, the present application is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed in the embodiments of the present application.

Volumetric Video (also known as Volumetric Video, spatial Video, volumetric three-dimensional Video, or 6-degree-of-freedom Video, etc.) is a technique that generates a sequence of three-dimensional models by capturing information (e.g., depth information, color information, etc.) in three-dimensional space. Compared with the traditional video, the volumetric video adds the concept of space into the video, and the three-dimensional model is used for better restoring the real three-dimensional world, instead of using a two-dimensional plane video plus a mirror to simulate the sense of space of the real three-dimensional world. Because the volume video is a three-dimensional model sequence, a user can adjust to any visual angle to watch according to own preference, and compared with a two-dimensional plane video, the volume video has higher reduction degree and immersion sense.

Alternatively, in the present application, the three-dimensional model used to construct the volumetric video may be reconstructed as follows:

firstly, color images and depth images of different visual angles of a shooting object and camera parameters corresponding to the color images are acquired; and training a neural network model implicitly expressing a three-dimensional model of the shooting object according to the acquired color image and the corresponding depth image and camera parameters, and extracting an isosurface based on the trained neural network model to realize three-dimensional reconstruction of the shooting object so as to obtain the three-dimensional model of the shooting object.

It should be noted that, in the embodiments of the present application, the neural network model of which architecture is adopted is not particularly limited, and may be selected by those skilled in the art according to actual needs. For example, a multi-layer perceptron (Multilayer Perceptron, MLP) without a normalization layer may be selected as a base model for model training.

The three-dimensional model reconstruction method provided in the present application will be described in detail below.

Firstly, a plurality of color cameras and depth cameras can be synchronously adopted to shoot a target object (the target object is a shooting object) which needs to be subjected to three-dimensional reconstruction at multiple visual angles, so as to obtain color images and corresponding depth images of the target object at multiple different visual angles, namely, at the same shooting moment (the difference value of actual shooting moments is smaller than or equal to a time threshold, namely, the shooting moments are considered to be the same), the color cameras at all visual angles shoot to obtain color images of the target object at the corresponding visual angles, and correspondingly, the depth cameras at all visual angles shoot to obtain depth images of the target object at the corresponding visual angles. The target object may be any object, including but not limited to living objects such as a person, an animal, and a plant, or inanimate objects such as a machine, furniture, and a doll.

Therefore, the color images of the target object at different visual angles are provided with the corresponding depth images, namely, when shooting, the color cameras and the depth cameras can adopt the configuration of a camera set, and the color cameras at the same visual angle are matched with the depth cameras to synchronously shoot the same target object. For example, a studio may be built, in which a central area is a photographing area, around which a plurality of sets of color cameras and depth cameras are paired at a certain angle interval in a horizontal direction and a vertical direction. When the target object is in the shooting area surrounded by the color cameras and the depth cameras, the color images and the corresponding depth images of the target object at different visual angles can be obtained through shooting by the color cameras and the depth cameras.

In addition, camera parameters of the color camera corresponding to each color image are further acquired. The camera parameters include internal parameters and external parameters of the color camera, which can be determined through calibration, wherein the internal parameters of the color camera are parameters related to the characteristics of the color camera, including but not limited to data such as focal length and pixels of the color camera, and the external parameters of the color camera are parameters of the color camera in a world coordinate system, including but not limited to data such as position (coordinates) of the color camera and rotation direction of the camera.

As described above, after obtaining the color images of the target object at different viewing angles and the corresponding depth images thereof at the same shooting time, the three-dimensional reconstruction of the target object can be performed according to the color images and the corresponding depth images thereof. Different from the mode of converting depth information into point cloud to perform three-dimensional reconstruction in the related technology, the method and the device train a neural network model to achieve implicit expression of the three-dimensional model of the target object, so that three-dimensional reconstruction of the target object is achieved based on the neural network model.

Optionally, the application selects a multi-layer perceptron (Multilayer Perceptron, MLP) that does not include a normalization layer as the base model, and trains as follows:

converting pixel points in each color image into rays based on corresponding camera parameters;

sampling a plurality of sampling points on the rays, and determining first coordinate information of each sampling point and an SDF value of each sampling point from a pixel point;

inputting the first coordinate information of the sampling points into a basic model to obtain a predicted SDF value and a predicted RGB color value of each sampling point output by the basic model;

based on a first difference between the predicted SDF value and the SDF value and a second difference between the predicted RGB color value and the RGB color value of the pixel point, adjusting parameters of the basic model until a preset stop condition is met;

And taking the basic model meeting the preset stopping condition as a neural network model of the three-dimensional model of the implicitly expressed target object.

Firstly, converting a pixel point in a color image into a ray based on camera parameters corresponding to the color image, wherein the ray can be a ray passing through the pixel point and perpendicular to a color image surface; then, sampling a plurality of sampling points on the ray, wherein the sampling process of the sampling points can be executed in two steps, partial sampling points can be uniformly sampled firstly, and then the plurality of sampling points are further sampled at a key position based on the depth value of the pixel point, so that the condition that the sampling points near the surface of the model can be sampled as many as possible is ensured; then, calculating first coordinate information of each sampling point in a world coordinate system and a directional distance (Signed Distance Field, SDF) value of each sampling point according to camera parameters and depth values of the pixel points, wherein the SDF value can be a difference value between the depth value of the pixel point and the distance between the sampling point and an imaging surface of a camera, the difference value is a signed value, when the difference value is a positive value, the sampling point is outside the three-dimensional model, when the difference value is a negative value, the sampling point is inside the three-dimensional model, and when the difference value is zero, the sampling point is on the surface of the three-dimensional model; then, after sampling of the sampling points is completed and the SDF value corresponding to each sampling point is obtained through calculation, first coordinate information of the sampling points in a world coordinate system is further input into a basic model (the basic model is configured to map the input coordinate information into the SDF value and the RGB color value and then output), the SDF value output by the basic model is recorded as a predicted SDF value, and the RGB color value output by the basic model is recorded as a predicted RGB color value; then, parameters of the basic model are adjusted based on a first difference between the predicted SDF value and the SDF value corresponding to the sampling point and a second difference between the predicted RGB color value and the RGB color value of the pixel point corresponding to the sampling point.

In addition, for other pixel points in the color image, sampling is performed in the above manner, and then coordinate information of the sampling point in the world coordinate system is input to the basic model to obtain a corresponding predicted SDF value and a predicted RGB color value, which are used for adjusting parameters of the basic model until a preset stopping condition is met, for example, the preset stopping condition may be configured to reach a preset number of iterations of the basic model, or the preset stopping condition may be configured to converge the basic model. When the iteration of the basic model meets the preset stopping condition, the neural network model which can accurately and implicitly express the three-dimensional model of the shooting object is obtained. Finally, an isosurface extraction algorithm can be adopted to extract the three-dimensional model surface of the neural network model, so that a three-dimensional model of the shooting object is obtained.

Optionally, in some embodiments, determining an imaging plane of the color image based on camera parameters; and determining that the rays passing through the pixel points in the color image and perpendicular to the imaging surface are rays corresponding to the pixel points.

The coordinate information of the color image in the world coordinate system, namely the imaging surface, can be determined according to the camera parameters of the color camera corresponding to the color image. Then, it can be determined that the ray passing through the pixel point in the color image and perpendicular to the imaging plane is the ray corresponding to the pixel point.

Optionally, in some embodiments, determining second coordinate information and rotation angle of the color camera in the world coordinate system according to the camera parameters; and determining an imaging surface of the color image according to the second coordinate information and the rotation angle.

Optionally, in some embodiments, the first number of first sampling points are equally spaced on the ray; determining a plurality of key sampling points according to the depth values of the pixel points, and sampling a second number of second sampling points according to the key sampling points; the first number of first sampling points and the second number of second sampling points are determined as a plurality of sampling points obtained by sampling on the rays.

Firstly uniformly sampling n (i.e. a first number) first sampling points on rays, wherein n is a positive integer greater than 2; then, according to the depth value of the pixel point, determining a preset number of key sampling points closest to the pixel point from n first sampling points, or determining key sampling points smaller than a distance threshold from the pixel point from n first sampling points; then, resampling m second sampling points according to the determined key sampling points, wherein m is a positive integer greater than 1; and finally, determining the n+m sampling points obtained by sampling as a plurality of sampling points obtained by sampling on the rays. The m sampling points are sampled again at the key sampling points, so that the training effect of the model is more accurate at the surface of the three-dimensional model, and the reconstruction accuracy of the three-dimensional model is improved.

Optionally, in some embodiments, determining a depth value corresponding to the pixel point according to a depth image corresponding to the color image; calculating an SDF value of each sampling point from the pixel point based on the depth value; and calculating coordinate information of each sampling point according to the camera parameters and the depth values.

After a plurality of sampling points are sampled on the rays corresponding to each pixel point, for each sampling point, determining the distance between the shooting position of the color camera and the corresponding point on the target object according to the camera parameters and the depth value of the pixel point, and then calculating the SDF value of each sampling point one by one and the coordinate information of each sampling point based on the distance.

After the training of the basic model is completed, for the coordinate information of any given point, the corresponding SDF value of the basic model after the training is completed can be predicted by the basic model after the training is completed, and the predicted SDF value indicates the position relationship (internal, external or surface) between the point and the three-dimensional model of the target object, so as to realize the implicit expression of the three-dimensional model of the target object and obtain the neural network model for implicitly expressing the three-dimensional model of the target object.

Finally, performing iso-surface extraction on the neural network model, for example, drawing the surface of the three-dimensional model by using an iso-surface extraction algorithm (MC), so as to obtain the surface of the three-dimensional model, and further obtaining the three-dimensional model of the target object according to the surface of the three-dimensional model.

According to the three-dimensional reconstruction scheme, the three-dimensional model of the target object is implicitly modeled through the neural network, and depth information is added to improve the training speed and accuracy of the model. By adopting the three-dimensional reconstruction scheme provided by the application, the three-dimensional reconstruction is continuously carried out on the shooting object in time sequence, so that three-dimensional models of the shooting object at different moments can be obtained, and a three-dimensional model sequence formed by the three-dimensional models at different moments according to the time sequence is the volume video shot by the shooting object. Therefore, the volume video shooting can be carried out on any shooting object, and the volume video with specific content presentation can be obtained. For example, the dance shooting object can be shot with a volume video to obtain a volume video of dance of the shooting object at any angle, the teaching shooting object can be shot with a volume video to obtain a teaching volume video of the shooting object at any angle, and the like.

It should be noted that, the volume video according to the following embodiments of the present application may be obtained by shooting using the above volume video shooting method.

The embodiment of the application provides a method and device for controlling a photographing, electronic equipment and a computer readable storage medium. The photographing control device may be integrated in an electronic device, which may be a server or a terminal.

The method for controlling the time shooting of the embodiment of the application can be applied to the manufacturing flow and the use of the volume video, for example, a shot object (such as a performer) in the volume video is used as a virtual object, and the time shooting is performed according to the method for controlling the time shooting of the embodiment of the application. Illustratively, the production flow and use of the volume video are approximately as follows:

the first step: shooting and collecting

The performer enters a camera array system deployed according to a matrix, and the data such as color information, material information, depth information and the like of the performer are shot and extracted through professional-level acquisition equipment such as an infrared IR camera, a 4K ultra-high definition industrial camera and the like.

And a second step of: material generation

After the data are acquired, the materials are uploaded to the cloud, and then a volume video (3D dynamic character model sequence) can be automatically generated by a cloud mobilizing algorithm.

And a third step of: using volumetric video

The volume video is placed into the UE4/UE5/Unity 3D through the plug-in, perfectly fused with the virtual scene or CG special effects, supported real-time rendering, or used for AR shooting and the like.

The execution main body of the photographing control method in the embodiment of the present application may be a photographing control device provided in the embodiment of the present application, or different types of electronic devices such as a server device, a physical host, or a User Equipment (UE) integrated with the photographing control device, where the photographing control device may be implemented in a hardware or software manner, and the UE may specifically be a terminal device such as a smart phone, a tablet computer, a notebook computer, a palm computer, a desktop computer, or a personal digital assistant (Personal Digital Assistant, PDA). The electronic equipment can adopt a working mode of independent operation or a working mode of equipment cluster, and can integrate a camera or establish network connection with the camera so as to realize shooting of shooting scenes to form shooting pictures; the electronic device can also integrate a display screen or establish network connection with the display screen so as to realize the display of shooting pictures in the shooting process.

For example, the method for controlling the photographing in the embodiment of the present application may be applied to a photographing control system as shown in fig. 1. The time control system includes a terminal 101 and a server 102, where the terminal 101 may be a device including both receiving and transmitting hardware, i.e., a device having receiving and transmitting hardware capable of performing two-way communication over a two-way communication link. The terminal 101 may specifically be a terminal provided with a camera, such as a mobile phone, a tablet computer, a notebook computer, etc., for capturing a shooting scene to obtain a shooting picture; the terminal 101 may specifically be a camera installed at a shooting site for completing shooting of a still picture. The terminal 101 and the server 102 may perform bidirectional communication through a network, and the server 102 may be an independent server, or may be a server network or a server cluster formed by servers, which includes, but is not limited to, a computer, a network host, a single network server, a plurality of network server sets, or a cloud server formed by a plurality of servers. Wherein the Cloud server is composed of a large number of computers or web servers based on Cloud Computing (Cloud Computing). The server 102 may further include a display screen for displaying a shot screen. The terminal 101 and the server 102 may collectively implement the photographing control method, for example, the terminal 101 may transmit a photographing screen to the server 102, whereby the server 102 may present the photographing screen including the user character; performing gesture recognition on the user role in the shooting picture to obtain the current gesture of the user role; if the current gesture is matched with a preset position control gesture, acquiring a pointing position point of the current gesture; and placing a virtual object in the shooting picture according to the pointing position point so as to control the relative position of the virtual object and the user role in the shooting picture.

Those skilled in the art will appreciate that the application environment shown in fig. 1 is merely an application scenario of the present application, and is not limited to the application scenario of the present application, and other application environments may further include more or fewer computer devices than those shown in fig. 1, for example, only 1 server 102 is shown in fig. 1, and it will be appreciated that the time control system may further include one or more other servers, which is not limited herein.

It should be further noted that, the schematic view of the scene of the photographing control system shown in fig. 1 is only an example, and the photographing control system and the scene described in the embodiments of the present invention are for more clearly describing the technical solutions of the embodiments of the present invention, and do not constitute a limitation on the technical solutions provided by the embodiments of the present invention, and those skilled in the art can know that, with the evolution of the photographing control system and the appearance of a new service scene, the technical solutions provided by the embodiments of the present invention are equally applicable to similar technical problems.

Next, a method for controlling a photographing in accordance with an embodiment of the present application will be described, where an electronic device is used as an execution body, and the electronic device is integrated with a camera and a display screen for illustration, and in order to simplify and facilitate description, the execution body will be omitted in the following method embodiments.

Referring to fig. 2, fig. 2 is a schematic flow chart of a method for controlling a photographing according to an embodiment of the present application. It should be noted that although a logical order is depicted in the flowchart of fig. 2 or other figures, in some cases the steps shown or described may be performed in a different order than that depicted. The method for controlling the photographing comprises the following steps 201 to 204, wherein:

201. a photographic screen is presented that includes the user character.

The user role refers to a user who performs a time shooting with the virtual object. For example, a person in a scene is photographed.

The virtual object refers to an object that is not present in the shooting scene but is presented in the shooting screen and is in time with the user character. For example, as shown in fig. 3, the virtual object is a puppy, which is not present in the photographed scene (as indicated by the photographed scene within the dashed box in fig. 3), but is present in the photographed screen.

The virtual object may specifically be a three-dimensional model in the volumetric video, or may also be a two-dimensional model.

The shooting scene refers to a real scene where a user character is shot.

The shot screen is a screen formed by capturing a screen of a shot scene. The shooting picture can be specifically a picture captured when the camera is turned on but the camera is not formally in a shooting state, or a picture captured when the camera is turned on and the shooting state is formally performed (for example, after a user starts shooting according to a 'start' button).

For example, when the camera of the electronic device is opened, the camera captures a current shooting scene to form a shooting picture, and the shooting picture containing the user role is presented on the display screen of the electronic device. In some embodiments, when the camera of the electronic device is turned on, the virtual object may also be presented simultaneously, that is, in step 201, the photographed image including the user character and the virtual object is presented simultaneously. In other embodiments, the virtual object may be presented only when the camera takes a formal photograph, that is, a photograph including the user character but not the virtual object may be presented in step 201.

202. And carrying out gesture recognition on the user role in the shooting picture to obtain the current gesture of the user role.

The current gesture refers to a gesture of the user character, which is obtained by performing gesture recognition on the user character. For example, the current gesture of the user character may be "index finger spread, remaining four fingers curved", "five fingers combined spread", and so on.

Illustratively, first, the shot screen presented in step 201 may be captured to obtain a shot screen image; and then, carrying out gesture recognition according to the shot picture image by a gesture recognition algorithm to obtain the current gesture of the user role.

For example, first, based on a training data set (including a plurality of sample images, and labeling a hand region of a user in each sample image and a gesture type corresponding to the hand region in each sample image), a preset gesture recognition algorithm is trained, so that the trained gesture recognition algorithm learns the features of various gestures, and a trained gesture recognition algorithm (applicable to detecting the hand region in the image and determining the gesture type corresponding to the hand region in the image) is obtained. The preset gesture recognition algorithm may be an open source network model, such as an afflicientnet model, a YOLOv3 network, a MobileNet network, or the like, which may be used for classification tasks. Specifically, an open source network with model parameters as default values (available for classification tasks) can be adopted as a preset gesture recognition algorithm.

The gesture types required to be learned by the gesture recognition algorithm may be set according to the gesture required to be recognized in the actual service scenario, for example, if the gesture required to recognize the user role is "the index finger stretches and the other four fingers bend", 2 gesture types (one is "the index finger stretches and the other four fingers bend", and the other is "the other gesture") may be set to train the preset gesture recognition algorithm; for another example, if it is required to recognize whether the gesture of the user character is a category 1 (such as "index finger stretch, remaining four fingers bend" gesture), a category 2 (such as "five fingers merge stretch" gesture), a category 3 (a form other than category 1 and category 2), 3 gesture categories (one is "index finger stretch, remaining four fingers bend", one is "five fingers merge stretch", and the other is "other gesture") may be set to train the preset gesture recognition algorithm.

Then, the shot picture presented in the step 201 is captured to obtain a shot picture image, and the shot picture image is input into a trained gesture recognition algorithm to call the trained gesture recognition algorithm to classify the shot picture image: the method comprises the steps of detecting a hand area of a user role in a shot picture image, and classifying the hand area in the shot picture image to obtain a gesture type of the user role in the shot picture image to serve as a current gesture of the user role.

203. And if the current gesture is matched with the preset position control gesture, acquiring a pointing position point of the current gesture.

The preset position control gesture refers to a preset gesture for controlling the placement position of the virtual object. For example, the preset position control gesture may be "index finger stretch, remaining four fingers bend", or "five fingers combined stretch", etc.

The preset position control gesture is merely an example, and in fact, the specific presentation form of the preset position control gesture may be set according to the actual service scene requirement, and in this embodiment, the specific presentation form of the preset position control gesture is not limited.

The pointing position point refers to a position pointed by the current gesture. Specifically, the position point (shown in the following case (1)) pointed by the finger corresponding to the current gesture may be the position point (shown in the following case (2)) associated with the current gesture in advance.

After the current gesture of the user character is identified in step 202, it is detected whether the current gesture of the user character is matched with a preset position control gesture, and if the current gesture is matched with the preset position control gesture, step 203 is entered; otherwise, if the current gesture does not match the preset position control gesture, the step 202 may be performed to identify the user character in the shot image without further processing or re-executing, so as to obtain the current gesture of the user character, until it is detected that the current gesture of the user character matches the preset position control gesture, and then step 203 is entered.

There are various ways of determining the pointing position point in step 203, including, illustratively:

case (1): the pointing position is the position point pointed by the finger corresponding to the current gesture. In this case, step 203 may specifically include the following steps 2031A to 2032A:

2031A, if the current gesture is matched with a preset position control gesture, acquiring a ray formed in a three-dimensional space where the shooting picture is located, where the finger corresponding to the current gesture points.

2032A, obtaining an intersection point between the ray and the supporting surface of the virtual object, as a pointing position point of the current gesture.

The three-dimensional space in which the shooting picture is located is a three-dimensional space of a shooting scene captured corresponding to the shooting picture.

The ray refers to a ray formed in a three-dimensional space where a photographed image is located, and the finger corresponding to the current gesture is pointed to the ray. As shown in fig. 3, a ray can be understood specifically as: and taking the finger tip corresponding to the current gesture as a starting point, and pointing the finger to a ray in the ray extending direction.

In some embodiments, in step 2032A, the standing surface of the user character (such as the ground) may be directly used as the supporting surface of the virtual object, where the intersection point between the ray and the standing surface of the user character may be directly used as the pointing position point of the current gesture. For example, as shown in fig. 4, assuming that the preset position control gesture is "the index finger stretches and the remaining four fingers bend", the current gesture of the user character is identified as "the index finger stretches and the remaining four fingers bend" by step 202, the current gesture is matched with the preset position control gesture, a ray formed by the finger pointing direction (i.e., the pointing direction of the index finger) corresponding to the current gesture of the user character in the three-dimensional space where the photographed image is located can be identified, and then an intersection point (shown as a point a in fig. 4) between the ray and the standing surface of the user character is taken as the pointing position point of the current gesture.

In some embodiments, there are multiple planes (such as a floor plane and a stair step plane) in the shooting scene at the same time, in step 2032A, any one plane of the shooting scene may be designated as a supporting surface of the virtual object, at this time, before step 2032A, the supporting surface of the virtual object may be determined first, and then step 2032A is performed to obtain an intersection point between the ray and the supporting surface of the virtual object as a pointing position point of the current gesture. The process of determining the support surface of the virtual object may specifically include: adding the standing surface of the user role in the shooting picture to an alternative plane set of the shooting picture; adding a plane with an included angle smaller than a preset included angle threshold value between the shooting picture and the standing surface into the alternative plane set; from each plane of the candidate plane set, a plane which has an intersection point with the ray and is closest to the ray is acquired to be used as a supporting surface of the virtual object.

The starting point of the ray refers to the starting point of the finger pointing, and can be, for example, a fingertip.

Wherein, the plane intersection point refers to the intersection point between the ray and the plane.

Among the planes of the candidate plane set, the plane closest to the start point of the ray is the plane having the intersection with the ray, and the distance between the intersection of the planes and the start point of the ray is the smallest.

For example, as shown in fig. 6, the shot picture includes a desktop, a wall surface, and a ground, the standing surface of the user character is the ground, wherein an included angle between the desktop and the ground is smaller than a preset included angle threshold (e.g., 5 °), an intersection point of the ray and the ground is a point a, and an intersection point of the ray and the desktop is a point B, and then the ground and the desktop are added into the alternative plane set; then, whether the intersection point exists between the ray and each plane (i.e. the ground and the desktop) in the alternative plane set is calculated, and a plane (e.g. the desktop in fig. 6) closest to the origin of the ray and having the intersection point with the ray (e.g. the intersection point of the ray with the ground is the point a and the intersection point of the ray with the desktop is the point B in fig. 6) is selected as the supporting surface of the virtual object.

In order to ensure that the virtual object can be normally placed, the supporting surface of the virtual object is the ground or a plane parallel to the ground (such as a desktop, each ladder plane of stairs and the like), the standing surface of the user role and the plane with an included angle smaller than a preset included angle threshold value with the standing surface are added into an alternative plane set of a shooting picture, and then the plane which has an intersection point with the ray and is closest to the ray is screened out from the alternative plane set to be used as the supporting surface of the virtual object; in the first aspect, the standing surface and the plane with the included angle smaller than the preset included angle threshold value are added into the alternative plane set, so that the fact that the plane which is possibly the supporting surface remains for judging the supporting surface is guaranteed, some non-supporting surfaces can be filtered, and therefore the calculated amount of pointing position points of the current gesture needs to be determined is reduced. In the second aspect, on the basis that the user can specify the supporting surface of the virtual object, a plane (such as a wall surface in a shooting scene) which has an intersection with the ray but cannot normally place the virtual object is avoided as the supporting surface of the virtual object, so that the virtual object is prevented from being placed on an incorrect plane. In the third aspect, since the plane closest to the intersection point of the ray is the support surface of the virtual object, the problem of erroneous judgment of the support surface when the ray intersects a plurality of intersection points simultaneously (for example, as shown in fig. 6, the ray intersects the table surface and the ground simultaneously, and in this case, the plane closest to the origin of the ray, that is, the table surface, is the support surface of the virtual object) can be avoided. In the fourth aspect, the user can specify the placement plane of the virtual object without fixing a certain plane (such as the ground) as the placement plane of the virtual object, so that the control diversity of the virtual object is improved.

Case (2): the pointing position is a position point pre-associated with the current gesture. For example, the preset position control gesture includes: the gesture A, the gesture B and the gesture C are respectively: and if the current gesture is recognized as the C gesture, the current gesture is proved to be matched with the preset position control gesture, and the position 1 m in front of the user role is used as the pointing position point of the current gesture.

204. And placing a virtual object in the shooting picture according to the pointing position point so as to control the relative position of the virtual object and the user role in the shooting picture.

For example, as shown in fig. 4 and fig. 5, when the pointing position point is point a, a virtual object (such as a puppy) is placed at the pointing position point a, so that the virtual object in the shot image is placed at a position desired by the user, so that the user role can control the relative position of the virtual object and the user role in the shot image, so that the user can generally control and clear the placement position of the virtual object in the shooting process even if the user cannot see the shot image, and thus, actions and expressions which are more naturally coordinated with the virtual object can be made, the abrupt nature of the shot video or image is reduced, and the effect of the shot video or image is more natural.

Further, in order to make the azimuth information of the virtual object better controlled and clear for the photographer (i.e. shooting the user character in the picture), besides the placement point of the virtual object can be controlled by the gesture, the direction of the virtual object (for example, the relative direction to the user character, the relative direction to the camera, etc.), the relative distance between the virtual object and the user character can be controlled by the gesture. At this time, in step 203, the gesture type of the current gesture of the user character is detected; and if the gesture type is a position control gesture and the current gesture is matched with a preset position control gesture, acquiring a pointing position point of the current gesture. For example, the preset position control gesture is "five-finger combined stretching", if the current gesture is "index finger stretching and the other four fingers bending", the pointing position point of the current gesture is not obtained, and if the current gesture is "five-finger combined stretching", the current gesture is proved to be the preset position control gesture, the pointing position point of the current gesture is further obtained. The method and the device ensure that the execution position point is acquired under the condition that the user makes a control on the position of the virtual object, so that the situation that the current gesture of the user is not the control on the position of the virtual object or the current gesture is the control on the relative orientation, the relative distance and other azimuth information of the virtual object is avoided, invalid detection of the execution position point is avoided, and the situation that the user can control different azimuths (such as placement point, placement distance, placement orientation and the like) of the virtual object through various types of gestures is ensured.

If the gesture type is a direction control gesture, acquiring the associated direction of the current gesture; and controlling the relative orientation of the virtual object and the user role in the shooting picture according to the association orientation. For example, the preset orientation control gestures are "index finger parallel ground direct forward", "index finger parallel ground direct backward", wherein the associated orientations of "index finger parallel ground direct forward", "index finger parallel ground direct backward" are respectively: and if the current gesture is that the index finger is parallel to the ground and points to the front, proving that the current gesture is a preset direction control gesture, moving the virtual object in the shooting picture according to the associated direction of the current gesture that the index finger is parallel to the ground and points to the front, namely that the virtual object is away from the user character, so that the relative direction of the virtual object in the control shooting picture and the user character is that the virtual object is away from the user character. Therefore, the user can control different directions of the virtual object through gestures, so that the user can still approximately control and clearly control the directions of the virtual object under the condition that the user cannot see a shooting picture, actions and expressions which are more naturally coordinated with the virtual object are made, the abrupt nature of the video in a time shooting is reduced, and the effect of the video in a time shooting is more natural.

Further, the direction of the user character may occur after the user makes the direction control gesture, for example, the direction of the user character is changed from facing the camera to facing the camera, and the user essentially controls the virtual object to face the user character. At this time, after controlling the relative orientation of the virtual object and the user character in the photographing screen according to the associated orientation, when a change in the orientation of the user character is detected, the orientation of the virtual object may be updated in response to the change in the orientation of the user character so that the relative orientation of the user character and the virtual object is maintained as the associated orientation.

If the gesture type is a distance control gesture, acquiring the associated distance of the current gesture; and controlling the relative distance between the virtual object and the user role in the shooting picture according to the association distance. For example, the preset distance control gesture is "1 finger up", "2 fingers up", wherein the associated distances of "1 finger up", "2 fingers up" are respectively: the virtual object is 1 meter away from the user character and 2 meters away from the user character, if the current gesture is "1 finger is raised", the current gesture is proved to be a preset distance control gesture, and the virtual object in the shooting picture is moved 1 meter away from the user character according to the associated distance "1 finger is raised" of the current gesture, so that the relative distance between the virtual object and the user character in the control shooting picture is 1 meter. Therefore, the user can control different distances of the virtual object through gestures, so that the user can still approximately control and clearly control the distance of the virtual object under the condition that the user cannot see a shooting picture, actions and expressions which are more naturally coordinated with the virtual object are made, the abrupt nature of the video in a time shooting is reduced, and the effect of the video in a time shooting is more natural.

A photographer (can be the user character himself or other people in time) can select to press a shooting control at any time so as to control a camera to enter a shooting state to shoot a shooting picture, and a target shooting video of a virtual object and the user character is obtained; and the electronic equipment responds to the touch operation aiming at the shooting control, shoots the shooting picture and obtains the target snap video of the virtual object and the user role. Further, in order to avoid that after entering a shooting state, a user character makes a control gesture and is shot, so that the shot target in-process video data volume is large (for example, especially when a three-dimensional model in a volume video is used as a virtual object, the in-process video is obviously increased), or a user needs to cut out a control gesture frame in the video at a later stage, the shooting picture can be shot in response to a touch operation for a shooting control, and a preliminary in-process video of the virtual object and the user character is obtained; performing control gesture recognition on the video frames in the preliminary shot video to obtain target video frames containing control gestures (for example, recognizing the video frames containing position control gestures, orientation control gestures or distance control gestures as target video frames); and filtering the target video frame from the preliminary shot video to obtain the target shot video. Therefore, the video frames containing the control gestures can be filtered and saved, so that the memory occupied by the target video in a time shooting mode can be reduced to a certain extent, and the subsequent cutting processing is reduced.

Further, after the photographer presses the photographing control, that is, when the camera is in the photographing state, whether the current gesture is a preset control gesture (such as whether the current gesture is a position control gesture, an orientation control gesture, or a distance control gesture) can be automatically recognized, if yes, photographing is automatically paused, and photographing is continued after the control is completed. Namely, the method for controlling the photographing further comprises the following steps: when the current gesture is matched with a preset control gesture, detecting whether a camera of the shooting picture is in a shooting state or not; if the camera is in a shooting state, switching the camera from the shooting state to a pause state; and switching the camera from a pause state to a shooting state until the virtual object in the shooting picture is placed at the pointing position point. For example, to avoid that the control gesture of the photo holder will be recorded, the picture on which the gesture appears may be skipped or filtered automatically. For example, even if the photographing apparatus has pressed the "photographing" button, if the photographer is controlling the position of the virtual object using gestures, photographing is automatically paused; after the virtual object is placed at the pointing position point, shooting is automatically continued to reduce the amount of video data that is in time with the virtual object.

Therefore, the embodiment can control the position, the orientation, the distance and the like of the virtual object through the gesture of the co-shooting person (i.e. the user), so that the user can roughly control and clear the position of the virtual object (such as a three-dimensional model in the volume video), thereby making actions and expressions which are more naturally coordinated with the virtual object, reducing the abrupt nature of the co-shooting video, and enabling the effect of the co-shooting video to be more natural. The problem that a photographer is required to manually adjust the position of the virtual object in a shooting picture and inform the photographer that the photographer cannot accurately and quickly know the position of the virtual object can be avoided to a certain extent.

In order to better implement the method for controlling a time in the embodiment of the present application, based on the method for controlling a time, the embodiment of the present application further provides a device for controlling a time, as shown in fig. 7, which is a schematic structural diagram of an embodiment of the device for controlling a time in the embodiment of the present application, where the device for controlling a time 700 includes:

a display unit 701 for presenting a photographing screen including a user character;

the recognition unit 702 is configured to perform gesture recognition on the user character in the shot image, so as to obtain a current gesture of the user character;

an obtaining unit 703, configured to obtain a pointing position point of the current gesture if the current gesture is matched with a preset position control gesture;

And a control unit 704, configured to place a virtual object in the shooting picture according to the pointing position point, so as to control a relative position of the virtual object and the user character in the shooting picture.

In some embodiments, the acquiring unit 703 is specifically configured to:

In some embodiments, before the acquiring the intersection point between the ray and the supporting surface of the virtual object as the pointing position point of the current gesture, the acquiring unit 703 is specifically configured to:

In some embodiments, the acquiring unit 703 is specifically configured to:

detecting a gesture type of a current gesture of the user character;

In some embodiments, the control unit 704 is specifically configured to:

In some embodiments, the virtual object is a three-dimensional model in a volumetric video, and the control unit 704 is specifically configured to:

Therefore, the apparatus 700 for controlling the photographing in this embodiment of the present application may have the following technical effects: the user role can control the position of the virtual object through gestures, so that the user role is enabled to take the same time as the virtual object, and further the user role is enabled to take actions and expressions which are more naturally coordinated with the virtual object when taking the same time as the virtual object, the abrupt nature of a taken picture is reduced, and the effect of the taken picture is more natural.

In the implementation, each unit may be implemented as an independent entity, or may be implemented as the same entity or several entities in any combination, and the implementation of each unit may be referred to the foregoing method embodiment, which is not described herein again.

Correspondingly, the embodiment of the application also provides electronic equipment, which can be a terminal, and the terminal can be terminal equipment such as a smart phone, a tablet personal computer, a notebook computer, a touch screen, a personal computer (PC, personal Computer), a personal digital assistant (Personal Digital Assistant, PDA) and the like. As shown in fig. 8, fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 800 includes a processor 801 having one or more processing cores, a memory 802 having one or more computer-readable storage media, and a computer program stored on the memory 802 and executable on the processor. The processor 801 is electrically connected to the memory 802. It will be appreciated by those skilled in the art that the electronic device structure shown in the figures is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

The processor 801 is a control center of the electronic device 800, connects various parts of the entire electronic device 800 using various interfaces and lines, and performs various functions of the electronic device 800 and processes data by running or loading software programs and/or modules stored in the memory 802, and calling data stored in the memory 802, thereby performing overall monitoring of the electronic device 800.

In this embodiment of the present application, the processor 801 in the electronic device 800 loads instructions corresponding to the processes of one or more application programs into the memory 802 according to the steps of any of the above-mentioned methods for controlling a time, and the processor 801 runs the application programs stored in the memory 802, so as to implement the specific process of the above-mentioned method for controlling a time.

Optionally, as shown in fig. 8, the electronic device 800 further includes: a touch display 803, a radio frequency circuit 804, an audio circuit 805, an input unit 806, and a power supply 807. The processor 801 is electrically connected to the touch display 803, the radio frequency circuit 804, the audio circuit 805, the input unit 806, and the power supply 807, respectively. It will be appreciated by those skilled in the art that the electronic device structure shown in fig. 8 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

The touch display 803 may be used to display a graphical user interface and receive operation instructions generated by a user acting on the graphical user interface. The touch display 803 may include a display panel and a touch panel. Wherein the display panel may be used to display information entered by a user or provided to a user as well as various graphical user interfaces of the electronic device, which may be composed of graphics, text, icons, video, and any combination thereof.

The radio frequency circuit 804 may be configured to receive and transmit radio frequency signals to and from a network device or other electronic device via wireless communication to and from the network device or other electronic device.

Audio circuitry 805 may be used to provide an audio interface between a user and an electronic device through speakers, microphones, and so on. The audio circuit 805 may transmit the received electrical signal converted from audio data to a speaker, and convert the electrical signal into a sound signal for output by the speaker; on the other hand, the microphone converts the collected sound signals into electrical signals, which are received by the audio circuit 805 and converted into audio data, which are processed by the audio data output processor 801 and sent to, for example, another electronic device via the radio frequency circuit 804, or which are output to the memory 802 for further processing. The audio circuitry 805 may also include an ear bud jack to provide communication of the peripheral headphones with the electronic device.

The input unit 806 may be used to receive input numbers, character information, or user characteristic information (e.g., fingerprint, iris, facial information, etc.), and to generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control.

The power supply 807 is used to power the various components of the electronic device 800. Alternatively, the power supply 807 may be logically connected to the processor 801 through a power management system, so that functions of managing charging, discharging, and power consumption management are implemented through the power management system. The power supply 807 may also include one or more of any components, such as a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

Although not shown in fig. 8, the electronic device 800 may further include a camera, a sensor, a wireless fidelity module, a bluetooth module, etc., which are not described herein.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.

To this end, embodiments of the present application provide a computer readable storage medium having stored therein a plurality of computer programs that can be loaded by a processor to perform any of the methods of controlling a time frame provided by embodiments of the present application.

Wherein the computer-readable storage medium may comprise: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

In the above embodiments of the photographing control apparatus, the computer readable storage medium and the electronic device, the descriptions of the embodiments are focused on, and for the parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments. It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the detailed working process and the beneficial effects of the above-described timing control apparatus, computer-readable storage medium, electronic device and corresponding units may refer to the description of the timing control method in the above embodiment, which is not repeated herein.

The above details of a method, an apparatus, an electronic device, and a computer readable storage medium for controlling a photographing according to the embodiments of the present application, and specific examples are applied to illustrate the principles and embodiments of the present application, where the above description of the embodiments is only for helping to understand the method and core ideas of the present application; meanwhile, those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, and the present description should not be construed as limiting the present application in view of the above.

Claims

1. A method of controlling a burst, the method comprising:

presenting a shooting picture containing a user role;

2. The method of claim 1, wherein if the current gesture matches a preset position control gesture, obtaining the pointing position point of the current gesture comprises:

3. The method according to claim 2, wherein before acquiring an intersection point between the ray and the support surface of the virtual object as the pointing position point of the current gesture, further comprises:

4. The method of claim 1, wherein if the current gesture matches a preset position control gesture, obtaining the pointing position point of the current gesture comprises:

detecting a gesture type of a current gesture of the user character;

5. The method of claim 4, further comprising:

6. The method according to claim 5, wherein after controlling the relative orientation of the virtual object and the user character in the photographed picture according to the associated orientation, the method further comprises:

7. The method of claim 4, further comprising:

8. The method of claim 1, further comprising:

9. The method of claim 8, further comprising:

10. The method of claim 1, further comprising:

11. The method of any one of claims 1-10, wherein the virtual object is a three-dimensional model in a volumetric video.

12. A time control device, characterized in that the time control device comprises:

13. An electronic device comprising a processor and a memory, the memory having stored therein a computer program, the processor executing the burst control method according to any one of claims 1 to 11 when calling the computer program in the memory.

14. A computer-readable storage medium, having stored thereon a computer program that is loaded by a processor to perform the burst control method of any one of claims 1 to 11.