CN117221712A

CN117221712A - Method for photographing, electronic equipment and storage medium

Info

Publication number: CN117221712A
Application number: CN202311472969.4A
Authority: CN
Inventors: 孟文; 王枫桥
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2023-11-07
Filing date: 2023-11-07
Publication date: 2023-12-12

Abstract

The embodiment of the application provides a method for photographing, electronic equipment and a storage medium, which are applied to the technical field of terminals and can obtain more real and natural photographing images through fusion of three-dimensional information of photographing images. The method for photographing is applied to a first device, the first device comprises at least one depth camera, and the method comprises the following steps: after the first equipment and at least one second equipment are connected in a remote time, a first interface is displayed, wherein the first interface is a shooting preview interface; responding to a first operation of a simultaneous shooting initiator, and sending a first shooting image to a server, wherein the first shooting image is a depth image acquired by at least one depth camera, and comprises a background image and a first portrait of the simultaneous shooting initiator; and displaying a second interface, wherein the second interface comprises a snap shot image, and the snap shot image is obtained by performing depth fusion on depth information of a background image, a first human image and a second human image in a second shot image acquired by at least one second device by a server.

Description

Method for photographing, electronic equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of terminals, in particular to a photographing method, electronic equipment and a storage medium.

Background

Currently, camera functions are commonly used functions provided by electronic devices, through which a user can perform video call, photographing, video recording, or the like. In daily life, a photo is often required between friends, family or colleagues to commemorate a particular moment or to share important moments. However, due to time, geographic distance, or other limitations, it is difficult for users to take a photo at the same time and place as friends and relatives, which is often achieved only by post image synthesis techniques.

However, the existing image synthesis technology (such as video call or photo synthesis technology) has a plurality of limitations, so that each person in the synthesized photo is hard and unnatural, and it is difficult to synthesize a smooth photo.

Disclosure of Invention

The embodiment of the application provides a method, electronic equipment and a storage medium for taking a picture in time, wherein the obtained image in time is more real and natural by fusing the depth information of a depth image.

In order to achieve the above purpose, the embodiment of the present application adopts the following technical scheme:

in a first aspect, an embodiment of the present application provides a method for photographing, which is applied to a first device, where the first device includes at least one depth camera, and the method includes:

After the first equipment and at least one second equipment are connected in a remote time, a first interface is displayed, wherein the first interface is a shooting preview interface; responding to a first operation of a simultaneous shooting initiator, and sending a first shooting image to a server, wherein the first shooting image is a depth image acquired by at least one depth camera, and comprises a background image and a first portrait of the simultaneous shooting initiator; and displaying a second interface, wherein the second interface comprises a snap shot image, and the snap shot image is obtained by performing depth fusion on depth information of a background image, a first human image and a second human image in a second shot image acquired by at least one second device by a server.

For example, the first device may be a device used by a live initiator and the second device may be a device used by a live invitee. The second device may be one or more. For example, a photo initiator may invite one friend to perform remote photo taking, or may invite multiple friends to perform remote photo taking.

Because the first shooting image and the second shooting image are depth images, all the shooting users in the shooting images obtained by carrying out depth fusion on the first shooting image and the second shooting image are more real and natural.

Optionally, before the first shot image is sent to the server in response to the first operation of the shooting initiator, the shooting method may further include:

sending a first depth image to a server, wherein the first depth image is a preview image acquired by at least one depth camera; and after receiving the gesture prompt information from the server, displaying a third interface.

The gesture prompt information is used for prompting the photographing gesture of the photographing initiator or the photographing invitee, and is generated by the server according to the depth information of the first depth image and the second depth image acquired by the at least one second device.

It can be understood that when the first device and the second device shoot previews, the preview images can be sent to the server, so that the server predicts shooting postures of all the users according to depth information of the preview images, so as to prompt all the users to shoot images at predicted positions, and the postures of all the users in subsequent composite images are more harmonious and natural.

Optionally, before receiving the gesture prompt information from the server, the above-mentioned method for photographing may further include:

in response to a second operation of the close-up initiator, a close-up template request is sent to the server.

The method comprises the steps that a photographing template request is used for a server to generate gesture prompt information according to a photographing template, a first depth image and a second depth image corresponding to the photographing template request.

It is understood that the gestures, positions, etc. of the users in the different time templates are not the same, and the time initiator can select the time template desired by himself. The server can generate gesture prompt information according to information such as gestures and positions of users in the snap template and depth information of the first depth image and the second depth image.

Optionally, after receiving the gesture prompt information from the server and displaying the third interface, the method may further include:

transmitting a third depth image to the server, wherein the third depth image is a preview image of a shooting initiator acquired by at least one depth camera after the gesture is adjusted according to gesture prompt information;

and displaying a fourth interface, wherein the fourth interface comprises a pre-shot image, and the pre-shot image is obtained by fusing a server according to the third depth image and a fourth depth image acquired by at least one second device.

It can be understood that before the captured image is collected, the first device may receive the pre-shot image generated by the server according to the preview image, and if the close-up initiator is not satisfied with the pre-shot image displayed in the first device, the close-up initiator triggers the capturing operation after adjusting the capturing pose, position, expression, dressing and the like, so as to collect the close-up image that is more satisfied by the close-up user.

Optionally, the background image of the live image is the background of the first shot image or the background of the second shot image.

That is, the live image may be a background in the shooting environment where the live initiator is located, or may be a background in the shooting environment where the live invitee is located, which is not limited herein.

Optionally, before responding to the first operation of the shooting initiator, the shooting method may further include:

and in response to a third operation on the first control, displaying a fourth interface, wherein the first control is used for switching the background of the shot image, and the background of the preview image displayed in the fourth interface is different from the background in the first interface.

That is, a control for switching the background of the live image is displayed on the shooting interface of the first device, and the first device can switch the live image in response to the operation of the first control by the live initiator

Optionally, the camera parameters carried in the first captured image and the second captured image are the same, and the camera parameters include at least one of sensitivity, brightness, color, white balance parameter, or exposure parameter.

It can be understood that parameters of cameras in the first device and the second device are the same, so that styles of all the shooting users in the acquired shooting images are unified.

Optionally, the snap-shot image is an image after beautifying treatment, and the beautifying treatment comprises at least one of skin grinding, face thinning, acne removing, big eye, make-up, shaping, whitening, wrinkle removing, hairdressing and black eye removing.

In an embodiment, the server may beautify the fused captured images to obtain beautified captured images.

In another embodiment, after the first device and/or the second device receive the still image returned by the server, the first device and/or the second device may first perform a beautifying process on the still image, and then store the still image after the beautifying process.

In a second aspect, an embodiment of the present application provides another method for photographing, which is applied to a server, where the method for photographing may include:

receiving a first photographed image of a first device and a second photographed image of at least one second device, each of the first device and the second device including at least one depth camera, the first photographed image including a background image and a first portrait of a live initiator, the second photographed image including a background image and a second portrait of a live invitee; fusing depth information corresponding to the background image, the first portrait and the second portrait respectively to obtain a snap shot image; and sending the live images to the first device and at least one second device.

It can be understood that the first shooting image collected by the first device and the second shooting image collected by the second device are depth images, when the server fuses the depth images, the server can combine the depth information of the depth images to adjust the gesture, the position, the expression and the like of the user to be shot, so that a more real and natural snap-shot image is obtained.

Optionally, before receiving the first captured image of the first device and the second captured image of the at least one second device, the method may further include:

receiving a first depth image sent by first equipment and a second depth image sent by at least one second equipment;

generating posture prompt information according to the depth information of the first depth image and the depth information of the second depth image, wherein the posture prompt information is used for prompting shooting postures of the co-shooting initiator and the co-shooting invitee;

and sending the gesture prompt information to the first device or at least one second device.

It can be understood that the server can generate gesture prompt information according to the depth information of the preview image so as to prompt each of the simultaneous users to shoot the image at the predicted position, so that the gesture of each of the simultaneous users in the subsequent composite image is more harmonious and natural.

Optionally, generating the pose prompt information according to the depth information of the first depth image and the depth information of the second depth image includes:

receiving a request of a time shooting template sent by first equipment or at least one second equipment;

and generating gesture prompt information according to the corresponding shooting template, the first depth image and the second depth image requested by the shooting template.

It can be understood that the user in time can select the template in time first, and the server can generate the gesture prompt information based on the gesture, position and other information of each user in the template in time, and the depth information of the first depth image and the second depth image.

Optionally, before the fusion of the depth information corresponding to the background image, the first portrait and the second portrait, the method for photographing in the above manner may further include:

and carrying out pose estimation on the first human image and the second human image by adopting a pose estimation algorithm, and determining target poses and/or target positions of the first human image and the second human image.

It can be understood that the server can also adjust the gesture, position, etc. of the photographing user before fusing to obtain the snap-shot image, so that the fused snap-shot image is more natural.

Optionally, fusing depth information corresponding to the background image, the first portrait and the second portrait respectively includes:

And fusing the depth information respectively corresponding to the background image, the first portrait and the second portrait based on the target pose and/or the target position of the first portrait and the second portrait.

Optionally, before receiving the first captured image of the first device and the second captured image of the at least one second device, the method further comprises:

receiving a third depth image sent by the first equipment and a fourth depth image sent by at least one second equipment;

fusing the third depth image and the fourth depth image to obtain a pre-shot image;

the pre-shot image is sent to the first device and to at least one second device.

It can be understood that the server generates the pre-shot image according to the preview image, if the shooting initiator is not satisfied with the pre-shot image displayed in the first device, the shooting operation is triggered after the shooting initiator adjusts shooting posture, position, expression, dressing and the like, so that the shooting image which is more satisfied by the shooting user is acquired.

Optionally, the method further includes, before fusing depth information corresponding to the background image, the first portrait and the second portrait respectively to obtain the snap-shot image:

Receiving a background switching request;

and switching the background image according to the background switching request.

Optionally, before sending the live image to the first device and the at least one second device, the method further comprises:

and beautifying the shot images, wherein the beautifying treatment comprises at least one of skin grinding, face thinning, acne removing, large eye, makeup, shaping, whitening, wrinkle removing, hairdressing and black eye removing.

In a third aspect, the present application provides an electronic device comprising: at least one depth camera and a display screen; one or more processors; a memory; wherein the memory stores one or more computer programs, the one or more computer programs comprising instructions that, when executed by the electronic device, cause the electronic device to perform the method of taking a photograph of any of the first aspects above.

In a fourth aspect, the present application provides a server comprising: one or more processors; a memory; wherein the memory stores one or more computer programs, the one or more computer programs comprising instructions that, when executed by the processor, cause the server to perform the method of taking a photograph of any of the second aspects above.

In a fifth aspect, the present application provides a computer readable storage medium having instructions stored therein that, when run on an electronic device, cause the electronic device to perform the method of taking a photograph of any of the first aspects, and when run on a server, cause the server to perform the method of taking a photograph of any of the second aspects.

In a sixth aspect, the application provides a computer program product comprising computer instructions which, when run on an electronic device, cause the electronic device to perform the method of taking a photograph of any of the first aspects, and when run on a server, cause the server to perform the method of taking a photograph of any of the second aspects.

It will be appreciated that the electronic device according to the third aspect, the server according to the fourth aspect, the computer storage medium according to the fifth aspect, and the computer program product according to the sixth aspect are all configured to perform the corresponding methods provided above, and therefore, the advantages achieved by the method are referred to the advantages in the corresponding methods provided above, and will not be repeated herein.

Drawings

FIG. 1 is an exemplary diagram of a remote shot of the related art;

fig. 2 is an application scenario schematic diagram of a shooting method provided by an embodiment of the present application;

fig. 3 is a schematic structural diagram of a server according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

FIG. 5 is a diagram of a system architecture according to an embodiment of the present application;

fig. 6 is a schematic diagram of a process of authentication and a snap-shot establishment according to an embodiment of the present application;

fig. 7 is an exemplary diagram of a login of a snap application according to an embodiment of the present application;

fig. 8 is an exemplary diagram of a snap shot setup according to an embodiment of the present application;

FIG. 9 is an exemplary diagram of a close-up cue provided by an embodiment of the present application;

fig. 10 is a schematic flow chart of a method for photographing according to an embodiment of the present application;

fig. 11 is an exemplary diagram of a snap-in background switching provided in an embodiment of the present application;

fig. 12 is an exemplary diagram of another snap-shot background switching provided in an embodiment of the present application;

fig. 13 is an exemplary diagram of still another still background switch provided in an embodiment of the present application;

FIG. 14 is an exemplary diagram of a selection-in-time template provided by an embodiment of the present application;

FIG. 15 is a diagram showing an example of a snap-in image according to an embodiment of the present application;

FIG. 16 is an exemplary diagram of another display still image provided by an embodiment of the present application;

fig. 17 is a schematic flow chart of another method for photographing according to an embodiment of the present application;

FIG. 18 is an exemplary illustration of a snap shot provided by an embodiment of the present application;

fig. 19 is an exemplary diagram of another snap shot provided in an embodiment of the present application;

fig. 20 is an exemplary diagram of a snap video according to an embodiment of the present application;

fig. 21 is an exemplary diagram of a switching camera according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. Wherein, in the description of the embodiments of the present application, unless otherwise indicated, "/" means or, for example, a/B may represent a or B; "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone.

The terms "first" and "second" are used below for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the embodiments of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more.

In embodiments of the application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

The electronic equipment provided by the embodiment of the application is provided with one or more cameras and is provided with an application program (called as a shooting application for short) with a shooting function, so that the functions of remotely shooting photos, shooting videos or shooting live broadcast with other users can be realized.

In one possible scenario, when a plurality of users collect the snap-shot images at different geographic locations by using a snap-shot application in different electronic devices, each electronic device detects a trigger operation (such as a click operation or a touch operation) of a corresponding user on a shooting control, and in response to the trigger operation, each electronic device performs a shooting operation, collects a frame of image, and each electronic device sends the collected image to a server. And the server synthesizes the images acquired by the plurality of electronic devices to obtain the simultaneous images, and then the server respectively sends the simultaneous images to the electronic devices corresponding to the plurality of users so as to display the simultaneous images on the display interfaces of the simultaneous application of the electronic devices.

Fig. 1 is an exemplary diagram of a remote photographing of the related art. Fig. 1 (a) shows that a user 1 acquires an image 1 by using a mobile phone 1, fig. 1 (b) shows that a user 2 acquires an image 2 by using a mobile phone 2, the mobile phone 1 and the mobile phone 2 upload the image 1 and the image 2 to a server respectively, and after receiving the image 1 and the image 2, the server synthesizes the image 1 and the image 2 to obtain a snap shot image, such as the snap shot image shown in fig. 1 (c).

As can be seen from the snap-shot image shown in (c) of fig. 1, after the server receives the image 1 sent by the mobile phone 1 and the image 2 sent by the mobile phone 2, only the image 1 and the image 2 are simply spliced, so that the obtained snap-shot image has the problems of unnatural snap-shot among a plurality of users, single shooting angle and the like, and the experience effect of shooting the snap-shot image by using the snap-shot application by a plurality of users is poor.

Therefore, the embodiment of the application provides a method for photographing in time, which comprises the steps of carrying out depth fusion on depth images after the first equipment or the second equipment provided with at least one depth camera collects the depth images, and adjusting the posture, the position and the like of a user in time in the images when the images are fused, so that the fused image in time is more real and natural.

Fig. 2 is an application scenario schematic diagram of a photographing method according to an embodiment of the present application, where the application scenario includes: first device 210, second device 220, third device 230, fourth device 240, and server 250. The first device 210 through the fourth device 240 may each be in communication with a server 250. Wherein the owner of the first device 210 is user 1, the owner of the second device 220 is user 2, the owner of the third device 230 is user 3, and the owner of the fourth device 240 is user 4. The first device 210 to the fourth device 240 each have a photographing function, for example, the first device 210 to the fourth device 240 each install a photographing application having a photographing function, or the first device 210 to the fourth device 240 may load an applet to call a related component of a photographing through an already installed application, or the first device 210 to the fourth device 240 may implement remote photographing through a web page application of the application having the photographing function. The snap application may be a system application of the electronic device, or may be a third party application downloaded by the user in a third party application market, which is not particularly limited herein.

Any one of the first device 210 to the fourth device 240 may be used as an initiator of the remote photographing, for example, the first device 210 is a device that initiates the remote photographing, and the first device 210 determines, in response to the operation of the user 1, other users that perform the remote photographing with the user 1. For example, user 1 may choose to take a time with user 2 and user 3. The first device 210 sends a photographing request to the second device 220 and the third device 230, after the second device 220 and the third device 230 receive the photographing request sent by the first device 210, the second device 220 and the third device 230 determine to remotely photograph with the first device 210, and the first device 210, the second device 220 and the third device 230 respectively respond to photographing operations of a user and acquire corresponding photographing image data.

It should be noted that, in the embodiment of the present application, the specific form of the taken image data is not limited, and if the taken initiator initiates a request for taking a photo, the taken image data is a taken image; if the request of the short video for the time is initiated by the time shooting initiator, the image data for the time shooting is the short video for the time shooting; if the live request is initiated by the live initiator, the live image data is live.

Further, at least one of a depth camera (TOF), a structured light camera, or a binocular video camera is provided in each of the first to fourth devices 210 to 240. The images acquired by the first device 210 to the fourth device 240 through at least one of a TOF camera, a structured light camera, or a binocular camera are depth images.

A depth image is also called a range image, and refers to an image in which the distance from an image collector to each point in a scene is taken as a pixel value.

The server 250 has an image processing function, for example, the server 250 may perform image division processing on the received images transmitted from the first to fourth devices 210 to 240 to separate the user from the background image; server 250 may also estimate the pose and the syndicated layout of users 1 through 5 based on the position information and pose information for each user; the server 250 may also fuse the foreground image of each user with the selected background image to obtain a composite image.

Taking the user 1 as a photograph initiator and the users 2 to 4 as photograph invitees in fig. 2 as examples, after the users 1 to 4 upload the preview images to the server, the server can extract the background and the foreground of each preview image to obtain the background image and the figures of the users 1 to 4. The server determines the layout of users 1 to 4 in the in-process template in response to the in-process template selection operation by the user. The server can also determine the camera parameters of the second device, the third device and the fourth device according to the camera parameters of the first device, so as to ensure that the camera parameters of the second device to the fourth device are consistent with the camera parameters of the first device, and ensure that the shooting styles of all users in the simultaneous images are kept uniform.

After responding to shooting requests of users 1 to 4, the first to fourth devices upload the shot images 1 to 4 to a server, and after receiving the images 1 to 4, the server carries out gesture estimation on the simultaneous shooting users in the images 1 to 4 and then carries out image fusion to obtain the simultaneous shooting images.

Note that, the number and types of electronic devices in fig. 2 are only examples, and the number of users performing the photographing and the number and types of electronic devices in the embodiment of the present application are not limited. For example, when 3 users take a remote shot, the 3 users may all use the mobile phone, or the 3 users may all use the telephone watch, or the like.

Fig. 3 is a schematic structural diagram of a server according to an embodiment of the present application, where the server may be an independent physical server, may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud computing services. The application is not limited in this regard. The server will be specifically described below.

It should be understood that the structure illustrated in the embodiments of the present application does not constitute a specific limitation on the server. In other embodiments, the server may include more or fewer components than in FIG. 3, or certain components may be combined, or certain components may be split, or a different arrangement of components may be provided. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

As shown in fig. 3, the server may include a processor 310, a memory 320, and a communication module 330. Processor 310 may be used to read and execute computer-readable instructions. In particular, the processor 310 may include a controller, an operator, and registers. The controller is mainly responsible for instruction decoding and sending out control signals for operations corresponding to the instructions. The arithmetic unit is mainly responsible for storing register operands, intermediate operation results and the like temporarily stored in the instruction execution process. Registers are high-speed memory devices of limited memory capacity that can be used to temporarily store instructions, data, and addresses.

The processor 310 may further include an image processing module 311, among other things. The image processing module 311 may be used to perform foreground segmentation processing, pose estimation, image fusion processing, and the like on the received depth image.

In particular implementations, the hardware architecture of the processor 310 may be an application specific integrated circuit (application specific integrated circuit, ASIC) architecture, an airless pipelined microprocessor (microprocessor without interlocked piped stages, MIPS) architecture, a ARM (advanced risc machines) architecture, or a Network Processor (NP) architecture, among others.

Memory 320 is coupled to processor 310 for storing various software programs and/or sets of instructions. In the embodiment of the application, the method for photographing the electronic device can be integrated in one processor of the server, can also be stored in a memory of the server in the form of program codes, and the processor of the server invokes the codes stored in the memory of the server to execute the method. In particular implementations, memory 320 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. Memory 320 may store an operating system such as an embedded operating system like uos, vxWorks, RTLinux, etc.

The communication module 330 may be used to establish a communication connection between a server and other communication terminals (e.g., devices in fig. 2) through a network, and to transmit and receive data through the network. For example, in the case of networking of electronic devices, the server establishes a connection with the electronic device through the communication module 330 to facilitate transmission of subsequent images. For example, the server may receive an image from the electronic device that was reported when the electronic device acquired the image.

It should be understood that the configuration illustrated in this embodiment does not constitute a specific limitation on the server. In other embodiments, the server may include more or fewer components than shown, or may combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The electronic device may be a mobile phone, a tablet computer, a personal computer (personal computer, PC), a personal digital assistant (personal digital assistant, PDA), a smart watch, a netbook, a wearable electronic device, an augmented reality (augmented reality, AR) device, a Virtual Reality (VR) device, a vehicle-mounted device, a smart car, or other electronic devices with remote shooting function, which is not limited in the embodiments of the present application.

In addition, the electronic equipment is provided with a three-dimensional camera so as to acquire depth image information. For example, the electronic device may be provided with a time of flight (TOF) camera or a binocular vision imaging camera, or the like. The electronic equipment is also provided with an application program (called a shooting application for short) with a shooting function, and the functions of remotely shooting photos, shooting videos or shooting live broadcast and the like with other users can be realized.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application, as shown in fig. 4.

The electronic device 400 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, and a subscriber identity module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor, a gyro sensor, a barometric sensor, and the like.

It should be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation on the electronic device 400. In other embodiments of the application, electronic device 400 may include more or fewer components than shown, or may combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a memory, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

The controller may be a neural hub and a command center of the electronic device 400, among others. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it may be called directly from memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.

It should be understood that the interfacing relationship between the modules illustrated in the embodiments of the present application is only illustrative, and is not meant to limit the structure of the electronic device 400. In other embodiments of the present application, the electronic device 400 may also employ different interfaces in the above embodiments, or a combination of interfaces.

The charge management module 140 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger.

The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like.

The wireless communication function of the electronic device 400 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in electronic device 400 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc., applied to the electronic device 400. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.

The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., as applied to the electronic device 400. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.

In some embodiments, antenna 1 and mobile communication module 150 of electronic device 400 are coupled, and antenna 2 and wireless communication module 160 are coupled, such that electronic device 400 may communicate with a network and other devices via wireless communication techniques. Wireless communication techniques may include global system for mobile communications (global system for mobile communications, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR techniques, among others. The GNSS may include a global satellite positioning system (global positioning system, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a beidou satellite navigation system (beidou navigation satellite system, BDS), a quasi zenith satellite system (quasi-zenith satellite system, QZSS) and/or a satellite based augmentation system (satellite based augmentation systems, SBAS).

The electronic device 400 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, and the like.

The electronic device 400 may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.

The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, so that the electrical signal is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, electronic device 400 may include 1 or N cameras 193, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 400 is selecting a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.

Video codecs are used to compress or decompress digital video. The electronic device 400 may support one or more video codecs. Thus, the electronic device 400 may play or record video in a variety of encoding formats, such as: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent awareness of the electronic device 400 may be implemented by the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 400. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 121 may be used to store computer-executable program code that includes instructions. The processor 110 executes various functional applications of the electronic device 400 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the electronic device 400 (e.g., audio data, phonebook, etc.), and so on. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like.

The electronic device 400 may implement audio functions through the audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone interface 170D, and application processor, etc. Such as music playing, recording, etc.

The SIM card interface 195 is used to connect a SIM card. The SIM card may be inserted into the SIM card interface 195, or removed from the SIM card interface 195 to enable contact and separation with the electronic device 400. The electronic device 400 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The SIM card interface 195 may support Nano SIM cards, micro SIM cards, and the like. The same SIM card interface 195 may be used to insert multiple cards simultaneously. The types of the plurality of cards may be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The electronic device 400 interacts with the network through the SIM card to realize functions such as communication and data communication. In some embodiments, electronic device 400 employs esims, namely: an embedded SIM card. The eSIM card can be embedded in the electronic device 400 and cannot be separated from the electronic device 400.

The software system of the electronic device may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. The embodiment of the application takes a layered architecture as an example, and illustrates the software structure of the electronic equipment.

Fig. 5 is a system architecture diagram according to an embodiment of the present application.

It will be appreciated that the layered architecture divides the software into several layers, each with a clear role and division. The layers communicate with each other through a software interface. Software system of a first device and a second device: may include an application layer (abbreviated as application layer), an application framework layer (not shown in fig. 5), a system library, and a kernel layer.

As shown in fig. 5, the application layer includes a snap shot application.

The photographing application is used for responding to operation of a photographing user and realizing a remote photographing function.

The system library comprises an image acquisition module and an image transmission module.

The image acquisition module is used for acquiring images acquired by the camera. The image transmission module is used for uploading the image acquired by the camera to the server.

The kernel layer comprises a camera driver and a display driver.

The camera driving is used for responding to triggering operation of a user and driving the camera to collect images. The display driver is used for driving the display screen to display preview images or snap images.

The server comprises an image processing module, a gesture estimation module and a parameter acquisition module.

The image processing module is used for extracting the backgrounds of the images uploaded by the first equipment and the second equipment, and fusing the background images with the portrait of the user to obtain the image.

The gesture estimation module is used for estimating the gesture, position or angle of the user in time.

The parameter acquisition module is used for transmitting the camera parameters of the equipment for initiating the photographing to the invited equipment after acquiring the camera parameters of the equipment for initiating the photographing.

For ease of understanding, the following embodiment of the present application will take an electronic device having the structure shown in fig. 4 as an example, take an electronic device of a photo initiator as a first device, take an electronic device of a photo invitee as a second device as an example, and describe a photo process.

The first device and the second device may be the same type of device or different types of devices. For example, the first device and the second device may both be mobile phones, or the first device is a mobile phone, the second device is a smart watch, and so on.

In the embodiment of the application, before at least two users perform remote photographing, both a photographing initiator and a photographing invitee need to log in a photographing application and establish a photographing connection.

Fig. 6 is a schematic flow chart of authentication and a process of creating a time according to an embodiment of the present application, as shown in fig. 6, the method may include steps 601 to 613.

S601, the first device responds to login operation of a user and sends login information to a server.

By way of example, fig. 7 (a) shows a graphical user interface (graphical user interface, GUI) of a mobile phone, which is the desktop 701 of the mobile phone. When the handset detects that the user clicks on the icon 702 of the snap application on the desktop 701, the snap application may be launched, displaying another GUI, which may be referred to as a login interface 703, as shown in fig. 7 (b). The login interface 703 displays entries of "user name" and "password", and a user can input login information (e.g., user name and login password) that has been registered in the entries of "user name" and "password".

It should be noted that, the manner of implementing the remote photographing function by logging in the photographing application already installed in the mobile phone shown in fig. 7 is only an example, and the mobile phone may also implement the remote photographing function by calling the related component of the photographing through the applet loaded by the already installed application program, or the mobile phone may also implement the remote photographing function through the web page application of the application program having the photographing function.

In the embodiment of the application, after detecting the triggering operation of the user on the login control 704, the mobile phone responds to the triggering operation and sends the login information of the user to the server.

S602, the server performs identity verification according to the login information.

In the embodiment of the application, after receiving the login information sent by the first device, the server matches the login information with the login information stored in the server in advance, so as to determine an identity authentication result according to the matching result.

If the login information sent by the first device and received by the server is matched with the login information stored in advance, the server determines that the identity authentication result is authentication success. If the login information sent by the first device and received by the server is not matched with the login information stored in advance, the server determines that the identity authentication result is authentication failure.

S603, the server returns an identity authentication result to the first device.

S604, the first device judges whether the login is successful according to the identity authentication result.

S605, the first device prompts the user that login is successful.

S606, the first device prompts the user that login fails.

For example, when the identity authentication result returned by the server to the first device is that the authentication is successful, a prompt message may be displayed in the interface of the first device to prompt the user that the login is successful. As shown in fig. 7 (c), a prompt message "login successful" is displayed in the mobile phone interface.

When the authentication result returned by the server to the first device is authentication failure, prompt information can be displayed in the interface of the first device so as to prompt the user that login fails. As shown in fig. 7 (d), a prompt message "login failed, please re-login" is displayed on the mobile phone interface.

It should be noted that the prompt information shown in (c) and (d) in fig. 7 is merely illustrative, and the first device may also prompt the user identity authentication result by sending out a voice prompt, or the first device may also prompt the user identity authentication result by displaying other prompt information, which is not limited herein.

It should be noted that the above-described flow of login verification of the snap application of S601 to S606 is applicable not only to the first device of the snap initiator, but also to the second device of the snap invitee, and even to other electronic devices in which the snap application is installed. The process of login verification for the second device of the snap invitee in the embodiment of the present application is not specifically described.

S607, the first device sends a shooting request to the server in response to the selection operation of the shooting initiator.

Wherein, the selection operation refers to the operation that the time initiator selects the time invitee.

After the first device successfully logs into the time APP, the first device may add the time contact in response to a selection operation of the time initiator. For example, the first device may search for a user name registered by a contact to be added (for example, user a) in response to an operation of the photo initiator, and after searching for the user name of the contact to be added, the first device sends an adding request to the second device of user a through the server, and after the second device of user a receives the adding request, in response to the operation of user a, sends an indication of approval to add to the first device through the server. And after receiving the consent adding instruction sent by the server, the first device successfully adds the user A as a photo taking friend. Thereafter, user A is displayed in the live contact list of the first device.

Then, the shooting initiator selects a user A in the shooting contact list, and the first device responds to the selection operation of the shooting initiator and sends a shooting request to the server.

The user a agrees to add the buddy in the time for introduction, and the second device sends an indication of refusal to add to the first device through the server in response to the operation of the user a. After the first device receives the addition rejection indication sent by the server, prompt information can be displayed in a display interface of the first device so as to prompt the user A to reject the friend adding request.

S608, the server sends a request for a time to the second device.

S609, the second device sends the snap response information to the server.

After receiving the request for taking a photograph sent by the first device, the server sends the request for taking a photograph to the second device, and after receiving the request for taking a photograph, the second device can display prompt information on a display interface so as to prompt whether the invitee agrees to take a photograph or not.

Illustratively, after the close-up initiator clicks on the close-up control shown in fig. 8 (a), the handset may display a close-up contact list in response to the click operation of the close-up initiator. As in (b) in fig. 8, there is displayed a list of the snap contacts of the snap friends that the snap initiator has added.

After the close-up initiator selects the user D as the close-up invitee, prompt information can be displayed in the device of the user D to prompt the user whether to agree to close-up. As shown in fig. 8 (c), assuming that the device of the user D is a smart watch, after the smart watch receives the request for taking a photograph sent by the server, the display interface of the smart watch displays a prompt "of reminding information: * Invitations are in harmony with you, whether to agree. If the user D agrees to take the photos together, the intelligent watch responds to the response operation of the user D and enters a photo-taking interface.

It should be noted that, in the embodiment of the present application, the number of users in a time is not limited, for example, the time initiator may select only one user as a time invitee in the time contact list, or may simultaneously select a plurality of users as time invitees. As shown in (b) in fig. 8, the photo initiator may select only user D as the photo invitee. As shown in fig. 8 (D), the photo initiator may also select user C and user D as photo invitees at the same time.

And S610, the server returns the time response information to the first device.

S611, the first device determines, according to the time response information, whether the time invitee receives the time request.

S612, the first device prompts success of the snap invitation.

S613, the first device prompts a failure of the snap invitation.

In the embodiment of the application, the second device responds to the operation of the time-taking invitee, sends the time-taking response information to the server, and returns the time-taking response information to the first device after the server receives the time-taking response information.

If the first device receives the photographing response information which is that the photographing invitee agrees to be photographed, the first device prompts that the photographing invitation is successful. If the first device receives the snap response message, the first device prompts that the snap invitation fails.

For example, as shown in (c) of fig. 8, assuming that the user D clicks the "yes" control, in response to the operation of the user D, the mobile phone may display a prompt information "prompt" shown in (a) of fig. 9: user D agrees to the close photo, and the close photo initiation succeeds. Assuming that the user D clicks the "no" control, in response to the operation of the user D, the mobile phone may display a prompt message "prompt" shown in (b) of fig. 9: user D refuses to take a photo, and the take a photo initiates a failure.

In one scenario, the first device receives the shot response information sent by the server as agreeing to shot, and the interface of the first device may display the first preview interface. After the second device logs in the photographing application, the second device displays a second preview interface. And in the process that the first equipment and the second equipment display the preview interfaces, after the first equipment and the second equipment acquire the corresponding preview images, respectively uploading the acquired preview images to a server. After receiving the preview images collected by the first device and the second device, the server performs fusion processing on the preview images collected by the first device and the second device to obtain fused in-time preview images. After the first equipment and the second equipment upload the acquired images to the server, the server fuses the images acquired by the first equipment and the second equipment, and returns the snap-in images to the first equipment and the second equipment after the snap-in images are obtained, so that the first equipment and the second equipment obtain the snap-in images. The above-mentioned photographing procedure will be described in detail with reference to fig. 10 by taking the first device as a device for initiating a photographing and the second device as an invited device for performing a photographing, and as shown in fig. 10, the photographing method may include the following steps:

S1001, the first device transmits a first depth image to the server in response to a shooting preview operation of the photo initiator.

S1002, the second device transmits a second depth image to the server in response to a photographing preview operation of the live invitee.

The depth image is an image in which a distance (depth) from a camera to each point in a scene is set as a pixel value.

Here, the first device and the second device are devices in different geographical locations, such as X-ground of the first device and Y-ground of the second device.

After the first device displays the first preview interface, the first device sends the first depth image displayed in the first preview interface to the server. And similarly, after the second device displays the second preview interface, the second device sends the second depth image displayed in the second preview interface to the server.

For example, after the first device displays the first preview interface, the first device acquires a first depth image by using a front camera, and sends the acquired first depth image to the server. And after the second device displays the second preview interface, the second device acquires a second depth image by adopting a rear camera and sends the acquired second depth image to the server.

In the embodiment of the application, in the process of displaying the preview interface by the first device and the second device, the preview interface of the second device can be displayed in the first device, and the preview interface of the first device can be displayed in the second device. I.e. the preview interface can be shared in real time between the first device and the second device.

In some embodiments, after the first device and the second device establish remote connection, the background images in the preview interfaces of the first device and the second device may be the background images collected by the camera of the first device or the background images collected by the camera of the second device, so as to meet the shooting requirements of the user in a time.

Illustratively, as shown in (a) of fig. 11, the background image displayed in the first preview interface of the first device is a background image collected by the camera of the first device, and as shown in (b) of fig. 11, the background image displayed in the second preview interface of the second device is also a background image collected by the camera of the first device.

Further, the first device and the second device may each switch the background of the live image in response to a switching operation (one example of a third operation) by the user.

In the first case, the photo initiator may switch the background image of the photo image at the first preview interface of the first device. The first preview interface of the first device displays a self-shot image of the live initiator, and the first device can switch the background image to the background where any live invitee is currently located in response to the operation of the live initiator. When the first device responds to the operation of the shooting initiator and switches the background image in the first preview interface to the background image collected by the camera of the second device, the background image in the second preview interface of the second device is also switched to the background image collected by the camera of the second device.

Illustratively, as shown in (a) of fig. 12, a first preview interface of the close-up initiator is displayed in the first device, and the first device may display a close-up contact list in response to operation of the "background switch" control by the close-up initiator, as shown in (b) of fig. 12. The first device responds to the selection operation of the user D, and can switch the shooting background to the background image acquired by the camera of the second device corresponding to the user D. The background image in the first preview interface of the first device is switched from the background image shown in (a) in fig. 12 to the background image shown in (c) in fig. 12. The background image in the second preview interface of the second device is switched from the background image shown in (b) in fig. 11 to the background image shown in (d) in fig. 12.

In the second case, the live invitee may also switch the background image of the live image at the second preview interface of the second device. When the background image collected by the device of the live initiator is displayed in the second preview interface of the second device, the second device may switch the shooting background to the background image collected by the camera of the second device in response to the operation of the live invitee. Likewise, the first device also switches the background image in the first preview interface to the background image captured by the camera of the second device in response to operation of the live invitee.

Illustratively, as shown in fig. 13 (a), the second device has displayed therein a self-shot image of the live invitee and a background image captured by the camera of the first device, and the second device may switch the background image to the background image captured by the camera of the second device in response to operation of the "background switch" control by the live invitee, the interface after the background switch being as shown in fig. 13 (b). Likewise, in response to operation of the "background switch" control by the live invitee, the background image of the first device is switched from (c) in fig. 13 to the background image shown in (d) in fig. 13.

It should be noted that the above-mentioned method for switching the shooting background in fig. 12 and 13 is only described as an example, and the specific background switching method may be determined according to the actual shooting situation, which is not limited herein. In addition, the background images displayed by the preview interfaces of the first device and the second device are changed in real time and are not a fixed picture. The background image changes with a change in the position of the live user (live initiator or live invitee) or with a movement of the electronic device of the live user.

When the first device and the second device collect the simultaneous images, the first device and the second device can collect the images by adopting a front camera, can collect the images by adopting a rear camera, can collect the images by adopting a wide-angle camera, and the like. For example, a first device uses a front-facing camera to capture images and a second device uses a wide-angle camera to capture images.

It should be noted that, the cameras used for acquiring the images in the first device and the second device may be TOF cameras, binocular vision cameras or structured light cameras, and in the embodiment of the present application, the types of the cameras in the first device and the second device are not limited, and the cameras capable of acquiring the depth information of the images may meet the scheme of the present application.

S1003, the server performs background extraction on the first depth image to obtain a first background frame and a first main body image.

And S1004, performing foreground extraction on the second depth image by the server to obtain a second main image.

In the embodiment of the application, after receiving a first depth image sent by a first device and a second depth image sent by a second device, a server extracts the background of the first depth image to obtain a first background frame and a first main image, and extracts the foreground of the second depth image to obtain a second background frame and a second main image.

Optionally, the server may extract the first depth image and the second depth image by using an image segmentation algorithm, so as to obtain a corresponding background frame and a corresponding main image. The image segmentation algorithm may be a semantic segmentation algorithm or an instance segmentation algorithm, among others.

U-Net is a common semantic segmentation algorithm, and adopts an encoder-decoder structure. It progressively reduces the resolution of the image and extracts features, and then progressively restores resolution and performs segmentation prediction. U-Net exhibits good performance in speech segmentation tasks. Mask R-CNN is an instance segmentation algorithm that further provides accurate segmentation of each instance based on object detection. The Mask R-CNN combines the regional suggestion network and the full convolution network, so that the foreground frame of each user in the image can be accurately extracted.

Alternatively, the server may also input the first depth image into a pre-trained semantic segmentation model, which outputs the first background frame and the first subject image. The semantic segmentation model is obtained by training a large number of training samples, and has the capability of extracting background information and foreground information of an image. Likewise, the server may also input the second depth image into a pre-trained instance segmentation model that outputs a second background frame and a second subject image.

The method for processing the first depth image and the second depth image by the server is only described as an example, and other methods for extracting the foreground or the background of the depth image are also applicable to the embodiment of the present application, which is not described herein.

S1005, the first device transmits a request for a shooting template in response to the operation of the shooting initiator.

In the embodiment of the present application, a plurality of the photographing templates are stored in advance in the server, and the first device may send a request for the photographing templates to the server in response to a photographing template selection operation (an example of a second operation) of the photographing initiator.

Illustratively, as shown in (a) of fig. 14, a "close template" control is displayed in the first preview interface of the first device, and the first device displays a close gesture in response to a trigger operation of the "close template" control by the close initiator, as shown in (b) of fig. 14. The first device sends a request for a snap template to the server in response to operation of the "side-by-side standing" control by the snap initiator. Wherein the snap gestures in the snap templates may include hugging, hand pulling, kissing, standing side by side, and so forth.

S1006, the server acquires depth information of the first subject image and the second subject image.

The depth information refers to three-dimensional coordinate values of each pixel point in the image.

In an embodiment of the present application, the first depth image acquired by the camera of the first device includes depth information. The second depth image acquired by the camera of the second device also includes depth information. The server may acquire depth information of the first subject image and the second subject image.

It should be noted that, in the embodiment of the present application, the execution order of the step 1005 and the step 1006 is not limited, and the step 1005 may be executed first, then the step 1006 may be executed, then the step 1005 may be executed, or the step 1005 and the step 1006 may be executed simultaneously.

S1007, the server performs gesture estimation on the user based on the snap template request and the depth information, and determines a gesture frame of each user.

In the embodiment of the application, after receiving the request of the photographing template sent by the first device, the server determines the target photographing template according to the request of the photographing template. Wherein, the predicted gesture of the user can be included in the time template.

In some embodiments, the server may input the first subject image and the second subject image into a trained pose estimation model, respectively, and determine pose estimation results of the respective users to be photographed based on the output of the pose estimation model. The gesture estimation model is trained by a large number of samples and has the capability of predicting the gesture of the user.

In other embodiments, the server may further directly input the first depth image and the second depth image into a trained pose estimation model, and determine pose estimation results of each user to be photographed in time according to output of the pose estimation model.

Common pose estimation algorithms are OpenPose, poseNet, etc. The openPose is a commonly used gesture estimation algorithm, which can detect the key point positions of a human body, including the head, arms, trunk and other parts, and estimate the spatial structure of the gesture. Openwise shows better performance in both multi-person pose estimation and complex contexts.

PoseNet is a lightweight pose estimation algorithm suitable for mobile devices and real-time applications. It uses convolutional neural network to process the input image and outputs the coordinates of the key points. The PoseNet can estimate the gesture of the user in real time, and is suitable for the real-time requirement of remote photo-taking.

In the embodiment of the application, after determining the gesture estimation result of the user to be photographed, the server can determine the corresponding gesture frame according to the gesture estimation result of the user to be photographed.

S1008, the server sends the gesture box of the snap shot initiator to the first device.

S1009, the server sends the gesture frame of the live invitee to the second device.

For example, the server determines the predicted pose of the snap user from the target snap template, then sends the pose frame of the snap initiator to the first device and sends the pose frame of the snap invitee to the second device. The dashed box shown in (c) in fig. 14 is the posture box of the live initiator, and the dashed box shown in (d) in fig. 14 is the posture box of the live invitee.

And after the first equipment receives the gesture frame of the shooting initiator sent by the server, the gesture frame is displayed in a preview interface of the first equipment. The photograph initiator may adjust the pose and/or position of the photograph according to the pose frame displayed in the first device.

Similarly, after receiving the gesture frame of the live invitee sent by the server, the second device displays the gesture frame in the preview interface of the second device. The live invitee may adjust the pose and/or position of the photograph based on the pose frame displayed in the second device.

In some embodiments, when the server sends the gesture frame to the first device or the second device, the server may also send prompt information to the first device and the second device to prompt the photographing gesture or photographing position of the user. As shown in fig. 14 (d), when the server transmits the gesture frame of the live invitee to the second device, the server also transmits the prompt message "to lay aside and to hold a fist", and after receiving the gesture frame and the prompt message of the live invitee transmitted by the server, the second device can adjust the photographing gesture and the photographing position.

It should be noted that, in fig. 14, the server sends prompt information to the second device as an exemplary description, and the server may also send voice prompt information to the first device or the second device to prompt the time initiator or the time invitee to adjust the photographing gesture and the photographing position.

S1010, the first device transmits a first captured image to the server in response to the capturing operation.

S1011, the second device transmits a second captured image to the server in response to the capturing operation.

In the embodiment of the application, after a photographing initiator determines a photographing gesture and a photographing position, the photographing initiator triggers a photographing control, and after a first photographed image is acquired by a first device in response to a triggering operation (an example of a first operation) of the photographing control by the photographing initiator, the first device sends the first photographed image to a server.

Likewise, after the photo taking invitee determines the photo taking pose and the photo taking position, the photo taking invitee triggers the photo taking control, and the second device responds to the triggering operation of the photo taking invitee on the photo taking control, and after the second photo taking image is acquired, the second device sends the second photo taking image to the server.

In some embodiments, to improve the beautification of the live image, the live initiator may start the intelligent optimization function before the live initiator triggers the shooting control, so that the first device collects the beautified first shot image. Illustratively, as shown in (c) in fig. 14, after determining the photographing gesture and the photographing position, the photographing initiator triggers the intelligent optimization control 1410 first and then triggers the photographing control to acquire the first beautified photographed image. Similarly, as shown in (d) of fig. 14, after determining the photographing posture and the photographing position, the photo invitee triggers the intelligent optimization control 1420 first and then triggers the photographing control to acquire the second beautified photographed image.

In the embodiment of the application, in order to ensure that the shooting styles of all users in the simultaneous shooting images are uniform, before the first equipment and the second equipment acquire the images, the simultaneous shooting users can adjust the camera parameters of the first equipment and the camera parameters of the second equipment to be consistent. Optionally, the server may determine the camera parameters of the first device according to the received preview stream of the first device, and determine the camera parameters of the second device according to the preview stream of the second device. The server judges whether the camera parameters of the first device are the same as those of the second device so as to determine whether to adjust the camera parameters of the first device or the second device.

In one case, assuming that the background image acquired by the camera of the first device is the background of the live image, if the server determines that the camera parameters of the first device are different from the camera parameters of the second device, the server may send the camera parameters of the first device to the second device. After the second device receives the camera parameters of the first device sent by the server, the second device responds to the adjustment operation of the invitee in the process of photographing, and adjusts the camera parameters of the second device according to the camera parameters of the first device so that the adjusted camera parameters of the second device are consistent with the camera parameters of the first device.

The camera parameters include, but are not limited to, sensitivity, brightness, color, white balance parameters, exposure parameters, and the like. Therefore, the imaging effect of the image acquired by the second equipment is consistent with the imaging effect of the image acquired by the first equipment by adjusting the camera parameters of the second equipment, so that the uniformity of the styles of all users in the shot images acquired by the first equipment and the second equipment is ensured.

For example, assume that the server determines, from the preview stream of the first device, that the sensitivity value of the camera parameter of the first device is 200, the luminance value is 100, and the server determines that the sensitivity value of the camera parameter of the second device is 180, and the luminance value is 120. The server may send the camera parameter of the first device to the second device, and after the second device receives the camera parameter of the first device, the second device adjusts the sensitivity value of the camera parameter of the second device from 180 to 200 and adjusts the brightness value from 120 to 100 in response to an adjustment operation of the user.

It should be noted that, the foregoing description is exemplified by adjusting the camera parameters of the second device according to the camera parameters of the first device. Assuming that the background image acquired by the camera of the second device is the background of the shot image, when the server determines that the camera parameter of the first device is different from the camera parameter of the second device, the camera parameter of the proposal device can be adjusted according to the camera parameter of the second device, so that the adjusted camera parameter of the first device is consistent with the camera parameter of the second device.

S1012, the server performs image fusion on the first shooting image and the second shooting image to obtain a simultaneous shooting image.

In the embodiment of the application, after receiving a first shooting image sent by a first device and a second shooting image sent by a second device, a server performs image fusion on the first shooting image and the second shooting image to obtain a fused snap-in image.

In an embodiment, the server may perform foreground extraction on the first shot image and the second shot image to obtain a background image and two foreground images including a human image, and the server inputs the background image and the foreground image into a trained depth fusion network, and the depth fusion network outputs the fused live images.

The training process of the server on the deep fusion network is as follows:

(1) Training set preparation. The server collects a pair of co-shot images, the pair of co-shot images including a foreground image and a corresponding background image. Meanwhile, the server generates a mask image corresponding to each foreground image and marks the foreground region.

(2) And constructing a deep fusion network. The server may employ a convolutional neural network or a network architecture that generates a network structure such as an countermeasure network as a deep convergence network. The depth fusion network receives the foreground image, the background image and the mask image as input, and generates a fused snap image as output.

(3) Training process. The server uses the foreground image, the background image and the mask image in the training set as input, and uses the real snap-in image as a target to train the depth fusion network. During training, network parameters are optimized to minimize the difference between the generated image and the real-time image.

The above-mentioned server uses a trained depth fusion network to fuse the first captured image and the second captured image, which is only described as an example, and the server may also use other image fusion methods to fuse the first captured image and the second captured image, which is not limited herein.

In another embodiment, when the server fuses the first photographed image and the second photographed image, the server may first decompose the first photographed image and the second photographed image into the base portion and the detail content; then, respectively fusing the basic part and the detail content, namely fusing the basic part through weighted average, extracting multi-layer characteristics from the detail content by using a deep learning network, generating a plurality of candidates for fusing the detail content by using L1 norm and weighted average strategy by using the extracted characteristics, and obtaining the final fused detail content by using the maximum selection strategy once the candidates are obtained; and finally, combining the fused basic part with the detail content, and reconstructing the image to obtain the simultaneous image.

When the invitee is a plurality of users, the server fuses the first shooting image and the plurality of second shooting images, the first shooting image and any one of the second shooting images can be fused, and then the fused image is fused with one of the second shooting images continuously until the first shooting image and the plurality of second shooting images are fused completely, so that the fused snap-in image is obtained.

In some embodiments, the server may further perform pose estimation on the co-shooting users before fusing the first shooting image and the second shooting image, so as to obtain pose estimation results of the co-shooting users. Wherein the gesture estimation result includes the gesture, the relative position, and the angle of the snap user. Then, the server adjusts the gesture, position or angle of each photographing user according to the gesture estimation result of each photographing user. Therefore, the gesture, the relative position or the angle of the shooting user are adjusted, so that the adjusted shooting user is more natural.

For example, assuming that the photographing user is two persons, the server may identify the photographing user to determine information such as gender, age, etc. of the photographing user, assuming that the server determines that the two photographing users are male and female respectively, the server may estimate that the gesture estimation results of the two photographing users are hand-in-hand.

Further exemplary, assuming that a pair of parents and children are remotely photographed, as shown in (a) of fig. 15, after the first device acquires the first photographed image of the parents, the first device uploads the first photographed image to the server. As shown in (c) of fig. 15, after the second device acquires the second captured image of the son, the second device uploads the second captured image to the server. After the server receives the first shooting image and the second shooting image, when the first shooting image and the second shooting image are fused, the posture and the angle of the son are adjusted by the server, so that the posture of the father and the son after adjustment is more natural. As in the first device in fig. 15 (b), the live images of the parent and child are shown, and as in the second device in fig. 15 (d), the live images of the parent and child are also shown.

Also exemplary, assuming that the close-up user is 4 people, the server recognizes that the pose of the 3-position close-up user is "bije", and the pose of the 1-position close-up user is not "bije". The server may estimate the pose of the user in time and adjust the pose of the user in time to "biyer" based on the pose estimation result.

In some embodiments, the server may beautify the fused captured images. For example, the server may perform beautifying treatments such as skin grinding, face thinning, acne removal, large eyes, make-up, shaping, whitening, wrinkle removal, hair dressing, black eye removal, and the like on a snap-shot user in a snap-shot image. Therefore, the taken image shot by the user in time is more attractive.

Optionally, after the server obtains the fused snap images, the server may identify the user information of the snap users, so as to beautify each snap user according to the identified user information. For example, the server identifies the users as the elderly, middle-aged men and infants, respectively, and when the server beautifies the users, the beautifying operations for the elderly, middle-aged men and infants are not the same. The server performs beautifying operations such as wrinkle removal, skin polishing and the like on the old, performs beautifying operations such as shaping, whitening and the like on the middle-aged male, and does not need to perform beautifying operations on the baby.

Optionally, the server may further perform a beautifying operation on the shooting user according to scene information of the shooting background image. For example, the server recognizes that the scene information is dusk, and the server can perform face lighting processing on all the shooting users, so that all the shooting users in the processed shooting images are fused with the background more naturally.

Optionally, the server may also perform three-dimensional construction on the human body of the photographing user. For example, assuming that the user a is a disabled person (lack of a left leg), the server may reconstruct the left leg of the user a in the process of fusing the photographed images of the user a, so that the user a in the fused image is a normal person. Therefore, the photographing requirement of the user can be met more for the simultaneous photographing image.

Optionally, the server may further adjust the five sense organs of the user in the still image, so that the adjusted five sense organs of the user in the still image are more natural. For example, assume that in the shot image 1 acquired by the device by the user 1, the eye of the user 1 is in a closed-eye state, and when the server fuses the images, the eyes of the user 1 can be adjusted, so that in the fused shot image, the eyes of the user 1 are in a normal open-eye state. For example, the server can adjust the expression of each shooting user in the shooting image, so that the expression of each shooting user after adjustment is more natural.

S1013, the server returns the live image to the first device.

S1014, the server returns the snap shot image to the second device.

For example, as shown in (c) and (d) in fig. 14, after the first device sends the first shot image to the server and the second device sends the second shot image to the server, the server may fuse the foreground image and the background image of the first shot image and the second shot image to obtain a fused live image. The server may return the live images to the first device and the second device, respectively. After the first device receives the live image, the live image is displayed on the display interface of the first device, as shown in (a) of fig. 16. Likewise, after receiving the still image, the second device displays the still image on the display interface of the second device, as shown in (b) of fig. 16.

After receiving the taken images, the first device responds to the operation of the taken initiator, and can edit, share or save the taken images. Similarly, after receiving the live image, the second device may save, share, or edit the live image in response to an operation of the live invitee.

In an embodiment, before the first device responds to the save operation of the in-time initiator to save the in-time image, the first device may also respond to the edit operation of the in-time initiator to edit the in-time image, and after the edit is completed, respond to the save operation of the in-time initiator to save the edited in-time image.

Optionally, after the first device receives the still image sent by the server, the still initiator may perform an editing operation on the still image. The editing operation comprises size adjustment, position adjustment, beautification treatment and the like. Then, the first device saves the edited live image in response to a save operation of the live initiator.

For example, it is assumed that after the first device receives the snap-shot image, the first device may perform expression adjustment, body type adjustment, and the like on the snap-shot initiator in response to the operation of the snap-shot initiator, so as to obtain an adjusted snap-shot image, so that the adjusted snap-shot image more meets the requirement of the snap-shot initiator.

After the first device and the second device receive the live image sent by the server, the first device and the second device may both edit the live image, or the first device edits the live image, and the second device does not edit the live image, or the second device edits the live image, and the first device does not edit the live image. That is, the live initiator and the live invitee can determine whether to perform editing operations on the live image according to their own needs. In another remote photographing method, the server may fuse the first background frame, the first subject image, and the second subject image, and then return the fused pre-photographed image to the first device and the second device. If the shooting initiator or the invitee is not satisfied with the fused pre-shot image, the shooting initiator and the invitee can trigger shooting operation after adjusting shooting posture or angle so as to obtain the shot image more meeting the requirements of the user. The above-described remote photographing process is described in detail below with reference to fig. 17, and as shown in fig. 17, the remote photographing process may include the following steps:

S1701, the first device transmits the first depth image to the server in response to the photographing preview operation of the photograph initiator.

S1702, the second device sends a second depth image to the server in response to a shooting preview operation of the live invitee.

S1703, the server performs background extraction on the first depth image to obtain a first background frame and a first main image.

And S1704, performing foreground extraction on the second depth image by the server to obtain a second main body image.

S1705, the first device transmits a request for a shooting template in response to an operation of the shooting initiator.

S1706, the server acquires depth information of the first subject image and the second subject image.

S1707, the server estimates the gesture of the user based on the snap template request and the depth information, and determines the gesture frame of each user.

S1708, the server transmits the gesture frame of the photo initiator to the first device.

S1709, the server sends a gesture box of the live invitee to the second device.

The implementation procedures of S1701 to S1709 may be referred to the implementation procedures of S1001 to S1009, and will not be described here again.

S1710, the first device transmits the third depth image to the server. Correspondingly, the server receives the third depth image.

S1711, the second device sends the fourth depth image to the server. Correspondingly, the server receives the fourth depth image.

In the embodiment of the application, after the server sends the gesture frame of the shooting initiator to the first device, the shooting initiator adjusts the shooting gesture according to the gesture frame, and the first device acquires a third depth image and sends the third depth image to the server.

Similarly, after the server sends the gesture frame of the invitee to the second device, the second device acquires a fourth depth image after the invitee adjusts the shooting gesture according to the gesture frame, and sends the fourth depth image to the server.

And S1712, the server performs background extraction on the third depth image to obtain a third background frame and a third main body image.

And S1713, the server performs background extraction on the fourth depth image to obtain a fourth main body image.

And S1714, the server fuses the third background frame, the third main body image and the fourth main body image to obtain a pre-shot image.

The method for extracting the background of the third depth image and the fourth depth image by the server refers to the process of extracting the background of the first depth image and the second depth image by the server, and is not described herein. The process of obtaining the pre-shot image by the server through image fusion is also referred to as the above image fusion process, and will not be described here again.

S1715, the server transmits the pre-shot image to the first device.

S1716, the server transmits the pre-shot image to the second device.

It can be understood that the server performs image fusion on preview images acquired by the first device and the second device to obtain a pre-shot image, and the server sends the pre-shot image to the first device and the second device.

If the shooting initiator is not satisfied with the pre-shot image received by the first device, the shooting initiator can adjust shooting posture, shooting angle, expression and the like. Likewise, if the live invitee is not satisfied with the pre-live image received by the second device, the live invitee may also adjust shooting pose, shooting angle, expression, and the like.

By way of example, FIG. 18 shows an exemplary diagram of a pair of parent-child remote shots. As shown in (a) of fig. 18, the first device acquires a third depth image, and transmits the third depth image to the server. And the server fuses the preview images of the first device and the second device, and returns the pre-shot images to the first device and the second device after obtaining the pre-shot images. A pre-shot image is displayed in the shooting preview interface of the first device as in (b) in fig. 18. The father is not satisfied with the pre-shot image, and the father adjusts the shooting position or angle, for example, the father adjusts the side face position, the relative position with the son, the arc of the mouth angle, and the like.

S1717, the first device transmits the first captured image to the server in response to the capturing operation.

S1718, the second device transmits the second captured image to the server in response to the capturing operation.

After the shooting posture, angle or expression of the shooting initiator is adjusted, the shooting initiator triggers a shooting function, and the first equipment responds to shooting operation and sends a first shooting image to the server after the first shooting image is acquired.

Likewise, after the invitee in the time adjusts his own shooting posture, angle or expression, the invitee in the time triggers a shooting function, and the second device, in response to a shooting operation, acquires a second shot image and then sends the second shot image to the server.

S1719, the server performs image fusion on the first shooting image and the second shooting image to obtain a simultaneous shooting image.

S1720, the server returns the live image to the first device.

S1721, the server returns the live image to the second device.

Continuing with fig. 18 as an example, after the father adjusts the post-posture, the photographing operation is triggered, and the first device transmits the first photographed image to the server in response to the photographing operation. Likewise, the second device transmits the second captured image to the server. And the server fuses the first shooting image and the second shooting image to obtain a snap shot image. The server returns the live image to the first device and the second device. A still image is displayed in the first device in fig. 18 (c), and a still image is displayed in the second device in fig. 18 (d).

As can be seen by comparing (b) and (c) in fig. 18, after the father adjusts the shooting angle and the shooting posture, the server fuses the obtained snap images more naturally, and the server also adjusts the position of the synthetic image relative to the camera, so that the fusion of the adjusted father-son snap and the background image is more true and natural.

The implementation procedures of S1717 to S1721 may be referred to the implementation procedures of S1010 to S1014, and will not be described here.

Therefore, the server performs depth fusion on the preview images of the first device and the second device, and returns the fused pre-shot images to the first device and the second device, so that the pre-shot images are seen in the preview interface by a user in time. When the simultaneous shooting user is not satisfied with the pre-simultaneous shooting image, the shooting angle, the shooting posture and the like can be adjusted, so that the finally obtained simultaneous shooting image meets the user requirement. In addition, the user in time can acquire real and natural in-time images through one-time shooting operation.

As another example, as shown in fig. 19, a four-port remote photographing is performed in fig. 19. Dad in (a) in fig. 19 is the in-time initiator, i.e., the first device is the in-time initiating device, mom, sister and brother are in-time invitees. Taking the background collected by the first device as the background for the shooting, the preview images collected by the first device are shown in (a) of fig. 19, and the preview images collected by the 3 second devices are shown in (b), (c) and (d) of fig. 19, respectively. After the first device and the 3 second devices send the preview images to the server, the server extracts the foreground of the received preview images to obtain the portraits and the background images of the 4 simultaneous users, and then the server fuses the portraits and the background images of the 4 simultaneous users to obtain the pre-simultaneous images. The server returns the pre-shot image to the first device and 3 second devices. The pre-shot image received by any one device is shown in fig. 19 (e). Assuming that the snap user is not satisfied with the pre-snap image, the snap user may reselect the snap template. After the photographing user redetermines the photographing template, the first device and the 3 second devices respectively respond to photographing operations of the photographing user and respectively upload photographed images to the server.

After receiving the shooting images uploaded by the 4 devices, the server estimates the gestures of all the shooting users, and after determining the gesture estimation results of all the shooting users, the server performs image fusion on the shooting images to obtain the shooting images. The server returns the live image to the first device and 3 second devices. The still image received by any one device is shown in fig. 19 (f).

In addition, it should be noted that, after the server receives the shot images sent by the first device and the 3 second devices, the server identifies the shooting user, so as to determine the user information such as gender, height, body shape, etc. of the shooting user. The server can estimate the position, posture, etc. of the photographing user from the user information, and as shown in (f) of fig. 19, the server determines that the positions of dad and mom are located behind the daughter and son. In addition, the server also adjusts the positions of the combined users and the positions of the background images, for example, the server keeps the positions of the combined 4 combined users away from the sea, so that the combined images are more real and natural.

It should be noted that, in fig. 19, three of the invitees in a time are illustrated as examples, and the number of the invitees in a time is not limited in the embodiment of the present application.

In the above embodiment, the method for taking photos in time is exemplified by taking photos in time by a user in time, but the user in time may take videos in time, live broadcast in time, etc., which is not limited in the embodiment of the present application.

Illustratively, as shown in fig. 20, a live user may receive a live video returned by a server.

In the embodiment of the application, the first equipment and the second equipment can adopt the front camera, the rear camera or the wide-angle camera to shoot, so that shooting images can be acquired under different visual angles.

Illustratively, fig. 21 (a) is an image acquired by a front camera of the first device, and fig. 21 (b) is an image acquired by a wide-angle camera of the first device.

It will be appreciated that the electronic device or the like may include hardware structures and/or software modules that perform the functions described above. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

The embodiment of the application can divide the functional modules of the electronic device and the like according to the method example, for example, each functional module can be divided corresponding to each function, and two or more functions can be integrated in one processing module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation.

In the case of dividing the respective functional modules with the respective functions, one possible composition diagram of the electronic device involved in the above-described embodiment may include: a display unit, a transmission unit, a processing unit, etc. It should be noted that, all relevant contents of each step related to the above method embodiment may be cited to the functional description of the corresponding functional module, which is not described herein.

The embodiment of the application also provides electronic equipment which comprises one or more processors and one or more memories. The one or more memories are coupled to the one or more processors, the one or more memories being configured to store computer program code comprising computer instructions that, when executed by the one or more processors, cause the electronic device to perform the relevant method steps described above to implement the method of the above embodiments.

Embodiments of the present application also provide a computer-readable storage medium having stored therein computer instructions that, when executed on an electronic device, cause the electronic device to perform the above-described related method steps to implement the method of the above-described embodiments.

Embodiments of the present application also provide a computer program product comprising computer instructions which, when run on an electronic device, cause the electronic device to perform the above-described related method steps to implement the method of the above-described embodiments.

In addition, embodiments of the present application also provide an apparatus, which may be embodied as a chip, component or module, which may include a processor and a memory coupled to each other; the memory is configured to store computer-executable instructions, and when the apparatus is running, the processor may execute the computer-executable instructions stored in the memory, so that the apparatus executes the method for performing the method for photographing performed by the electronic device in the above method embodiments.

The electronic device, the computer readable storage medium, the computer program product or the apparatus provided in this embodiment are configured to execute the corresponding method provided above, and therefore, the advantages achieved by the electronic device, the computer readable storage medium, the computer program product or the apparatus can refer to the advantages in the corresponding method provided above, which are not described herein.

From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be implemented by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to implement all or part of the functions described above. The specific working processes of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which are not described herein.

The functional units in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: flash memory, removable hard disk, read-only memory, random access memory, magnetic or optical disk, and the like.

The foregoing is merely illustrative of specific embodiments of the present application, and the scope of the present application is not limited thereto, but any changes or substitutions within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of taking a photograph, applied to a first device, the first device comprising at least one depth camera, the method comprising:

after the first equipment and at least one second equipment are connected in a remote time, a first interface is displayed, wherein the first interface is a shooting preview interface;

responding to a first operation of a shooting initiator, and sending a first shooting image to a server, wherein the first shooting image is a depth image acquired by the at least one depth camera, and comprises a background image and a first portrait of the shooting initiator;

and displaying a second interface, wherein the second interface comprises a snap-shot image, and the snap-shot image is obtained by performing depth fusion on depth information of the background image, the first portrait and a second portrait in a second shooting image acquired by at least one second device by the server.

2. The method of claim 1, wherein the method further comprises, in response to a first operation by the photograph initiator, prior to sending the first captured image to the server:

sending a first depth image to the server, wherein the first depth image is a preview image acquired by the at least one depth camera;

and after receiving gesture prompt information from the server, displaying a third interface, wherein the gesture prompt information is used for prompting the photographing gesture of the photographing initiator or the photographing invitee, and the gesture prompt information is generated by the server according to the first depth image and the depth information of the second depth image acquired by the at least one second device.

3. The method of claim 2, wherein prior to receiving the gesture prompt from the server, the method further comprises:

and responding to a second operation of the photographing initiator, sending a photographing template request to the server, wherein the photographing template request is used for generating the gesture prompt information according to a photographing template, the first depth image and the second depth image corresponding to the photographing template request by the server.

4. A method according to claim 2 or 3, wherein after receiving the gesture prompt from the server and displaying a third interface, the method further comprises:

transmitting a third depth image to the server, wherein the third depth image is a preview image acquired by the at least one depth camera and obtained by the simultaneous shooting initiator after the gesture is adjusted according to the gesture prompt information;

and displaying a fourth interface, wherein the fourth interface comprises a pre-shot image, and the pre-shot image is obtained by fusing the server according to the third depth image and a fourth depth image acquired by at least one second device.

5. The method according to claim 1 or 2, wherein the background image of the live image is the background of the first captured image or the background of the second captured image.

6. The method of claim 5, wherein prior to the responding to the first operation of the photograph initiator, the method further comprises:

and responding to a third operation on a first control, displaying a fourth interface, wherein the first control is used for switching the background of the shot image, and the background of the preview image displayed in the fourth interface is different from the background in the first interface.

7. A method according to any of claims 1-3, wherein camera parameters carried in the first and second captured images are the same, the camera parameters comprising at least one of sensitivity, brightness, color, white balance parameters or exposure parameters.

8. The method of claim 1 or 2, wherein the snap image is an image after a beautification treatment, the beautification treatment comprising at least one of skin abrasion, face thinning, acne removal, large eye, make-up, shaping, whitening, wrinkle removal, hair styling, black eye removal.

9. A method of taking a photograph, applied to a server, the method comprising:

receiving a first photographed image of a first device and a second photographed image of at least one second device, each of the first device and the second device including at least one depth camera, the first photographed image including a background image and a first person of a live initiator, the second photographed image including the background image and a second person of a live invitee;

fusing the depth information corresponding to the background image, the first portrait and the second portrait respectively to obtain a simultaneous image;

And sending the snap shot image to the first device and the at least one second device.

10. The method of claim 9, wherein prior to receiving the first captured image of the first device and the second captured image of the at least one second device, the method further comprises:

receiving a first depth image sent by the first device and a second depth image sent by the at least one second device;

generating posture prompt information according to the depth information of the first depth image and the depth information of the second depth image, wherein the posture prompt information is used for prompting shooting postures of the simultaneous shooting initiator and the simultaneous shooting invitee;

and sending the gesture prompt information to the first device or the at least one second device.

11. The method of claim 10, wherein generating pose cues from depth information of the first depth image and depth information of the second depth image comprises:

receiving a request of a time shooting template sent by the first equipment or at least one second equipment;

and generating the gesture prompt information according to the corresponding shooting template, the first depth image and the second depth image requested by the shooting template.

12. The method according to any one of claims 9-11, wherein before the fusing the depth information respectively corresponding to the background image, the first portrait and the second portrait, the method further comprises:

and carrying out gesture estimation on the first human figure and the second human figure by adopting a gesture estimation algorithm, and determining target gestures and/or target positions of the first human figure and the second human figure.

13. The method of claim 12, wherein fusing depth information corresponding to the background image, the first portrait, and the second portrait, respectively, comprises:

and fusing depth information corresponding to the background image, the first portrait and the second portrait respectively based on the target pose and/or the target position of the first portrait and the second portrait.

14. The method of any of claims 9-11, wherein prior to receiving the first captured image of the first device and the second captured image of the at least one second device, the method further comprises:

receiving a third depth image sent by the first device and a fourth depth image sent by the at least one second device;

and sending the pre-shot image to the first device and the at least one second device.

15. The method according to any one of claims 9-11, wherein the background image of the live image is the background of the first captured image or the background of the second captured image.

16. The method according to any one of claims 9-11, wherein before the fusing the depth information corresponding to the background image, the first portrait and the second portrait respectively to obtain a snap-in image, the method further includes:

receiving a background switching request;

17. The method of any of claims 9-11, wherein camera parameters carried in the first captured image and the second captured image are the same, the camera parameters including at least one of sensitivity, brightness, color, white balance parameters, or exposure parameters.

18. The method of any of claims 9-11, wherein prior to the sending the live image to the first device and the at least one second device, the method further comprises:

And beautifying the snap images, wherein the beautifying treatment comprises at least one of skin grinding, face thinning, acne removing, big eyes, make-up, shaping, whitening, wrinkle removing, hairdressing and black eye removing.

19. An electronic device, comprising:

the device comprises at least one depth camera, a camera and a camera, wherein the depth camera is used for acquiring a depth image;

one or more processors;

a memory;

wherein the memory has stored therein one or more computer programs, the one or more computer programs comprising instructions, which when executed by the electronic device, cause the electronic device to perform the method of taking a photograph of any of claims 1-8.

20. A server, comprising:

one or more processors;

a memory;

wherein the memory has stored therein one or more computer programs, the one or more computer programs comprising instructions, which when executed by the server, cause the server to perform the method of taking a photograph of any of claims 9-18.

21. A computer-readable storage medium having instructions stored therein that, when read and executed by one or more processors, perform the method of taking a photograph of any of claims 1-8, or perform the method of taking a photograph of any of claims 9-18.