CN111866434A

CN111866434A - Video co-shooting method, video editing device and electronic equipment

Info

Publication number: CN111866434A
Application number: CN202010574700.7A
Authority: CN
Inventors: 卢昳; 江运柱; 杨怀渊; 林晓晴; 顾辉
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2020-06-22
Filing date: 2020-06-22
Publication date: 2020-10-30

Abstract

The embodiment of the disclosure relates to the technical field of video processing, in particular to a video co-shooting method, a video editing device, electronic equipment and a storage medium. In at least one embodiment of the present disclosure, a remote video connection is established between a user (initiator) initiating video capture and a plurality of users (participants) participating in video capture, and the initiator starts a video recording operation, and accordingly, both the initiator device and the plurality of participant devices respond to the operation and record videos respectively, so that the initiator video and the plurality of participant videos can be synthesized into a captured video, and the captured video is synchronously split and independently selected, thereby realizing a guided multi-video clip, satisfying the video capture requirements of remote users, and improving the efficiency of the multi-video clip.

Description

Video co-shooting method, video editing device and electronic equipment

Technical Field

The embodiment of the disclosure relates to the technical field of video processing, in particular to a video co-shooting method, a video clipping device, electronic equipment and a non-transitory computer-readable storage medium.

Background

Video is a storage method for various dynamic images. There are various video shooting modes, for example, a user may start a video shooting function of an electronic device (e.g., a smart phone, etc.), and perform self-shooting through a front camera of the electronic device, or shoot a surrounding environment through a rear camera of the electronic device; for another example, the user may start a screen recording function of the electronic device to record a dynamic image on a screen of the electronic device.

Currently, if a plurality of users wish to take a video in time, common methods are, for example: a plurality of users are gathered together in the lower part of the network, one electronic device is used for starting the video shooting function, and the dynamic images of the users in the same scene are shot together; another example is: a plurality of users respectively take the video by themselves, and then combine the plurality of self-shooting videos to form the video of taking a photo in time.

However, the way of gathering the videos together under multiple subscriber lines is not suitable for the remote users, and the way of combining the self-shooting videos of multiple users to form the video-in-photo cannot meet the requirements of communication and interaction among the multiple users.

Disclosure of Invention

To solve at least one problem of the prior art, at least one embodiment of the present disclosure provides a method for video co-shooting, a method for video clipping, an apparatus, an electronic device and a non-transitory computer-readable storage medium.

In a first aspect, an embodiment of the present disclosure provides a method for video co-shooting, which is applied to an initiator device, and the method includes:

receiving a first input of a video snapshot;

establishing a video connection with at least one participant device in response to the first input;

after the video connection is established, receiving a second input of the recorded video;

responding to the second input, recording an initiator video, and sending recording start information to the at least one participant device, so that the at least one participant device records the participant video after receiving the recording start information;

and acquiring the participant video, and synthesizing a co-shooting video based on the participant video and the initiator video.

In a second aspect, an embodiment of the present disclosure further provides a method for video co-shooting, which is applied to a participant device, where the method includes:

receiving invitation information of video co-shooting, wherein the invitation information is generated by initiator equipment;

detecting a first input for the invitation information;

based on the first input being acceptance invitation, sending acceptance information to the initiator device, and establishing video connection with the initiator device;

after video connection is established, receiving recording start information and recording participant videos, wherein the recording start information is generated by the initiator device;

Sending the participant video to the initiator device to cause the initiator device to compose a co-shot video based on the participant video and the initiator video.

In a third aspect, an embodiment of the present disclosure further provides a method for video clipping, where the method includes:

acquiring a first co-shooting video, wherein the first co-shooting video is synthesized based on an initiator video recorded by initiator equipment and a participant video recorded by at least one participant equipment; the initiator device establishes video connection with the at least one participant device, and the at least one participant device records the participant video after receiving the recording start information sent by the initiator device;

synchronously segmenting a plurality of videos included in the first snap-shot video;

selecting each video included in the first snap video individually;

generating a second snap video based on a result of the synchronized segmentation and a result of the separate selection.

In a fourth aspect, an embodiment of the present disclosure further provides a client device, where the client device is configured to initiate video taking, and the client device includes:

a multi-person build subunit for: receiving a first input of a video snapshot; establishing a video connection with at least one participant device in response to the first input;

A video recording subunit for: after the video connection is established, receiving a second input of the recorded video; responding to the second input, recording an initiator video, and sending recording start information to the at least one participant device, so that the at least one participant device records the participant video after receiving the recording start information;

a video interaction subunit to: acquiring the participant video;

a video composition subunit to: synthesizing a co-shot video based on the participant video and the initiator video.

In a fifth aspect, an embodiment of the present disclosure further provides a client device, where the client device is configured to participate in video co-shooting, and the client device includes:

a multi-person build subunit for: receiving invitation information of video co-shooting, wherein the invitation information is generated by initiator equipment; detecting a first input for the invitation information; based on the first input being acceptance invitation, sending acceptance information to the initiator device, and establishing video connection with the initiator device;

a video recording subunit for: after video connection is established, receiving recording start information and recording participant videos, wherein the recording start information is generated by the initiator device;

A video interaction subunit to: sending the participant video to the initiator device to cause the initiator device to compose a co-shot video based on the participant video and the initiator video.

In a sixth aspect, an embodiment of the present disclosure further provides a client device, where the client device includes:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a first co-shooting video, and the first co-shooting video is synthesized based on an initiator video recorded by initiator equipment and a participant video recorded by at least one participant equipment; the initiator device establishes video connection with the at least one participant device, and the at least one participant device records the participant video after receiving the recording start information sent by the initiator device;

a dividing unit configured to synchronously divide a plurality of videos included in the first snap video;

a selection unit configured to individually select each video included in the first snap-shot video;

a generating unit configured to generate a second snap video based on a result of the synchronized segmentation and the result of the individual selection.

In a seventh aspect, an embodiment of the present disclosure further provides an electronic device, including: a processor and a memory; the processor is adapted to perform the steps of the method according to the first aspect by calling a program or instructions stored by the memory.

In an eighth aspect, an embodiment of the present disclosure further provides an electronic device, including: a processor and a memory; the processor is adapted to perform the steps of the method according to the second aspect by calling a program or instructions stored in the memory.

In a ninth aspect, an embodiment of the present disclosure further provides an electronic device, including: a processor and a memory; the processor is adapted to perform the steps of the method according to the third aspect by calling a program or instructions stored in the memory.

In a tenth aspect, the present disclosure also proposes a non-transitory computer-readable storage medium for storing a program or instructions for causing a computer to perform the steps of the method according to the first aspect.

In an eleventh aspect, the disclosed embodiments also propose a non-transitory computer-readable storage medium storing a program or instructions for causing a computer to perform the steps of the method according to the second aspect.

In a twelfth aspect, embodiments of the present disclosure also provide a non-transitory computer-readable storage medium for storing a program or instructions, the program or instructions causing a computer to perform the steps of the method according to the third aspect.

Accordingly, in at least one embodiment of the present disclosure, a remote video connection is established between a user (initiator) initiating video capture and a plurality of users (participants) participating in video capture, the initiator starts a video recording operation, accordingly, both the initiator device and the plurality of participant devices respond to the operation and record videos respectively, and then the initiator device or the participant devices or the server device can acquire a video of the initiator and a video of the plurality of participants to synthesize a captured video, thereby meeting the video capture requirements of the remote users and the requirements of communication among the users during the video capture. In addition, synchronous segmentation and independent selection can be carried out on the co-shooting video, the multi-video clip of the director type is realized, and the efficiency of the multi-video clip is improved.

Drawings

To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art according to the drawings.

FIG. 1 is a diagram of an exemplary application scenario provided by an embodiment of the present disclosure;

Fig. 2 is an exemplary block diagram of a client device provided by an embodiment of the present disclosure;

FIG. 3 is an exemplary block diagram of a publish works module provided by embodiments of the present disclosure;

fig. 4 is an exemplary block diagram of a cloud capture unit provided by an embodiment of the present disclosure;

fig. 5 is an exemplary block diagram of an electronic device provided by an embodiment of the present disclosure;

FIG. 6 is an exemplary interaction diagram of a multi-person connection provided by an embodiment of the disclosure;

FIG. 7 is an exemplary interaction diagram for video recording and compositing provided by embodiments of the present disclosure;

FIG. 8 is a graphical user interface diagram of a published work provided by an embodiment of the disclosure;

FIG. 9 is a schematic diagram of a multi-user connected graphical user interface provided by an embodiment of the present disclosure;

fig. 10 is a flowchart illustrating a video-taking method according to an embodiment of the disclosure;

fig. 11 is a schematic flow chart of another video-taking method provided by the embodiment of the present disclosure;

FIG. 12 is a flowchart illustrating a method for video editing according to an embodiment of the present disclosure;

FIG. 13 is a schematic diagram of a graphical user interface for downloading and editing a snap video provided by an embodiment of the present disclosure;

FIG. 14 is a schematic diagram of a one-level page of a video clip provided by an embodiment of the present disclosure;

FIG. 15 is a schematic diagram of a two-level page of a video clip provided by an embodiment of the present disclosure;

FIG. 16 is a schematic illustration of a three-level page of a video clip provided by an embodiment of the present disclosure;

fig. 17-19 are different state diagrams of a three level page provided by embodiments of the present disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure can be more clearly understood, the present disclosure will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the embodiments described are only a few embodiments of the present disclosure, and not all embodiments. The specific embodiments described herein are merely illustrative of the disclosure and are not intended to be limiting. All other embodiments derived by one of ordinary skill in the art from the described embodiments of the disclosure are intended to be within the scope of the disclosure.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

With the development of new media and the internet, live webcast, short videos and the like become a part of life, work and entertainment of people, the communication distance of people is shortened, remote communication can be realized by using the network, and offline convergence is not needed. In this context, the demand for cloud shooting (which can be understood as remotely co-shooting videos using a network) is becoming stronger.

However, the current way of taking a video in close shot mainly is to take a video in close shot after the video is gathered off line, and the requirement of cloud shooting cannot be met. In addition, some methods combine videos shot by different users into a co-shooting video, but the methods cannot meet the requirements of communication and interaction among the users. Therefore, it is highly desirable to provide a video-shooting scheme suitable for cloud shooting to satisfy the interaction requirement.

The embodiment of the disclosure provides a method, a device, electronic equipment and a storage medium for video co-shooting, which are suitable for cloud photo, wherein a remote video connection is established between a user (initiator) initiating the video co-shooting and a plurality of users (participants) participating in the video co-shooting, the initiator starts a video recording operation, correspondingly, the initiator equipment and the participants respond to the operation and record videos respectively, and then the initiator equipment or the participants or a server side equipment can obtain a combined video of the initiator video and the participants, and synchronously divide and independently select the captured video, so that a guided multi-video clip is realized, the video co-shooting requirements of the remote users and the requirements of communication among the users in the co-shooting process are met, and the efficiency of the multi-video clip is improved.

The video co-shooting scheme provided by the embodiment of the disclosure can be applied to any scene capable of connecting with a network, such as creation of co-shooting short videos, video co-shooting in a network live broadcast process, remote online office/entertainment, multi-user online videos/conferences and the like. It is to be understood that the application scenarios of the embodiments of the present disclosure are only examples or embodiments of the present disclosure, and it is obvious for a person of ordinary skill in the art to apply the present disclosure to other similar scenarios without creative efforts.

Fig. 1 is an exemplary application scenario provided in an embodiment of the present disclosure. As shown in fig. 1, a plurality of client devices 11 are included in the scenario, and the plurality of client devices 11 are communicatively connected to a server device 12 through a network.

The client device 11 is a device used by a user, and can be connected to a network to provide various networking information, online video services and the like for the user. In some embodiments, the client device 11 may be any type of electronic device, such as a mobile terminal such as a smart phone, a tablet computer, a smart wearable device (smart watch, smart glasses, etc.), and a fixed terminal such as a desktop computer, a smart television, etc.

In some embodiments, the client device 11 is provided with an image sensor and an audio sensor. The image sensor is used for collecting image data, such as a front camera and a rear camera of a smart phone. The audio sensor is used to collect audio data, such as a Microphone (Microphone), but also other sensors capable of collecting audio data, such as audio of a user of the device.

The network may include, but is not limited to, a 4G network, a 5G network, a light broadband network, a WIFI, and other low-latency high-bandwidth communication networks to ensure the fluency of the video.

The server device 12 may be a server or a server group. The server group may be centralized or distributed. Whether a server or a server farm, in essence, is also an electronic device or a cluster of electronic devices, but the server (server farm) has more intensive computing and management capabilities than the electronic device to which the client device is directed. In some embodiments, since the server device 12 has a more powerful computing capability, the server device 12 may assume partial functions of video taking, such as synthesizing one video from a plurality of videos, editing the video, and other functions that can be implemented by the client device 11 (except for functions that must be implemented by the client device 11, such as recording the video), and the like, the client device 11 only needs to send a corresponding function request message to the server device 12 when wanting to use this part of functions, and the server device 12 returns the result to the client device 11 after completing the corresponding function. In some embodiments, the server device 12 may serve as a bridge for information interaction between different client devices 11, that is, different client devices 11 do not communicate directly with each other, but the server device 12 serves as a transfer station, and one client device 11 first sends information to the server device 12, and then the server device 12 sends the information to another client device 11.

In the scenario shown in fig. 1, in order to implement the video-shooting scheme of the embodiment of the present disclosure, the following adjustments need to be made to the client device 11 and the server device 12:

first, the client device 11 and the server device 12 need to be able to install software and have a hardware configuration supporting software operation. Secondly, the client device 11 needs to install video-snapshot client software, and it is understood that the electronic device that installs the video-snapshot client software is called a client device. The server device 12 needs to install video snapshot server software, and it is understood that the electronic device that installs the video snapshot server software is called a server device.

Video co-shooting client software and video co-shooting server software are software bases for implementing the video co-shooting scheme of the embodiment of the disclosure. The electronic device and the network are the hardware basis for implementing the video-taking scheme of the embodiments of the present disclosure.

Based on the above adjustments made to client device 11 and server device 12, the video-taking scheme of the embodiments of the present disclosure may be implemented.

Any one of the plurality of client devices 11 may initiate a video auction, and the other client devices 11 may participate in the video auction. The client device 11 initiating the video auction may be considered an initiator device and the client devices 12 participating in the video auction may be considered participant devices. Any one of the client devices 11 may initiate video close-up, and may also participate in video close-up, and when initiating video close-up, the client device 11 is an initiator device; the client device 11 is a participant device when participating in a video auction.

The client device 11 initiating the video snapshot establishes a video connection with the participating video snapshot client device 11 through a network, similar to a video conference, while the server device 12 provides a video connection service and a video snapshot service.

After the video connection is established, the user may start a video recording operation of the initiator device (for example, click a video recording control), and then the client device 11 initiating the video snapshot sends a video recording instruction, and the server device 12 sends the video recording instruction to each client device 11 participating in the video snapshot. Therefore, the client device 11 initiating the video snapshot and the client device 11 participating in the video snapshot record videos respectively, and because the client devices are in a video connection state, different users can communicate and interact with each other in the process of recording the videos, and the videos are recorded. In some embodiments, the client device 11 retains audio from the device user captured by the audio sensor (microphone) while recording the video, does not record audio from the speaker, and eliminates audio from the device, such as playing audio from the video, audio during a video connection (e.g., a video conference), and/or audio live during a live broadcast.

In some embodiments, a video shot by an initiator device and a video shot by a participant device may be uploaded to a server device 12, the server device 12 synthesizes a co-shot video, the video pictures uploaded by each device are displayed in the co-shot video through a split mirror, and audio corresponding to each video is synthesized into one audio track through operations such as audio mixing and echo cancellation, that is, multiple videos share one audio track, so as to solve the problem of asynchronization between audio and pictures. In some embodiments, both the initiator device and the participant device may download videos through the server device 12, and then synthesize a close-shot video, so as to meet the cloud-shot requirement and the requirement for communication between users in the close-shot process.

Fig. 2 is an exemplary block diagram of a client device 20 according to an embodiment of the present disclosure. In some embodiments, client device 20 may be implemented as client device 11 or as part of client device 11 in fig. 1 for initiating or participating in a video auction.

As shown in FIG. 2, the client device 20 may be divided into a plurality of modules, which may include, for example: a GUI (Graphical user interface) generation module 21, a work distribution module 22, and other modules that can be used for video capture.

The GUI generation module 21 is used to generate a graphical user interface related to video taking. In some embodiments, when the user opens the video-on-demand client software installed in the client device 20, the GUI generation module 21 may generate a main interface, where the main interface includes information (e.g., short videos, vlog, etc.), a release work control, and other function controls. In some embodiments, upon a user triggering the release work control, GUI generation module 21 pops out the release work sub-interface in the main interface. In some embodiments, if the control for publishing the work is located in the sub-interface, after the user triggers the control for publishing the work, the GUI generation module 21 pops up the sub-interface for publishing the work in the sub-interface. In some embodiments, the work publishing sub-interface includes a video taking control and controls of other publishing modes.

The release work module 22 is used to release video works. In some embodiments, the release work module 22 provides multiple video modes of release, including at least video taking. In some embodiments, the video distribution provided by the work distribution module 22 may also include direct capture and editing of video.

In some embodiments, fig. 8 shows a schematic diagram of a graphical user interface for publishing a work, in fig. 8, reference numeral 80 denotes a graphical user interface with a control for publishing a work, and when a user clicks the control for publishing a work, the GUI generation module 21 pops up a sub-interface 81 for publishing the work in the graphical user interface 80, and the sub-interface 81 for publishing the work fixedly displays the text "publishing work" and includes three controls: direct shot 811, clip video 812, and cloud shot 813. Wherein the cloud shot 813 can be understood as a video snapshot control. When the user clicks the cloud beat 813, the GUI generation module 21 pops up the cloud beat interface 82 in the image user interface 80, and the cloud beat interface 82 includes two controls: initiating 821 and participating 822. When the user clicks initiator 821, client device 20 is the initiator device; when the user clicks on participant 822, client device 20 is a participant device.

In some embodiments, the division of the modules in the client device 20 is only one logical function division, and there may be another division manner when the actual implementation is performed, for example, the GUI generation module 21 and the work release module 22 may be implemented as one module; the GUI generation module 21 or the work distribution module 22 may also be divided into a plurality of sub-modules. It will be appreciated that the various modules or sub-modules can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application.

Fig. 3 is an exemplary block diagram of a release work module 30 provided by an embodiment of the present disclosure. In some embodiments, the published work module 30 may be implemented as the published work module 22 or as part of the published work module 22 in FIG. 2.

As shown in FIG. 3, the release work module 30 may include, but is not limited to, the following elements: a direct shooting unit 31, a clip video unit 32, and a cloud shooting unit 33.

The direct shooting unit 31 is configured to detect that a direct shooting control is triggered, for example, a user clicks the direct shooting control, acquire image data acquired by an image sensor and audio data acquired by an audio sensor, and record a video based on the image data and the audio data. In some embodiments, the direct capture unit 31 records video based on the image data and the audio data in a manner common in the art.

The clip video unit 32 is configured to detect that a clip video control is triggered, for example, a user clicks the clip video control, and then present a graphical user interface for clipping a video, where the graphical user interface includes a plurality of functional controls for clipping a video, for example, an add special effect control, a render control, a size modification control, a video selection control, and the like. After clicking the video selection control, the user can select the stored video to clip, the selected video is displayed in the graphical user interface, and the user can click other function controls for clipping the video to clip the video.

The cloud shooting unit 33 is configured to detect that a cloud shooting control is triggered, for example, a user clicks the cloud shooting control, and then perform video shooting. In some embodiments, the cloud photographing unit 33 determines the current identity state, for example, after the cloud photographing unit 33 detects that the initiating control is triggered, it determines that the current identity state is the initiator; and after the participation control is detected to be triggered, determining that the current identity state is the participant. In some embodiments, the cloud shooting unit 33 completes video taking based on the current identity status. For example, the cloud shooting unit 33 completes video co-shooting by adopting a video co-shooting mode of the initiator based on the current identity state as the initiator; and finishing video co-shooting by adopting a video co-shooting mode of the participants based on the current identity state as the participants.

In some embodiments, the division of each unit in the release work module 30 is only one logical function division, and there may be another division manner in actual implementation, for example, at least two units of the direct shooting unit 31, the clip video unit 32, and the cloud shooting unit 33 may be implemented as one unit; the direct photographing unit 31, the clip video unit 32, or the cloud photographing unit 33 may also be divided into a plurality of sub-units. It will be understood that the various units or sub-units may be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application.

Fig. 4 is an exemplary block diagram of a cloud shooting unit 40 according to an embodiment of the present disclosure. In some embodiments, the cloud beat unit 40 may be implemented as the cloud beat unit 33 in fig. 3 or as part of the cloud beat unit 33.

As shown in fig. 4, the cloud shooting unit 40 may include, but is not limited to, the following units: a multi-person building sub-unit 41, a video recording sub-unit 42, a video interaction sub-unit 43 and a video composition sub-unit 44.

The client device 11 being an initiator device

The cloud shooting unit 40 is an initiator based on the current identity state, and completes video shooting by adopting a video shooting mode of the initiator, and accordingly, the functions of the sub-units in the cloud shooting unit 40 are described as follows:

the multiple people setup subunit 41 may receive a first input of a video auction. The first input of the video snapshot indicates that the user initiates the video snapshot, for example, the user clicks the cloud snapshot control and clicks the initiation control, taking fig. 8 as an example, after clicking the cloud snapshot 813, the user clicks the initiation 821, and these two click operations are the first input.

In some embodiments, the multiple participant connection subunit 41 may establish a video connection with at least one participant device in response to the first input. In some embodiments, the multiple person connection subunit 41 generates, in response to the first input, a graphical user interface for establishing a video connection, where the graphical user interface includes a plurality of areas for displaying a video picture of the initiator device or a video picture of the participant device. The video picture of the initiator device is a dynamic image formed by image data collected by an image sensor of the initiator device; the video frame of the participant device is a dynamic image formed by image data collected by an image sensor of the participant device.

For example, fig. 9 shows a schematic diagram of a multi-user established gui, and in fig. 9, the gui 90 includes a plurality of areas: area 91, area 92, area 93 and area 94, wherein area 91 shows a video picture of the originator 911, area 92 shows a video picture of the participant 922 who has established a connection, and

areas

93 and 94 are empty. Also included in the graphical user interface 90 is an invitation control that the initiator clicks to invite other participants to join the video coapt. In some embodiments, the invitation control may be set separately, such as the invitation buddy 95 in FIG. 9. In some embodiments, an invitation control may be provided in each of the regions, such as the plus signs in region 93 and region 94. In some embodiments, the plus sign and inviting buddy 95 may be present at the same time.

In some embodiments, the multiple people setup sub-unit 41 may receive a third input inviting the participant. The third input of the invitation participant represents that the initiator invites other participants to join in the video co-shooting, for example, the user clicks the invitation control and selects one participant to invite, taking fig. 9 as an example, the user clicks a plus sign or clicks the invitation friend 95 and selects (clicks) one friend from the friend list to invite, these two clicking operations are the third input, and this invitation mode can be understood as private letter sharing in the client (in the video co-shooting client software). In some embodiments, a buddy list (or watch list) is invoked during on-premises privacy sharing; the online state of the friend is displayed in the friend list, if the friend is online and an invitation is sent, a popup window is popped up in the terminal to remind the friend to join in the cloud beat; if the friend is not on line, the friend is informed in a private letter form; the user (initiator) can cancel the invitation friend; when the friend participates in the cloud shooting, the private trust state is the participated state; when the friend rejects the cloud shooting, the private trust state is rejected; the popup window and the strong reminder only last for a preset time, for example, 10 seconds, and if no response exists within 10 seconds, the private message state is changed into a cancelled state.

In some embodiments, the invitation manner to invite other participants to join the video auction may also be other manners, such as wechat sharing, duplicate password linking, and the like. The WeChat sharing and the duplicate password link are essentially password sharing means. For example, after a WeChat user (an invitee) copies a file (i.e., a link, text and/or a picture for inviting to join in a cloud shot) on the WeChat, the video close-shot client software is opened, the client scans whether a password exists, if so, the password is presented in a popup mode, and the WeChat user can join in the cloud shot by clicking.

In some embodiments, the multiple person setup sub-unit 41 sends invitation information to the corresponding participant devices in response to the third input. In some embodiments, the invitation information includes identification information of the video capture, where the identification information is used to represent a unique identifier of the video capture, and the identification information at each video capture is different, for example, "32 BF" shown in fig. 9 is identification information, and may also be understood as a room number or a live room number of the capture. In some embodiments, the identification information is information generated in response to the first input, or information generated concurrently with the generation of the graphical user interface 90. In some embodiments, the acceptance information is fed back by the participant device after the participant confirms to join the video snap. The multiple person linkage sub-unit 41 displays a video picture of the participant apparatus in a blank (free) area in response to the acceptance information. For example, in fig. 9, a video frame of the newly joined participant would be displayed in area 93. In some embodiments, the video locations of the participants coincide with the order of joining the video footage, each participant joining the video footage corresponding to a successful live room connection, e.g., in fig. 9, the first joined participant would be shown in area 92, the second joined participant would be shown in area 93, and the third joined participant would be shown in area 94, and when the live room is full, the invitee clicks on the join cloud footage, and a prompt message such as "late come" or "full" would be displayed.

The video recording subunit 42 is configured to receive a second input of the recorded video after the video connection is established. The second input of the recorded video indicates that the user triggered the video recording operation, for example, the user clicks a video recording control, taking fig. 9 as an example, the user clicks a recorded video 96, and this click operation is the second input. In some embodiments, if the initiator clicks to start recording the video 96, no new participants are added to the session, and no new participants are added to the session until the session is over. In some embodiments, the video recording subunit 42 records the initiator video and sends a recording start message to the at least one participant device in response to the second input, such that the at least one participant device records the participant video after receiving the recording start message. In some embodiments, the video recording subunit 42 sends recording start information to a server device (e.g., the server device 12 in fig. 1) to enable the server device to send the recording start information to the at least one participant device. In some embodiments, when the initiator exits the video close shot, the close shot is interrupted, the wheat connection is finished, and if the initiator is in a recording state, the recording is finished immediately and a prompt is given; if only the participant exits, the time of the taking is not interrupted, and the process is continued as long as the number of the participants in the live broadcast room is more than 1. In some embodiments, the initiator may close the microphone on the other participants, but may not help the other participants to open the microphone, and the participants may open the microphone themselves.

In some embodiments, the video recording subunit 42, in order to ensure that the initiator video and the participant video begin recording at the same time, may perform a countdown after sending a recording start message to the participant device in response to the second input, which may be displayed in the gui 90, and record the initiator video after the countdown ends. Accordingly, the participant device counts down after receiving the recording start information, and records the participant video after the counting down is finished, that is, the recording of the participant video is automatically executed by the participant device without any operation by a user of the participant device. It should be noted that the initiator device and the participant device may synchronize for countdown, and the synchronization may be performed by using a well-known technique.

In some embodiments, the video recording subunit 42 may acquire image data captured by an image sensor and audio data captured by an audio sensor; and recording the video of the initiator based on the image data and the audio data. The video recording subunit 42, the image sensor and the audio sensor belong to the same client device, that is, the video recording subunit 42 only records the audio data of the image data collected by the image sensor and the audio sensor of the device, and the recording definition can reach 720P or 1080P or higher. Because of the limited network speed, the video frames of the participants transmitted through the network (e.g., the video frame of the participant 922 in fig. 9) are compressed to 360P or 480P, and the video recording sub-unit 42 does not record the video in the screen of the device, i.e., does not record the video frames of other participants (the participant 922) displayed in the gui 90, so as to avoid the problem of unclear recorded video. In some embodiments, the video recording subunit 42 records the originator video with the same size as the area in fig. 9, for example, the size of the area in fig. 9 is 3:4, and the size of the originator video is also 3: 4.

The video interaction subunit 43 is configured to obtain a participant video. In some embodiments, the video interaction subunit 43 obtains the participant video from the server device, and the participant video is recorded by the participant device and uploaded to the server device. For example, the video interaction subunit 43 sends a request message for acquiring a video to the server device, and the server device receives the request message and then issues the participant video, which belongs to an active acquisition mode. In some embodiments, the video interaction subunit 43 may receive the participant video sent by the server device, where this mode belongs to a passive acquisition mode, and the server device issues the participant video to the initiator device as long as the participant video is acquired by the server device. In some embodiments, the video interaction subunit 43 may acquire the participant video in a manner of combining an active acquisition manner and a passive acquisition manner. In some embodiments, the video interaction subunit 43 may further send the initiator video to the server device after the video recording subunit 42 records the initiator video, so that other participant devices may download the initiator video from the server device and perform video composition together with the participant video.

The video composition subunit 44 is configured to compose a co-shot video based on the participant video and the initiator video. Wherein, the video of the participant can be a plurality. In some embodiments, the dynamic pictures in the snap video include a dynamic picture of the initiator video and a dynamic picture of at least one participant video, and the dynamic pictures in the snap video are co-soundtrack. In some embodiments, the video composition subunit 44 may use a mixing operation and an echo cancellation operation to complete a common audio track, solving the problem of picture-and-sound asynchrony. Among them, the mixing operation and the echo canceling operation may use well-known techniques.

The client device 11 is a participant device

The cloud shooting unit 40 is a participant based on the current identity state, completes video shooting in a video shooting mode of the participant, and accordingly, the functions of the sub-units in the cloud shooting unit 40 are described as follows:

the multiple people setup sub-unit 41 may receive invitation information for a video auction, the invitation information being generated by the initiator device. In some embodiments, the invitation information is sent by the initiator device, or sent by the first participant device, that is, any participant device may forward the invitation information to other participant devices after receiving the invitation information. In some embodiments, any participant can share the link of the video shot, for example, sharing by private trust in the terminal, sharing by WeChat, copying password link, and the like. In some embodiments, the multiple participant setup sub-unit 41 may send the invitation information to the second participant device to invite the second participant device to join the video auction. In some embodiments, the invitation information may include identification information of the video taken. The user can actively input the identification information and actively join the video corresponding to the identification information for close shooting.

In some embodiments, the multiple person setup sub-unit 41 detects a first input for the invitation information. The first input indicates that the user accepted the invitation. For example, the user clicks on the accept invitation control, and this clicking operation is the first input. In some embodiments, the multiple person connection subunit 41 sends an acceptance message to the initiator device to establish a video connection with the initiator device based on the first input being an acceptance invitation.

In some embodiments, the multiple person setup sub-unit 41 receives a second input for identification information. The second input indicates that the user actively inputs the identification information, for example, in fig. 8, after the user clicks the cloud beat 813, the user actively inputs the identification information in the popped-up cloud beat interface 82 (an input box is not shown in the cloud beat interface 82, and an input box may be provided in the actual implementation) and clicks the participation 822. In some embodiments, the multi-user connection establishing sub-unit 41 compares the second input with the identification information in the request information, and if the comparison is consistent, sends an acceptance message to the initiator device, and establishes a video connection with the initiator device.

In some embodiments, after sending the acceptance information to the initiator device, the multi-person establishing subunit 41 generates a graphical user interface for establishing a video connection, where the graphical user interface includes a plurality of areas, and the areas are used to display a video picture of the initiator device or a video picture of the participant device. The video picture of the initiator device is a dynamic image formed by image data collected by an image sensor of the initiator device; the video frame of the participant device is a dynamic image formed by image data collected by an image sensor of the participant device. In some embodiments, the graphical user interface generated by the multi-person establishing sub-unit 41 for establishing a video connection is, for example, as shown in fig. 9, it should be noted that in this embodiment, a participant device is targeted, and therefore, the recorded video 96 in fig. 9 does not exist, or the recorded video 96 may be replaced by a quit-from-auction control, so that the user quits from the auction after clicking the control. In some embodiments, the multiple person setup sub-unit 41 displays the video picture of the present participant device in a blank (free) area, for example, in the area 93.

The video recording subunit 42 receives recording start information, which is generated by the initiator device, after establishing a video connection. In some embodiments, the recording start information is sent to a server device by the initiator device, and is issued by the server device. In some embodiments, the video recording subunit 42 records the participant video after receiving the recording start information. In some embodiments, in order to ensure that the initiator video and the participant video start to be recorded simultaneously, the video recording subunit 42 performs a countdown after receiving the recording start information, and records the participant video after the countdown is finished, that is, the recording of the participant video is automatically performed by the video recording subunit 42 without any operation performed by the user of the participant equipment. It should be noted that the initiator device and the participant device may synchronize for countdown, and the synchronization may be performed by using a well-known technique.

In some embodiments, the video recording subunit 42 may acquire image data captured by an image sensor and audio data captured by an audio sensor; and recording the video of the participant based on the image data and the audio data. The video recording subunit 42, the image sensor and the audio sensor belong to the same client device, that is, the video recording subunit 42 only records the audio data of the image data collected by the image sensor and the audio sensor of the device, and the recording definition can reach 720P or 1080P or higher. Because the current network speed is limited, the video frames of the initiator or other participants transmitted through the network are compressed to 360P or 480P, and the definition is low, the video recording subunit 42 does not record the video in the screen of the device, that is, the video frames of the initiator (911) and other participants (participants 922) displayed in the graphical user interface 90 are not recorded, so that the problem of unclear recorded video is avoided.

The video interaction subunit 43 is configured to send the generated participant video to the initiator device, so that the initiator device synthesizes a co-shot video based on the participant video and the initiator video. In some embodiments, the video interaction subunit 43 sends the generated participant video to the server device, so that the server device sends the participant video to the initiator device or other participant devices. In some embodiments, the video interaction subunit 43 may obtain the originator video. In some embodiments, the video interaction subunit 43 may obtain videos generated by other participant devices. It will be appreciated that the manner in which the video interaction subunit 43 obtains the initiator video or the video of the other participant devices may be an active obtaining manner, a passive obtaining manner, or a combination thereof.

The video composition subunit 44 may compose a co-shot video based on the acquired initiator video and the generated participant video. In some embodiments, the video composition subunit 44 may compose a snap video based on the captured initiator video and all participant videos. In some embodiments, the dynamic pictures in the snap video generated by the video composition subunit 44 include a dynamic picture of the initiator video and a dynamic picture of at least one participant video, and the dynamic pictures in the snap video are co-soundtrack. In some embodiments, the video composition subunit 44 may use a mixing operation and an echo cancellation operation to complete a common audio track, solving the problem of picture-and-sound asynchrony. Among them, the mixing operation and the echo canceling operation may use well-known techniques.

In some embodiments, the division of each sub-unit in the cloud shooting unit 40 is only one logical function division, and there may be another division manner in actual implementation, for example, at least two of the multi-person connection sub-unit 41, the video recording sub-unit 42, the video interaction sub-unit 43, and the video composition sub-unit 44 may be implemented as one unit; the multi-person building subunit 41, the video recording subunit 42, the video interaction subunit 43, or the video composition subunit 44 may also be divided into a plurality of subunits. It will be understood that the various sub-units can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application.

Client device 11 editing live video

In some embodiments, the client device 11 shown in FIG. 1 is capable of clipping videos, such as clipping snap videos. The client device 11 may include an acquisition module, a segmentation module, a selection module, and a generation module.

The acquisition module is used for acquiring a first co-shooting video. For example, a first snap video is acquired from server device 12. The first snap video can be understood as a snap video to be clipped. In some embodiments, the first snap video is synthesized based on an initiator video recorded by an initiator device and a participant video recorded by at least one participant device, and a plurality of videos share a soundtrack; the initiator device establishes video connection with the at least one participant device, and the at least one participant device records the participant video after receiving the recording start information sent by the initiator device. In some embodiments, reference may be made to the foregoing embodiments for a synthesizing manner of the first snap video, which is not described herein again. In some embodiments, the client device 11 may provide a control for downloading the snap video to the user, as shown in fig. 13, a control for displaying the download snap video 131 on a page (e.g., a Vlog small helper page) presented by the client device 11, and the obtaining module responds to an input of the download snap video 131, for example, the user clicks the download snap video 131, and the obtaining module obtains the first snap video from the server device 12.

The segmentation module is used for synchronously segmenting a plurality of videos included in the first snap-shot video. For example, the first snap-shot video comprises 4 videos, and the segmentation module can segment the videos at the same time without segmenting each video, so that the efficiency of the segmentation operation is improved. In some embodiments, the client device 11 may provide a control of the video clip to the user, as shown in fig. 13, the clip snap video 132 control is displayed on a page presented by the client device 11 (e.g., the Vlog small helper page), and the splitting module responds to an input of the clip snap video 132, e.g., the user clicks on the clip snap video 132, and the client device 11 presents a first level page of the video clip. The primary page, as shown in fig. 14, includes a preview area 141 and an operation area 142. The preview area 141 is used to display the picture composition of the first snap video, and the first snap video in fig. 14 includes 4 videos, so there are 4 video pictures. The operation area 142 is used to display a thumbnail 143 of the first-time video and a one-level taskbar 144.

In some embodiments, a video control is included in the primary taskbar 144. The video control is used to trigger a clip of the first co-capture video. In some embodiments, audio controls, text controls, picture-in-picture controls, and the like for editing video may also be included in the primary taskbar. When the user clicks the audio control, audio, such as music, special effect sound, and the like, can be added to the first snap video; when a user clicks the text control, text, such as subtitles, can be added to the first snap video; when a user clicks the picture-in-picture control, visual information such as special-effect animation and pictures can be added to the first co-shooting video.

In some embodiments, the segmentation module presents the secondary page in response to input from a video control in the primary taskbar 144, e.g., a user clicks on the video control. In some embodiments, the split module presents the secondary page in response to a selection operation of the first snap video thumbnail, e.g., a user clicking on the first snap video thumbnail. The secondary page, as shown in fig. 15, includes a preview area 151 and an operation area 152. The preview area 151 is used to display a screen composition of the first co-shot video. The operation area 152 is used to display a thumbnail 153 of the first snap video and a secondary taskbar 154. In some embodiments, the operating region 152 may also display a thumbnail 155 of the normal video, i.e., a video of only one shot, the normal video and the first co-shot video belonging to a peer entity. When the user clicks on the thumbnail 155 of the normal video, the normal video can be edited.

In some embodiments, a segmentation control is included in the secondary taskbar 154. The segmentation module responds to the input of the segmentation control, for example, a user clicks the segmentation control, and the segmentation module synchronously segments a plurality of videos included in the first taken video, for example, a white vertical line in fig. 15 is a segmentation line, and a visible segmentation line is a segment for synchronously segmenting the first taken video instead of individually segmenting each video in the taken videos. In some embodiments, a selection control (e.g., a cloud beat selection control), a volume control, and a delete control are also included in the secondary taskbar 154. Clicking the cloud capture selection control by the user can individually select each video included in the first co-capture video. In some embodiments, for multiple segments of the first co-shot video, the user may click on a thumbnail of any one segment (which is equivalent to selecting the segment), and then click on the cloud shot selection control, which may make a separate selection of each video segment in the segment. The user may set the volume level by clicking on the volume control. Clicking the delete control by the user may delete videos that do not want to be edited, such as the first-time video or the ordinary video.

The selection module is used for independently selecting each video included in the first co-shooting video. In some embodiments, the selection module responds to input from a cloud beat selection control in the secondary taskbar 154, e.g., the user clicks on the cloud beat selection control, and the selection module presents a tertiary page. The three-level page is shown in fig. 16, and includes a preview area 161 and an operation area 162. The preview area 161 is used to display the screen composition of the first co-shooting video. The operation region 162 is used to display a thumbnail of each video included in the first snap video and set the thumbnail of each video to a selectable state. The selection module responds to the selection operation of the thumbnail, for example, a user clicks the thumbnail of any video, and the selection module sets the video corresponding to the selection operation to be in a selected state.

For example, in fig. 16, the operation region 162 displays thumbnails of 4 video clips in parallel, the 4 video clips constituting divided clips of the first taken video, and the 4 video clips are all in a selected state. In fig. 16, a segment has a slider control thereon, which is a slidable control generated on the segment by the segmentation module in response to a selection operation on the segment (e.g., a user clicking on a thumbnail of the segment). The selection module responds to the input of the slider control, for example, the user drags the slider control, and the selection module adjusts the duration of the segment.

Fig. 17 to 19 are different state diagrams of three-level pages, and in fig. 17, the user has shortened the duration of the segment by dragging the slider control, and has selected 3 videos. In fig. 18, the user has selected 2 videos. In fig. 19, the user has selected only 1 video.

And the generating module is used for generating a second co-shooting video based on the result of synchronous segmentation of the segmentation module and the result of independent selection of the selection module. For example, in fig. 17, the user clicks on the publish control in the preview area 171, and may publish the second co-capture video.

In some embodiments, the generation module reconstructs a picture displayed in a preview area of the three-level page in response to the selection operation of the thumbnail. In some embodiments, the generation module displays the selected video pictures in the preview area of the third-level page, the arrangement sequence of the video pictures is the same as the selection sequence, and the thumbnail in the clicking operation area can realize the guide type switching of the view format of the preview area and control the split-mirror pictures during the play of the co-shooting video. For example, in fig. 17, the user selects 3 videos, the generation module displays 3 frames corresponding to the 3 videos selected by the user in the preview area 171, and the display order is the same as the selected order. For example, in fig. 18, the user selects 2 videos, the number of the frames displayed by the generation module in the preview area 181 is 2, the 2 videos correspond to the 2 videos selected by the user, and the display order is the same as the selected order. For example, in fig. 19, the user selects 1 video, and the generation module displays 1 screen corresponding to the video selected by the user in the preview region 191. In some embodiments, the generation module adjusts the position and size of the displayed picture in the preview area of the three-level page according to the number of videos selected by the user. For example, in fig. 17, the user selects 3 videos, and the generation module displays the 3 rd selected video frame in the center. For example, in fig. 18, the user has selected 2 videos and the generation module adjusts the picture position of these 2 videos to four-mirror height-adapted vertical centering. For example, in fig. 19, the user selects 1 video, and the generation module fills the preview area of the page of three levels with the pictures of the 1 video.

In some embodiments, the operating region of the tertiary page is further configured to display a tertiary taskbar comprising: a cropping screen control and a rotating screen control. For example, the operation regions 162 to 192 in fig. 16 to 19 respectively display the third-level taskbar 163 to third-level taskbar 193, and each of the third-level taskbar 163 to third-level taskbar 193 includes a crop screen control and a spin screen control. In some embodiments, the generating module performs a crop operation (crop) on the corresponding video frame in the preview area of the third-level page in response to an input of the crop frame control, for example, a user clicks the crop frame control and crops the video frame in the preview area. In some embodiments, the generating module responds to an input of a spin picture control, for example, a user clicks the spin picture control and rotates a video picture in the preview area, and the generating module rotates a corresponding video picture in the preview area of the third-level page.

In the embodiment, the segmentation and selection of multiple videos in the co-shooting video are split into two steps, namely, the secondary page is video synchronous segmentation, and the tertiary page is video independent selection. In some embodiments, the video may be segmented prior to selecting the video, or the video may be selected prior to segmenting the video. The secondary page has a control in it to return to the primary page, such as the left-most control in the secondary taskbar 154 in FIG. 15. The tertiary page has a control therein to return to the secondary page, such as the left-most control in the tertiary taskbar 163 in FIG. 16.

In some embodiments, the division of each module in the client device 11 is only one logical function division, and there may be another division manner in actual implementation, for example, at least two of the obtaining module, the dividing module, the selecting module, and the generating module may be implemented as one module; the acquisition module, the segmentation module, the selection module or the generation module may also be divided into a plurality of units. It will be understood that the various modules or units can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application.

Fig. 5 is a schematic structural diagram of an electronic device 50 according to an embodiment of the present disclosure. In some embodiments, the electronic device 50 may be implemented as the client device 11 of fig. 1. In some embodiments, if the user initiates the video capture through the electronic device 50, the electronic device 50 is the initiator device. In some embodiments, if the user participates in the video taking through the electronic device 50, the electronic device 50 is a participant device.

As shown in fig. 5, the electronic apparatus includes: at least one image sensor 51, at least one audio sensor 52, at least one processor 53, at least one memory 54 and at least one communication interface 55. The various components in the electronic device are coupled together by a bus system 56. And a communication interface 55 for information transmission with an external device. Understandably, the bus system 56 is used to enable connective communication between these components. The bus system 56 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 56 in fig. 5.

In some embodiments, the image sensor 51 is used to capture image data, such as a Camera (Camera). The audio sensor 52 is used to capture audio data, such as a Microphone (Microphone), but may also be other sensors capable of capturing audio data, such as audio of a user of the device.

It will be appreciated that the memory 54 in the present embodiment can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory.

In some embodiments, memory 54 stores elements, executable units or data structures, or a subset thereof, or an expanded set thereof as follows: an operating system and an application program.

The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application programs, including various application programs such as a Media Player (Media Player), a Browser (Browser), etc., are used to implement various application services. A program (which may be understood as video-close-up client software) implementing the video-close-up method (or video-clipping method) provided by the embodiments of the present disclosure may be included in the application program.

In the embodiment of the present disclosure, the processor 53 is configured to execute the steps of the embodiments of the video co-shooting method (or the video clipping method) provided by the embodiment of the present disclosure by calling a program or an instruction stored in the memory 54, which may be, specifically, a program or an instruction stored in an application program.

The method for video taking in time (or the method for video clipping) provided by the embodiment of the disclosure can be applied to the processor 53, or implemented by the processor 53. The processor 53 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 53. The processor 53 may be a general-purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The steps of the video co-shooting method (or video clipping method) provided by the embodiments of the present disclosure may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software units in the decoding processor. The software elements may be located in ram, flash, rom, prom, or eprom, registers, among other storage media that are well known in the art. The storage medium is located in a memory 54, and the processor 53 reads the information in the memory 54 and performs the steps of the method in connection with its hardware.

Fig. 6 is an exemplary interaction diagram providing a multi-person connection according to an embodiment of the disclosure. In fig. 6, after the user clicks on the cloud beat 813 in fig. 8 and clicks on the initiator 821, the initiator device determines to perform the cloud beat, and then generates the connection interface (fig. 9).

The initiator device receives an invitation input 1, where the invitation input 1 is used to invite the participant device 1 to join the cloud beat, for example, the user clicks the plus sign of the area 93 in fig. 9 or clicks the invitation buddy 95, and selects the buddy to join the cloud beat. The video screen displayed in area 92 of fig. 9 indicates that a remote video connection has been established with the initiator device.

The initiator device sends invitation information 1 to the server device. The server device issues the invitation information 1 to the participant device 1. After the user of the participant apparatus 1 accepts the invitation, the participant apparatus 1 transmits acceptance information 1 to the server apparatus. The server device sends the acceptance information 1 to the initiator device. To this end, the initiator device establishes a remote video connection with the participant device 1. The initiator device displays a video screen of the participant device 1 in the connection interface, for example, in the area 93 of fig. 9.

The initiator device receives invitation input 2, which invitation input 2 is used to invite participant device 2 to join the cloud beat, e.g., the user clicks on the plus sign of area 94 in fig. 9 or clicks on the inviting buddy 95, selecting the buddy to join the cloud beat.

The initiator device sends invitation information 2 to the server device. The server device issues the invitation information 2 to the participant device 2. After the user of the participant device 2 accepts the invitation, the participant device 2 sends an acceptance message 2 to the server device. The server device sends the acceptance information 2 to the initiator device. To this end, the initiator device establishes a remote video connection with the participant device 2. The initiator device displays a video screen of the participant device 2 in the connectivity interface, for example, in area 94 of fig. 9.

Fig. 7 is an exemplary interaction diagram of video recording and composition provided by an embodiment of the disclosure. In fig. 7, the user of the participant device clicks on the recorded video 96 and the participant device determines to record the video.

The participant device sends a recording instruction (i.e., recording start information) to the server device. And the server equipment transmits the recording instruction to all participant equipment. The initiator device and all participant devices begin recording video at the same time.

After the initiator device and all the participant devices record the videos, the videos are uploaded to the server device respectively. The server device will send all the video down to the initiator device and so the participant devices. It should be noted that the server device does not issue the video uploaded by the same device to the same device again, for example, after the initiator device uploads the initiator video, the server device does not send the initiator video to the initiator device again.

Fig. 10 is an exemplary flowchart of a method for video taking a photo in accordance with an embodiment of the present disclosure. The main execution body of the method is the initiator device, and for convenience of description, the following embodiments describe the flow of the method for video taking with the initiator device as the main execution body.

In step 101, an initiator device receives a first input of a video snapshot. The first input of the video snapshot indicates that the user initiates the video snapshot, for example, the user clicks the cloud snapshot control and clicks the initiation control, taking fig. 8 as an example, after clicking the cloud snapshot 813, the user clicks the initiation 821, and these two click operations are the first input.

In step 102, the initiator device establishes a video connection with at least one participant device in response to the first input.

In some embodiments, the initiator device generates a graphical user interface for establishing a video connection in response to the first input; a third input may then be received inviting the participant; thereby sending invitation information to the corresponding participant device in response to the third input. And after the participant confirms to join the video and take a photo, the participant equipment feeds back the acceptance information. And responding to the acceptance information by the initiator equipment, and displaying a video picture of the participant equipment in a graphical user interface. To this end, the remote video connection between the initiator device and the invited participant device is completed.

In some embodiments, the graphical user interface includes a plurality of regions for displaying video of the initiator device or video of the participant device, the video of the newly joined participant being displayed in a blank (free) region. In some embodiments, an invitation control is also included in the graphical user interface, and the initiator invites other participants to join the video coapt by clicking on the invitation control. In some embodiments, the invitation control may be provided separately, or in each of the regions, or both. In some embodiments, the third input inviting the participant represents an invitation by the initiator to other participants to join the video auction. In some embodiments, the invitation information includes identification information of the video snapshot, where the identification information is used to represent a unique identifier of the video snapshot, and the identification information is different each time the video is taken.

In step 103, the initiator device receives a second input to record the video after establishing the video connection. And the second input of the recorded video indicates that the user triggers the video recording operation.

In step 104, the initiator device records the initiator video in response to the second input, and sends recording start information to the at least one participant device, so that the at least one participant device records the participant video after receiving the recording start information. In some embodiments, the initiator device sends recording start information to the server device, so that the server device issues the recording start information to the at least one participant device.

In some embodiments, in order to ensure that the initiator video and the participant video start recording at the same time, the initiator device counts down after sending a recording start message to the participant device in response to the second input, and records the initiator video after the count down ends. Accordingly, the participant device counts down after receiving the recording start information, and records the participant video after the counting down is finished, that is, the recording of the participant video is automatically executed by the participant device without any operation by a user of the participant device. It should be noted that the initiator device and the participant device may synchronize for countdown, and the synchronization may be performed by using a well-known technique.

In some embodiments, the initiator device may acquire image data collected by an image sensor and audio data collected by an audio sensor; and recording the video of the initiator based on the image data and the audio data, and not recording the video in the screen of the initiator equipment.

In step 105, the initiator device acquires the participant video and synthesizes a co-shot video based on the participant video and the initiator video. Wherein, the video of the participant can be a plurality. In some embodiments, the initiator device obtains a participant video from the server device, where the participant video is recorded by the participant device and uploaded to the server device, for example, the initiator device sends a request message for obtaining the video to the server device, and the server device receives the request message and then issues the participant video, which belongs to an active obtaining mode. In some embodiments, the initiator device may receive the participant video sent by the server device, where the method belongs to a passive acquisition method, and the server device issues the participant video to the initiator device as long as the participant video is acquired by the server device. In some embodiments, the initiator device may acquire the participant video in a combination of active and passive acquisition modes.

In some embodiments, the initiator device may also send the initiator video to the server device after recording the initiator video, so that other participant devices may download the initiator video from the server device and perform video composition with the participant video.

In some embodiments, the dynamic pictures in the snap video include a dynamic picture of the initiator video and a dynamic picture of at least one participant video, and the dynamic pictures in the snap video are co-soundtrack. In some embodiments, the initiator device may use a mixing operation and an echo cancellation operation to complete a common soundtrack, solving the problem of picture asynchrony. Among them, the mixing operation and the echo canceling operation may use well-known techniques.

In some embodiments, composition operations that the initiator device may perform include, but are not limited to, segmentation, selection, cropping (crop), and rotation. In some embodiments, the splitting is done simultaneously for the initiator video and all participant videos, and not separately for each video. The segmentation includes video segmentation and audio segmentation. In some embodiments, the initiator device adjusts the order of the initiator video and all participant videos according to the order in which the videos were selected by the user; and marking the video selected by the user as the selected state, and marking the unselected video as the unselected state. In some embodiments, the selecting is performed after the segmenting is completed. In some embodiments, after detecting that a user clicks a certain video to play, the initiator device triggers a crop operation and a rotation operation, so that the user can adjust or rotate the video frame capture.

Fig. 11 is an exemplary flowchart of a method for video taking a photo in accordance with an embodiment of the present disclosure. The execution subject of the method is a participant device, and for convenience of description, the following embodiment describes a flow of the method for video taking with the participant device as the execution subject.

In step 111, the participant device receives invitation information for a video snapshot, the invitation information being generated by the initiator device. In some embodiments, the invitation information is sent by the initiator device, or sent by the first participant device, that is, any participant device may forward the invitation information to other participant devices after receiving the invitation information. In some embodiments, the participant device may send the invitation information to the second participant device to invite the second participant device to join the video auction. In some embodiments, the invitation information may include identification information of the video taken. The user can actively input the identification information and actively join the video corresponding to the identification information for close shooting.

In step 112, the participant device detects a first input for the invitation information. The first input indicates that the user accepted the invitation. For example, the user clicks on the accept invitation control, and this clicking operation is the first input. In some embodiments, the participant device receives a second input for the identification information. The second input indicates that the user actively inputs the identification information.

In step 113, the participant device sends acceptance information to the initiator device based on the first input as an acceptance invitation, and establishes a video connection with the initiator device. In some embodiments, the participant device compares the second input with the identification information in the request information, and if the comparison is consistent, sends an acceptance message to the initiator device, and establishes a video connection with the initiator device.

In some embodiments, after sending the acceptance information to the initiator device, the participant device generates a graphical user interface for establishing a video connection, and then displays a video screen of the device in the graphical user interface.

In some embodiments, the graphical user interface includes a plurality of regions for displaying a video frame of the initiator device or a video frame of the participant device. The video picture of the initiator device is a dynamic image formed by image data collected by an image sensor of the initiator device; the video frame of the participant device is a dynamic image formed by image data collected by an image sensor of the participant device. In some embodiments, the video pictures of the present device are displayed in a blank (free) area.

In step 114, after establishing the video connection, the participant device receives recording start information generated by the initiator device and records a participant video. In some embodiments, the recording start information is sent to a server device by the initiator device, and is issued by the server device. In some embodiments, in order to ensure that the initiator video and the participant video start to be recorded simultaneously, the participant device performs countdown after receiving the recording start information, and records the participant video after the countdown is finished, that is, the recording of the participant video is performed automatically by the participant device without any operation by the user. It should be noted that the initiator device and the participant device may synchronize for countdown, and the synchronization may be performed by using a well-known technique.

In some embodiments, a participant device acquires image data collected by an image sensor and audio data collected by an audio sensor; and recording the video of the participant based on the image data and the audio data, and not recording the video in the screen of the participant equipment.

In step 115, the participant device sends the participant video to the initiator device to cause the initiator device to compose a co-shot video based on the participant video and the initiator video. In some embodiments, the participant device sends the generated participant video to the server device to cause the server device to send the participant video to the initiator device or other participant devices.

In some embodiments, the participant device may obtain the initiator video. In some embodiments, participant devices may obtain videos generated by other participant devices. It will be appreciated that the manner in which the participant device obtains the initiator video or the video of the other participant devices may be an active acquisition manner, a passive acquisition manner, or a combination of both.

In some embodiments, the participant device may compose a co-shot video based on the acquired initiator video and the generated participant video. In some embodiments, the participant devices may compose a co-shot video based on the captured initiator video and all participant videos.

In some embodiments, the dynamic pictures in the snap video generated by the participant device include a dynamic picture of the initiator video and a dynamic picture of at least one participant video, and the dynamic pictures in the snap video are co-soundtrack. In some embodiments, the participant devices may use a mixing operation and an echo cancellation operation to complete a common soundtrack, solving the problem of picture asynchrony. Among them, the mixing operation and the echo canceling operation may use well-known techniques.

Fig. 12 is an exemplary flowchart of a method for video clipping provided by an embodiment of the present disclosure. The execution subject of the method is the client device, and for convenience of description, the flow of the method for video clipping is described with the client device as the execution subject in the following embodiments.

In step 121, the client device obtains a first co-capture video. For example, a first snap video is obtained from a server device. The first snap video can be understood as a snap video to be clipped. The first co-shooting video is synthesized based on an initiator video recorded by initiator equipment and a participant video recorded by at least one participant equipment, and a plurality of videos share an audio track; the initiator device establishes video connection with the at least one participant device, and the at least one participant device records the participant video after receiving the recording start information sent by the initiator device. In some embodiments, reference may be made to the foregoing embodiments for a synthesizing manner of the first snap video, which is not described herein again. In some embodiments, the client device may provide a control for downloading the snap video to the user, as shown in fig. 13, a control for downloading the snap video 131 is displayed on a page (e.g., a Vlog small helper page) presented by the client device, and the first snap video is obtained from the server device in response to an input of the downloading snap video 131, for example, the user clicks the downloading snap video 131.

In step 122, the client device performs synchronized segmentation of a plurality of videos included in the first snap video. For example, the first snap video includes 4 videos, and the client device may segment the videos at the same time without segmenting each video separately, thereby improving the efficiency of the segmentation operation. In some embodiments, the client device may provide a control for the video clip to the user, as shown in fig. 13, the clip snap video 132 control is displayed on a page presented by the client device (e.g., the Vlog small helper page), the client device responds to an input to the clip snap video 132, e.g., the user clicks on the clip snap video 132, and the client device presents a first level page of the video clip. The primary page, as shown in fig. 14, includes a preview area 141 and an operation area 142. The preview area 141 is used to display the picture composition of the first snap video, and the first snap video in fig. 14 includes 4 videos, so there are 4 video pictures. The operation area 142 is used to display a thumbnail 143 of the first-time video and a one-level taskbar 144.

In some embodiments, the client device presents the secondary page in response to input from a video control in the primary taskbar 144, e.g., the user clicks on the video control. In some embodiments, the client device presents the secondary page in response to a selection operation of the first snap video thumbnail, e.g., a user clicking on the first snap video thumbnail. The secondary page, as shown in fig. 15, includes a preview area 151 and an operation area 152. The preview area 151 is used to display a screen composition of the first co-shot video. The operation area 152 is used to display a thumbnail 153 of the first snap video and a secondary taskbar 154. In some embodiments, the operating region 152 may also display a thumbnail 155 of the normal video, i.e., a video of only one shot, the normal video and the first co-shot video belonging to a peer entity. When the user clicks on the thumbnail 155 of the normal video, the normal video can be edited.

In some embodiments, a segmentation control is included in the secondary taskbar 154. In response to the input of the splitting control, for example, the user clicks the splitting control, the client device synchronously splits a plurality of videos included in the first captured video, for example, the white vertical line in fig. 15 is a split line, and the visible split line is a split line for synchronously splitting the first captured video, rather than separately splitting each of the videos in the captured video. In some embodiments, a selection control (e.g., a cloud beat selection control), a volume control, and a delete control are also included in the secondary taskbar 154. Clicking the cloud capture selection control by the user can individually select each video included in the first co-capture video. In some embodiments, for multiple segments of the first co-shot video, the user may click on a thumbnail of any one segment (which is equivalent to selecting the segment), and then click on the cloud shot selection control, which may make a separate selection of each video segment in the segment. The user may set the volume level by clicking on the volume control. Clicking the delete control by the user may delete videos that do not want to be edited, such as the first-time video or the ordinary video.

In step 123, the client device makes an individual selection of each video included in the first snap video. In some embodiments, the client device responds to input of a cloud beat selection control in secondary taskbar 154, e.g., the user clicks on the cloud beat selection control, and the client device presents a tertiary page. The three-level page is shown in fig. 16, and includes a preview area 161 and an operation area 162. The preview area 161 is used to display the screen composition of the first co-shooting video. The operation region 162 is used to display a thumbnail of each video included in the first snap video and set the thumbnail of each video to a selectable state. The client device responds to the selection operation of the thumbnail, for example, when the user clicks the thumbnail of any video, the client device sets the video corresponding to the selection operation to be in a selected state.

For example, in fig. 16, the operation region 162 displays thumbnails of 4 video clips in parallel, the 4 video clips constituting divided clips of the first taken video, and the 4 video clips are all in a selected state. In fig. 16, a segment has a slider control thereon, which is a slidable control generated on the segment by the client device in response to a selection operation on the segment (e.g., a user clicking on a thumbnail of the segment). The client device responds to the input of the slider control, for example, the user drags the slider control, and the client device adjusts the duration of the segment.

In step 124, the client device generates a second snap video based on the results of the synchronized segmentation and the results of the individual selection. For example, in fig. 17, the user clicks on the publish control in the preview area 171, and may publish the second co-capture video.

In some embodiments, the client device reconstructs a screen displayed in a preview area of the three-level page in response to the selection operation of the thumbnail. In some embodiments, the client device displays the selected video pictures in the preview area of the third-level page, the arrangement sequence of the video pictures is the same as the selection sequence, and the thumbnail in the operation area is clicked to realize the guide switching of the preview area view format and control the split-view pictures during the play of the co-shooting video. For example, in fig. 17, the user selects 3 videos, the client device displays 3 screens in the preview area 171, the 3 screens correspond to the 3 videos selected by the user, and the display order is the same as the selected order. For example, in fig. 18, the user selects 2 videos, the number of screens displayed by the client device on the preview area 181 is 2, the 2 videos are displayed in the same order as the selected order. For example, in fig. 19, the user selects 1 video, and the client device displays 1 screen in the preview area 191 corresponding to the video selected by the user. In some embodiments, the client device adjusts the position and size of the displayed screen in the preview area of the three-level page according to the number of videos selected by the user. For example, in fig. 17, the user selects 3 videos, and the client device displays the 3 rd selected video screen in the center. For example, in fig. 18, the user has selected 2 videos, and the client device adjusts the picture position of these 2 videos to four-mirror height-adapted vertical centering. For example, in fig. 19, the user has selected 1 video, and the client device has the screen of this 1 video tiled across the preview area of the three-level page.

In some embodiments, the operating region of the tertiary page is further configured to display a tertiary taskbar comprising: a cropping screen control and a rotating screen control. For example, the operation regions 162 to 192 in fig. 16 to 19 respectively display the third-level taskbar 163 to third-level taskbar 193, and each of the third-level taskbar 163 to third-level taskbar 193 includes a crop screen control and a spin screen control. In some embodiments, the client device performs a crop operation (crop) on a corresponding video frame in the preview area of the tertiary page in response to an input of a crop frame control, e.g., a user clicks the crop frame control and crops the video frame in the preview area. In some embodiments, the client device responds to an input of the spin picture control, for example, a user clicks the spin picture control and rotates the video picture in the preview area, and the client device rotates the corresponding video picture in the preview area of the third-level page.

Based on the video clipping method disclosed by the embodiment, the segmentation and selection of multiple videos in the co-shooting video are split into two steps, namely, the second-level page is video synchronous segmentation, and the third-level page is video independent selection. In some embodiments, the video may be segmented prior to selecting the video, or the video may be selected prior to segmenting the video. The secondary page has a control in it to return to the primary page, such as the left-most control in the secondary taskbar 154 in FIG. 15. The tertiary page has a control therein to return to the secondary page, such as the left-most control in the tertiary taskbar 163 in FIG. 16.

It is noted that, for simplicity of description, the foregoing method embodiments are described as a series of acts or combination of acts, but those skilled in the art will appreciate that the disclosed embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently with other steps in accordance with the disclosed embodiments. In addition, those skilled in the art can appreciate that the embodiments described in the specification all belong to alternative embodiments.

Embodiments of the present disclosure also provide a non-transitory computer-readable storage medium storing a program or instructions, where the program or instructions cause a computer to perform steps of various embodiments of a method (or a method of video editing) such as video taking, and in order to avoid repeated descriptions, the steps are not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than others, combinations of features of different embodiments are meant to be within the scope of the disclosure and form different embodiments.

Those skilled in the art will appreciate that the description of each embodiment has a respective emphasis, and reference may be made to the related description of other embodiments for those parts of an embodiment that are not described in detail.

Although the embodiments of the present disclosure have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the present disclosure, and such modifications and variations fall within the scope defined by the appended claims.

Claims

1. A method of video co-shooting, applied to an initiator device, the method comprising:

receiving a first input of a video snapshot;

Wherein the initiator video and the participant video are used to compose a snap video.

2. The method of claim 1, wherein establishing a video connection with at least one participant device in response to the first input comprises:

responding to the first input, generating a graphical user interface for establishing video connection, wherein the graphical user interface comprises a plurality of areas, and the areas are used for displaying a video picture of initiator equipment or a video picture of participant equipment;

receiving a third input inviting the participant;

sending invitation information to the corresponding participant device in response to the third input;

and responding to the acceptance information sent by the corresponding participant equipment, and displaying a video picture of the corresponding participant equipment in the area.

3. The method according to claim 2, wherein the invitation information includes identification information of the video taken in time;

the identification information is information generated in response to the first input, or information generated while the graphical user interface is generated.

4. The method of claim 1, wherein the initiator device is provided with an image sensor and an audio sensor;

Accordingly, the recording initiator video includes:

acquiring image data acquired by the image sensor and audio data acquired by the audio sensor; and recording the video of the initiator based on the image data and the audio data.

5. The method of claim 1, wherein said sending recording start information to said at least one participant device comprises:

and sending recording start information to server equipment so that the server equipment sends the recording start information to the at least one participant equipment.

6. The method of claim 1, wherein the dynamic pictures in the snap video comprise a dynamic picture of the initiator video and a dynamic picture of at least one participant video, and wherein the dynamic pictures in the snap video are co-soundtrack.

7. The method of claim 1, wherein the snap video is composed by the initiator device or composed by a server device;

the originator device composing the snap video includes: acquiring a participant video recorded by the at least one participant device, and synthesizing a co-shooting video based on the participant video and the initiator video;

The server device synthesizing the snap-shot video comprises the following steps: and acquiring the initiator video uploaded by the initiator device and the participant video uploaded by the at least one participant device, and synthesizing a co-shot video based on the participant video and the initiator video.

8. A method of video co-capture for use with a participant device, the method comprising:

detecting a first input for the invitation information;

9. The method of claim 8, wherein the invitation information is sent by the initiator device or by a first participant device; accordingly, the method further comprises:

sending the invitation information to a second participant device.

10. The method according to claim 8, wherein the invitation information includes identification information of the video taken in time;

correspondingly, after receiving the invitation information of the video snapshot, the method further comprises:

receiving a second input for the identification information;

comparing the second input with the identification information in the invitation information;

and if the comparison is consistent, sending acceptance information to the initiator equipment, and establishing video connection with the initiator equipment.

11. The method according to claim 8 or 10, wherein the sending an acceptance message to the initiator device, establishing a video connection with the initiator device comprises:

after receiving information is sent to the initiator device, a graphical user interface used for establishing video connection is generated, wherein the graphical user interface comprises a plurality of areas, and the areas are used for displaying video pictures of the initiator device or video pictures of participant devices;

displaying a video frame of the participant device in the area.

12. The method of claim 8, wherein the recording start information is sent to a server device by the initiator device and is issued by the server device.

13. The method of claim 8, wherein the participant device is provided with an image sensor and an audio sensor;

accordingly, the recording of the participant video includes:

acquiring image data acquired by the image sensor and audio data acquired by the audio sensor; recording a participant video based on the image data and the audio data.

14. The method of claim 8, wherein the dynamic pictures in the snap video comprise a dynamic picture of the initiator video and a dynamic picture of at least one participant video, and wherein the dynamic pictures in the snap video are co-soundtrack.

15. The method of claim 8, wherein the snap video is composed by the participant device or by a server device;

the participant device composing the snap video comprises: acquiring an initiator video recorded by the initiator device, and synthesizing a co-shooting video based on the participant video and the initiator video;

16. A method of video clipping, the method comprising:

selecting each video included in the first snap video individually;

17. The method of claim 16, wherein after the obtaining the first co-shot video, the method further comprises:

responding to the input of the video clip, and displaying a primary page; the first-level page comprises a preview area and an operation area, the preview area is used for displaying the picture composition of the first co-shooting video, and the operation area is used for displaying the thumbnail of the first co-shooting video and a first-level task bar; the first-level task bar comprises a video control; the video control is used for triggering the clipping of the first co-shot video.

18. The method of claim 17, wherein the synchronously segmenting the plurality of videos included in the first snap video comprises:

responding to the input of the video control, and displaying a secondary page; the second-level page comprises a preview area and an operation area, the preview area is used for displaying the picture composition of the first co-shooting video, and the operation area is used for displaying the thumbnail of the first co-shooting video and a second-level task bar; the secondary task bar comprises a segmentation control;

and responding to the input of the segmentation control, and synchronously segmenting a plurality of videos included in the first taken video.

19. The method of claim 18, wherein after the synchronous segmentation, the method further comprises:

responding to the selection operation of the segmentation segments, and generating a slider control on the segmentation segments;

and responding to the input of the slider control, and adjusting the time length of the segmentation segment.

20. The method of claim 18, wherein the secondary taskbar further comprises a selection control;

the individually selecting each video included in the first snap video comprises:

responding to the input of the selection control, and displaying a three-level page; the three-level page comprises a preview area and an operation area, the preview area is used for displaying the picture composition of the first co-shooting video, and the operation area is used for displaying the thumbnail of each video included in the first co-shooting video and setting the thumbnail of each video to be in a selectable state;

And responding to the selection operation of the thumbnail, and setting the video corresponding to the selection operation to be in a selected state.

21. The method of claim 20, further comprising: and responding to the selection operation of the thumbnail, and reconstructing the picture displayed in the preview area of the three-level page.

22. The method according to claim 21, wherein the reconstructing the displayed screen of the preview area of the tertiary page comprises:

and displaying the selected video pictures in the preview area of the three-level page, wherein the arrangement sequence of the video pictures is the same as the selection sequence.

23. The method according to claim 21, wherein the reconstructing the displayed screen of the preview area of the tertiary page comprises:

and adjusting the position and the size of the picture displayed in the preview area of the three-level page based on the selection number of the thumbnails.

24. The method according to claim 20, wherein the operating area of the tertiary page is further configured to display a tertiary taskbar, wherein the tertiary taskbar comprises a cropping control and a spin control;

and responding to the input of the cutting picture control or the input of the rotating picture control, and adjusting the picture displayed in the preview area of the three-level page.

25. The method according to claim 24, wherein the adjusting the display of the preview area of the tertiary page comprises:

responding to the input of the cutting picture control, and cutting the corresponding video picture in the preview area of the three-level page;

and responding to the input of the rotating picture control, and rotating the corresponding video picture in the preview area of the three-level page.

26. A client device, the client device configured to initiate a video auction, the client device comprising:

27. A client device, wherein the client device is configured to participate in video co-shooting, the client device comprising:

28. A client device, the client device comprising:

29. An electronic device, comprising: a processor and a memory;

the processor is adapted to perform the steps of the method of any one of claims 1 to 7 by calling a program or instructions stored in the memory.

30. An electronic device, comprising: a processor and a memory;

the processor is adapted to perform the steps of the method of any one of claims 8 to 15 by calling a program or instructions stored in the memory.

31. An electronic device, comprising: a processor and a memory;

the processor is adapted to perform the steps of the method of any one of claims 16 to 25 by calling a program or instructions stored in the memory.

32. A non-transitory computer-readable storage medium storing a program or instructions for causing a computer to perform the steps of the method according to any one of claims 1 to 7.

33. A non-transitory computer-readable storage medium storing a program or instructions for causing a computer to perform the steps of the method according to any one of claims 8 to 15.

34. A non-transitory computer readable storage medium storing a program or instructions for causing a computer to perform the steps of the method according to any one of claims 16 to 25.