WO2022170837A1

WO2022170837A1 - Video processing method and apparatus

Info

Publication number: WO2022170837A1
Application number: PCT/CN2021/136393
Authority: WO
Inventors: 陈兰昊; 孟庆吉; 徐世坤; 于飞; 陈中领
Original assignee: 华为技术有限公司
Priority date: 2021-02-09
Filing date: 2021-12-08
Publication date: 2022-08-18
Also published as: CN114915722B; CN114915722A

Abstract

Provided in the present application are a video processing method and an electronic device. By means of extracting action information from a video, mutual comparison between actions of a plurality of persons can be realized. An action of a first person is corrected on the basis of an action of a second person; a new video is generated, and the new video may comprise an image of the first person; and the corrected action of the first person is displayed. In the new video, the action of the first person may be more similar to the action of the second person. Therefore, the aim of the present application is to improve the action matching degree or action coordination degree between a plurality of users and reduce the amount of post-processing a user performs on videos.

Description

Method and apparatus for processing video

This application claims the priority of the Chinese patent application with the application number 202110178452.9 and the application title "Method and Apparatus for Processing Video" filed with the China Patent Office on February 9, 2021, the entire contents of which are incorporated into this application by reference.

This application claims the priority of the Chinese patent application with the application number 202110529002.X and the application title "Method and Apparatus for Processing Video" filed with the China Patent Office on May 14, 2021, the entire contents of which are incorporated herein by reference middle.

technical field

The present application relates to the field of electronic equipment, and more particularly, to a method and apparatus for processing video.

Background technique

Multiple users located in the same venue can take a photo with one or more camera devices (electronic devices with cameras) to obtain a co-shot video containing the appearances of the multiple users. A single user can co-shoot with the video material to obtain a co-production video including the user and the video material. However, without many rehearsals, the coordinated actions of multiple users or characters are often difficult to coordinate, which may result in mediocre or suboptimal coordination. This may also require the user to perform additional post-processing on the co-shot video.

SUMMARY OF THE INVENTION

The present application provides a method and apparatus for processing video, with the purpose of improving the action matching degree of multiple users and reducing the post-processing amount of video by users.

In a first aspect, a method for processing video is provided, including:

The first electronic device acquires a first video, where the first video is a video of a first character;

obtaining, by the first electronic device, a first action file corresponding to the first video, where the first action file corresponds to an action of the first character;

The first electronic device obtains a second action file corresponding to a second video, the second video is a video of a second character, and the second action file corresponds to an action of the second character;

The first electronic device generates a target video according to the first video, the first action file, and the second action file, where the target video includes a first character image of the first character, and the target video The actions of the first character in the video are different from those of the first character in the first video, and the actions of the first character in the target video are different from those of the second character in the second video. Action corresponds.

The solution provided by the present application can extract information on the actions of the first character in the video, and modify the actions of the first character according to the actions of the second character, so that the actions of the first character can be closer to the actions of the second character. This helps to reduce the post-processing workload of the user for the video, and further helps to improve the user experience of shooting, producing, and processing the video.

In conjunction with the first aspect, in some implementations of the first aspect, before the first electronic device acquires the first video, the method further includes:

The first electronic device establishes a video call connection between the first electronic device and a second electronic device, where the first electronic device is the electronic device of the first character, and the second electronic device is the second electronic device the character's electronic equipment;

The first electronic device obtains the first video, including:

The first electronic device acquires the first video during a video call;

The method also includes:

The first electronic device acquires the second video from the second electronic device through the video call connection.

The first character and the second character can interact with the video through a video call to make a new video. This is beneficial to increase the applicable scenarios of video calls, and video calls can also have the function of making videos. The video data obtained during the video call can also be used to create a new video, which improves the interaction efficiency between devices. In addition, the video call also facilitates the details of the interaction between the first character and the second character, which is beneficial to improve the accuracy of the actions made by the first character, and further helps to further reduce the post-processing workload of the user for the video.

Optionally, the second video may be the same video as the first video. The first video may be a video of the first person and the second person.

With reference to the first aspect, in some implementations of the first aspect, the first video and the second video correspond to the same time period during the video call, and the target video further includes the second character's A second character image, where actions of the second character in the target video correspond to actions of the second character in the second video.

The target video includes the second character, which can realize the co-production of the first character and the second character, which is beneficial to increase the flexibility of the video.

In conjunction with the first aspect, in some implementations of the first aspect, the method further includes:

The first electronic device acquires a third video, and the third video is a video of a third character;

The first electronic device acquires a third action file corresponding to the third video, and the third action file corresponds to the action of the third character;

The first electronic device generates a target video according to the first video, the first action file, and the second action file, including:

The first electronic device generates the target video according to the first video, the third video, the first action file, the second action file, and the third action file, and the target video Also includes a third character image of the third character, the action of the third character in the target video is different from the action of the third character in the third video, and the third character in the target video is different. The action of the character corresponds to the action of the second character in the second video.

The third character and the first character can act against the same second character. Without any video processing, it is difficult to coordinate the movements of the third character with the movements of the first character. In order to make the movements sufficiently coordinated, the first character and the second character need to be rehearsed many times in advance, which increases the difficulty of co-producing the video. The solution provided by this application can extract the action files of multiple characters, and adjust the actions of multiple characters in a unified manner based on sample actions, which is beneficial to increase the coordination of actions of multiple characters and reduce the post-processing workload of users for videos. .

With reference to the first aspect, in some implementations of the first aspect, the target video further includes a second character image of the second character, and the actions of the second character in the target video are similar to the second character image of the second character. The action of the second character in the video corresponds.

With reference to the first aspect, in some implementations of the first aspect, the first person image and the second person image belong to the same frame image in the target video.

In the same frame, the actions of the two characters can be relatively similar, so it is beneficial to improve the coordination of the actions of the first character and the second character in terms of time sequence. For example, the swing speed of the first character and the second character can be faster. similar.

With reference to the first aspect, in some implementations of the first aspect, the second video is a video of the second character and the fourth character, and the method further includes:

The first electronic device acquires a fourth action file, where the fourth action file corresponds to the action of the fourth character in the second video;

The first electronic device generates the said first video, the third video, the first action file, the second action file, the third action file, and the fourth action file A target video, the target video further includes a third character image of the third character, the action of the third character in the target video is different from the action of the third character in the third video, the The actions of the third character in the target video correspond to the actions of the fourth character in the second video.

The third character and the first character can perform actions according to the two characters in the same video, which is beneficial to improve the action cooperation between the third character and the first character. Without any video processing, the correlation between the actions of the third character and the actions of the first character may be relatively weak, and it may be relatively difficult for the first character and the third character to jointly complete an action. If the solution provided by the present application is not used, the first character and the second character need to be rehearsed many times in advance, which increases the difficulty of co-producing a video. The solution provided by this application can extract the action files of multiple characters, and adjust the actions of the multiple characters based on the sample actions of the two characters, which is beneficial to increase the cooperation of the actions of the multiple characters, and is beneficial to reduce the user's attention to the video. Post-processing workload.

With reference to the first aspect, in some implementations of the first aspect, the target video further includes a second character image of the second character and a fourth character image of the fourth character. The action of the second character corresponds to the action of the second character in the second video, and the action of the fourth character in the target video corresponds to the action of the fourth character in the second video.

The target video includes the second character and the fourth character, and the first character, the second character, the third character, and the fourth character can be co-produced, which is beneficial to increase the flexibility of the video.

With reference to the first aspect, in some implementations of the first aspect, the first person image, the second person image, the third person image, and the fourth person image belong to the target video. the same frame of image.

In the same frame, the actions of the first character, the second character, the third character, and the fourth character can be relatively similar, so it is beneficial to improve the actions of the first character, the second character, the third character, and the fourth character in time sequence. Coordination, for example, the swing speeds of the movements of the first character, the second character, the third character, and the fourth character can be more similar.

With reference to the first aspect, in some implementations of the first aspect, before the first electronic device acquires the first video, the method further includes:

The first electronic device establishes a video call connection between the first electronic device and the second electronic device, where the first electronic device is the electronic device of the first character, and the second electronic device is the third character's electronic device Electronic equipment;

The first electronic device obtains the first video, including:

The first electronic device acquires the first video during a video call;

The first electronic device acquires the third video, including:

The first electronic device acquires a third video from the second electronic device through the video call connection.

The first character and the third character can interact with the video through a video call to make a new video. This is beneficial to increase the applicable scenarios of video calls, and video calls can also have the function of making videos. The video data obtained during the video call can also be used to create a new video, which improves the interaction efficiency between devices. In addition, the video call also facilitates the details of the interaction between the first character and the third character, which helps to improve the accuracy of the actions of the first character and the third character, and further helps to further reduce the post-processing workload of the user for the video.

With reference to the first aspect, in some implementations of the first aspect, the first video and the third video correspond to the same time period during the video call.

The first character can perform actions synchronously with the third character, which is beneficial to improve the coordination in timing between the actions of the first character and the actions of the third character.

With reference to the first aspect, in some implementations of the first aspect, establishing, by the first electronic device, a video call connection between the first electronic device and the second electronic device includes:

The first electronic device establishes a video call connection between the first electronic device and the second electronic device through a photographing application or a video calling application.

The shooting application can call user controls from other applications than the shooting application, so that it can initiate a co-shooting request to other users. In addition, through the co-shot control, multiple applications (including a shooting application) of the electronic device can be made to run cooperatively, so as to realize the co-shot of multiple users.

Video calling apps can run in tandem with other apps to achieve co-production of multiple users. Therefore, in addition to the function of video calling, the video calling application may also have the function of generating video.

With reference to the first aspect, in some implementations of the first aspect, the second video is a video stored locally or in the cloud.

The first electronic device may modify the action of the first character in the first video according to the existing video. The existing video can also be reused, which is beneficial to improve the flexibility of video processing.

With reference to the first aspect, in some implementations of the first aspect, the first electronic device obtains a second action file corresponding to the second video, including:

The first electronic device acquires the second action file from the second electronic device.

The first electronic device may not obtain the second video, but only obtain the information related to the action in the second video, which is beneficial to reduce the amount of information transmission between the first electronic device and the second electronic device, thereby improving the video processing. Efficiency and communication efficiency.

With reference to the first aspect, in some implementations of the first aspect, the action of the first character in the target video corresponds to the action of the second character in the second video, including:

The action file corresponding to the first character image is the first target action file, the matching degree between the first action file and the second action file is the first matching degree, and the first target action file and The matching degree between the second action files is a second matching degree, and the second matching degree is greater than the first matching degree.

The method provided by the present application is beneficial to improve the similarity of the actions of two characters on the basis of the original video, and is beneficial to make the processed video have relatively high action coordination.

With reference to the first aspect, in some implementations of the first aspect, the first electronic device obtains a first action file corresponding to the first video, including:

The first electronic device determines the first action sub-file according to at least two of the following: a first head pixel, a first neck pixel, a first torso pixel, a first upper left forelimb pixel, a first Left upper hindlimb pixel point, first left lower forelimb pixel point, first left lower hindlimb pixel point, first right upper forelimb pixel point, first right upper hindlimb pixel point, first right lower forelimb pixel point, first right lower hindlimb pixel point, A left-hand pixel, a first right-hand pixel.

The solution of the present application can divide each area of a person's body, so as to extract the relevant information of each part of the body and obtain the action information of the person.

With reference to the first aspect, in some implementations of the first aspect, the first action subfile includes at least one of the following limb angles:

The first head angle, the first neck angle, the first trunk angle, the first left upper forelimb angle, the first left upper hindlimb angle, the first left lower forelimb angle, the first left lower hindlimb angle, the first right upper forelimb angle, the first right upper Hind limb angle, first right lower forelimb angle, first right lower hind limb angle, first left hand angle, first right hand angle.

The solution of the present application can aim at the difference of some parts of the two characters to confirm the similarities and differences in the actions of the two characters.

In combination with the first aspect, in some implementations of the first aspect, the first action file corresponds to a first limb angle, the second action file corresponds to a second limb angle, and the target action file corresponds to a third limb angle , the difference between the first limb angle and the second limb angle is smaller than a preset angle, and the third limb angle is between the first limb angle and the second limb angle.

The solution of the present application can adjust the movement of a character by adjusting the angle of a certain limb, so that the movements of multiple characters can be more coordinated.

With reference to the first aspect, in some implementations of the first aspect, the first video includes a first subframe and a second subframe, the second video includes a third subframe and a fourth subframe, and the The target video includes a fifth subframe, a sixth subframe, the first subframe, the third subframe, and the fifth subframe correspond to each other, and the second subframe, the fourth subframe, The sixth subframes correspond to each other, the time difference between the first subframe and the second subframe is the first time difference, and the time difference between the third subframe and the fourth subframe is the first time difference. Two time differences, the time difference between the fifth subframe and the sixth subframe is a third time difference, and the third time difference is between the first time difference and the second time difference.

The solution of the present application can adjust the time difference between multiple actions, which is beneficial to make the actions of multiple characters more similar within a period of time.

With reference to the first aspect, in some implementations of the first aspect, the target video includes a first image area and a second image area, and the first image area includes pixels corresponding to the first character, so The second image area includes pixels corresponding to the second person.

The target video includes the actions of the two characters, which is helpful for the user to observe the corrected actions of the first character more intuitively, and the relatively high action coordination between the first character and the second character.

With reference to the first aspect, in some implementations of the first aspect, the first image area includes pixels corresponding to any of the following: a first background image, a second background image, a target gallery image, the first A background image includes pixels corresponding to the scene where the first character is located, the second background image includes pixels corresponding to the scene where the second character is located, and the target gallery image is stored in the first electronic image on the device.

With reference to the first aspect, in some implementations of the first aspect, the second image area includes pixels corresponding to any one of the following: a first background image, a second background image, a target gallery image, the first A background image includes pixels corresponding to the scene where the first character is located, the second background image includes pixels corresponding to the scene where the second character is located, and the target gallery image is stored in the first electronic image on the device.

The target video can flexibly use any one of the first video, the second video or the gallery image as the background of the target video. If the first image area and the second image area use the same background, the first image area and the second image area can be regarded as being in the same background or the same scene, which is beneficial to increase the relationship between the first image area and the second image area. correlation and integration. The first character image and the second character image can be assigned to different areas on the user interface, which can be more suitable for scenes that need to distinguish the character images relatively clearly. mixed scene.

With reference to the first aspect, in some implementations of the first aspect, the co-shot video further includes a background image area, and the background image area is the background of the first image area and the second image area, and the The background image area includes pixels corresponding to any one of the following: a first background image, a second background image, and a target gallery image, the first background image includes pixels corresponding to the scene where the first character is located, and the The second background image includes pixels corresponding to the scene where the second character is located, and the target gallery image is an image stored on the first electronic device.

The background image area can flexibly use any one of the first video, the second video or the gallery image as the background of the target video. The first image area and the second image area can be regarded as being in the same background or the same scene, which is beneficial to increase the correlation and fusion between the first image area and the second image area. This can be more suitable for scenes that do not require significant area user images, such as group co-shot scenes.

In a second aspect, an electronic device is provided, comprising: a processor, a memory and a transceiver, the memory is used for storing a computer program, and the processor is used for executing the computer program stored in the memory; wherein,

The processor is configured to obtain a first video, where the first video is a video of a first character;

The processor is further configured to acquire a first action file corresponding to the first video, where the first action file corresponds to the action of the first character;

The processor is further configured to acquire a second action file corresponding to a second video, where the second video is a video of a second character, and the second action file corresponds to an action of the second character;

The processor is further configured to generate a target video according to the first video, the first action file, and the second action file, where the target video includes a first character image of the first character, and the target video is The actions of the first character in the target video are different from those of the first character in the first video, and the actions of the first character in the target video are different from those of the second character in the second video. The actions of the characters correspond.

With reference to the second aspect, in some implementations of the second aspect, before the processor acquires the first video, the processor is further configured to:

establishing a video call connection between the electronic device and a second electronic device, where the electronic device is the electronic device of the first character, and the second electronic device is the electronic device of the second character;

The processor is specifically configured to acquire the first video during a video call;

The processor is further configured to acquire the second video from the second electronic device through the video call connection.

With reference to the second aspect, in some implementations of the second aspect, the first video and the second video correspond to the same time period during the video call, and the target video further includes the second character's A second character image, where actions of the second character in the target video correspond to actions of the second character in the second video.

In conjunction with the second aspect, in some implementations of the second aspect, the processor is further configured to:

obtaining a third video, where the third video is a video of a third character;

acquiring a third action file corresponding to the third video, where the third action file corresponds to the action of the third character;

The processor is specifically configured to generate the target video according to the first video, the third video, the first action file, the second action file, and the third action file, and the The target video further includes a third character image of the third character, the actions of the third character in the target video are different from those of the third character in the third video, and the actions of the third character in the target video are different from those of the third character in the target video. The actions of the third character correspond to the actions of the second character in the second video.

With reference to the second aspect, in some implementations of the second aspect, the target video further includes a second character image of the second character, and the action of the second character in the target video is the same as the second character image. The action of the second character in the video corresponds.

With reference to the second aspect, in some implementations of the second aspect, the first person image and the second person image belong to the same frame of images in the target video.

With reference to the second aspect, in some implementations of the second aspect, the second video is a video of the second character and the fourth character, and the processor is further configured to:

obtaining a third video, where the third video is a video of a third character;

acquiring a fourth action file, the fourth action file corresponds to the action of the fourth character in the second video;

The processor is specifically configured to, according to the first video, the third video, the first action file, the second action file, the third action file, and the fourth action file, generate the target video, the target video further includes a third character image of the third character, and the movement of the third character in the target video is different from the movement of the third character in the third video, The actions of the third character in the target video correspond to the actions of the fourth character in the second video.

With reference to the second aspect, in some implementations of the second aspect, the target video further includes a second character image of the second character and a fourth character image of the fourth character. The action of the second character corresponds to the action of the second character in the second video, and the action of the fourth character in the target video corresponds to the action of the fourth character in the second video.

With reference to the second aspect, in some implementations of the second aspect, the first person image, the second person image, the third person image, and the fourth person image belong to the target video. the same frame of image.

establishing a video call connection between the electronic device and a second electronic device, where the electronic device is the electronic device of the first character, and the second electronic device is the electronic device of the third character;

The processor is specifically configured to acquire a third video from the second electronic device through the video call connection.

With reference to the second aspect, in some implementations of the second aspect, the first video and the third video correspond to the same time period during the video call.

With reference to the second aspect, in some implementations of the second aspect, the processor is specifically configured to establish a video call connection between the electronic device and the second electronic device through a photographing application or a video calling application.

With reference to the second aspect, in some implementations of the second aspect, the second video is a video stored locally or in the cloud.

With reference to the second aspect, in some implementations of the second aspect, the processor is specifically configured to acquire the second action file from the second electronic device.

With reference to the second aspect, in some implementations of the second aspect, the action of the first character in the target video corresponds to the action of the second character in the second video, including:

With reference to the second aspect, in some implementations of the second aspect, the processor is specifically configured to determine the first action sub-file according to at least two of the following: a first header pixel, a first neck Pixel point, first torso pixel point, first left upper forelimb pixel point, first left upper hindlimb pixel point, first left lower forelimb pixel point, first left lower hindlimb pixel point, first right upper forelimb pixel point, first right upper hindlimb pixel point , the first pixel point of the lower right forelimb, the first pixel point of the lower right hind limb, the first pixel point of the left hand, and the first pixel point of the right hand.

With reference to the second aspect, in some implementations of the second aspect, the first action subfile includes at least one of the following limb angles:

In combination with the second aspect, in some implementations of the second aspect, the first action file corresponds to a first limb angle, the second action file corresponds to a second limb angle, and the target action file corresponds to a third limb angle , the difference between the first limb angle and the second limb angle is smaller than a preset angle, and the third limb angle is between the first limb angle and the second limb angle.

With reference to the second aspect, in some implementations of the second aspect, the first video includes a first subframe and a second subframe, the second video includes a third subframe and a fourth subframe, and the The target video includes a fifth subframe, a sixth subframe, the first subframe, the third subframe, and the fifth subframe correspond to each other, and the second subframe, the fourth subframe, The sixth subframes correspond to each other, the time difference between the first subframe and the second subframe is the first time difference, and the time difference between the third subframe and the fourth subframe is the first time difference. Two time differences, the time difference between the fifth subframe and the sixth subframe is a third time difference, and the third time difference is between the first time difference and the second time difference.

In a third aspect, a computer storage medium is provided, including computer instructions, which, when the computer instructions are executed on an electronic device, cause the electronic device to perform the above-mentioned any one of the possible implementations of the first aspect. method.

In a fourth aspect, a computer program product is provided, which, when the computer program product runs on a computer, causes the computer to execute the method described in any one of the possible implementations of the first aspect above.

Description of drawings

FIG. 1 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

FIG. 2 is a software structural block diagram of an electronic device provided by an embodiment of the present application.

FIG. 3 is a schematic structural diagram of a user interface provided by an embodiment of the present application.

FIG. 4 is a schematic structural diagram of a user interface provided by an embodiment of the present application.

FIG. 5 is a schematic structural diagram of a user interface provided by an embodiment of the present application.

FIG. 6 is a schematic structural diagram of a user interface provided by an embodiment of the present application.

FIG. 7 is a schematic structural diagram of an extraction action file provided by an embodiment of the present application.

FIG. 8 is a schematic structural diagram of a video processing method provided by an embodiment of the present application.

FIG. 9 is a schematic structural diagram of a user interface provided by an embodiment of the present application.

FIG. 10 is a schematic structural diagram of a user interface provided by an embodiment of the present application.

FIG. 11 is a schematic structural diagram of a user interface provided by an embodiment of the present application.

FIG. 12 is a schematic structural diagram of a user interface provided by an embodiment of the present application.

FIG. 13 is a schematic structural diagram of a user interface provided by an embodiment of the present application.

FIG. 14 is a schematic structural diagram of a user interface provided by an embodiment of the present application.

FIG. 15 is a schematic structural diagram of a user interface provided by an embodiment of the present application.

FIG. 16 is a schematic flowchart of a method for processing video provided by an embodiment of the present application.

FIG. 17 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

Detailed ways

The technical solutions in the present application will be described below with reference to the accompanying drawings.

The terms used in the following embodiments are for the purpose of describing particular embodiments only, and are not intended to be limitations of the present application. As used in the specification of this application and the appended claims, the singular expressions "a," "an," "the," "above," "the," and "the" are intended to also Expressions such as "one or more" are included unless the context clearly dictates otherwise. It should also be understood that, in the following embodiments of the present application, "at least one" and "one or more" refer to one, two or more than two. The term "and/or", used to describe the association relationship of related objects, indicates that there can be three kinds of relationships; for example, A and/or B, can indicate: A alone exists, A and B exist at the same time, and B exists alone, A and B can be singular or plural. The character "/" generally indicates that the associated objects are an "or" relationship.

References in this specification to "one embodiment" or "some embodiments" and the like mean that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in other embodiments," etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments" unless specifically emphasized otherwise. The terms "including", "including", "having" and their variants mean "including but not limited to" unless specifically emphasized otherwise.

The electronic device provided by the embodiments of the present application, a user interface for such an electronic device, and an embodiment for using such an electronic device are described below. In some embodiments, the electronic device may be a portable electronic device that also includes other functions such as personal digital assistant and/or music player functions, such as a mobile phone, a tablet computer, a wearable electronic device with wireless communication capabilities (eg, a smart watch) Wait. Exemplary embodiments of portable electronic devices include, but are not limited to, carry-on

Or portable electronic devices with other operating systems. The above-mentioned portable electronic device may also be other portable electronic devices, such as a laptop computer (Laptop) or the like. It should also be understood that, in some other embodiments, the above-mentioned electronic device may not be a portable electronic device, but a desktop computer.

Exemplarily, FIG. 1 shows a schematic structural diagram of an electronic device 100 . The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , the mobile communication module 150, the wireless communication module 160, the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone jack 170D, the button 190, the camera 193, the display screen 194, and the subscriber identification module (SIM) card Interface 195, etc.

It can be understood that the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 100 . In other embodiments of the present application, the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent components, or may be integrated in one or more processors. In some embodiments, the electronic device 101 may also include one or more processors 110 . The controller can generate an operation control signal according to the instruction operation code and the timing signal, and complete the control of fetching and executing instructions. In some other embodiments, a memory may also be provided in the processor 110 for storing instructions and data. Illustratively, the memory in the processor 110 may be a cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. In this way, repeated access is avoided, and the waiting time of the processor 110 is reduced, thereby improving the efficiency of the electronic device 101 in processing data or executing instructions.

In some embodiments, the processor 110 may include one or more interfaces. The interface may include an inter-integrated circuit (I2C) interface, an inter-integrated circuit sound (I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal) asynchronous receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, SIM card interface, and/or USB interface, etc. Among them, the USB interface 130 is an interface that conforms to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like. The USB interface 130 can be used to connect a charger to charge the electronic device 101, and can also be used to transmit data between the electronic device 101 and peripheral devices. The USB interface 130 can also be used to connect an earphone, and play audio through the earphone.

It can be understood that the interface connection relationship between the modules illustrated in the embodiments of the present application is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100 . In other embodiments of the present application, the electronic device 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.

The charging management module 140 is used to receive charging input from the charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive charging input from the wired charger through the USB interface 130. In some wireless charging embodiments, the charging management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 100 . While the charging management module 140 charges the battery 142 , it can also supply power to the electronic device through the power management module 141 .

The power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 . The power management module 141 receives input from the battery 142 and/or the charging management module 140 and supplies power to the processor 110 , the internal memory 121 , the external memory, the display screen 194 , the camera 193 , and the wireless communication module 160 . The power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, battery health status (leakage, impedance). In some other embodiments, the power management module 141 may also be provided in the processor 110 . In other embodiments, the power management module 141 and the charging management module 140 may also be provided in the same device.

The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.

Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example, the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide wireless communication solutions including 2G/3G/4G/5G etc. applied on the electronic device 100 . The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like. The mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 . In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110 . In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 .

The wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), global navigation satellites Wireless communication solutions such as global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), and infrared technology (IR). The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 . The wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation through the antenna 2 .

The electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

Display screen 194 is used to display images, videos, and the like. Display screen 194 includes a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light). emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on. In some embodiments, electronic device 100 may include one or more display screens 194 .

The display screen 194 of the electronic device 100 may be a flexible screen. Currently, the flexible screen has attracted much attention due to its unique characteristics and great potential. Compared with traditional screens, flexible screens have the characteristics of strong flexibility and bendability, which can provide users with new interactive methods based on the bendable characteristics, and can meet more needs of users for electronic devices. For an electronic device equipped with a foldable display screen, the foldable display screen on the electronic device can be switched between a small screen in a folded state and a large screen in an unfolded state at any time. Therefore, users are using the split-screen function more and more frequently on electronic devices equipped with foldable displays.

The electronic device 100 may implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.

The ISP is used to process the data fed back by the camera 193 . For example, when taking a photo, the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on image noise, brightness, and skin tone. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 193 .

Camera 193 is used to capture still images or video. The object is projected through the lens to generate an optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. DSP converts digital image signals into standard RGB, YUV and other formats of image signals. In some embodiments, the electronic device 100 may include one or more cameras 193 .

A digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy and so on.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos of various encoding formats, such as: Moving Picture Experts Group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.

The NPU is a neural-network (NN) computing processor. By drawing on the structure of biological neural networks, such as the transfer mode between neurons in the human brain, it can quickly process the input information, and can continuously learn by itself. Applications such as intelligent cognition of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.

The external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100 . The external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card.

Internal memory 121 may be used to store one or more computer programs including instructions. The processor 110 may execute the above-mentioned instructions stored in the internal memory 121, thereby causing the electronic device 101 to execute the method for off-screen display, various applications and data processing provided in some embodiments of the present application. The internal memory 121 may include a storage program area and a storage data area. Wherein, the stored program area may store the operating system; the stored program area may also store one or more applications (such as gallery, contacts, etc.) and the like. The storage data area may store data (such as photos, contacts, etc.) created during the use of the electronic device 101 and the like. In addition, the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic disk storage components, flash memory components, universal flash storage (UFS), and the like. In some embodiments, the processor 110 may cause the electronic device 101 to execute the instructions provided in the embodiments of the present application by executing the instructions stored in the internal memory 121 and/or the instructions stored in the memory provided in the processor 110 . The method of off-screen display, and other applications and data processing. The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.

The keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key. The electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .

FIG. 2 is a block diagram of the software structure of the electronic device 100 according to the embodiment of the present application. The layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces. In some embodiments, the Android system is divided into four layers, which are, from top to bottom, an application layer, an application framework layer, an Android runtime (Android runtime) and a system library, and a kernel layer. The application layer can include a series of application packages.

As shown in Figure 2, the application package can include applications such as gallery, camera, Changlian, map, and navigation.

The application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer. The application framework layer includes some predefined functions.

As shown in Figure 2, the application framework layer may include window managers, content providers, view systems, telephony managers, resource managers, notification managers, and the like.

A window manager is used to manage window programs. The window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, take screenshots, etc.

Content providers are used to store and retrieve data and make these data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone book, etc.

The view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. View systems can be used to build applications. A display interface can consist of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.

The phone manager is used to provide the communication function of the electronic device 100 . For example, the management of call status (including connecting, hanging up, etc.).

The resource manager provides various resources for the application, such as localization strings, icons, pictures, layout files, video files and so on.

The notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages, and can disappear automatically after a brief pause without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc. The notification manager can also display notifications in the status bar at the top of the system in the form of graphs or scroll bar text, such as notifications of applications running in the background, and notifications on the screen in the form of dialog windows. For example, text information is prompted in the status bar, a prompt sound is issued, the electronic device vibrates, and the indicator light flashes.

Android Runtime includes core libraries and a virtual machine. Android runtime is responsible for scheduling and management of the Android system.

The core library consists of two parts: one is the function functions that the java language needs to call, and the other is the core library of Android.

The application layer and the application framework layer run in virtual machines. The virtual machine executes the java files of the application layer and the application framework layer as binary files. The virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.

A system library can include multiple functional modules. For example: surface manager (surface manager), media library (media library), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.

The Surface Manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.

The media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files. The media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.

The 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing.

2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is the layer between hardware and software. The kernel layer contains at least display drivers, camera drivers, audio drivers, and sensor drivers.

The solutions provided by the embodiments of the present application can be applied to co-production scenarios, for example, scenarios in which a user is in a co-production with a material, and a user is in a co-production with a user. The user and the user co-production scene may also include a remote multi-user co-production scene. The remote multi-user co-production scenario may refer to the inability or difficulty of at least two users to complete co-production at the same time with the same camera device. Some possible examples of co-production scenarios are described below.

Example 1

User A can take a selfie through an electronic device A with a camera function to obtain a selfie video A; user B can take a selfie through an electronic device B with a camera. By synthesizing video A and video B, a co-production video of user A and user B can be obtained. The selfie video A and the selfie video B can be obtained by asynchronous shooting.

In this example, the visual coordination of user A's and user B's synchronized actions may be poor. For example, the distance between user A and electronic device A may be quite different from the distance between user B and electronic device B, so the outline size of user A in selfie video A and the outline size of user B in selfie video B are quite different. For another example, user A and user B perform similar actions, but user A's action is relatively fast and the action is relatively large, while the action of user B is relatively slow and small. Therefore, the matching degree of video A and video B may be relatively poor; correspondingly, the coordination of co-produced videos may be relatively poor. In order to achieve relatively high coordination of the co-shot video, the user needs to perform post-processing with a large workload on the co-shot video.

Example 2

User A can make a video call with user B through the electronic device A with the camera function, and obtain a co-shot video that includes both user A and user B by recording the screen.

However, the clarity of the co-shot video obtained through screen recording is usually relatively poor. The maximum resolution of the co-shot video usually depends on the display resolution of the electronic device A. Moreover, even if user A and user B communicated and negotiated many shooting details, multiple rehearsals may be required for multiple users to perform actions with high similarity. Therefore, the coordination of co-production videos may be relatively poor. In order to achieve relatively high coordination of the co-shot video, the user needs to perform post-processing with a large workload on the co-shot video.

Example three

User A and User B may be located in the same scene. User A and User B can perform similar actions, and take a selfie through the electronic device A with a camera function to obtain a co-shot video.

In this example, user A's actions may be relatively poorly coordinated with user B's actions. For example, user A and user B perform similar actions, but the action of user A is relatively fast and the action is relatively large, while the action of user B is relatively slow and the action is relatively small. Therefore, the coordination of co-production videos may be relatively poor. In order to achieve relatively high coordination of the co-shot video, the user needs to perform post-processing with a large workload on the co-shot video.

Example four

User A can observe the video material. The video footage contains a series of actions of Person C. The user A imitates the action of the character C in the video material, and records the action made by the user A through the electronic device A with the camera function, so as to obtain the video A. Optionally, by synthesizing the video A and the video material, a co-shot video can be obtained.

Even if user A repeatedly watches and imitates the actions of character C in the video material, user A may need to rehearse repeatedly to make actions that are highly similar to the actions of character C. Therefore, the match of Video A with the video material may be relatively poor. Correspondingly, the coordination of co-production videos may be relatively poor. In order to achieve relatively high coordination of the co-shot video, the user needs to perform post-processing with a large workload on the co-shot video.

Embodiments of the present application provide a new method for video processing, which aims to reduce the post-processing workload of the user for the video, thereby helping to improve the user experience of shooting, producing, and processing the video.

FIG. 3 is a schematic diagram of a user interface 300 provided by an embodiment of the present application. The user interface 300 may be displayed on the first electronic device. The user interface 300 may be an interface of a camera application, or an interface of other applications having a photographing function. That is to say, a camera application or other applications having a photographing function may be carried on the first electronic device. The first electronic device may display the user interface 300 in response to operations made by the first user on the applications.

For example, the first user may open the camera application by clicking on the icon of the camera application, and then the first electronic device may display the user interface 300 . The camera application can call the camera 193 shown in FIG. 1 to capture the scene around the first electronic device. For example, the camera application may call the front camera of the first electronic device to take a selfie image of the first user, and display the selfie image on the user interface 300 .

The user interface 300 may include a plurality of function controls 310 (the function controls 310 may be presented on the user interface 300 in the form of tabs), and the plurality of function controls 310 may respectively correspond to a plurality of camera functions of the camera application. As shown in FIG. 3 , the multiple camera functions may include, for example, a portrait function, a photographing function, a video recording function, a video co-shooting function, a professional function, etc., and the multiple function controls 310 may include a portrait function control, a photographing function control, a video recording function control, Video co-shooting function controls, professional function controls.

The first electronic device can switch the current camera function to a function for completing video co-shooting, such as the "video co-shooting" function shown in FIG. It should be understood that, in other possible examples, the camera application may include other camera functions for completing video co-shooting. The embodiments of the present application are described below by taking the video co-shooting function as an example.

In the case that the current camera function is a video co-shooting function, the user interface 300 may include, for example, at least one of the following controls: a user co-shooting control 320 , a material co-shooting control 330 , and a gallery co-shooting control 340 . In response to an operation performed by the first user on any of the controls, the first electronic device may synthesize the captured video with other files, thereby completing the co-shooting.

The user co-shooting control 320 can be used to select or invite the second user to make a video call, so as to complete the synchronization and co-shooting of the first user and the second user.

For example, in response to an operation (such as a click operation) performed by the first user on the user snap control 320, the first electronic device may display a plurality of user controls corresponding to a plurality of users on the user interface 300, and the plurality of users may Including the second user. In response to an operation (such as a click operation) by the first user on the user control of the second user, the first electronic device may initiate a video call to the second electronic device, wherein the second electronic device may be the electronic device used by the second user . Correspondingly, the second user may receive the video call invitation from the first user through the second electronic device. The second electronic device may display an interface for the video call invitation, and the interface may include controls for answering the video call. In response to an operation performed by the second user on the video call answering control, a video call connection can be established between the first electronic device and the second electronic device. After the first electronic device establishes a video call connection with the second electronic device, the first electronic device can obtain the first video by shooting, and the second electronic device can obtain the second video by shooting. The first electronic device may acquire the second video through the video call connection. The second electronic device may be connected through a video call to obtain the first video. The electronic device can obtain one or more processed videos according to the first video and the second video by using the video processing method provided in the embodiment of the present application.

The material co-shot control 330 may be used to select a material from the cloud, so as to complete the co-production between the first user and the material. The material may refer to files that can reflect actions, such as videos and action templates stored in the cloud. For example, the cloud may refer to a cloud server, a cloud storage device, and the like.

For example, the material may be a second video containing a target person (eg, a second user). In this application, the target person may be, for example, a person known or familiar to the first user, such as a friend, family member, celebrity, etc., or a stranger, or a cartoon image with character characteristics. In some examples, a material can be understood as a kind of action template. Footage can contain multiple frames of action images of the target person. With reference to the example shown in FIG. 3 , the first electronic device may acquire the material from the cloud server in response to the user's operation on the material in-time control.

The first user may shoot the first video through the first electronic device. The first electronic device may capture a first video including the first user. The first electronic device can obtain one or more processed videos by using the video processing method provided in the embodiment of the present application according to the first video and material.

In an example, the first electronic device may crop the first video according to the outline of the first user in the first video to obtain a sub-video of the first person. The first person sub-video may include an image of the first user and not include a background image in the first video. The first electronic device may synthesize the first character sub-video, material and background multimedia data into a first target video, wherein the material may not include a background image corresponding to the target character, and the background multimedia data may serve as the first character sub-video, the material's background image. background. The background multimedia data may, for example, come from other files than the first video and material.

In another example, the first electronic device may crop the first video according to the outline of the first user in the first video to obtain a sub-video of the first person. The first electronic device may synthesize the first character sub-video and material into a first target video, wherein the material may include a background image corresponding to the target character, so that the background image in the material may serve as the background of the first character sub-video.

In yet another example, the first electronic device may synthesize the first video and material into the first target video. The material may not include a background image corresponding to the target person. The background image in the first video can serve as the background for the footage.

An example is given below to illustrate the relationship between the user image (or user pixels, user image blocks) and the background image (or background pixels, background image blocks).

For example, user a can take a selfie of a video through electronic device a. In the case that the video A shot by the electronic device a includes the user a, the electronic device a can crop the video A according to the outline of the user a in the video A to obtain the user sub-video and the background sub-video. Wherein, the user sub-video may include the image of user a and not include the background image; the background sub-video may include the background image and not include the image of user a.

The following takes a subframe A of video A as an example for detailed description. The subframe A may include a plurality of pixel points A, and the plurality of pixel points A may include a plurality of pixel points a corresponding to the outline of the user a. The multiple pixel points a' located within the multiple pixel points a in the subframe A may form a subframe a' of the user's sub-video, and may form the image of the user a; the multiple pixel points located in the subframe A A plurality of pixel points a" other than a may form a subframe a" of the background sub-video, and may form the background image.

The gallery co-shooting control 340 may be used to select a gallery video from a local gallery to complete the first user's co-shooting with the gallery video. The gallery video may be understood as a video stored locally on the first electronic device.

For example, the gallery video is a second video containing a target person (eg, a second user). The first user may shoot the first video through the first electronic device. The first electronic device may capture a first video including the first user. The first electronic device can obtain one or more processed videos according to the first video and the gallery video by using the video processing method provided in the embodiment of the present application.

In an example, the first electronic device may crop the first video according to the outline of the first user in the first video to obtain a sub-video of the first person. The first person sub-video may include an image of the first user and not include a background image in the first video. The first electronic device may synthesize the first character sub-video, the gallery video and the background multimedia data into a first target video, wherein the gallery video may not include a background image corresponding to the target character, and the background multimedia data may serve as the first character sub-video, Background for gallery videos. The background multimedia data may, for example, come from other files than the first video and material.

In another example, the first electronic device may crop the first video according to the outline of the first user in the first video to obtain a sub-video of the first person. The first electronic device may synthesize the first person sub-video and the gallery video into a first target video, wherein the gallery video may include a background image corresponding to the target person, so that the background image may serve as the background of the first person sub-video.

In yet another example, the first electronic device may synthesize the first video and the gallery video into the first target video. The gallery video may not include a background image corresponding to the target person. The background image in the first video can serve as the background for the gallery video.

Optionally, the user interface 300 may further include a gallery control 350 . In response to the first user's operation on the gallery control 350, the first electronic device may jump to the gallery application to view the photographed or stored multimedia data.

In response to the first user acting on any one of the user snap control 320 , the material snap control 330 , and the gallery snap control 340 , the first electronic device may display a user interface 400 as shown in FIG. 4 . The user interface 400 may include a first interface area 460 and a second interface area 470 . The first interface region 460 and the second interface region 470 may not cross each other. The first interface area 460 and the second interface area 470 may be located anywhere on the user interface 400 . As shown in FIG. 4 , the second interface area 470 may be located above the user interface 400 , for example, and the first interface area 460 may be located below the user interface 400 , for example.

The first user can observe the second interface area 470 of the user interface 400 , so as to understand and be familiar with the actions of the target person in the second interface area 470 . 3, in a possible example, the second interface area 470 may display the video call content of the second user, in this case, the target person may be the second user; in another possible example, the second user The interface area 470 may, for example, display the material, in this case, the target person may be the target person in the material; in an example, the second interface area 470 may, for example, display the gallery video, in this case, the target person may be the gallery The target person in the video.

In the following, for the convenience of description, the video resources displayed in the second interface area 470 are collectively referred to as the second video. Wherein, the second video may be any of the following: video call data received from a second electronic device during a video call, where the second electronic device is an electronic device used by the second user; material; gallery video.

The second video may include a second character sub-video, or the second character sub-video may be extracted from the second video. That is, the second video may include subframes corresponding to the target person. As shown in FIG. 4 , the second electronic device can display a second character image 471 in the second interface area 470, and then can play a picture of the sub-video of the second character. That is, the second interface area 470 may include the second character image 471 . The second interface area 470 may include pixel points corresponding to the target person.

In other examples, the first electronic device may directly play the second video within the second interface area 470 . The second interface area 470 may include a second character image 471 and a second background image 472 , and the second background image 472 may serve as a background of the second character image 471 . That is, the first electronic device may not perform video cropping or video extraction on the second video.

The first user can imitate the target person to make a series of actions, which are recorded by the first electronic device. If the second video is a video call video of the second user, the first user can imitate the second user. If the second video is a material, the first user can imitate the target person in the material. If the second video is a gallery video, the first user can imitate the target person in the gallery video. As shown in FIG. 4 , user interface 400 may include recording controls 410 . In response to an operation performed by the first user on the recording control 410, the first electronic device may capture a first video.

During the process of shooting the first video, the first user can preview the shooting effect of the first video through the first interface area 460 shown in FIG. 4 .

In one example, the first electronic device may include the first character sub-video, or the first character sub-video may be extracted from the first video. That is, the second video may include subframes corresponding to the first user. The first electronic device may display the first character image 461 in the first interface area 460, and then may play a picture of the sub-video of the first character. That is, the first interface area 460 may include the first character image 461 . The first interface area 460 may include pixel points corresponding to the first user.

In other examples, the first electronic device may directly play the first video in the first interface area 460 . The first interface area 460 may include a first character image 461 and a first background image 462 , and the first background image 462 may serve as a background of the first character image 461 . That is, the first electronic device may not perform video cropping or video extraction on the first video.

Optionally, in response to an operation performed by the user on the recording control 410, the electronic device may synthesize the first video and the second video to obtain, for example, the first target video shown in FIG. 5 and FIG. 6 . The first target video may include a first image area 560 corresponding to the first video or the first user, and a second image area 570 corresponding to the second video or the target person. The first image area 560 may correspond to the first interface area 460, and the second image area 570 may correspond to the second interface area 470, so that the preview video during the shooting process and the synthesized video may have relatively high unity.

In yet another possible scenario, two users can imitate one or more characters in the same video to perform actions through a video call. The two users can communicate details of the imitation through a video call.

For example, the first electronic device may send a video call invitation to the second electronic device according to the operation of the first user on the user co-shot control 320 as shown in FIG. 3 , and the user using the second electronic device may be the third user. After that, a video call connection can be established between the first electronic device and the second electronic device. The first user and the third user may, for example, select the second video through the material co-shooting control 330 or the gallery co-shooting video 340 shown in FIG. 3 . The second video may be a video about the target person. The second video may show the action of the target person. Both the first electronic device and the third electronic device may display one or more of the first interface area, the second interface area, and the third interface area on the user interface. The first interface area may display the content of the video call of the first user; the second interface area may display the second video; and the third interface area may display the content of the video call of the third user. The first user and the third user can imitate the actions in the second video to generate the first video and the third video during the video call. Since the first user and the third user imitate the actions of the same target character in the same video, the first video and the third video can be processed respectively with reference to the action form of the target character in the second video to obtain a result that includes the first user , the target video of the third user.

Optionally, during the video call, the first user and the third user can imitate the action of the target person at the same time, or they can imitate the action of the target person in different time periods successively.

For example, the first electronic device may send a video call invitation to the second electronic device according to the operation of the first user on the user co-shot control 320 as shown in FIG. 3 , and the user using the second electronic device may be the third user. After that, a video call connection can be established between the first electronic device and the second electronic device. The first user and the third user may, for example, select the second video through the material co-shooting control 330 or the gallery co-shooting video 340 shown in FIG. 3 . The second video may be a video about the first target person and the second target person. For example, the first target person and the second target person may cooperate to complete a series of actions in the second video. Both the first electronic device and the third electronic device may display one or more of the first interface area, the second interface area, and the third interface area on the user interface. The first interface area may display the content of the video call of the first user; the second interface area may display the second video; and the third interface area may display the content of the video call of the third user. The first user can imitate the action of the first target person in the second video, and generate the first video through a video call connection. The third user can imitate the action of the second target person in the second video, and generate a third video through a video call connection. Since the first user and the third user imitate the actions of the first target character and the second target character in the same video, the first video , and the third video is processed to obtain a target video including the first user and the third user.

Optionally, during the video call, the period during which the first user imitates the action of the first target person may substantially overlap with the period during which the third user imitates the action of the second target person, or the first user imitates the first target person. The time period for the action of the second target character may not overlap with the time period for the third user to imitate the action of the second target character.

Operation controls that may be included in the user interface 400 shown in FIG. 4 are described below.

User interface 400 may include, for example, a split screen switch control 420 .

As shown in the user interface 400 in the figure, when the split screen switch control 420 is in an on state, the first interface area 460 and the second interface area 470 may be, for example, two regular display areas. That is, the outline of the first interface area 460 may not match (or correspond to) the outline of the first user, and the outline of the second interface area 470 may not match (or correspond to) the outline of the target person. For example, the area of the first interface region 460 and the area of the second interface region 470 may correspond to a fixed ratio (eg, 1:1, 1:1.5, etc.). In the example shown in FIG. 4 , the split screen switch control 420 is currently in an on state. Both the first interface area 460 and the second interface area 470 may be rectangular in shape.

Correspondingly, in the user interface 500 shown in FIG. 5 , the first image area 560 and the second image area 570 of the first target video may be two regular display areas. The outline of the first image area 560 may not match (or not correspond to) the outline of the first user, and the outline of the second image area 570 may not match (or not correspond to) the outline of the target person. For example, the area of the first image area 560 and the area of the second image area 570 may correspond to a fixed ratio (eg, 1:1, 1:1.5, etc.). With reference to the example shown in FIG. 5 , the shapes of the first image area 560 and the second image area 570 may both be rectangles. That is, both the first image area 560 and the second image area 570 may include a background image.

In other examples, when the split-screen switch control 420 is in an off state, the outline of the first interface area 460 may match (or correspond to) the outline of the first user, and the outline of the second interface area 470 may match (or correspond to), for example, the outline of the first user. The profile of the second user matches (or corresponds). That is, the first interface area 460 may not include the first background image 462 of the first video as shown in FIG. 4 ; the second interface area 470 may not include the second background in the second video as shown in FIG. 4 . image 472.

Correspondingly, in the example shown in FIG. 6 , the contour of the first image area 560 of the first target video may match (or correspond to) the contour of the first user, and the second image area 570 of the first target video may match (or correspond to) the contour of the first user. The profile of can match (or correspond to) the profile of the second user. That is, the first image area 560 may not include the first background image 462 of the first video as shown in FIG. 4 ; the second image area 570 may not include the second background in the second video as shown in FIG. 4 . image 472.

Optionally, the first target video may include a first background image area 580 . For example, the pixel points of the first background image area 580 may be default values. The pixel points of the first background image area 580 may also correspond to any one of the first background image 462, the second background image 472, and the target gallery image. In some examples, the target gallery image may be a subframe of the gallery video. For example, a certain subframe of the first target video may correspond to a target gallery image, and multiple subframes of the co-shot video may correspond one-to-one with multiple subframes of the video where the target gallery image is located.

As shown in the user interface 600 of FIG. 6 , the first user can indicate to the first electronic device that the background of the first target video corresponds to the target gallery image by acting on the user interface. The first electronic device may, in response to an instruction from the user, determine that the pixels of the first background image area 580 of the first target video correspond to the target gallery image, so that the first target video shown in FIG. Pixels corresponding to the first background image 462 and the second background image 472.

Optionally, when there may be a display conflict between the first character image 461 and the second character image 471 on the user interface 400 , the electronic device may display the first character image 461 or the second character image 471 preferentially. In other words, the first person image 461 may be overlaid on the second person image 471 , or the second person image 471 may be overlaid on the first person image 461 .

Optionally, in order to reduce the workload of post-processing the video for the user, the user can adjust the display size of the first character image 461 and the second character image 471 by operating on the user interface 400, and then adjust the image of the first user. and the size ratio of the image of the target person in the first target video.

Optionally, as shown in FIG. 4 , the user interface 400 may include a background removal switch control 430 .

When the background removal switch control 430 is in an off state, the electronic device may not deduct the background of the first video and the background of the second video, that is, display the background image of the first video and the background image of the second video on the user interface 400 .

In the example shown in FIG. 4, the background removal switch control 430 may currently be in an off state. The first interface area 460 may display a first character image 461 and a first background image 462 . The first background image 462 may be the background image of the first user. The first background image 462 may be obtained by photographing the scene where the first user is located. That is, the first interface area 460 may include pixels corresponding to the first user and pixels corresponding to the scene where the first user is located. The second interface area 470 may display a second character image 471 and a second background image 472 . The second background image 472 may be a background image of the target person. That is, the second interface area 470 may include pixels corresponding to the target character and pixels corresponding to the scene where the target character is located.

When the background removal switch control 430 is in an on state, the electronic device may, for example, deduct the background of the first video and/or the background of the second video. For example, the first electronic device may display the background image of the first video in both the first interface area 460 and the second interface area 470; for another example, the first electronic device may display the background image in the first interface area 460 and the second interface area 470 Both display the background image of the second video; for another example, the first electronic device may display other background images except the background image of the first video and the background of the second video in the first interface area 460 and the second interface area 470. For another example, the first electronic device may display the background image of the first video on the user interface 400, but not the background image of the second video; for another example, the first electronic device may display the background of the second video on the user interface 400 image, the background image of the first video is not displayed; for another example, the first electronic device may display other background images on the user interface 400 except for the background image of the first video and the background of the second video. Other background images other than the background image of the first video and the background of the second video may be, for example, the target gallery image.

Correspondingly, the first image area 560 in the first target video may include the first person image 461 and the first background image 462, and the second image area 570 in the first target video may include the second person image 471 and the first background image 462, the first background image 462 is used to serve as the background of the second person image 471; alternatively, the first image area 560 in the first target video may include the first person image 461, the second background image 472, the first target video The second image area 570 in the video may include a second person image 471, a second background image 472, and the second background image 472 is used to serve as the background of the first person image 461; or, the first image area 560 in the first target video It may include a first person image 461 and a target gallery image, and the second image area 570 in the first target video may include a second person image 471 and a target gallery image, and the target gallery image is used to serve as the first character image 461 and the second character. The background of the image 471; alternatively, the first image area 560 in the first target video may include the first person image 461 and not include the first background image 462, and the second image area 570 in the first target video may include the second The character image 471 does not include the second background image 472; the first background image area 580 in the first target video may include any of the following: the first background image 462, the second background image 472, the target gallery image, the first The background image area 580 may be used to serve as a background for the first image area 560 and the second image area 570 .

Optionally, in the case that the background removal switch control 430 is in an on state, the first electronic device may respond to an operation performed by the first user on the user interface 400, and determine whether the user is in the first interface area 460, the second interface area 470 or the user interface. Background image displayed within interface 400 .

Optionally, when the split screen switch control 420 is in an off state, the background removal switch control 430 may be in an on state.

Optionally, as shown in FIG. 4 , the user interface 400 may include a beauty switch control 440 .

When the beauty switch control 440 is in an on state, the electronic device can perform portrait beautification for the first person image 461 and/or the second person image 471 . That is, the electronic device may display the first person image 461 and/or the second person image 471 after beautification of the portrait on the user interface 400; in the synthesized first target video, the person in the first image area 560 The image and/or the person image in the second image area 570 may be an image after beautification processing.

When the beauty switch control 440 is in an off state, the electronic device may not perform portrait beautification for the first person image 461 and the second person image 471 . That is to say, the electronic device can display the first person image 461 and the second person image 471, the first person image 461 and the second person image 471 on the user interface 400 according to the original image of the first user and the original image of the target person. It can be an unbeautified image. In the synthesized first target video, the person image in the first image area 560 can be obtained from the original image of the first user, and the person image in the second image area 570 can be obtained from the original image of the target person. The person image in the image area 560 and the person image in the second image area 570 may be images without beauty treatment.

Optionally, as shown in the figure, the user interface 400 may further include a filter switch control 450 .

When the filter switch control 450 is in an on state, the electronic device may perform filter beautification for the image of the first video and/or the image of the second video. That is, the electronic device may display the image of the first video and/or the image of the first video after beautification by the filter on the user interface 400; and, in the synthesized first target video, the first image area 560 The image within and/or the image within the second image area 570 may be a filtered image.

When the filter switch control 450 is in an off state, the electronic device may not perform filter beautification for the first person image 461 and the second person image 471 . That is to say, the electronic device can display the unfiltered image in the user interface 400 according to the original image of the first video and the original image of the second video; in the synthesized first target video, the first image area The image in 560 can be obtained from the original image of the first video, and the image in the second image area 570 can be obtained from the original image of the second first video, that is, the first target video may not include filtered images.

In an example, after the shooting of the first video is completed, the first electronic device may process the first video or the first target video according to the first video and the second video by using the video processing method provided in the embodiment of the present application , the first target video shown in Figure 5 and Figure 6 is obtained. In another example, the electronic device may simultaneously perform the shooting of the first video and the processing of the first video. This embodiment of the present application may not limit the specific sequence of steps for processing video. The following describes the video processing method provided by the embodiment of the present application with reference to FIG. 7 and FIG. 8 .

The first electronic device may extract the first action file according to the first video. The action file can indicate the relative positions of the multiple limbs of the character on the image in multiple frames of a video, thereby reflecting the action information of the character in the video. It can be known from the foregoing that the first video may include or be extracted to obtain a sub-video of the first person. The first person sub-video may include a plurality of first sub-frames. The first action file may include first action subfiles corresponding to the plurality of first subframes one-to-one. Each first subframe may contain one action of the first user. As shown in FIG. 7, 710 shows a first subframe A of the first person sub-video. The action performed by the first user in the first subframe A may be the first action A. 711 shows the first action sub-file A corresponding to the first action A.

The first electronic device may, for example, determine the first action sub-file A according to the positional relationship or coordinates between the following at least two items: the first head pixel, the first neck pixel, the first torso pixel, the first upper left pixel Forelimb pixel point, first left upper hindlimb pixel point, first left lower forelimb pixel point, first left lower hindlimb pixel point, first right upper forelimb pixel point, first right upper hindlimb pixel point, first right lower forelimb pixel point, first right Lower hindlimb pixels, first left-hand pixel, and first right-hand pixel.

The first head pixel point may be a pixel point corresponding to the head of the first user. The first neck pixel point may be a pixel point corresponding to the neck of the first user. The first torso pixel point may be a pixel point corresponding to the torso of the first user. The first upper left forelimb pixel point may be a pixel point corresponding to the upper left forelimb of the first user. The pixel point of the first upper left hind limb may be a pixel point corresponding to the upper left hind limb of the first user. The first lower left forelimb pixel point may be a pixel point corresponding to the lower left forelimb of the first user. The pixel point of the first lower left hind limb may be a pixel point corresponding to the lower left hind limb of the first user. The pixel point of the first upper right forelimb may be a pixel point corresponding to the upper right forelimb of the first user. The pixel point of the first upper right hind limb may be a pixel point corresponding to the upper right hind limb of the first user. The first lower right forelimb pixel point may be a pixel point corresponding to the first user's lower right forelimb. The pixel point of the first lower right hind limb may be a pixel point corresponding to the lower right hind limb of the first user. The first left-hand pixel point may be a pixel point corresponding to the left hand of the first user. The first right-hand pixel point may be a pixel point corresponding to the right hand of the first user. The first action sub-file may be data reflecting or indicating or describing or corresponding to the first action.

In one example, as shown in FIG. 7 , a plurality of pixel points can be approximately fitted into line segments. The type of line segment may include, for example, one or more of the following: head line segment, neck line segment, trunk line segment, left upper forelimb line segment, left upper hindlimb line segment, left lower limb line segment, left lower forelimb line segment, left lower hindlimb line segment, right upper forelimb line segment, right upper hindlimb line segment Line segment, right lower forelimb line segment, right lower hindlimb line segment, left hand line segment, right hand line segment. The action sub-file may include, for example, data of line segments fitted by pixel points.

The positional relationship between the two types of pixel points can correspond to information such as the angle and distance between the two fitted line segments. For example, type 1 pixels can be fitted to line segment 1, and type 2 pixels can be fitted to line segment 2. The length of line segment 1 can reflect the relative number of type 1 pixels; the length of line segment 2 can reflect the relative number of type 2 pixels. The positional relationship between the type 1 pixel point and the type 2 pixel point may correspond to information such as the angle and distance between the line segment 1 and the line segment 2. In this application, a negative angle value may mean that the limb is inclined to the left; a positive angle value may mean that the limb is inclined to the right. The larger the absolute value of the angle, the more tilted the limb can be considered to be.

In the example shown in FIG. 7 , the first action sub-file A may reflect that the first action A of the first user may include raising the right upper limb. The relative positional relationship between the first upper right hindlimb pixel point and the first torso pixel point, as well as the relative positional relationship between the first upper right frontal pixel point and the first right upper hindlimb pixel point, can reflect that in the first action A , the lifting angle of the upper right hind limb of the first user is the first right rear upper limb angle, and the lifting angle of the first user's right upper fore limb is the first right front upper limb angle. As shown in FIG. 7 , the first right upper hind limb angle may be, for example, about 85°, and the first right upper fore limb angle may be, for example, about -10°. That is, in other examples of the first user, the first upper right forelimb angle may be determined according to the first upper right forelimb pixel point and the first torso pixel point. In this case, the first right upper forelimb angle may be, for example, about 75°.

In the example shown in FIG. 7 , the first action sub-file A may reflect that the first action A of the first user may include raising the upper left limb. The relative positional relationship between the pixels of the second upper left hind limb and the pixels of the first torso, as well as the relative positional relationship between the pixels of the first upper left front and the pixels of the first upper left hindlimb, can reflect that in the first action A , the lift angle of the left upper hind limb of the first user is the first left upper hind limb angle, and the lift angle of the first user's left upper fore limb is the first left upper fore limb angle. As shown in FIG. 7 , the angle of the first upper left hind limb may be, for example, slightly less than -90°, and the angle of the first upper left front limb may be, for example, about -45°. In other examples, the first upper left forelimb angle may be determined according to the first upper left forelimb pixel point and the first torso pixel point. In this case, the first upper left forelimb angle may be, for example, about -135°.

In the example shown in FIG. 7 , the first action sub-file A may reflect that the first action A of the first user may include raising the right lower limb. The relative positional relationship between the pixels of the first lower right hindlimb and the first torso, as well as the relative positional relationship between the pixels of the first lower right front and the first lower right hindlimb, can reflect that in the first In Action A, the lifting angle of the first user's lower right hind limb is the first right rear lower limb angle, and the lifting angle of the first user's right lower fore limb is the first right front lower limb angle. As shown in FIG. 7 , the first right lower hind limb angle may be, for example, about 60°, and the first right lower fore limb angle may be, for example, about 0°. In other examples, the first right lower forelimb angle may be determined according to the first right lower forelimb pixel point and the first torso pixel point. In this case, the first right lower forelimb angle may be, for example, about 60°.

In the example shown in FIG. 7 , the first action sub-file A may reflect that the first action A of the first user may include not raising the left lower limb. The relative positional relationship between the pixels of the second lower left hind limb and the pixels of the first torso, as well as the relative positional relationship between the pixels of the first lower left front and the pixels of the first lower left hindlimb, can reflect that in the first action A , the lifting angle of the left lower hind limb of the first user is the first left lower hind limb angle, and the lifting angle of the first user's left lower fore limb is the first left lower fore limb angle. As shown in FIG. 7 , the first left lower hind limb angle may be, for example, about -5°, and the first left lower fore limb angle may be, for example, about 5°. In other examples, the first left lower forelimb angle may be determined according to the first left lower forelimb pixel point and the first torso pixel point. In this case, the first left lower forelimb angle may be, for example, about 0°.

In the example shown in FIG. 7 , the first action sub-file A may reflect that the first action A of the first user may include twisting the neck. The relative positional relationship between the first neck pixel point and the first torso pixel point can reflect that, in the first action A, the angle of the neck twist is the first neck angle. As shown in FIG. 7 , the first neck angle may be, for example, about 5°.

In the example shown in FIG. 7 , the first action sub-file A may reflect that the first action A of the first user may include twisting his head. The relative positional relationship between the first head pixel point and the first neck pixel point can reflect that, in the first action A, the angle of the head twist is the first head angle. As shown in FIG. 7 , the first head angle may be about 15°, for example. In other examples, the first head angle may be determined according to the first head pixels and the first torso pixels. In this case, the first head angle may be about 20°, for example.

In the example shown in FIG. 7 , the first action sub-file A may reflect that the first action A of the first user may include not leaning the torso. The relative positional relationship between the first torso pixel point and the mid-vertical line (the mid-vertical line may be perpendicular to the horizon) may reflect that the angle at which the torso is inclined in the first action A is the first torso angle. As shown in FIG. 7 , the first torso angle may be about 0°, for example.

Optionally, the first action sub-file A may reflect the first left-hand angle and/or the first right-hand angle.

For example, the first left hand angle may reflect the angle between the first left hand and the first upper left forelimb. The first left-hand angle can be obtained, for example, according to the first left-hand pixel point and the first left upper forelimb pixel point.

For another example, the first left hand angle may reflect the angle between the first left hand and the first torso. It can be obtained from the first left-hand pixel and the first torso pixel.

For example, the first right hand angle may reflect the angle between the first right hand and the first right upper forelimb. The first right hand angle can be obtained, for example, according to the first right hand pixel point and the first right upper forelimb pixel point.

For another example, the first right hand angle may reflect the angle between the first right hand and the first torso. It can be obtained from the first right hand pixel and the first torso pixel.

It should be understood that, in this embodiment of the present application, a possible action of the first user is described by using the example shown in FIG. 7 . The embodiments of the present application are not intended to limit the specific content of the first action.

The first electronic device may extract the second action file according to the second video. It can be known from the foregoing that the second video may include or be extracted to obtain a sub-video of the second person. The second person sub-video may include a plurality of second sub-frames. The second action file may include second action subfiles corresponding to the plurality of second subframes one-to-one. Each second subframe may contain an action of the target character. As shown in FIG. 7, 720 shows a second subframe a of the second person sub-video. The action made by the target person in the second subframe a may be the second action a. 721 shows the second action sub-file a corresponding to the second action a.

For example, the second electronic device may determine the second action sub-file a according to the positional relationship or coordinates between at least two of the following: the second head pixel, the second neck pixel, the second torso pixel, the second upper left pixel Forelimb pixels, second upper left hindlimb pixel, second left lower forelimb pixel, second left lower hindlimb pixel, second right upper forelimb pixel, second right upper hindlimb pixel, second right lower forelimb pixel, second right Lower hindlimb pixels, second left-hand pixel, second right-hand pixel.

Wherein, the second head pixel point may be a pixel point corresponding to the head of the target person. The second neck pixel point may be a pixel point corresponding to the neck of the target person. The second torso pixel point may be a pixel point corresponding to the torso of the target person. The second upper left forelimb pixel point may be a pixel point corresponding to the upper left forelimb of the target person. The second upper left hind limb pixel point may be a pixel point corresponding to the left upper hind limb of the target person. The second left lower forelimb pixel point may be a pixel point corresponding to the left lower forelimb of the target person. The second lower left hind limb pixel point may be a pixel point corresponding to the left lower hind limb of the target person. The second upper right forelimb pixel point may be a pixel point corresponding to the upper right forelimb of the target person. The second upper right hindlimb pixel point may be a pixel point corresponding to the upper right hindlimb of the target person. The second right lower forelimb pixel point may be a pixel point corresponding to the target person's lower right forelimb. The second lower right hindlimb pixel point may be a pixel point corresponding to the lower right hindlimb of the target person. The second left-hand pixel point may be a pixel point corresponding to the left hand of the target person. The second right-hand pixel point may be a pixel point corresponding to the right hand of the target person. The second action subfile may be data reflecting or indicating or describing or corresponding to the second action. As mentioned above, the positional relationship, quantity relationship, etc. between different types of pixels can reflect the action direction, action angle, action range, etc. of the target person.

In the example shown in FIG. 7 , the second action sub-file a may reflect that the second action a of the target person may include raising the right upper limb. The relative positional relationship between the second upper right hindlimb pixel point and the second torso pixel point, as well as the relative positional relationship between the second upper right frontal pixel point and the second right upper hindlimb pixel point, can reflect that in the second action a , the lifting angle of the upper right hind limb of the target person is the second right rear upper limb angle, and the lifting angle of the target person's upper right forelimb is the second right front upper limb angle. As shown in FIG. 7 , the angle of the second upper right hind limb may be, for example, about 60°, and the angle of the second upper right front limb may be, for example, about 30°. That is, in other examples of the target person, the second upper right forelimb angle may be determined according to the second upper right forelimb pixel point and the second torso pixel point. In this case, the second upper right forelimb angle may be, for example, about 90°.

In the example shown in FIG. 7 , the second action subfile a may reflect that the second action a of the target person may include raising the upper left limb. The relative positional relationship between the pixels of the second upper left hindlimb and the second torso, as well as the relative positional relationship between the pixels of the second upper left front and the second upper left hindlimb, can reflect that in the second action a , the lifting angle of the upper left hind limb of the target person is the second upper left hind limb angle, and the lifting angle of the upper left front limb of the target person is the second upper left forelimb angle. As shown in FIG. 7 , the angle of the second upper left hind limb may be, for example, slightly less than -135°, and the angle of the second upper left fore limb may be, for example, about -15°. In other examples, the second upper left forelimb angle may be determined according to the second upper left forelimb pixel point and the second torso pixel point. In this case, the second upper left forelimb angle may be, for example, about -150°.

In the example shown in FIG. 7 , the second action sub-file a may reflect that the second action a of the target person may include raising the right lower limb. The relative positional relationship between the pixels of the second lower right hindlimb and the second torso, as well as the relative positional relationship between the pixels of the second lower right front and the second lower right hindlimb, can reflect that in the second In action a, the lifting angle of the lower right hind limb of the target person is the second right rear lower limb angle, and the lifting angle of the target person's lower right fore limb is the second right front lower limb angle. As shown in FIG. 7 , the second right lower hind limb angle may be, for example, about 60°, and the second right lower front limb angle may be, for example, about 0°. In other examples, the second lower right forelimb angle may be determined according to the second lower right forelimb pixel point and the second torso pixel point. In this case, the second right lower forelimb angle may be, for example, about 60°.

In the example shown in FIG. 7 , the second action subfile a may reflect that the second action a of the target person may include not raising the left lower limb. The relative positional relationship between the pixels of the second lower left hind limb and the second torso, as well as the relative positional relationship between the pixels of the second lower left front and the second lower left hindlimb, can reflect that in the second action a , the raising angle of the lower left hind limb of the target person is the second lower left hind limb angle, and the raising angle of the lower left front limb of the target person is the second lower left front limb angle. As shown in FIG. 7 , the second left lower hind limb angle may be, for example, about 0°, and the second left lower fore limb angle may be, for example, about 0°. In other examples, the second left lower forelimb angle may be determined according to the second left lower forelimb pixel point and the second torso pixel point. In this case, the second left lower forelimb angle may be approximately 0°, for example.

In the example shown in FIG. 7 , the second action sub-file a may reflect that the second action a of the target person may include twisting the neck. The relative positional relationship between the second neck pixel point and the second torso pixel point may reflect that, in the second action a, the angle of the neck twist is the second neck angle. As shown in FIG. 7 , the second neck angle may be, for example, about 30°.

In the example shown in FIG. 7 , the second action sub-file a may reflect that the second action a of the target person may include twisting his head. The relative positional relationship between the second head pixel point and the second neck pixel point can reflect that, in the second action a, the angle of the head twist is the second head angle. As shown in FIG. 7 , the second head angle may be about 0°, for example. In other examples, the second head angle may be determined from the second head pixels and the second torso pixels. In this case, the second head angle may be about 30°, for example.

In the example shown in FIG. 7 , the second action sub-file a may reflect that the second action a of the target person may include not leaning the torso. The relative positional relationship between the second torso pixel point and the mid-perpendicular line (the mid-perpendicular line may be perpendicular to the horizon) may reflect that the angle at which the torso is inclined in the second action a is the second torso angle. As shown in FIG. 7, the second torso angle may be, for example, about -5°.

Optionally, the second action sub-file a may reflect the second left-hand angle and/or the second right-hand angle.

For example, the second left hand angle may reflect the angle between the second left hand and the second upper left forelimb. For example, the second left-hand angle can be obtained according to the second left-hand pixel point and the second left upper forelimb pixel point.

For another example, the second left hand angle may reflect the angle between the second left hand and the second torso. It can be obtained from the second left hand pixel and the second torso pixel.

For example, the second right hand angle may reflect the angle between the second right hand and the second upper right forelimb. For example, the second right-hand angle can be obtained according to the second right-hand pixel point and the second right upper forelimb pixel point.

As another example, the second right hand angle may reflect the angle between the second right hand and the second torso. It can be obtained from the second right hand pixel and the second torso pixel.

With reference to the example in FIG. 7, it can be seen that the motion of the first user and the motion of the target person are relatively similar, but there are still differences between the two: the angle of the first upper right forelimb can be, for example, about -10°, and the angle of the second upper right forelimb, for example, can be about is 30°. The first right upper hindlimb angle may be, for example, about 85°, and the second right upper hindlimb angle may be, for example, about 60°. The first upper left forelimb angle may be, for example, about -45°, and the second upper left forelimb angle may be, for example, about -15°. The first upper left hind limb angle may be, for example, slightly less than -90°, and the second upper left hind limb angle may be, for example, about -135°. The first right lower forelimb angle may be, for example, about 0°, and the second right lower forelimb angle may be, for example, about 0°. The first right lower hindlimb angle may be, for example, about 60°, and the second right lower hindlimb angle may be, for example, about 60°. The first left lower forelimb angle may be, for example, about 5°, and the second left lower forelimb angle may be, for example, about 0°. The first left lower hindlimb angle may be, for example, about -5°, and the second left lower hindlimb angle may be, for example, about 0°. The first neck angle may be about 5°, for example. The second neck angle may be about 30°, for example. The first head angle may be approximately 15°, for example. The second head angle may be approximately 30°, for example. The first torso angle may be approximately 0°, for example. The second torso angle may be about -5°, for example.

Comparing the first action A with the second action a, at least it can be concluded that the angle of the first upper right upper limb may be different from the angle of the second upper right forelimb; the angle of the first upper right hind limb may be different from the angle of the second upper right hind limb; the angle of the first upper left forearm may be different from that of the second upper right hind limb. can be different from the second upper left forelimb angle; the first left upper hindlimb angle can be different from the second left upper hindlimb angle; the first left lower forelimb angle can be different from the second left lower forelimb angle; the first left lower hindlimb angle can be different from the second left lower hindlimb angle ; the first neck angle can be different from the second neck angle; the first head angle can be different from the second head angle; the first torso angle can be different from the second torso angle.

The first electronic device can adjust the pixels in the first subframe according to the second action subfile a, and the first action subfile A can also be adjusted accordingly, so that the action subfile corresponding to the processed first video is the same as the first action subfile A. The two action subfiles a can be as similar or corresponding as possible.

For example, in the example shown in FIG. 7 , the first electronic device may adjust the pixels in the first subframe so that: the first right upper right upper limb angle can be equal to or approximately equal to the second right upper forelimb angle; the first right upper hind limb angle The angle may be equal or approximately equal to the second upper right hind limb angle; the first left upper forelimb angle may be equal or approximately equal to the second left upper forelimb angle; the first left upper hind limb angle may be equal or approximately equal to the second left upper hind limb angle; The forelimb angle may be equal or approximately equal to the second left lower forelimb angle; the first left lower hind leg angle may be equal or approximately equal to the second left lower hind leg angle; the first neck angle may be equal or approximately equal to the second neck angle; the first The head angle may be identical or approximately identical to the second head angle; the first torso angle may be identical or approximately identical to the second torso angle. Furthermore, the first action a in the processed first subframe may be as similar or corresponding to the first action A in the second video as possible.

The first electronic device may output the processed first subframe, as shown by 730 in FIG. 7 . It can be seen that the adjusted action of the first user may be more similar to the action of the target character. That is to say, through the method shown in FIG. 7 , the action angle, action direction, action range, etc. of the user in the first video can be adjusted, thereby helping to improve the action matching degree between the first user and the target person.

The first electronic device may compare the size of the first user sub-video with the playback size of the second user sub-video according to the first action sub-file A and the second action sub-file a. For example, according to at least one of head pixels, neck pixels, torso pixels, left upper body pixels, left lower body pixels, right upper body pixels, right lower body pixels, left hand pixels, and right hand pixels, One or more line segments can be fitted. The first electronic device may determine the size of the user sub-video according to the length of the one or more line segments.

In the example shown in FIG. 7 , the size of the first user sub-video may be relatively small, and the size of the second user sub-video may be relatively large. The first electronic device may adjust the pixels in the first user sub-video so that the size of the first user sub-video and the size of the second user sub-video can relatively match. That is, with the method shown in FIG. 7 , the screen ratio occupied by the first user sub-video can be adjusted, which is beneficial to improve the size matching degree between the first user sub-video and the second user sub-video.

The multiple first subframes of the first person sub-video may further include a first subframe B, as shown by 810 in FIG. 8 . The action performed by the first user in the first subframe B may be the first action B. The first subframe A and the first subframe B are two different subframes in the first person sub-video. The first electronic device may determine the first action subfile B according to the first subframe B in combination with the method shown in FIG. 7 , as shown in 811 in FIG. 8 .

The multiple second subframes of the second person sub-video may further include a second subframe b, as shown in 820 in FIG. 8 . The action performed by the second user in the second subframe b may be the second action b. The second subframe a and the second subframe b are two different subframes in the second person sub-video. The first electronic device may obtain the second action subfile b according to the second subframe b or from the cloud server in combination with the method shown in FIG. 7 , as shown in 821 in FIG. 8 .

The first subframe A (shown as 710 in FIG. 8 ) may correspond to the second subframe a (shown as 720 in FIG. 8 ), and the first subframe B may correspond to the second subframe b. That is to say, the first action sub-file A (shown as 711 in FIG. 8 ) may have a relatively high similarity with the second action sub-file a (shown as 721 in FIG. 8 ). The file B may have a relatively high similarity with the second action sub-file b.

In one example, the time difference between the first subframe A and the first subframe B may be T, and the time difference between the second subframe a and the second subframe b may be t. That is, in the first video, relative to the second video, the first user makes the first action A and transitions to the first action B may be relatively fast or relatively slow. As shown in FIG. 8 , T may be greater than t, that is, the action of the first user may be relatively slow.

The first electronic device may adjust the subframe between the first subframe A and the first subframe B according to the first subframe A, the first subframe B, the second subframe a, and the second subframe b, so that the A subframe B may be close to the first subframe A or far away from the first subframe A, and then the time difference between the first subframe A and the first subframe B may be adjusted. For example, in the first video, a subframe with a distance t from the first subframe A is the first subframe C (as shown by the dotted rectangle in FIG. 8 ). The first electronic device can adjust the subframe between the first subframe A and the first subframe B, so that the first subframe B can be adjusted to the position before the first subframe C is adjusted.

With reference to the example shown in FIG. 8 , the time for the target person to make the second action a and transition to the second action b is relatively short, that is to say, the target person’s action is relatively fast; the first electronic device can reduce the first subframe A The time difference between the first subframe B and the first subframe B, so that the action of the first user can be accelerated, as shown by the dashed arrow in FIG. 8 .

With the method shown in FIG. 8 provided by the embodiment of the present application, the first electronic device can process the first video, which is beneficial to improve the speed similarity between the actions of the first user and the actions of the target person. The processed first video may be the first target video.

In other examples, in response to the user's operation on the recording control 410 shown in FIG. 4 , the electronic device may further process the first video to obtain the first target video, as shown in the user interface 900 in FIG. 9 . The first target video may be obtained by processing the first video and the second video through the methods shown in FIG. 7 and FIG. 8 . The first target video includes, for example, the first image area 560 shown in FIG. 5 and FIG. 6 , that is, includes pixels corresponding to the first interface area 460 shown in FIG. 4 . That is, the first target video shown in FIG. 9 may not include the second image area 570 shown in FIG. 5 and FIG. 6 , that is, not include pixels corresponding to the second interface area 470 shown in FIG. 4 .

As shown in FIG. 5 and FIG. 6 , the video processing method provided by the embodiment of the present application can be applied to the optimization of action coordination in a co-shot scene. As shown in FIG. 8 , the method for processing video provided by the embodiment of the present application may also be applied to other scenarios except the co-shooting scenario, for example, action optimization is performed for a single video. With the video processing method provided in the embodiment of the present application, the range of motion of the first user, the screen size of the first user, the speed of the motion of the first user, etc. in the first video can be adjusted, thereby reducing the impact of the first user on the first video. amount of post-processing.

In response to the user's operation on the gallery application, the electronic device may retrieve the first target video shown in FIG. 5 , FIG. 6 or FIG. 9 , so that the user can watch the first target video. In response to the user's operation on the gallery application, the electronic device may perform post-adjustment on the first target video. For example, you can adjust the speed of the first image area, the playback speed of the second image area, beautify the first image area, beautify the second image area, the size of the first image area, and the size of the second image area.

In a possible example, in response to the first user acting on the action optimization control 550 shown in FIG. 5 , FIG. 6 , and FIG. 9 , the first electronic device may adjust the relationship between the first action sub-file and the second action sub-file similarity, and then the first user can flexibly specify the adjustment range of the action. For example, if the first user does not want the first video to be identical to the second video, the first user can use the action optimization control 550 to reduce the degree to which the first video or the first target video is processed or optimized; When a user wants the first video and the second video to be as identical as possible, the first user can use the action optimization control 550 to increase the degree to which the first video or the first target video is processed or optimized. The degree to which the first video is processed or optimized may be 0.6˜0.7 by default, for example. In this way, it is not only beneficial to improve the action matching degree between the action of the first user and the target character, but also to retain the characteristics of the first user.

In some scenarios, multiple users can imitate the same footage or the same gallery video. The electronic device may synthesize the videos of the multiple users into a co-shot video, so as to improve the coordination of the co-shot video including the multiple users.

Referring to the examples shown in FIG. 3 to FIG. 9 , the third user can imitate the target person in the second video to make a series of actions, and shoot the third video through the third electronic device. The target character may be the same character as the character imitated by the first user above. As shown in FIG. 10 , the third electronic device may display the user interface 1000 as shown in FIG. 10 . The user interface 1000 may include a third interface area 1060 and a fourth interface area 1070 .

As shown in FIG. 10 , the third electronic device may display a third character image 1061 and a third background image 1062 in the third interface area 1060, wherein the third character image 1061 may include pixels corresponding to the third user, and the third The background image 1062 may include pixel points corresponding to the scene where the third user is located. The third interface area 1060 may be used to preview the shooting effect of the third user. The third electronic device may display the second character image 471 in the fourth interface area 1070, and then may play the screen of the second character sub-video. The fourth interface area 1070 may be used to prompt the action of the target character.

As shown in FIG. 10 , the user interface may include recording controls 1010 . In response to an operation performed by the third user on the recording control 1010, the third electronic device may capture a third video. The third video may include or be extracted to obtain a third user sub-video and a third background sub-video.

In a possible example, in conjunction with the video processing methods shown in FIG. 7 and FIG. 8 , the third electronic device may determine, according to the third user sub-video, a plurality of third action sub-files, the plurality of third action sub-files There may be one-to-one correspondence with multiple third subframes of the third user sub-video. The third action subfile may be used to reflect the actions of the third user in the third subframe. Optionally, in the example shown in FIG. 10 , the mirror action of the third user may have a higher degree of matching with the action of the target character, and the third electronic device may determine the plurality of third actions according to the mirror video of the third video. sub file.

The third electronic device can adjust the pixels of the third video according to the third action sub-file and the second action sub-file corresponding to the third action sub-file, so as to realize the processing of the third video, which can be beneficial to improve the Action similarity between the third user and the target character. As shown in FIG. 10 , the captured movements of the third user are similar to but slightly different from those of the target person; as shown in FIG. 11 , after the third video is processed, the movements of the third user are similar to those of the target person. can have higher similarity.

In one example, the third electronic device may synthesize the second video and the third video into a second target video, where the second target video and/or the third video may be processed videos. In the example shown in FIG. 11 , the second target video may belong to a video that has undergone split-screen processing and has undergone background removal processing. The second target video may include a third image area 1160 and a fourth image area 1170 . The third image area 1160 may include pixels corresponding to the third user sub-video and pixels corresponding to the second background sub-video. The fourth image area 1170 may include pixels corresponding to the second user sub-video and pixels corresponding to the second background sub-video. That is, the background image in the second video can serve as the background of the third user sub-video.

In other examples, the second target video may not belong to a split-screen video, or may belong to a video that has undergone split-screen processing but has not undergone background removal processing.

Since both the first user and the third user imitate the action of the target person, synthesizing the processed first video and the third video into the third target video is beneficial to obtain a co-shot video with a relatively high degree of coordination. As shown in FIG. 12 , the first electronic device or the third electronic device may synthesize the first video and the third video into a third target video, where both the first video and the third video may be processed videos.

In a possible example, the third target video shown in FIG. 12 may belong to a video that has undergone split-screen processing but has not undergone background removal processing. The third target video may include a fifth image area 1260 and a sixth image area 1270 . The fifth image area 1260 may include pixels corresponding to the first user sub-video shown in FIG. 5 and pixels corresponding to the first background sub-video. The sixth image area 1270 may include pixels corresponding to the third user sub-video shown in FIG. 10 and pixels corresponding to the third background sub-video. In other examples, the third target video may belong to a video that has undergone split-screen processing and background removal processing.

In another example, the third target video shown in FIG. 13 may belong to a video that has not undergone split-screen processing but has undergone background removal processing. The third target video may include a fifth image area 1260, a sixth image area 1270, and a second background image area 1380. The fifth image area 1260 may include pixels corresponding to the first user sub-video, and may not include pixels corresponding to the first background sub-video. The sixth image area 1270 may include pixels corresponding to the third user sub-video, and may not include pixels corresponding to the third background sub-video. The second background image area 1380 may include pixel points corresponding to the target gallery image. In other examples, the second background image area 1380 may include pixel points corresponding to the first background image shown in FIG. 4 or the third background image shown in FIG. 10 .

In combination with the examples described above, in other possible applications, the first user may imitate the first target person in the second video to make a series of actions, and shoot the first video through the first electronic device; the third user may The second target person in the second video is imitated to make a series of actions, and the third video is captured by the third electronic device. The first target person and the second target person may be two different people in the second video. Referring to the examples described above, the first electronic device, the third electronic device, or other electronic devices may process the first video and the third video to obtain a target video including the first user and the third user.

In one example, the electronic device may acquire the first action file, the second action file, the third action file, and the fourth action file. The first action file can be obtained by extracting action information of the first user in the first video. The second action file can be obtained by extracting the action information of the first target person in the second video. The third action file can be obtained by extracting action information of the third user in the third video. The fourth action file can be obtained by extracting action information of the second target person in the second video.

The electronic device may compare the first action sub-file with the second action sub-file, process the first video, and obtain a first target video, where the actions of the first user in the first target video may be the same as the actions of the first user in the first video. Slightly different, the actions of the first user in the first target video may be more similar to the actions of the first target person in the second video.

The electronic device may compare the third action sub-file with the fourth action sub-file, process the third video, and obtain a third target video, and the action of the third user in the third target video may be the same as the action of the third user in the third video. Slightly different, the actions of the third user in the third target video may be more similar to the actions of the second target person in the second video.

The electronic device may synthesize the first target video and the third target video into a new video, and the new video may show actions of the first user in the first target video and actions of the third user in the third target video. Optionally, the new video may also include data in the second video. For example, the new video may also show the actions of the first target person and the second target person in the second video.

In other examples, the electronic device may directly generate a complete video according to the first video, the third video, the first action file, the second action file, the third action file, and the fourth action file, or skip generating the first action file. The steps of a target video and a third target video. FIG. 14 is a schematic diagram of another user interface 1400 provided by an embodiment of the present application. The user interface 1400 may be displayed on the first electronic device. The user interface 1400 may be an interface of a Connect application, or an interface of other applications having a video call function. That is to say, the first electronic device can carry the Changlian application or other applications with a video call function. The first electronic device may display the user interface 1400 in response to operations performed by the first user corresponding to the applications.

For example, the first user can open the Changlian application by clicking on the icon of the Changlian application, and then the first electronic device can display the user interface 1400 .

The user interface 1400 may include a plurality of user controls 1410 in a one-to-one correspondence with a plurality of users. The plurality of users may include a second user. In response to an operation (eg, a click operation) of the second user control 1410 by the first user, the first electronic device may display the contact information of the second user. The contact information of the second user may include at least one of the following: the name of the second user, the contact information of the second user, the call record of the second user, and the like.

As shown in FIG. 14, the user interface may include a user search control 1420. In one example, the first user may invite the second user to make a video call through the user search control. In response to the first user's operation on the user's search control (such as a click operation) and a series of subsequent operations (such as text input, voice input, scanning a QR code, etc.), the first electronic device can obtain the relevant information of the second user ( For example, part or all of the name of the second user, the initials of the second user's name, part or all of the video call number of the second user, etc.). The first electronic device may determine user records of the second user from multiple user records stored in the first electronic device according to the relevant information of the second user, and the multiple user records may be in one-to-one correspondence with the multiple users. Furthermore, the first electronic device can quickly display the user control of the second user on the user interface.

Optionally, the user interface may include common user controls 1412. As shown in FIG. 14 , the second user may belong to frequently used contacts, and the user interface may include a frequently used user control 1411 corresponding to the second user.

In one example, the first electronic device may count the user with the most times of co-shooting as user A, and display a common user control A on the user interface, where the common user control A may be a control corresponding to user A. In another example, the first electronic device may count the user with the most video calls as user B, and display a common user control B on the user interface, where the common user control B may be a control corresponding to user B.

Optionally, in order to facilitate the first user to quickly search for the second user, multiple users may be arranged in alphabetical order, for example.

Optionally, the user interface may include letter controls. In response to an operation performed by the first user on the letter control, the first electronic device may switch user controls displayed on the user interface.

The user interface may include smooth video controls 1430 . As shown in FIG. 14 , the user interface may include a plurality of smooth video controls 1430 corresponding to a plurality of users one-to-one.

The first user may invite the second user to make a video call through the first electronic device. In response to an operation (such as a click operation) performed by the first user on the connected video control 1430 corresponding to the second user, the first electronic device may initiate a video call to the second electronic device, wherein the second electronic device may be the second user electronic equipment used. Correspondingly, the second user may receive the video call invitation from the first user through the second electronic device. The second electronic device may display an interface for the video call invitation, and the interface may include controls for answering the video call. In response to an operation performed by the second user on the video call answering control, a video call connection can be established between the first electronic device and the second electronic device.

After the first electronic device establishes a video call connection with the second electronic device, the first electronic device can obtain the first video by shooting, and the second electronic device can obtain the second video by shooting; and the first electronic device can obtain the video through the first electronic device. The call is connected to obtain the second video, and the second electronic device can obtain the first video through the video call connection.

In one example, the first user may invite the second user to take a photo remotely during the video call. In other examples, the second user may invite the first user to take a photo remotely during the video call. After the remote co-shooting is authorized by both the first user and the second user, the first electronic device and the second electronic device may display a user interface 1500 as shown in FIG. 15 . User interface 1500 may be a preparation interface for a remote sync.

Optionally, the user interface shown in FIG. 14 may further include a remote synchronization control 1440 . As shown in FIG. 14, the user interface 1400 may include a plurality of remote snap controls 1440 that correspond one-to-one with a plurality of users. The first user may invite the second user to complete the remote co-shooting through a video call through the remote co-shooting control 1440 . With reference to FIG. 14 , in response to an operation (such as a click operation) performed by the first user on the remote synchronization control 1440, the first electronic device can initiate a video call to the second electronic device, and send instruction information to the second electronic device. The indication information is used to invite the second user to take a photo together, wherein the second electronic device may be an electronic device used by the second user. Correspondingly, the second user may receive the remote co-shooting invitation from the first user through the second electronic device. The second electronic device may display an interface for the remote co-shot invitation, which interface may include controls for answering the video call. In response to the operation of the second user acting on the video call answering control, a video call connection can be established between the first electronic device and the second electronic device, and both the first electronic device and the second electronic device can display as shown in Figure 15 user interface 1500.

As shown in FIG. 15 , the user interface 1500 may include a first interface area 1560 and a second interface area 1570. The first interface area 1560 may display part or all of the images currently captured by the first electronic device, and the second interface area 1570 may display Part or all of the images currently captured by the second electronic device. The first interface region 1560 and the second interface region 1570 may not cross each other. The first interface area 1560 and the second interface area 1570 may be located anywhere on the user interface 1500 . As shown in FIG. 15 , the first interface area 1560 may be located above the user interface 1500 , and the second interface area 1570 may be located below the user interface 1500 . That is, some or all of the images captured by the first electronic device and some or all of the images captured by the second electronic device may be displayed on the user interface 1500 at the same time.

The user can observe the user interface 1500, and then can preview the in-time effect of the first user and the second user. For example, as shown in FIG. 15 , in the case where the first user uses the front camera of the first electronic device to take a selfie, and the second user uses the front camera of the second electronic device to take a selfie, the first interface area 1560 may include the first The character image 1561 , the second interface area 1570 may include the second character image 1571 . That is, the first interface area 1560 may include pixels corresponding to the first user, and the second interface area 1570 may include pixels corresponding to the second user. It should be understood that in other examples, the first electronic device and/or the second electronic device may use a rear-facing camera to capture an image containing the user.

User interface 1500 may also include controls for adjusting the in-time effect. As shown in FIG. 15 , the user interface 1500 may include a split screen switch control 1520 , a background removal switch control 1530 , and a beautification switch control 1540 . These controls allow the user to adjust the timing effect before or during the timing. Optionally, referring to the embodiment shown in FIG. 4 , the split screen switch control 1520 may have the function of the split screen switch control 420 described above, and the background removal switch control 1530 may have the background removal switch control 430 described above. The beautification switch control 1540 may have the functions of the beauty switch control 440 and/or the filter switch control 450 described above, which will not be described in detail here.

User interface 1500 may include recording controls 1510 . In response to the user's operation on the recording control 1510, the electronic device can synthesize the first video shot by the first electronic device and the second video shot by the second electronic device to obtain the first target video as shown in FIG. 5 and FIG. 6 . . That is to say, in the examples shown in FIG. 14 and FIG. 15 , the user can obtain a co-shot video with relatively high coordination through the Changlian application and the video processing method provided by the embodiment of the present application.

Combined with the examples described above, a new application scenario is described below.

The first user can open the Changlian application by clicking the icon of the Changlian application. The first electronic device may display a plurality of user controls on the user interface. The plurality of users may include a third user. In response to an operation (eg, a click operation) by the first user on the control of the third user, the first electronic device may initiate a video call to the second electronic device used by the third user, and invite the third user to make a video call. Correspondingly, the third user may receive the video call invitation from the first user through the second electronic device. After that, a video call connection can be established between the first electronic device and the second electronic device.

After the first electronic device establishes a video call connection with the second electronic device, the first electronic device can obtain the first video by shooting, and the first video can be the video of the first user; the second electronic device can obtain the third video by shooting video, the first video may be a video of a third user; and the first electronic device may obtain the third video through the video call connection, and the second electronic device may obtain the first video through the video call connection. By extracting the action information of the first video, a first action file can be obtained, and the first action file can indicate the action of the first user in the first video. By extracting the action information of the third video, a third action file can be obtained, and the third action file can indicate the actions of the third user in the third video.

The first user can invite the third user to take a photo remotely during the video call. Alternatively, the third user may invite the first user to take a photo remotely during the video call. After the remote co-op is authorized by both the first user and the third user, the first electronic device and the second electronic device may display a preparation interface for remote co-op. The material co-shot control 330 and/or the gallery co-shot control 340 as shown in FIG. 3 may be displayed on the remote co-shot preparation interface. One of the first user and the third user can select the second video through the material co-shot control 330 or the gallery co-shot control 340 .

In one example, the second video may be a video of the first target person, showing the actions of the first target person. During the video call, the first user can imitate the action of the first target person in the second video, and the third user can imitate the action of the first target person in the second video. The period during which the first user imitates the action may be the same or different from the period during which the third user imitates the action.

One of the first electronic device and the second electronic device may, according to the acquired first video, third video, first action file, third action file, and the second action file corresponding to the second video, make an The first video and the third video are processed to obtain the target video. The second action file may correspond to the action of the first target person in the second video.

The target video may include an image of a first user and an image of a third user; wherein the actions of the first user in the target video may be different from the actions of the first user in the first video, and the first user is in the target video The action of the first target person in the second video may correspond to the action of the first target person; the action of the third user in the target video may be different from the action of the third user in the third video, and the action of the third user in the target video It may correspond to the action of the first target person in the second video.

Optionally, the target video may further include an image of the first target person in the second video.

In another example, the second video may be a video of the first target person and the second target person, showing actions of the first target person and actions of the second target person. During the video call, the first user can imitate the action of the first target person in the second video, and the third user can imitate the action of the second target person in the second video. The period during which the first user imitates the action may be the same or different from the period during which the third user imitates the action.

One of the first electronic device and the second electronic device can obtain the first video, the third video, the first action file, the third action file, and the second action file, the third action file corresponding to the second video Four action files, the first video and the third video are processed to obtain the target video. The second action file may correspond to the action of the first target person in the second video. The fourth action file may correspond to the action of the second target person in the second video.

The target video may include an image of a first user and an image of a third user; wherein the actions of the first user in the target video may be different from the actions of the first user in the first video, and the first user is in the target video The action of the first target person in the second video may correspond to the action of the first target person; the action of the third user in the target video may be different from the action of the third user in the third video, and the action of the third user in the target video It may correspond to the action of the second target person in the second video.

Optionally, the target video may further include an image of the first target person in the second video, and an image of the second target person in the second video.

This embodiment of the present application also provides a method 1600 for processing video, and the method 1600 may be implemented in an electronic device (eg, a mobile phone, a tablet computer, etc.) as shown in FIG. 1 and FIG. 2 . As shown in Figure 16, the method 1600 may include the following steps:

1601. The first electronic device acquires a first video, where the first video is a video of a first character.

Exemplarily, for the first character and the first video, reference may be made to the examples shown in the first character image 461 in FIGS. 4 to 6 .

Exemplarily, for the first character and the first video, reference may be made to the example shown in the first character image 1561 in FIG. 15 .

1602. The first electronic device acquires a first action file corresponding to the first video, where the first action file corresponds to the action of the first character.

Exemplarily, the first action file may refer to the example shown in the first action sub-file 711 in FIG. 7 .

Exemplarily, the first action file may refer to the example shown in the first action sub-file 811 in FIG. 8 .

1603. The first electronic device acquires a second action file corresponding to a second video, where the second video is a video of a second character, and the second action file corresponds to the action of the second character.

Exemplarily, the second action file may refer to the example shown in the second action sub-file 721 in FIG. 7 .

Exemplarily, the second action file may refer to the example shown in the second action sub-file 821 in FIG. 8 .

1604. The first electronic device generates a target video according to the first video, the first action file, and the second action file, where the target video includes a first character image of the first character, so The actions of the first character in the target video are different from those of the first character in the first video, and the actions of the first character in the target video are different from those of the second character in the second video. The actions of the characters correspond.

Exemplarily, for the first person image in the target video, reference may be made to the examples shown in the first person image 461 in FIG. 5 , FIG. 6 , and FIG. 9 . Exemplarily, for the first character image in the target video, reference may be made to the example shown in the first character image 1561 in FIG. 15 .

Optionally, before the first electronic device acquires the first video, the method further includes: the first electronic device establishes a video call connection between the first electronic device and the second electronic device, and the first electronic device establishes a video call connection between the first electronic device and the second electronic device. The electronic device is the electronic device of the first character, and the second electronic device is the electronic device of the second character; acquiring the first video by the first electronic device includes: the first electronic device is in a video call During the process, the first video is acquired; the method further includes: the first electronic device acquires the second video from the second electronic device through the video call connection.

Exemplarily, for the process of establishing a video call connection, reference may be made to the examples shown in FIG. 14 to FIG. 15 .

Exemplarily, for the second character and the second video, reference may be made to the example shown in the second character image 1571 in FIG. 15 .

Optionally, the first video and the second video correspond to the same time period during the video call, the target video further includes a second character image of the second character, and the target video Actions of the second character correspond to actions of the second character in the second video. Exemplarily, a certain frame of the target video may be shown with reference to the user interface shown in FIG. 15 . That is to say, the first character and the second character can perform similar actions synchronously. From the actions of the second character, the first video is processed to make the actions of the first character and the second character in the target video more coordinated.

Exemplarily, for the second person image in the target video, reference may be made to the example shown in the second person image 1571 in FIG. 15 .

Optionally, the method further includes: acquiring, by the first electronic device, a third video, where the third video is a video of a third person; and acquiring, by the first electronic device, a third video corresponding to the third video. an action file, wherein the third action file corresponds to the action of the third character; the first electronic device generates a target video according to the first video, the first action file, and the second action file, including: the first electronic device generates the target video according to the first video, the third video, the first action file, the second action file, and the third action file, and the The target video further includes a third character image of the third character, the actions of the third character in the target video are different from those of the third character in the third video, and the actions of the third character in the target video are different from those of the third character in the target video. The actions of the third character correspond to the actions of the second character in the second video.

Exemplarily, the third video may refer to the example shown in the third interface area 1060 of FIG. 10 . For the image of the third person in the third video, reference may be made to the example shown in the image 1061 of the third person in 10 .

Exemplarily, the target video may refer to the example shown in the third image area 1160 in FIG. 11 , or the example shown in the third image area 1160 and the fourth image area 1170 in FIG. 11 , or the third image in FIG. 12 . The example shown in area 1260, or refer to the example shown in the third image area 1260, the fourth image area 1270 of FIG. 12, or refer to the example shown in the user interface 1300 of FIG.

Optionally, the target video further includes a second character image of the second character, and actions of the second character in the target video correspond to actions of the second character in the second video.

Optionally, the first person image and the second person image belong to the same frame image in the target video.

Optionally, the second video is a video of the second character and the fourth character, and the method further includes: acquiring, by the first electronic device, a third video, and the third video is a third character the first electronic device obtains a third action file corresponding to the third video, and the third action file corresponds to the action of the third character; the first electronic device obtains a fourth action file , the fourth action file corresponds to the action of the fourth character in the second video; the first electronic device is based on the first video, the first action file, and the second action file , generating a target video, including: the first electronic device according to the first video, the third video, the first action file, the second action file, the third action file, the first action file Four action files to generate the target video, the target video further includes a third character image of the third character, the actions of the third character in the target video and the third character in the third video The actions of the characters are different, and the actions of the third character in the target video correspond to the actions of the fourth character in the second video.

Optionally, the target video further includes a second character image of the second character and a fourth character image of the fourth character, and the actions of the second character in the target video are related to the second video. The action of the second character in the target video corresponds to the action of the fourth character in the target video, and the action of the fourth character in the second video corresponds to the action of the fourth character in the second video.

Optionally, the first person image, the second person image, the third person image, and the fourth person image belong to the same frame of images in the target video.

Optionally, before the first electronic device acquires the first video, the method further includes: the first electronic device establishes a video call connection between the first electronic device and the second electronic device, and the first electronic device establishes a video call connection between the first electronic device and the second electronic device. The electronic device is the electronic device of the first character, and the second electronic device is the electronic device of the third character; the acquisition of the first video by the first electronic device includes: the first electronic device is in the process of a video call , acquiring the first video; and acquiring, by the first electronic device, a third video, comprising: the first electronic device acquiring the third video from the second electronic device through the video call connection.

Exemplarily, for the process of the video call between the first electronic device and the second electronic device, reference may be made to the examples shown in FIG. 14 to FIG. 15 .

Optionally, the first video and the third video correspond to the same time period during the video call.

Optionally, establishing, by the first electronic device, a video call connection between the first electronic device and the second electronic device includes: establishing, by the first electronic device, the first electronic device through a photographing application or a video calling application. A video call connection with a second electronic device.

Exemplarily, for a photographing application, reference may be made to the example shown in FIG. 3 .

Exemplarily, the video call application may refer to the examples shown in FIG. 14 to FIG. 15 .

Optionally, the second video is a video stored locally or in the cloud.

Exemplarily, for the locally stored video, reference may be made to the example shown in the gallery co-shot control 340 shown in FIG. 3 .

Exemplarily, for the video stored in the cloud, reference may be made to the example shown in the material co-shot control 330 shown in FIG. 3 .

Optionally, acquiring, by the first electronic device, a second action file corresponding to the second video includes: acquiring, by the first electronic device, the second action file from the second electronic device.

Optionally, the action of the first character in the target video corresponds to the action of the second character in the second video, including: the action file corresponding to the first character image is the first target action file, the matching degree between the first action file and the second action file is the first matching degree, and the matching degree between the first target action file and the second action file is the second matching degree , the second matching degree is greater than the first matching degree.

Optionally, obtaining, by the first electronic device, a first action file corresponding to the first video includes: the first electronic device determining the first action sub-file according to at least two of the following: a first header Part pixel, first neck pixel, first torso pixel, first left upper forelimb pixel, first left upper hindlimb pixel, first left lower forelimb pixel, first left lower hindlimb pixel, first right upper forelimb pixel point, the first right upper hind limb pixel point, the first right lower forelimb pixel point, the first right lower hind limb pixel point, the first left hand pixel point, and the first right hand pixel point.

Exemplarily, the first action sub-file may refer to the example shown in the first action sub-file 711 in FIG. 7 .

Optionally, the first action subfile includes at least one of the following limb angles: a first head angle, a first neck angle, a first trunk angle, a first upper left forelimb angle, a first upper left hind limb angle, a first Left lower forelimb angle, first left lower hindlimb angle, first right upper forelimb angle, first right upper hindlimb angle, first right lower forelimb angle, first right lower hindlimb angle, first left hand angle, first right hand angle.

Optionally, the first action file corresponds to a first limb angle, the second action file corresponds to a second limb angle, the target action file corresponds to a third limb angle, and the first limb angle is the same as the second limb angle. The difference between the limb angles is smaller than a preset angle, and the third limb angle is between the first limb angle and the second limb angle.

Optionally, the first video includes a first subframe and a second subframe, the second video includes a third subframe and a fourth subframe, and the target video includes a fifth subframe and a sixth subframe , the first subframe, the third subframe, and the fifth subframe correspond to each other, the second subframe, the fourth subframe, and the sixth subframe correspond to each other, and the first subframe The time difference between a subframe and the second subframe is the first time difference, the time difference between the third subframe and the fourth subframe is the second time difference, and the fifth subframe and the The time difference of the sixth subframe is a third time difference, and the third time difference is between the first time difference and the second time difference.

Optionally, the target video includes a first image area and a second image area, the first image area includes pixels corresponding to the first character, and the second image area includes pixels corresponding to the second character. corresponding pixels.

Exemplarily, for the first image area, reference may be made to the examples shown in the first image area 560 in FIG. 5 , FIG. 6 , and FIG. 9 . Exemplarily, for the first character image in the target video, reference may be made to the example shown in the first interface area 1560 in FIG. 15 .

Optionally, the first image area includes pixels corresponding to any of the following: a first background image, a second background image, and a target gallery image, and the first background image includes a scene with the first character. Corresponding pixel points, the second background image includes pixel points corresponding to the scene where the second character is located, and the target gallery image is an image stored on the first electronic device.

Exemplarily, the first image area includes a first background image, and the first image area may be, for example, the first image area 560 shown in FIG. 5 .

Exemplarily, the first image area includes a second background image, and the first image area may be, for example, the third image area 1160 shown in FIG. 11 .

Optionally, the second image area includes pixels corresponding to any one of the following: a first background image, a second background image, and a target gallery image, and the first background image includes a scene with the first character. Corresponding pixel points, the second background image includes pixel points corresponding to the scene where the second character is located, and the target gallery image is an image stored on the first electronic device.

Exemplarily, the second image area includes the first background image, and the second image area may be, for example, the second image area 570 shown in FIG. 5 .

Exemplarily, the second image area includes a second background image, and the second image area may be, for example, the fourth image area 1170 shown in FIG. 11 .

Optionally, the co-shot video further includes a background image area, the background image area is the background of the first image area and the second image area, and the background image area includes pixels corresponding to any of the following: Points: a first background image, a second background image, and a target gallery image, the first background image includes pixels corresponding to the scene where the first character is located, and the second background image includes pixels corresponding to the scene where the second character is located The pixel points corresponding to the scene, and the target gallery image is an image stored on the first electronic device.

Exemplarily, the background image area includes a target gallery image, and the background image area may be, for example, the first background image area 580 shown in FIG. 6 or the second background image area 1380 shown in FIG. 13 .

It can be understood that, in order to realize the above-mentioned functions, the electronic device includes corresponding hardware and/or software modules for executing each function. The present application can be implemented in hardware or in the form of a combination of hardware and computer software in conjunction with the algorithm steps of each example described in conjunction with the embodiments disclosed herein. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functionality for each particular application in conjunction with the embodiments, but such implementations should not be considered beyond the scope of this application.

In this embodiment, the electronic device can be divided into functional modules according to the above method examples. For example, each functional module can be divided corresponding to each function, or two or more functions can be integrated into one processing module. The above-mentioned integrated modules can be implemented in the form of hardware. It should be noted that, the division of modules in this embodiment is schematic, and is only a logical function division, and there may be other division manners in actual implementation.

In the case where each functional module is divided according to each function, FIG. 17 shows a possible schematic diagram of the composition of the electronic device 1700 involved in the above embodiment. As shown in FIG. 17 , the electronic device 1700 may include: an acquisition unit 1701 . Processing unit 1702 .

The obtaining unit 1701 may be configured to obtain a first video, where the first video is a video of a first character.

The obtaining unit 1701 may also be configured to obtain a first action file corresponding to the first video, where the first action file corresponds to the action of the first character.

The obtaining unit 1701 may also be configured to obtain a second action file corresponding to a second video, where the second video is a video of a second character, and the second action file corresponds to the action of the second character.

The processing unit 1702 may be configured to generate a target video according to the first video, the first action file, and the second action file, where the target video includes a first character image of the first character, and the target video The actions of the first character in the video are different from those of the first character in the first video, and the actions of the first character in the target video are different from those of the second character in the second video. Action corresponds.

It should be noted that, all relevant contents of the steps involved in the above method embodiments can be cited in the functional description of the corresponding functional module, which will not be repeated here.

Where an integrated unit is employed, the electronic device may include a processing module, a memory module and a communication module. The processing module may be used to control and manage the actions of the electronic device, for example, may be used to support the electronic device to perform the steps performed by the above units. The storage module may be used to support the electronic device to execute stored program codes and data, and the like. The communication module can be used to support the communication between the electronic device and other devices.

The processing module may be a processor or a controller. It may implement or execute the various exemplary logical blocks, modules and circuits described in connection with this disclosure. The processor may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of digital signal processing (DSP) and a microprocessor, and the like. The storage module may be a memory. The communication module may be a transceiver. The communication module may specifically be a device that interacts with other electronic devices, such as a radio frequency circuit, a Bluetooth chip, and a Wi-Fi chip.

In one embodiment, when the processing module is a processor and the storage module is a memory, the electronic device involved in this embodiment may be a device having the structure shown in FIG. 1 .

This embodiment also provides a computer storage medium, where computer instructions are stored in the computer storage medium, and when the computer instructions are executed on the electronic device, the electronic device executes the above-mentioned relevant method steps to realize the method for processing video in the above-mentioned embodiment. .

This embodiment also provides a computer program product, which, when the computer program product runs on a computer, causes the computer to execute the above-mentioned relevant steps, so as to realize the method for processing a video in the above-mentioned embodiment.

In addition, the embodiments of the present application also provide an apparatus, which may specifically be a chip, a component or a module, and the apparatus may include a connected processor and a memory; wherein, the memory is used for storing computer execution instructions, and when the apparatus is running, The processor can execute the computer-executable instructions stored in the memory, so that the chip executes the video processing method in the foregoing method embodiments.

Wherein, the electronic device, computer storage medium, computer program product or chip provided in this embodiment are all used to execute the corresponding method provided above. Therefore, for the beneficial effects that can be achieved, reference can be made to the corresponding provided above. The beneficial effects in the method will not be repeated here.

Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, which may be electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

A method for processing video, comprising:

The first electronic device acquires a first video, where the first video is a video of a first character;

obtaining, by the first electronic device, a first action file corresponding to the first video, where the first action file corresponds to an action of the first character;

The first electronic device obtains a second action file corresponding to a second video, the second video is a video of a second character, and the second action file corresponds to an action of the second character;

The first electronic device generates a target video according to the first video, the first action file, and the second action file, where the target video includes a first character image of the first character, and the target video The actions of the first character in the video are different from those of the first character in the first video, and the actions of the first character in the target video are different from those of the second character in the second video. Action corresponds.
The method according to claim 1, wherein before the first electronic device acquires the first video, the method further comprises:

The first electronic device establishes a video call connection between the first electronic device and a second electronic device, where the first electronic device is the electronic device of the first character, and the second electronic device is the second electronic device the character's electronic equipment;

The first electronic device obtains the first video, including:

The first electronic device acquires the first video during a video call;

The method also includes:

The first electronic device acquires the second video from the second electronic device through the video call connection.
The method according to claim 2, wherein the first video and the second video correspond to the same time period during the video call, and the target video further includes a second character of the second character An image, the actions of the second character in the target video correspond to the actions of the second character in the second video.
The method according to claim 1, wherein the method further comprises:

The first electronic device acquires a third video, and the third video is a video of a third character;

The first electronic device acquires a third action file corresponding to the third video, and the third action file corresponds to the action of the third character;

The first electronic device generates a target video according to the first video, the first action file, and the second action file, including:

The first electronic device generates the target video according to the first video, the third video, the first action file, the second action file, and the third action file, and the target video Also includes a third character image of the third character, the action of the third character in the target video is different from the action of the third character in the third video, and the third character in the target video is different. The action of the character corresponds to the action of the second character in the second video.
The method according to any one of claims 2 to 4, wherein the target video further comprises a second character image of the second character, and the movement of the second character in the target video is the same as the corresponding to the actions of the second character in the second video.
The method according to claim 5, wherein the first person image and the second person image belong to the same frame image in the target video.
The method according to claim 1, wherein the second video is a video of the second person and the fourth person, and the method further comprises:

The first electronic device acquires a third video, and the third video is a video of a third character;

The first electronic device acquires a third action file corresponding to the third video, and the third action file corresponds to the action of the third character;

The first electronic device acquires a fourth action file, where the fourth action file corresponds to the action of the fourth character in the second video;

The first electronic device generates a target video according to the first video, the first action file, and the second action file, including:

The first electronic device generates the said first video, the third video, the first action file, the second action file, the third action file, and the fourth action file A target video, the target video further includes a third character image of the third character, the action of the third character in the target video is different from the action of the third character in the third video, the The actions of the third character in the target video correspond to the actions of the fourth character in the second video.
The method according to claim 7, wherein the target video further comprises a second character image of the second character and a fourth character image of the fourth character, and the second character in the target video The action of the character corresponds to the action of the second character in the second video, and the action of the fourth character in the target video corresponds to the action of the fourth character in the second video.
The method according to claim 8, wherein the first person image, the second person image, the third person image, and the fourth person image belong to the same frame image in the target video .
The method according to any one of claims 4, 7-9, wherein before the first electronic device acquires the first video, the method further comprises:

The first electronic device establishes a video call connection between the first electronic device and the second electronic device, where the first electronic device is the electronic device of the first character, and the second electronic device is the third character's electronic device Electronic equipment;

The first electronic device obtains the first video, including:

The first electronic device acquires the first video during a video call;

The first electronic device acquires the third video, including:

The first electronic device acquires a third video from the second electronic device through the video call connection.
The method according to claim 10, wherein the first video and the third video correspond to the same time period during the video call.
The method according to any one of claims 2, 3, 10, and 11, wherein, establishing, by the first electronic device, a video call connection between the first electronic device and the second electronic device, comprising:

The first electronic device establishes a video call connection between the first electronic device and the second electronic device through a photographing application or a video calling application.
The method according to claim 1, wherein the second video is a video stored locally or in the cloud.
The method according to any one of claims 1 to 13, wherein the obtaining, by the first electronic device, a second action file corresponding to the second video comprises:

The first electronic device acquires the second action file from the second electronic device.
The method according to any one of claims 1 to 14, wherein the action of the first character in the target video corresponds to the action of the second character in the second video, comprising:

The action file corresponding to the first character image is the first target action file, the matching degree between the first action file and the second action file is the first matching degree, and the first target action file and The matching degree between the second action files is a second matching degree, and the second matching degree is greater than the first matching degree.
The method according to any one of claims 1 to 15, wherein the obtaining, by the first electronic device, a first action file corresponding to the first video comprises:

The first electronic device determines the first action sub-file according to at least two of the following: a first head pixel, a first neck pixel, a first torso pixel, a first upper left forelimb pixel, a first Left upper hindlimb pixel point, first left lower forelimb pixel point, first left lower hindlimb pixel point, first right upper forelimb pixel point, first right upper hindlimb pixel point, first right lower forelimb pixel point, first right lower hindlimb pixel point, A left-hand pixel, a first right-hand pixel.
The method according to any one of claims 1 to 16, wherein the first action subfile includes at least one of the following limb angles:

The first head angle, the first neck angle, the first trunk angle, the first left upper forelimb angle, the first left upper hindlimb angle, the first left lower forelimb angle, the first left lower hindlimb angle, the first right upper forelimb angle, the first right upper Hind limb angle, first right lower forelimb angle, first right lower hind limb angle, first left hand angle, first right hand angle.
The method according to any one of claims 1 to 17, wherein the first action file corresponds to a first limb angle, the second action file corresponds to a second limb angle, and the target action file corresponds to a first limb angle. Three limb angles, the difference between the first limb angle and the second limb angle is smaller than a preset angle, and the third limb angle is between the first limb angle and the second limb angle.
The method according to any one of claims 1 to 18, wherein the first video includes a first subframe and a second subframe, and the second video includes a third subframe and a fourth subframe , the target video includes a fifth subframe, a sixth subframe, the first subframe, the third subframe, and the fifth subframe correspond to each other, the second subframe, the fourth subframe The subframe and the sixth subframe correspond to each other, the time difference between the first subframe and the second subframe is the first time difference, and the time difference between the third subframe and the fourth subframe is the first time difference. The time difference is a second time difference, the time difference between the fifth subframe and the sixth subframe is a third time difference, and the third time difference is between the first time difference and the second time difference.
An electronic device, comprising:

a processor, a memory, and a transceiver, the memory for storing a computer program, the processor for executing the computer program stored in the memory; wherein,

The processor is configured to obtain a first video, where the first video is a video of a first character;

The processor is further configured to acquire a first action file corresponding to the first video, where the first action file corresponds to the action of the first character;

The processor is further configured to acquire a second action file corresponding to a second video, where the second video is a video of a second character, and the second action file corresponds to an action of the second character;

The processor is further configured to generate a target video according to the first video, the first action file, and the second action file, where the target video includes a first character image of the first character, and the target video is The actions of the first character in the target video are different from those of the first character in the first video, and the actions of the first character in the target video are different from those of the second character in the second video. The actions of the characters correspond.
The electronic device according to claim 20, wherein before the processor acquires the first video, the processor is further configured to:

establishing a video call connection between the electronic device and a second electronic device, where the electronic device is the electronic device of the first character, and the second electronic device is the electronic device of the second character;

The processor is specifically configured to acquire the first video during a video call;

The processor is further configured to acquire the second video from the second electronic device through the video call connection.
The electronic device according to claim 21, wherein the first video and the second video correspond to the same time period during the video call, and the target video further includes a second image of the second character. A character image, where the actions of the second character in the target video correspond to the actions of the second character in the second video.
The electronic device according to claim 20, wherein the processor is further configured to:

obtaining a third video, where the third video is a video of a third character;

acquiring a third action file corresponding to the third video, where the third action file corresponds to the action of the third character;

The processor is specifically configured to generate the target video according to the first video, the third video, the first action file, the second action file, and the third action file, and the The target video further includes a third character image of the third character, the actions of the third character in the target video are different from those of the third character in the third video, and the actions of the third character in the target video are different from those of the third character in the target video. The actions of the third character correspond to the actions of the second character in the second video.
The electronic device according to any one of claims 21 to 23, wherein the target video further includes a second character image of the second character, and the action of the second character in the target video is the same as the one of the second character. The action of the second character in the second video corresponds to.
The electronic device according to claim 24, wherein the first person image and the second person image belong to the same frame image in the target video.
The electronic device according to claim 20, wherein the second video is a video of the second character and the fourth character, and the processor is further configured to:

obtaining a third video, where the third video is a video of a third character;

acquiring a third action file corresponding to the third video, where the third action file corresponds to the action of the third character;

acquiring a fourth action file, the fourth action file corresponds to the action of the fourth character in the second video;

The processor is specifically configured to, according to the first video, the third video, the first action file, the second action file, the third action file, and the fourth action file, generate the target video, the target video further includes a third character image of the third character, and the movement of the third character in the target video is different from the movement of the third character in the third video, The actions of the third character in the target video correspond to the actions of the fourth character in the second video.
The electronic device according to claim 26, wherein the target video further comprises a second character image of the second character and a fourth character image of the fourth character, and the first character in the target video The movements of the two characters correspond to the movements of the second character in the second video, and the movements of the fourth character in the target video correspond to the movements of the fourth character in the second video.
The electronic device according to claim 27, wherein the first person image, the second person image, the third person image, and the fourth person image belong to the same frame in the target video image.
The electronic device according to any one of claims 23, 26-28, wherein before the processor acquires the first video, the processor is further configured to:

establishing a video call connection between the electronic device and a second electronic device, where the electronic device is the electronic device of the first character, and the second electronic device is the electronic device of the third character;

The processor is specifically configured to acquire the first video during a video call;

The processor is specifically configured to acquire a third video from the second electronic device through the video call connection.
The electronic device according to claim 29, wherein the first video and the third video correspond to the same time period during the video call.
The electronic device according to any one of claims 21, 22, 29, and 30, wherein,

The processor is specifically configured to establish a video call connection between the electronic device and the second electronic device through a photographing application or a video calling application.
The electronic device according to claim 20, wherein the second video is a video stored locally or in the cloud.
The electronic device according to any one of claims 20 to 32, wherein,

The processor is specifically configured to acquire the second action file from the second electronic device.
The electronic device according to any one of claims 20 to 33, wherein the action of the first character in the target video corresponds to the action of the second character in the second video, comprising:

The action file corresponding to the first character image is the first target action file, the matching degree between the first action file and the second action file is the first matching degree, and the first target action file and The matching degree between the second action files is a second matching degree, and the second matching degree is greater than the first matching degree.
The electronic device according to any one of claims 20 to 34, wherein,

The processor is specifically configured to, according to at least two of the following items, determine the first action sub-file: a first head pixel point, a first neck pixel point, a first torso pixel point, a first upper left forelimb pixel point, The first upper left hindlimb pixel point, the first left lower forelimb pixel point, the first left lower hindlimb pixel point, the first right upper forelimb pixel point, the first right upper hindlimb pixel point, the first right lower forelimb pixel point, the first right lower hindlimb pixel point , the first left-hand pixel, and the first right-hand pixel.
The electronic device according to any one of claims 20 to 35, wherein the first action subfile includes at least one of the following limb angles:

The first head angle, the first neck angle, the first trunk angle, the first left upper forelimb angle, the first left upper hindlimb angle, the first left lower forelimb angle, the first left lower hindlimb angle, the first right upper forelimb angle, the first right upper Hind limb angle, first right lower forelimb angle, first right lower hind limb angle, first left hand angle, first right hand angle.
The electronic device according to any one of claims 20 to 36, wherein the first action file corresponds to a first limb angle, the second action file corresponds to a second limb angle, and the target action file corresponds to A third limb angle, the difference between the first limb angle and the second limb angle is smaller than a preset angle, and the third limb angle is between the first limb angle and the second limb angle.
The electronic device according to any one of claims 20 to 37, wherein the first video includes a first subframe and a second subframe, and the second video includes a third subframe and a fourth subframe frame, the target video includes a fifth subframe, a sixth subframe, the first subframe, the third subframe, and the fifth subframe correspond to each other, the second subframe, the sixth subframe The four subframes and the sixth subframe correspond to each other, the time difference between the first subframe and the second subframe is the first time difference, and the time difference between the third subframe and the fourth subframe is The time difference is a second time difference, the time difference between the fifth subframe and the sixth subframe is a third time difference, and the third time difference is between the first time difference and the second time difference.
A computer storage medium, characterized by comprising computer instructions, which, when executed on an electronic device, cause the electronic device to perform the method according to any one of claims 1 to 19 .
A computer program product, characterized in that, when the computer program product is run on a computer, the computer is caused to perform the method according to any one of claims 1 to 19.