CN113115108A

CN113115108A - Video processing method and computing device

Info

Publication number: CN113115108A
Application number: CN202110411421.3A
Authority: CN
Inventors: 周杉; 曲晓奎
Original assignee: Qingdao Hisense Media Network Technology Co Ltd
Current assignee: Qingdao Hisense Media Network Technology Co Ltd; Juhaokan Technology Co Ltd
Priority date: 2018-12-20
Filing date: 2018-12-20
Publication date: 2021-07-13
Also published as: WO2020125009A1; CN109743625A

Abstract

The application discloses a video processing method and computing equipment, wherein a first user image is extracted according to a received first video frame of a first road square dance video uploaded by a first user terminal; extracting a second user image according to a received second video frame of a second road and square dance video uploaded by a second user terminal, synthesizing a first synthesized frame and a second synthesized frame according to the first user image and the second user image, sending the first synthesized frame to the second user terminal, and sending the second synthesized frame to the second user terminal without sending the second synthesized frame to the first user terminal.

Description

Video processing method and computing device

The application is a divisional application of a Chinese patent with application number of 201811563673.2, invented as 'a video processing method and a television' applied in 2018, 12 months and 20 days.

Technical Field

The application relates to the technical field of video image splicing, in particular to a video processing method and computing equipment.

Background

The interactive ability of camera brings the abundance of new business extension and experience for TV content, and how different users realize various interdynamic through smart TV, and the demand that user's enthusiasm is at present to the enhancement immersive experience, draws the distance between people, for example when outside weather is not good, and the user is at home and builds body, probably needs to realize through smart TV with other people's natural interaction, but present prior art still can't realize.

Disclosure of Invention

The embodiment of the application provides a video processing method and a television, which are used for realizing video synthesis of a multi-user terminal and meeting the socialized interaction requirements of different users through different terminals through video synthesis.

The video processing method provided by the embodiment of the application comprises the following steps:

receiving a first square dance video shot by a camera of a current user terminal, and extracting a first user image in the first square dance video; receiving a second square dance video shot by other user terminal cameras, and extracting a second user image in the second square dance video, wherein the second square dance and the first square dance are the same square dance;

and generating a synthesized square dance video by using the first user image and the second user image, and outputting the synthesized square dance video to the current user terminal, wherein the synthesized square dance video is synchronously synthesized according to square dance music.

According to the method, a first square dance video shot by a camera of a current user terminal is received, and a first user image in the first square dance video is extracted; receiving a second square dance video shot by other user terminal cameras, and extracting a second user image in the second square dance video, wherein the second square dance and the first square dance are the same square dance; and generating a synthesized square dance video by using the first user image and the second user image, and outputting the synthesized square dance video to the current user terminal, wherein the synthesized square dance video is synchronously synthesized according to square dance music, so that the video synthesis of a plurality of user terminals is realized, and the socialized interaction requirements of different users through different terminals are met through the video synthesis.

Optionally, the synthesized square dance video is output to the other user terminal.

Optionally, the composite square dance videos output to different user terminals are the same or different.

Optionally, the positions of the users in the video images in the synthesized square dance videos output to different user terminals are different.

Optionally, the first user image is located in a central region of an image in the composite square dance video output to the current user terminal.

By this method, the first user image is located in a central region of the image in the synthesized square dance video output to the current user terminal, so that each user is located in a central region in the own user terminal.

Optionally, the second user image is located in an adjacent region of a central region of an image in the composite square dance video output to the current user terminal.

Optionally, the background images in the composite square dance video output to different user terminals are different.

Optionally, the background image in the synthesized square dance video output to different user terminals is determined according to user instructions. By the method, the background image in the synthesized square dance video output to different user terminals is determined according to the user instruction, so that the user can select the background image according to dance or own preference.

Optionally, in any frame image in the synthesized square dance video, the frame position of the first user image in the first square dance video is prior to the frame position of the second user image in the second square dance video.

According to the method, in any frame of image in the synthesized square dance video, the frame position of the first user image in the first square dance video is prior to the frame position of the second user image in the second square dance video, so that in each user terminal, the lead dancer in the central area is faster than other dancers in rhythm.

Correspondingly, the embodiment of the application provides a television, including:

the receiving unit is used for receiving a first square dance video shot by a camera of a current user terminal and extracting a first user image in the first square dance video; receiving a second square dance video shot by other user terminal cameras, and extracting a second user image in the second square dance video, wherein the second square dance and the first square dance are the same square dance;

and the processing unit is used for generating a synthesized square dance video by using the first user image and the second user image and outputting the synthesized square dance video to the current user terminal, wherein the synthesized square dance video is synchronously synthesized according to square dance music.

An embodiment of the present application further provides a computing device, including:

a memory for storing program instructions;

and the processor is used for calling the program instructions stored in the memory and executing any one of the methods provided by the embodiment of the application according to the obtained program.

Another embodiment of the present application provides a computer storage medium having stored thereon computer-executable instructions for causing a computer to perform any one of the methods described above.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic view of a complete television set provided in an embodiment of the present application;

fig. 2 is a schematic diagram of a video processing method according to an embodiment of the present application;

fig. 3 is a schematic view of a multi-person interaction scene provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a user 1 at a dancer position according to an embodiment of the present application;

fig. 5 is a schematic diagram of a user 1 selecting a video background according to an embodiment of the present application;

fig. 6 is a television set according to an embodiment of the present application;

fig. 7 is a television set according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Various embodiments of the present application will be described in detail below with reference to the accompanying drawings. It should be noted that the display sequence of the embodiment of the present application only represents the sequence of the embodiment, and does not represent the merits of the technical solutions provided by the embodiments.

Referring to fig. 1, for the overall schematic view of the television provided in the embodiment of the present application, a square dance video is being played in a screen, and a camera on the television is used for shooting a user. Aiming at the shot videos of different users, the television end synthesizes the videos of different users jumping the same square dance and outputs the synthesized square dance videos to different user terminals.

Referring to fig. 2, a video processing method provided in an embodiment of the present application includes:

s101, receiving a first square dance video shot by a camera of a current user terminal, and extracting a first user image in the first square dance video; receiving a second square dance video shot by other user terminal cameras, and extracting a second user image in the second square dance video, wherein the second square dance and the first square dance are the same square dance;

for example, the user image is a user contour image after removing a background in the video.

S102, generating a synthesized square dance video by using the first user image and the second user image, and outputting the synthesized square dance video to the current user terminal, wherein the synthesized square dance video is synchronously synthesized according to square dance music.

For example, the user terminal is a television; the composite square dance video is a multi-person video in which a plurality of users jump together to dance one by one.

For example, the dance video output to the television of family a is the same as the dance video output to the television of family B, or the dance video output to the television of family a is different from the dance video output to the television of family B, and user 1 is located in the central area of the television screen of family a (i.e., the position of the dance) and user 2 is located in the central area of the television screen of family B.

For example, user 1 in the video output to the television of family a is located in the central region of the screen, and user 1 in the video output to the television of family B is located in the adjacent region of the central region of the screen.

For example, on user 1's television, user 1 is located in the center area of the image in the video of the television; on user 2's television, user 2 is located in the central region of the image in the video of the television.

For example, on user 1's television, other dancers are located in the neighborhood of user 1.

For example, the background image in the video output to the television set of family a is lakeside, and the background image in the video output to the television set of family B is people square.

Optionally, the background image in the synthesized square dance video output to different user terminals is determined according to user instructions.

For example, on the television of the user 1, if the user 1 selects the west lake as the background image of the video, the background image in the video output to the television of the user 1 is the west lake; on the television of the user 2, the user 2 selects the people square as the background image of the video, and the background image in the video output to the television of the user 2 is the people square.

For example, on user 1's television, user 1's dance movements are one rhythm faster than the movements of other dancers. The video is composed of one image and the dance motion is the same because different users jump the same dance, so that each frame of image in the shot video is the same, and if the 60 th image frame in the first square dance is combined with the 1 st image frame in the second square dance, the dance motion of the user 1 in the combined square dance video is faster than the motions of other dancers by a rhythm.

As shown in fig. 3, for a schematic view of a multi-person interactive scene provided in this embodiment of the present application, a camera photographs a dancer in front of a television, for example, a user 1 is at home a, a user 2 is at home B, a user 3 is at home C, and all the 3 users have selected to jump to the same dance, so that the camera of the television at home a photographs the user 1, the camera of the television at home B photographs the user 2, and the camera of the television at home C photographs the user 3.

The method comprises the following steps of splicing the shot videos together to enable a user to have a feeling of interactive dancing together, and the specific implementation method comprises the following steps:

a key control method (key control is a picture of two video signal input sources, a basic switching mode in the switching process) is adopted, different part parameters (such as brightness and chroma) in one video signal are utilized, high/low binary key control signals are formed through processing (the high/low binary key control signals are obtained by dividing one path of video to form a foreground image signal and a background image signal), a color background is removed through calculation to obtain a video with a transparent background, and then the video with the transparent background is cut to obtain the outline range of a dancer in the video.

And (4) splicing contour data in the video, calculating dance areas of different dancers, and splicing to synthesize the same video picture.

In the synthesized video, dancers in different families are different, that is, on the television of family A, the position of the user 1 at the dancer is displayed, and other dancers are positioned at other positions of the video picture; on the television of the family B, displaying the position of the user 2 at the dancer, and displaying other dancers at other positions of the video picture; on the television of family C, user 3 is shown in the position of the lead dancer, with other dancers in other positions of the video screen.

As shown in fig. 4, a schematic diagram of a user 1 in a dancer position provided in the embodiment of the present application, taking a position of the user 1 in the dancer position on a television of a family a as an example, a specific implementation method is as follows:

after acquiring contour signals of dancers, sequentially recording the width and height of the contour signal of each dancer, for example, the width of the contour signal of a user 1 is signalWidth1, the height of the contour signal is signalHeight1, the width of the contour signal of a second dancer is signalWidth2, the height of the contour signal is signalHeight2, the screen width of a television is screenWidth, and the screen height of the television is screenHeight, and the contour signal of the user 1 is positioned at the center position of the television screen by setting the coordinates of the x axis and the y axis of the contour signal of the user 1;

the x-axis coordinate x0 of the contour signal of user 1 is

The y-axis coordinate y0 is

(screen positive values from top left to top right and negative values from top left to bottom left depending on screen coordinate axis), coordinates x0 and y0 are fed into the stitching control processor for stitching, so that the contour signal of USER-1 is stitched to the center of the video.

Continuing to splice the second dancer, for example, the second dancer is expected to be positioned to the left of the user 1 (from the perspective of the user watching television), the second dancer is spaced from the user 1 at a distance of space (space is an edge space, i.e., the rightmost side of the second dancer is spaced from the leftmost side of the user 1), and the second dancer is spaced from the user 1 at a distance of space1 (space 1 is a center point space, i.e., the center point of the second dancer is spaced from the center point of the user 1 on the y-axis), so that the x-axis coordinate x1 of the contour signal of the second dancer is x 0-space-signaLWidth 2, and the y-axis coordinate y1 is y0+ space1, and the coordinates x1 and y1 are transmitted to the splice control processor, so that the contour signal of the second dancer is spliced to the left of the user 1.

And by parity of reasoning, calculating the coordinate values of the x axis and the y axis of other dancers such as the third dancer, the fourth dancer and the like in sequence, and transmitting the coordinate values into the splicing control processor in sequence for splicing.

In the synthesized video, the lead dancer of each family is faster by one rhythm than other dancers, that is, on the television of family a, the user 1 is shown at the position of the lead dancer, other dancers are located at other positions of the video picture, and the user 1 is faster by one rhythm than other dancers; on the television of family B, displaying that the user 2 is at the position of the dancer, other dancers are at other positions of the video picture, and the user 2 is faster than the other dancers by a rhythm; on the television of family C, user 3 is shown in the position of the lead dancer, the other dancers are in other positions of the video picture, and user 2 is one tempo faster than the other dancers.

Taking the example that the user 1 is at the position of the dancer on the television of the family A, the specific implementation method is as follows:

for example, 3 paths of signals are transmitted to the server, the first path of signal is a signal of the user 1, when the 3 paths of signals are transmitted to the splicing control processor, for the first path of signal, each frame image of the video is taken to be spliced, and for the second path of signal and the third path of signal, only the first frame image of the video is taken to be transmitted to the splicing control processor within a time range of 2 seconds (not limited to 2 seconds, but also within other time ranges), and is spliced with the first path of signal, and the spliced composite frame image is returned after splicing; after 2 seconds, each frame image of the second path of signal and the third path of signal image is spliced with the first path of signal.

In addition, the dancers of the 3-way signals may not start dancing at the same time, the second way signal is pre-stored in the server, and the dancers of the second way signal may jump ahead of the dancers of the first way signal by a certain time, such as a few minutes, hours or days, etc.

As shown in fig. 5, for the user 1 provided in the embodiment of the present application to select a video background schematic diagram, in a synthesized video, the user may select a scene (i.e., a background of the synthesized video) according to dance or his preference, and replace a real scene with the selected scene:

for example, if the user 1 selects a west lakeside, the background of dancing on the television of the family a is the west lakeside; user 2 selects the people square, and the background of dancing on the television of family B is the people square. The specific implementation method comprises the following steps:

and by adopting a keying method, removing the default background image in the synthesized video to obtain a video picture with a transparent background, and combining the background image selected by the user with the video picture with the transparent background through keying, so that the background in the video on the television is the background selected by the user.

Accordingly, referring to fig. 6, an embodiment of the present application provides a television, including:

the receiving unit 11 is used for receiving a first square dance video shot by a camera of a current user terminal and extracting a first user image in the first square dance video; receiving a second square dance video shot by other user terminal cameras, and extracting a second user image in the second square dance video, wherein the second square dance and the first square dance are the same square dance;

and the processing unit 12 is configured to generate a synthesized video of the square dance by using the first user image and the second user image, and output the synthesized video of the square dance to the current user terminal, wherein the synthesized video of the square dance is synchronously synthesized according to the square dance music.

Referring to fig. 7, an embodiment of the present application further provides a television, including:

the processor 600, for reading the program in the memory 610, executes the following processes:

Receiving a first square dance video shot by a camera of a current user terminal through the television, and extracting a first user image in the first square dance video; receiving a second square dance video shot by other user terminal cameras, and extracting a second user image in the second square dance video, wherein the second square dance and the first square dance are the same square dance; and generating a synthesized square dance video by using the first user image and the second user image, and outputting the synthesized square dance video to the current user terminal, wherein the synthesized square dance video is synchronously synthesized according to square dance music, so that the video synthesis of a plurality of user terminals is realized, and the socialized interaction requirements of different users through different terminals are met through the video synthesis.

Where in fig. 7 the bus architecture may include any number of interconnected buses and bridges, with various circuits of the memory represented by the memory and one or more processors represented by the processors linked together. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface.

The embodiment of the application provides a display terminal, which may be specifically a desktop computer, a portable computer, a smart phone, a tablet computer, a Personal Digital Assistant (PDA), and the like. The Display terminal may include a Central Processing Unit (CPU), a memory, an input/output device, etc., the input device may include a keyboard, a mouse, a touch screen, etc., and the output device may include a Display device, such as a Liquid Crystal Display (LCD), a Cathode Ray Tube (CRT), etc.

For different display terminals, the user interface 620 may optionally be an interface capable of interfacing with a desired device, including but not limited to a keypad, display, speaker, microphone, joystick, etc.

The processor is responsible for managing the bus architecture and the usual processing, and the memory may store data used by the processor in performing operations.

Alternatively, the processor may be a CPU (central processing unit), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or a CPLD (Complex Programmable Logic Device).

The memory may include Read Only Memory (ROM) and Random Access Memory (RAM), and provides the processor with program instructions and data stored in the memory. In the embodiments of the present application, the memory may be used for storing a program of any one of the methods provided by the embodiments of the present application.

The processor is used for executing any one of the methods provided by the embodiment of the application according to the obtained program instructions by calling the program instructions stored in the memory.

Embodiments of the present application provide a computer storage medium for storing computer program instructions for an apparatus provided in the embodiments of the present application, which includes a program for executing any one of the methods provided in the embodiments of the present application.

The computer storage media may be any available media or data storage device that can be accessed by a computer, including, but not limited to, magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memory (NAND FLASH), Solid State Disks (SSDs)), etc.

In summary, the video processing method and the television set provided by the embodiment of the application realize video synthesis of multiple user terminals, and meet the socialized interaction requirements of different users through different terminals through video synthesis.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method of video processing, the method comprising:

extracting a first user image according to a received first video frame of a first road square dance video uploaded by a first user terminal; extracting a second user image according to a received second video frame of a second road square dance video uploaded by a second user terminal, wherein the second road square dance video and the first road square dance video use the same square dance music, and a time difference value between the moment of the second video frame in the second road square dance video and the moment of the first video frame in the first road square dance video is within a preset time range;

synthesizing a first synthesized frame and a second synthesized frame according to the first user image and the second user image, wherein the relative position of the first user image in the first synthesized frame is different from the relative position of the first user image in the second synthesized frame, and the relative position of the second user image in the first synthesized frame is different from the relative position of the second user image in the second synthesized frame;

and sending the first synthesized frame to the first user terminal without sending the first synthesized frame to the second user terminal, and sending the second synthesized frame to the second user terminal without sending the second synthesized frame to the first user terminal.

2. The method of claim 1,

in the first composite frame, the first user image is positioned in the central area of the first composite frame, and the second user image is positioned in the adjacent area of the central area of the first composite frame;

in the second composite frame, the second user image is located in a central region of the second composite frame, and the first user image is located in a vicinity of the central region of the second composite frame.

3. The method of claim 1,

synthesizing a first video frame of the first synthesized frame at a time of the first square dance video earlier than a time of synthesizing a second video frame of the first synthesized frame at the second square dance video;

the first video frame of the second composite frame is synthesized at the first square dance video time later than the second video frame of the second composite frame at the second square dance video time.

4. The method of claim 1,

in the first composite frame, the first user image precedes the second user image in motion;

in the second composite frame, the second user image precedes the first user image in its motion.

5. A method of video processing, the method comprising:

extracting a first user image according to a first video frame of a first road square dance video of a first user terminal;

extracting a second user image according to a second video frame of a second road square dance video of a second user terminal, wherein the second road square dance video and the first road square dance video use the same road square dance music, and a time difference value between the moment of the second video frame in the second road square dance video and the moment of the first video frame in the first road square dance video is within a preset time range;

and synthesizing a first synthesized frame according to the first user image and the second user image, wherein the first user image is positioned at the neck dance position in the first synthesized frame, the second user image is positioned at other positions except the neck dance position in the first synthesized frame, and the first synthesized frame is used for displaying the first user terminal and is not used for displaying the second terminal.

6. A method of video processing, the method comprising:

extracting a second user image according to a second video frame of a second road square dance video of a second user terminal, wherein the second road square dance video and the first road square dance video use the same square dance music, and the moment of the first video frame in the first road square dance video is earlier than the moment of the first video frame in the first road square dance video;

and synthesizing a first synthesized frame according to the first user image and the second user image, wherein the first synthesized frame is used for displaying the first user terminal and is not used for displaying the second terminal.

7. A method of video processing, the method comprising:

8. A computing device, comprising:

a memory for storing program instructions;

a processor for calling the program instructions stored in the memory and executing the method of any one of claims 1 to 7 according to the obtained program.