CN112653851A

CN112653851A - Video processing method and device and electronic equipment

Info

Publication number: CN112653851A
Application number: CN202011534957.6A
Authority: CN
Inventors: 李国林
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2021-04-13

Abstract

The application discloses a video processing method and device and electronic equipment, and belongs to the technical field of communication. The video processing method comprises the following steps: acquiring first video data; extracting a target foreground image of at least part of video frames in the first video data, and generating second video data based on the target foreground image, wherein the second video data comprises a second video frame for representing the target foreground image; and sending the second video data to the opposite terminal so that the opposite terminal generates third video data based on the second video data, wherein the video frame of the third video data is synthesized by the second video frame and the background image extracted from the complete video data. The video processing method disclosed by the application can effectively reduce the size of actually uploaded data, thereby reducing the requirement on an uplink network, occupying less network resources, still realizing high-code-rate uploading under the condition that the network is unstable, and being beneficial to improving the picture definition and the fluency of a playing end.

Description

Video processing method and device and electronic equipment

Technical Field

The application belongs to the technical field of communication, and particularly relates to a video processing method and device and electronic equipment.

Background

The uploading of the video needs to occupy more network resources. In the process of implementing the present application, the inventor finds that at least the following problems exist in the prior art: if the video is to be uploaded stably and quickly, the requirement on the network condition is high, and when the network environment is poor, a long time is consumed; if the video is in a live video scene, when the network environment of the playing end is poor, the definition of the playing image of the playing end (the audience side) is reduced, and phenomena such as pause, blur and the like can occur.

Content of application

The embodiment of the application aims to provide a video processing method, which can solve the problem that video uploading is difficult when a network environment is poor.

In order to solve the technical problem, the present application is implemented as follows:

in a first aspect, an embodiment of the present application provides a video processing method, where the method includes:

acquiring first video data;

extracting a target foreground image of at least part of video frames in the first video data, and generating second video data based on the target foreground image, wherein the second video data comprises a second video frame for representing the target foreground image;

sending the second video data to an opposite terminal to enable the opposite terminal to generate third video data based on the second video data, wherein a video frame of the third video data is synthesized by the second video frame and a background image extracted from complete video data, and the video frame of the complete video data comprises a foreground image and a background image

In a second aspect, an embodiment of the present application provides another video processing method, including:

receiving second video data, wherein the second video data comprises a second video frame used for representing a target foreground image;

and synthesizing the second video frame in the second video data and a background image extracted from the complete video data into a video frame of third video data to obtain the third video data, wherein the video frame of the complete video data comprises a foreground image and a background image.

In a third aspect, an embodiment of the present application provides a video processing apparatus, including:

the first acquisition module is used for acquiring first video data;

the first processing module is used for extracting a target foreground image of at least part of video frames in the first video data and generating second video data based on the target foreground image, wherein the second video data comprises a second video frame used for representing the target foreground image;

a first sending module, configured to send the second video data to an opposite end, so that the opposite end generates third video data based on the second video data, where a video frame of the third video data is synthesized by the second video frame and a background image extracted from complete video data, and the video frame of the complete video data includes a foreground image and a background image.

In a fourth aspect, an embodiment of the present application provides another video processing apparatus, including:

a first receiving module, configured to receive second video data, where the second video data includes a second video frame used for representing a target foreground image;

the first synthesis module is configured to synthesize the second video frame in the second video data and a background image extracted from complete video data into a video frame of third video data to obtain the third video data, where the video frame of the complete video data includes a foreground image and a background image.

In a fifth aspect, the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the method according to the first aspect or the second aspect.

In a sixth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first or second aspect.

In a seventh aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect or the second aspect.

In the embodiment of the application, the target foreground image is extracted from the first video data, and the generated second video data is uploaded, so that the size of actually uploaded data can be effectively reduced, the requirement on an uplink network is reduced, fewer network resources are occupied, video data with basically unchanged image quality can be provided for the playing end through the synthesis of the uploaded second video data and the background image, in addition, in a live broadcast scene, the uploading with high code rate can still be realized under the condition that the network is unstable, and the improvement of the image definition and the fluency of the playing end is facilitated.

Drawings

Fig. 1 is a flowchart of a video processing method provided in an embodiment of the present application;

fig. 2 is a detailed flowchart of a video processing method according to an embodiment of the present application;

fig. 3 is a second detailed flowchart of a video processing method according to an embodiment of the present application;

fig. 4 is a second flowchart of a video processing method according to an embodiment of the present application;

fig. 5 is a third detailed flowchart of a video processing method according to an embodiment of the present application;

fig. 6 is one of the structural diagrams of a video processing apparatus according to an embodiment of the present application;

fig. 7 is a second structural diagram of a video processing apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;

fig. 9 is a second hardware schematic diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or described herein. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

In the prior art, when a video is uploaded, each frame of image needs to be uploaded, and under the condition that the frame rate and the code rate of the video are high, considerable network resources need to be occupied. In other words, if stable and fast video uploading is to be achieved, the requirements on the network conditions are high.

If the live broadcast scene is adopted, the playing end needs to push live broadcast pictures to the audio and video server in real time so as to be downloaded by the playing end, when the network environment of the playing end is poor, the code rate which can be uploaded in unit time can be reduced based on the reason of the self-adaptive code rate in the live broadcast, so that the definition of the played pictures of the playing end can be reduced, the phenomena of blocking, blurring and the like can occur, and the live broadcast effect and the commercial income of a live broadcast room are directly influenced.

The video processing method, the video processing apparatus, the electronic device, and the readable storage medium provided in the embodiments of the present application are described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.

The embodiment of the application provides a video processing method, and an execution main body of the video processing method can be a broadcasting end.

As shown in fig. 1, the video processing method includes: step 110, step 120 and step 130.

Step 110, obtaining first video data.

It is to be understood that the first video data comprises a plurality of video frames, the video frames of the first video data comprising a target foreground image and a background image opposite the target foreground image.

In some embodiments, the first video data may be generated based on a video captured in real-time by a video capture device at the headend;

alternatively, the first video data may be video data generated for a video stored locally by the terminal;

alternatively, the first video data may be video data stored locally by the terminal;

or the first video data may be video data generated by a video stored in the cloud;

alternatively, the first video data may be video data stored in a cloud.

And 120, extracting a target foreground image of at least part of video frames in the first video data, and generating second video data based on the target foreground image.

The target foreground image may be a human image or an animal image.

For example, for a sing-hop live broadcast, the target foreground image may be a main broadcast portrait.

For live animal broadcasting, the target foreground image can be a corresponding animal image.

The first video data includes a plurality of video frames, and the target foreground image may be extracted individually for each frame, or extracted individually for a part of the plurality of video frames.

For the first video data, the number of target foreground video frames may be equal to the number of video frames for which extraction is performed.

Correspondingly, based on the extracted target foreground image, second video data may be generated, the second video data including a second video frame, the second video frame being used to represent the target foreground image, and since the background in the second video frame is processed, the space occupied by the second video data is smaller than the space occupied by the corresponding first video data.

This step can be implemented as follows:

dividing N video frames in the first video data into an original background image and a target foreground image, wherein N is a positive integer and is not more than the number of the video frames in the first video data;

and processing the original background image to obtain an updated image comprising the target foreground image and the processed original background image, and generating second video data, wherein the space occupied by the updated image is smaller than that of the corresponding video frame.

It can be understood that the original background image in the video frame is processed to obtain the updated image corresponding to the video frame, and thus, the space occupied by the second video data including the updated image is smaller than that occupied by the original first video data.

In practical implementation, the target foreground image may be segmented from the video frame by an image recognition segmentation technique.

For example, for a live broadcast scene, an AI portrait recognition segmentation technology can be used to segment the anchor portrait to obtain a target foreground image, and then color blocks are used to fill the background of the target foreground image, so as to obtain an updated image with the same size as the original video frame. The color blocks can be pure colors or multiple colors, such as grey and white.

Or, after the video frame is divided into the original background image and the target foreground image, the original background image is directly replaced by other backgrounds.

The above various processing modes are required to achieve the purpose that the space occupied by the updated image is smaller than the space occupied by the corresponding video frame, and the boundary between the target foreground image and the new background image in the updated image is clear, so that the subsequent target foreground image is easy to extract and synthesize.

N may be equal to the number of video frames in the first video data, or N may be less than the number of video frames in the first video data.

When N is less than the number of video frames in the first video data, the video frames may be selected at regular intervals according to a time sequence to extract the target foreground image. Thus, the consistency of the finally obtained second video data is better.

Specifically, the value of N is positively correlated with the uplink rate.

In other words, the lower the current uplink rate is, the smaller the value of N is, so that the fewer the number of frames of the finally generated second video data is, and the uploading is facilitated; the higher the current uplink rate is, the larger the value of N is, so that the number of frames of the generated second video data is larger, and the larger the frame rate of the finally synthesized third video data is, the more smooth the video is.

Of course, N may also take a fixed value, such as a predetermined value of N.

And step 130, sending the second video data to the opposite terminal so that the opposite terminal generates third video data based on the second video data.

It should be noted that the second video data is used to synthesize the third video data with the background image extracted from the complete video data.

The opposite terminal may be a server, such as a network side device; or the opposite end may be an electronic device of the receiving end, such as a playing end. The following embodiments are described with the peer as the server.

The complete video data is the video data to be uploaded, which is acquired at a certain moment or a certain time period, the image in the complete video data comprises a real background image, and the third video data comprising the real background image can be acquired by synthesizing the background image and the second video data.

In other words, the third video data can be obtained by synthesizing the real background image with each second video frame in the second video data.

When the real background image is synthesized with the second video frame, the real background image needs to replace the background in the second video frame, so that the real background image is synthesized with the target foreground image.

According to the video processing method provided by the embodiment of the application, the target foreground image is extracted from the first video data, and the generated second video data is uploaded, so that the size of actually uploaded data can be effectively reduced, the requirement on an uplink network is reduced, less network resources are occupied, high-bit-rate uploading can be still realized under the condition that the network is unstable in a live broadcast scene, and the improvement of the picture definition and the smoothness of a playing end is facilitated.

In some embodiments, the step 120 of extracting a target foreground image of at least a portion of the video frames in the first video data includes:

and under the condition that the uplink rate is lower than the target value and the time length from the last time of sending the complete video data to the opposite terminal is less than the target time length, extracting the target foreground image of at least part of the video frames in the first video data.

In other words, the target foreground image of at least part of the video frames in the first video data is extracted under the condition that the uplink speed is lower than the target value and the time length from the last time of uploading the complete video data is less than the target time length.

And generating second video data in a mode of extracting a target foreground image under the condition that the uplink rate is lower than the target value and the time length from the last time of uploading the complete video data is less than the target time length, so that the video uploading requirement under the condition of unstable network is met.

Particularly, in a live broadcast scene, a broadcast end needs to push a live broadcast picture to a network side device (which may be an audio/video server) in real time, and under the condition that the broadcast end is weak in network or an uplink network is unstable, the size of actually-uploaded video data can be reduced, dependence on the uplink network is reduced, the picture texture of the broadcast end is not affected basically, and user experience is improved.

It should be noted that the target value may be determined based on at least one of the following factors:

a11, size of the first video data;

a12, the maximum code rate supported by the network side equipment;

a13, the resolution supported by the playout end, etc.

In a specific embodiment, the target value may be set to 8Mb/s-15Mb/s, such as 10 Mb/s.

In some embodiments, the video processing method may further include the steps of: and uploading the newly acquired complete video data under the condition that the uplink rate is lower than the target value and the time length from the last time of uploading the complete video data is not less than the target time length.

In the video processing method, for the acquired first video data, the complete video data is uploaded at intervals, in other words, the video data formed by the image with the original foreground and the original background is directly uploaded without extracting the target foreground image from the first video data and generating the second video data, and at least part of other video frame frames are uploaded in the manner of extracting the target foreground image and generating the second video data.

It can be understood that, compared with the originally acquired first video data, the second video data synthesized by the second video data and the background image still has certain differences in fluency and definition, and the influence on the sense of the user at the playing end can be reduced as much as possible by transmitting the complete video data at intervals.

The number of video frames in the new complete video data is positively correlated with the uplink rate.

In other words, the lower the current uplink rate is, the smaller the number of video frames in the new and complete video data is, so that the network load of uploading can be reduced, and the uploading is convenient to realize; the higher the current uplink rate is, the larger the number of image video frames in the new and complete video data is, so that the original image proportion is higher in the whole video, and the watching effect is better.

Certainly, from the perspective of saving network resources, the complete video data may not be uploaded, that is, at least part of the video frames of all the collected video data is selected, the target foreground image is extracted, and the second video data is generated and then sent to the opposite terminal.

The following specifically describes the embodiments of the present application from three different implementation perspectives.

Firstly, the terminal extracts a background image and uploads the background image to network side equipment.

In this case, the video processing method may further include the steps of:

extracting a background image of at least one video frame in the first video data;

the second video data further comprises a background image, and the sending of the second video data to the opposite end to enable the opposite end to generate third video data based on the second video data comprises:

and sending the second video data to the opposite terminal so that the opposite terminal synthesizes the second video frame and the background image into a video frame of the third video data.

It is understood that, for the first video data, in addition to extracting the target foreground image thereof and generating the second video data, the background image of the video frame of at least one of the first video data is also extracted, so that the first video data is processed, and the obtained second video data includes the target foreground image and the background image, for example, the second video data may include 20 second video frames and 1 background image, and the second video data is also smaller than the first video data.

And the network side equipment can synthesize the second video frame and the background image under the condition of receiving the second video data to obtain third video data, wherein the third video data is used for being provided for a playing end to be pulled.

In this embodiment, the complete video data is the first video data.

In this way, computational resources of the network side device may be saved.

As shown in fig. 2, in this case, the specific implementation flow of the video processing method is as follows:

step 210, processing the acquired first video data into second video data by the broadcasting end, and sending the second video data to the network side equipment, wherein the second video data comprises a second video frame and a background image, and the second video frame is used for representing a target foreground image;

step 220, the network side device synthesizes the second video frame and the background image in the second video data into third video data, and sends the third video data to the playing end.

And secondly, the network side equipment extracts the background image from the second video data.

In this case, the background image used for synthesizing the third video data together with the second video frame is extracted by the network side device, and a stronger processing capability of the network side device can be effectively utilized.

The second video data includes a target video frame in the first video data and a second video frame, the second video frame is used for representing a target foreground image, the target video frame is a video frame in the first video data, for example, a video frame located in the middle in time sequence can be selected, or the first or last video frame can be directly selected.

Generating second video data based on the target foreground image includes generating a second video frame based on the target foreground image, and generating the second video data based on the second video frame and the target video frame.

For example, the second video data may include 20 second video frames and 1 target video frame, and the second video data is also smaller than the first video data.

Sending the second video data to the opposite terminal so that the opposite terminal generates third video data based on the second video data, including:

and sending the second video data to the opposite terminal so that the opposite terminal synthesizes the background image extracted from the target video frame and the second video frame into a video frame of the third video data.

Under the condition that the network side equipment receives the second video data, the background image can be extracted from the target video frame, then the second video frame and the background image are synthesized to obtain third video data, and the third video data is used for being provided for a playing end to be pulled.

In this embodiment, the complete video data is the target video frame determined from the first video data and stored in the second video data.

Therefore, the network side equipment can conveniently and rapidly locate the background image corresponding to the current second video data.

step 210, processing the acquired first video data into second video data by the broadcasting end, and sending the second video data to the network side equipment, wherein the second video data comprises a second video frame and a target video frame, the second video frame is used for representing a target foreground image, and the target video frame is one frame in the first video data;

step 220, the network side device extracts a background image from the target video frame, synthesizes the second video frame and the background image in the second video data into third video data, and sends the third video data to the playing end.

And thirdly, extracting the background image from other video data by the network side equipment.

In this case, the background image used to synthesize the third video data together with the second video data is extracted by the network side device, and the stronger processing capability of the network side device can be effectively utilized.

The second video data does not include a background image, and each video frame of the second video data is a second video frame used for representing a target foreground image.

and sending the second video data to the opposite terminal so that the opposite terminal extracts the background image from the complete video data received last time, and synthesizing the second video frame and the background image into a video frame of the third video data.

The background image may be extracted from the complete image of the previous frame, so that the third video data obtained by synthesis has better consistency with the previous video data.

In this embodiment, the complete video data is the complete video data that was last received by the peer.

As shown in fig. 3, in this case, the specific implementation flow of the video processing method is as follows:

step 310, the broadcasting end processes the acquired first video data into second video data and sends the second video data to the network side equipment;

and step 320, the network side equipment synthesizes the second video data and the latest complete video data into third video data and sends the third video data to the playing end.

The embodiment of the application also provides another video processing method, and the video processing method can be used for network side equipment, such as an audio and video server.

As shown in fig. 4, the video processing method includes: step 410 and step 420.

And step 410, receiving second video data.

In a live scene, the second video data may be sent by a broadcast end, the second video data is generated by the broadcast end based on the acquired first video data, and the second video data includes a second video frame used for representing a target foreground image.

Step 420, synthesizing the second video frame in the second video data and the background image extracted from the complete video data into a video frame of third video data to obtain the third video data, where the video frame of the complete video data includes a foreground image and a background image.

It is understood that the target foreground image in the second video data is extracted, and the target foreground image and the background image are synthesized into the third video data.

Or the video processing method comprises the following steps: receiving video data; and under the condition that the video data is the second video data, synthesizing the background image and the second video data into third video data, wherein the background image is extracted from the complete video data.

In a live scene, the video data may be sent by a broadcast end, and the video data may be complete video data or second video data generated by the broadcast end based on the acquired first video data, where the second video data includes a second video frame used for representing a target foreground image.

Specifically, in the case where a background image of an image in the received video data is a processed background image, the video data is determined to be the second video data.

In actual implementation, when the background of the target foreground image is filled with a pure color at the broadcasting end, if the network side device detects that an image in the received video data is filled with a pure color, it may be determined that the video data is the second video data, and each video frame in the received video data needs to be synthesized with the background image to obtain the third video data.

According to the video processing method provided by the embodiment of the application, under the condition that incomplete second video data is received, third video data with the effect far exceeding that of the second video data can be provided for the playing end by synthesizing the second video data and the background image in the complete video data, and the improvement of the image definition and the fluency of the playing end is facilitated.

In this case, the second video data includes a background image and a second video frame.

In the video processing method, synthesizing a second video frame in second video data and a background image extracted from complete video data into a video frame of third video data, the method comprises:

and synthesizing the second video frame and the background image in the second video data into a video frame of the third video data.

It can be understood that, for the first video data, the playing end extracts, in addition to the target foreground image thereof and generates the second video data, a background image of a video frame of at least one of the first video data, so that the first video data is processed, and the obtained second video data includes the target foreground image and the background image, for example, the second video data may include 20 second video frames and 1 background image, and the second video data is also smaller than the first video data.

And the network side equipment can synthesize the second video frame and the background image to obtain third video data under the condition of receiving the second video data and the background image, wherein the third video data is used for being provided for a playing end to be pulled.

In this embodiment, the complete video data is the first video data.

In this way, computational resources of the network side device may be saved.

Synthesizing a second video frame in the second video data and a background image extracted from the complete video data into a video frame of third video data, comprising:

extracting a background image from a target video frame;

and synthesizing the second video frame and the background image into a video frame of the third video data.

Therefore, the network side equipment can quickly locate the background image corresponding to the current second video data.

And thirdly, extracting the background image by the network side equipment.

Before receiving the second video data in step 410, the method further includes: receiving complete video data;

extracting a background image from the latest received complete video data;

It should be noted that, in the video processing method provided in the embodiment of the present application, the execution main body may be a video processing apparatus, or a control module in the video processing apparatus for executing the loaded video processing method. In the embodiment of the present application, a video processing apparatus executes a loaded video processing method, and a network side device extracts a background image as an example, which illustrates the video processing method provided in the embodiment of the present application.

As shown in fig. 5, the video processing method includes:

the video processing device of the anchor end receives the input of starting the video processing device, starts the APP of the broadcast end and polls and detects the network condition; in actual implementation, the network uplink rate is detected every fixed time (e.g. 1 minute).

And the video processing device of the anchor end judges whether the network condition is good or not during each polling, if the network condition is good, the video processing device returns to the previous step to check the network condition, and if not, the video processing device enters the next step.

In actual execution, when the up rate of the anchor broadcast network reaches a target value (e.g., 10Mb/s), it can be considered that the current network quality is high, and the live broadcast streaming is directly performed according to a fixed high frame rate and a high code rate.

Under the condition that the anchor network is unstable, the camera is used for collecting the whole live video data of the anchor, the complete anchor video stream data is transmitted once every 5 seconds, each frame of other video streams is subjected to target foreground image (portrait) extraction, the background of the target foreground image is filled by using a pure color image, then, the uplink network pushing is carried out, and otherwise, the processing is not carried out.

Under the condition that the network is unstable, the transmission can be regarded as being carried out according to the self-adaptive code rate in the related technology, so that the definition of a playing end can be automatically reduced, the live broadcast effect is influenced, and further the commercial benefit of a live broadcast room is directly influenced.

Under the circumstance, the video processing method of the application uploads video data including a complete picture once (for example, once in 5 s) according to a fixed frequency, other video frames extract a target foreground image (portrait) through an AI portrait recognition segmentation technology, and a background is filled with a pure color to finally form second video data, wherein the second video data is far smaller than the original video data.

And sending the second video data to an audio and video server through a network, wherein the video data received by the audio and video server can be complete video data or the second video data. If the audio and video server detects that the image in the received video data is filled with a pure color, the video data can be determined to be second video data, and the second video data needs to be processed and then sent to a playing end; otherwise, the received video data can be directly sent to the playing end.

And under the condition that the received video data is determined to be the second video data, the audio and video server synthesizes each second video frame in the second video data with the background image to obtain third video data, namely, synthesizes the portrait and the background together again to generate the third video data.

The frame rate of the third video data is equal to that of the second video data, so that the live broadcast with a high frame rate can be realized in a weak network state.

And the playing end pulls the latest video data from the audio and video server to the local and plays the latest video data through the local player.

The embodiment of the present application further provides a video processing apparatus, where the video processing apparatus may be a playback end or a control module of the playback end, such as a mobile terminal on an anchor side or a computer on the anchor side.

As shown in fig. 6, the video processing apparatus includes: a first obtaining module 610, a first processing module 620 and a first sending module 630.

A first obtaining module 610, configured to obtain first video data;

a first processing module 620, configured to extract a target foreground image of at least a portion of video frames in the first video data, and generate second video data based on the target foreground image, where the second video data includes a second video frame used for representing the target foreground image;

a first sending module 630, configured to send the second video data to an opposite end, so that the opposite end generates third video data based on the second video data, where a video frame of the third video data is synthesized by the second video frame and a background image extracted from complete video data, and the video frame of the complete video data includes a foreground image and a background image.

According to the video processing device provided by the embodiment of the application, the target foreground image is extracted from the first video data, and the generated second video data is uploaded, so that the size of actually uploaded data can be effectively reduced, the requirement on an uplink network is reduced, fewer network resources are occupied, high-bit-rate uploading can still be realized under the condition that the network is unstable in a live broadcast scene, and the improvement of the picture definition and the smoothness of a playing end is facilitated.

In some embodiments, the second video data does not include a background image; the first sending module 630 is further configured to send the second video data to the opposite end, so that the opposite end extracts the background image from the last received complete video data, and synthesizes the second video frame and the background image into a video frame of the third video data.

In some embodiments, the video processing apparatus may further include: the first extraction module is used for extracting a background image of at least one video frame in the first video data; the second video data further comprises a background image,

the first sending module 630 is further configured to send the second video data to the peer, so that the peer synthesizes the second video frame and the background image into a video frame of the third video data.

In some embodiments, the second video data comprises a target video frame and a second video frame in the first video data;

the first sending module 630 is further configured to send the second video data to the peer, so that the peer synthesizes the background image extracted from the target video frame and the second video frame into a video frame of the third video data.

In some embodiments, the first processing module 620 is further configured to segment N video frames in the first video data into an original background image and a target foreground image, where N is a positive integer;

and processing the original background image to obtain an updated image comprising the target foreground image and the processed original background image, and generating second video data, wherein the space occupied by the updated image is smaller than the corresponding video frame in the first video data.

In some embodiments, the first processing module 620 is further configured to extract a target foreground image of at least a portion of the video frames in the first video data when the uplink rate is lower than the target value and the time length from the last time of sending the complete video data to the peer end is less than the target time length.

In some embodiments, the first sending module 630 is further configured to upload newly acquired complete video data when the uplink rate is lower than the target value and the time length from the last time of sending the complete video data to the peer end is not less than the target time length.

The embodiment of the present application further provides another video processing apparatus, where the video processing apparatus may be a network side device or a control module of the network side device, such as an audio and video server.

As shown in fig. 7, the video processing apparatus includes: a first receiving module 710 and a first synthesizing module 720.

A first receiving module 710, configured to receive second video data, where the second video data includes a second video frame for representing a foreground image of a target;

the first synthesizing module 720 is configured to synthesize the second video frame in the second video data and the background image extracted from the complete video data into a video frame of third video data to obtain the third video data, where the video frame of the complete video data includes a foreground image and a background image.

According to the video processing device provided by the embodiment of the application, under the condition that incomplete second video data is received, the second video data and the background image in the complete video data are synthesized, so that third video data with the effect far exceeding that of the second video data can be provided for the playing end, and the improvement of the image definition and the fluency of the playing end is facilitated.

In some embodiments, the first receiving module 710 is further configured to receive complete video data;

a first synthesizing module 720, further configured to extract a background image from the latest received complete video data; and synthesizing the second video frame and the background image into a video frame of the third video data.

In some embodiments, the second video data comprises a background image and a second video frame;

the first synthesizing module 720 is further configured to synthesize the second video frame and the background image in the second video data into a video frame of the third video data.

a first synthesizing module 720, further configured to extract a background image from the target video frame;

The video processing apparatus in the embodiment of the present application may be an apparatus, or may be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the embodiments of the present application are not particularly limited.

The video processing apparatus in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.

The video processing apparatus provided in the embodiment of the present application can implement each process implemented by the video processing apparatus in the method embodiments of fig. 1 to fig. 5, and for avoiding repetition, details are not repeated here.

As shown in fig. 8, an electronic device according to an embodiment of the present application is further provided, which includes a processor 820, a memory 810, and a program or an instruction stored in the memory 810 and executable on the processor 820, where the program or the instruction is executed by the processor 820 to implement the processes of the video processing method embodiment, and can achieve the same technical effects, and details are not repeated here to avoid repetition.

It should be noted that the electronic devices in the embodiments of the present application include the mobile electronic devices and the non-mobile electronic devices described above.

Fig. 9 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

The electronic device 900 includes, but is not limited to: a radio frequency unit 901, a network module 902, an audio output unit 903, an input unit 904, a sensor 905, a display unit 906, a user input unit 907, an interface unit 908, a memory 909, and a processor 910.

Those skilled in the art will appreciate that the electronic device 900 may further include a power source (e.g., a battery) for supplying power to various components, and the power source may be logically connected to the processor 910 through a power management system, so as to manage charging, discharging, and power consumption management functions through the power management system. The electronic device structure shown in fig. 9 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is not repeated here.

The input unit 904, which includes a camera in this embodiment of the present application, is configured to acquire first video data, where the first video data includes a plurality of video frames;

a processor 910, configured to extract a target foreground image of at least a portion of video frames in the first video data, and generate second video data based on the target foreground image;

the network module 902 is configured to send the second video data to the peer end, so that the peer end generates third video data based on the second video data, where a video frame of the third video data is synthesized by the second video frame and a background image extracted from complete video data, and the video frame of the complete video data includes a foreground image and a background image.

According to the electronic equipment provided by the embodiment of the application, the target foreground image is extracted from the first video data, and the generated second video data is uploaded, so that the size of actually uploaded data can be effectively reduced, the requirement on an uplink network is reduced, fewer network resources are occupied, high-bit-rate uploading can still be realized under the condition that the network is unstable in a live broadcast scene, and the image definition and the fluency of a playing end can be improved.

In some embodiments, the processor 910 is further configured to extract a target foreground image of at least a part of the video frames in the first video data if the uplink rate is lower than the target value and the time length from the last time of uploading the complete video data is less than the target time length.

In some embodiments, the processor 910 is further configured to segment N video frames in the first video data into an original background image and a target foreground image, where N is a positive integer; and processing the original background image to obtain an updated image comprising the target foreground image and the processed original background image, and generating second video data, wherein the space occupied by the updated image is smaller than the corresponding video frame in the first video data.

Optionally, the value of N is positively correlated with the uplink rate.

In some embodiments, the network module 902 is further configured to upload newly acquired complete video data when the uplink rate is lower than the target value and the time length from the last time of uploading the complete video data is not less than the target time length.

Optionally, the number of video frames in the new complete video data is positively correlated to the uplink rate.

In some embodiments, the processor 910 is further configured to extract a background image of at least one video frame in the first video data.

It should be noted that, in this embodiment, the electronic device 900 may implement each process in the method embodiment in this embodiment and achieve the same beneficial effects, and for avoiding repetition, details are not described here again.

It should be understood that, in the embodiment of the present application, the input Unit 904 may include a Graphics Processing Unit (GPU) 9041 and a microphone 9042, and the Graphics Processing Unit 9041 processes image data of a still picture or a video obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 906 may include a display panel 9061, and the display panel 9061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 907 includes a touch panel 9071 and other input devices 9072. A touch panel 9071 also referred to as a touch screen. The touch panel 9071 may include two parts, a touch detection device and a touch controller. Other input devices 9072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. Memory 909 can be used to store software programs as well as various data including, but not limited to, application programs and operating systems. The processor 910 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It is to be appreciated that the modem processor described above may not be integrated into processor 910.

The embodiments of the present application further provide a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the video processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement each process of the above video processing method embodiment, and can achieve the same technical effect, and the details are not repeated here to avoid repetition.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A video processing method, comprising:

acquiring first video data;

and sending the second video data to an opposite terminal so that the opposite terminal generates third video data based on the second video data, wherein a video frame of the third video data is synthesized by the second video frame and a background image extracted from complete video data, and the video frame of the complete video data comprises a foreground image and a background image.

2. The video processing method according to claim 1, wherein the second video data does not include a background image;

the sending the second video data to an opposite end to enable the opposite end to generate third video data based on the second video data comprises:

and sending the second video data to an opposite terminal so that the opposite terminal extracts a background image from the latest received complete video data and synthesizes the second video frame and the background image into a video frame of third video data.

3. The video processing method of claim 1, further comprising: extracting a background image of at least one video frame in the first video data;

the second video data further includes the background image, and the sending the second video data to an opposite end to enable the opposite end to generate third video data based on the second video data includes:

and sending the second video data to an opposite end so that the opposite end synthesizes the second video frame and the background image into a video frame of third video data.

4. The video processing method according to claim 1, wherein the second video data comprises a target video frame in the first video data and the second video frame;

and sending the second video data to an opposite terminal so that the opposite terminal synthesizes the background image extracted from the target video frame and the second video frame into a video frame of third video data.

5. The video processing method according to any of claims 1-4, wherein said extracting a target foreground image of at least a portion of video frames in the first video data and generating second video data based on the target foreground image comprises:

dividing N video frames in the first video data into an original background image and the target foreground image, wherein N is a positive integer;

and processing the original background image to obtain an updated image comprising the target foreground image and the processed original background image, and generating the second video data, wherein the space occupied by the updated image is smaller than the corresponding video frame in the first video data.

6. The video processing method according to any of claims 1-4, wherein said extracting a target foreground image of at least a portion of video frames in the first video data comprises:

7. The video processing method according to any one of claims 1-4, further comprising: and uploading newly acquired complete video data under the condition that the uplink rate is lower than the target value and the time length from the last time of sending the complete video data to the opposite terminal is not less than the target time length.

8. A video processing method, comprising:

9. The video processing method of claim 8, wherein prior to said receiving second video data, said method further comprises: receiving complete video data;

the synthesizing the second video frame in the second video data and the background image extracted from the complete video data into a video frame of third video data includes:

extracting a background image from the latest received complete video data;

and synthesizing the second video frame and the background image into a video frame of third video data.

10. The video processing method according to claim 8, wherein the second video data comprises a background image and the second video frame;

and synthesizing the second video frame and the background image in the second video data into a video frame of third video data.

11. The video processing method according to claim 8, wherein the second video data comprises the second video frame and a target video frame in the first video data;

extracting a background image from the target video frame;

12. A video processing apparatus, comprising:

the first acquisition module is used for acquiring first video data;

13. A video processing apparatus, comprising:

14. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions when executed by the processor implementing the steps of the video processing method according to any one of claims 1-11.