CN114900717A

CN114900717A - Video data transmission method, device, medium and computing equipment

Info

Publication number: CN114900717A
Application number: CN202210524716.6A
Authority: CN
Inventors: 陶金亮; 阮良; 陈功; 陈丽
Original assignee: Hangzhou Netease Zhiqi Technology Co Ltd
Current assignee: Hangzhou Netease Zhiqi Technology Co Ltd
Priority date: 2022-05-13
Filing date: 2022-05-13
Publication date: 2022-08-12
Anticipated expiration: 2042-05-13
Also published as: CN114900717B

Abstract

The embodiment of the disclosure provides a video data transmission method, a device, a medium and a computing device, which are applied to a sending end; the method comprises the following steps: acquiring a video frame image to be transmitted; determining whether an image processing capability set obtained by negotiation with a receiving end contains super-resolution capability or not; if the image processing capacity set comprises super-resolution capacity, processing the video frame image to reduce the image resolution to obtain a low-resolution image; wherein an image resolution of the low resolution image matches a processable resolution of the super resolution capability; and sending the low-resolution image to the receiving end so that the receiving end carries out super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and determining the high-resolution image as the video frame image. The video data transmission method and device can improve video data transmission efficiency and guarantee transmission quality of corresponding videos.

Description

Video data transmission method, device, medium and computing equipment

Technical Field

The embodiment of the disclosure relates to the technical field of videos, in particular to a video data transmission method, a video data transmission device, a video data transmission medium and a computing device.

Background

This section is intended to provide a background or context to the embodiments of the disclosure recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

With the development of computer technology and the increasing maturity of digital media technology, video recording APP, video call APP and the like are more and more widely applied. The user can record a video through the video recording APP, and share the video to other users; alternatively, the user may communicate with other users in Real Time in video form through a video call APP (RTC). In such a case, it is generally required that the electronic device used by the user captures video data and transmits the video data to the electronic devices used by other users, so that the electronic devices used by other users present video content corresponding to the video data to other users.

In the process of video data transmission, how to ensure the video quality becomes a problem of great concern.

Disclosure of Invention

In this context, embodiments of the present disclosure are intended to provide one.

In a first aspect of the disclosed embodiments, a video data transmission method is provided, which is applied to a sending end; the method comprises the following steps:

acquiring a video frame image to be transmitted;

determining whether an image processing capability set obtained by negotiation with a receiving end contains super-resolution capability or not;

if the image processing capacity set comprises super-resolution capacity, processing the video frame image to reduce the image resolution to obtain a low-resolution image; wherein an image resolution of the low resolution image matches a processable resolution of the super resolution capability;

and sending the low-resolution image to the receiving end so that the receiving end carries out super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and determining the high-resolution image as the video frame image.

In a second aspect of the disclosed embodiments, a video real-time transmission method is provided, which is applied to a receiving end; the method comprises the following steps:

receiving a low-resolution image sent by a sending end; the low-resolution image is obtained by processing the video frame image to be transmitted to reduce the image resolution under the condition that the sending end determines that the image processing capability set obtained by negotiation with the receiving end contains the super-resolution capability; wherein an image resolution of the low resolution image matches a processable resolution of the super resolution capability;

and performing super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and determining the high-resolution image as the video frame image.

In a third aspect of the disclosed embodiments, there is provided a video data transmission apparatus, which is applied to a transmitting end; the device comprises:

the acquisition module is used for acquiring a video frame image to be transmitted;

the determining module is used for determining whether the super-resolution capability is contained in the image processing capability set obtained by negotiation with the receiving end;

the processing module is used for carrying out image resolution reduction processing on the video frame image to obtain a low-resolution image if the image processing capacity set comprises super-resolution capacity; wherein an image resolution of the low resolution image matches a processable resolution of the super resolution capability;

and the sending module is used for sending the low-resolution image to the receiving end so that the receiving end carries out super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image and determine the high-resolution image as the video frame image.

In a fourth aspect of the disclosed embodiments, there is provided a video real-time transmission apparatus, which is applied to a receiving end; the device comprises:

the receiving module is used for receiving the low-resolution image sent by the sending end; the low-resolution image is obtained by processing the video frame image to be transmitted to reduce the image resolution under the condition that the sending end determines that the image processing capability set obtained by negotiation with the receiving end contains the super-resolution capability; wherein an image resolution of the low resolution image matches a processable resolution of the super resolution capability;

and the first reconstruction module is used for performing super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image and determining the high-resolution image as the video frame image.

In a fifth aspect of the disclosed embodiments, there is provided a medium having stored thereon a computer program that, when executed by a processor, implements any of the video data transmission processing methods described above.

In a fourth aspect of embodiments of the present disclosure, there is provided a computing device comprising:

a processor;

a memory for storing a processor executable program;

wherein the processor executes the executable program to realize any one of the above video data transmission methods.

According to the embodiment of the disclosure, in a video data transmission process, a sending end may, under a condition that it is determined that an image processing capability set obtained by negotiation with a receiving end includes a super-resolution capability, perform image resolution reduction processing on a video frame image to be transmitted first to obtain a low-resolution image, where an image resolution of the low-resolution image matches a processable resolution of the super-resolution capability, and then may send the low-resolution image to the receiving end, where the receiving end may perform super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and determine the high-resolution image as the video frame image.

By adopting the mode, on one hand, the sending end can reduce the image resolution of the video frame image to be transmitted and send the processed low-resolution image to the receiving end, so that the data volume of the video data transmitted in the video data transmission process can be reduced, and the video data transmission efficiency is improved; on the other hand, the receiving end can perform super-resolution reconstruction on the received low-resolution image, and the reconstructed high-resolution image is determined to be the video frame image sent by the sending end, so that the image quality of the video frame image acquired by the receiving end can be ensured, the transmission quality of the corresponding video is ensured, and the user experience is improved.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description, which proceeds with reference to the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

fig. 1 schematically shows a schematic diagram of an application scenario of video data transmission according to an embodiment of the present disclosure;

fig. 2 schematically shows a flow chart of a video data transmission method according to an embodiment of the present disclosure;

fig. 3 schematically shows a flow chart of another video data transmission method according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart of a method of image processing capability set negotiation, according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a software architecture diagram for image processing capability set negotiation, according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a software architecture diagram for video quality control according to an embodiment of the present disclosure;

fig. 7 schematically shows a schematic diagram of a software architecture for super-resolution reconstruction according to an embodiment of the present disclosure;

FIG. 8 schematically illustrates a schematic diagram of a medium according to an embodiment of the present disclosure;

fig. 9 schematically shows a block diagram of a video data transmission apparatus according to an embodiment of the present disclosure;

fig. 10 schematically shows a block diagram of another video data transmission apparatus according to an embodiment of the present disclosure;

FIG. 11 schematically shows a schematic diagram of a computing device in accordance with an embodiment of the present disclosure.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present disclosure will be described below with reference to several exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the present disclosure, and are not intended to limit the scope of the present disclosure in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As will be appreciated by one of skill in the art, embodiments of the present disclosure may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

According to an embodiment of the disclosure, a video data transmission method, a video data transmission device, a video data transmission medium and a computing device are provided.

In this document, it is to be understood that any number of elements in the figures are provided by way of illustration and not limitation, and any nomenclature is used for differentiation only and not in any limiting sense.

The principles and spirit of the present disclosure are explained in detail below with reference to several representative embodiments of the present disclosure.

Summary of The Invention

Taking two different users (assumed to be user 1 and user 2) to perform real-time communication in a video form through the video call APP, on one hand, the electronic device used by user 1 collects video data and sends the video data to the server corresponding to the video call APP, and the server sends the video data to the electronic device used by user 2, so that the electronic device used by user 2 displays the video corresponding to the video data to user 2, so that user 2 can view the video shot by user 1. At this time, the electronic device used by the user 1 is a sending end in the video data transmission process, and the electronic device used by the user 2 is a receiving end in the video data transmission process.

On the other hand, the electronic device used by the user 2 may collect video data and send the video data to the server corresponding to the video call APP, and the server sends the video data to the electronic device used by the user 1, so that the electronic device used by the user 1 displays a video corresponding to the video data to the user 1, so that the user 1 may view the video shot by the user 2. At this time, the electronic device used by the user 2 is a sending end in the video data transmission process, and the electronic device used by the user 1 is a receiving end in the video data transmission process.

In practical application, the video data comprises a series of images which are collected for a shot target in a period of time and are continuous in time sequence; these images are commonly referred to as video frame images.

In the related art, due to the limitation of network bandwidth, video data to be transmitted is usually processed at a transmitting end to reduce the data volume of the video data; for example, the image resolution of each video frame image in the video data is reduced to reduce the number of pixels in each video frame image.

However, reducing the image resolution of an image affects the image quality of the image. Therefore, the quality of the video frame image corresponding to the video data acquired by the receiving end is usually affected, and the user experience is poor.

In order to guarantee the image quality of video frames in the video data transmission process, the technical scheme for video data transmission is provided. In the process of transmitting video data, a transmitting end may perform processing for reducing an image resolution on a video frame image to be transmitted first to obtain a low-resolution image under the condition that it is determined that an image processing capability set obtained by negotiation with a receiving end includes a super-resolution capability, the image resolution of the low-resolution image is matched with a processable resolution of the super-resolution capability, and then the low-resolution image may be transmitted to the receiving end, and the receiving end may perform super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and determine the high-resolution image as the video frame image.

By adopting the mode, on one hand, the sending end can carry out the processing of reducing the image resolution of the video frame image to be transmitted and send the processed low-resolution image to the receiving end, so that the data volume of the video data transmitted in the video data transmission process can be reduced, and the video data transmission efficiency is improved; on the other hand, the receiving end can perform super-resolution reconstruction on the received low-resolution image and determine the reconstructed high-resolution image as the video frame image sent by the sending end, so that the image quality of the video frame image acquired by the receiving end can be ensured, the transmission quality of the corresponding video is ensured, and the user experience is improved.

Having described the basic principles of the present disclosure, various non-limiting embodiments of the present disclosure are described in detail below.

Application scene overview

Referring to fig. 1, fig. 1 schematically shows a schematic diagram of an application scenario of video data transmission according to an embodiment of the present disclosure.

As shown in fig. 1, in an application scenario of video data transmission, a server and at least one client (e.g., clients 1-N) accessing the server through any type of wired or wireless network may be included.

The server can be deployed on a server comprising an independent physical host or a server cluster consisting of a plurality of independent physical hosts; or, the server may be a server built based on a cloud computing service.

The client may correspond to an APP installed in an electronic device used by the user; the electronic device may be a smart phone, a tablet Computer, a notebook Computer, a PC (Personal Computer), a pda (Personal Digital Assistants), a wearable device (e.g., smart glasses, smart watches, etc.), a smart car-mounted device, a game console, or the like.

On one hand, the electronic equipment can be loaded with shooting hardware for collecting images or videos, such as an embedded camera, an external camera and the like. In this case, the electronic device may invoke the shooting hardware, collect a video for a shooting target, and serve as a sending end in a video data transmission process, and send the collected video data to other electronic devices serving as receiving ends in the video data transmission process through the server.

On the other hand, the electronic device may serve as a receiving end in a video data transmission process, receive video data sent by other electronic devices serving as sending ends in the video data transmission process through the server, and display videos corresponding to the video data to users.

Exemplary method

A method for video data transmission according to an exemplary embodiment of the present disclosure is described below with reference to fig. 2-5 in conjunction with the application scenario of fig. 1. It should be noted that the above application scenarios are merely illustrated for the convenience of understanding the spirit and principles of the present disclosure, and the embodiments of the present disclosure are not limited in this respect. Rather, embodiments of the present disclosure may be applied to any scenario where applicable.

Referring to fig. 2, fig. 2 schematically shows a flow chart of a video data transmission method according to an embodiment of the present disclosure.

The video data transmission method can be applied to a sending end in the video data transmission process; the video data transmission method may include the steps of:

step 201: and acquiring a video frame image to be transmitted.

In this embodiment, the sending end may obtain a video frame image in the video data to be transmitted.

In practical application, the video data to be transmitted may be video data autonomously acquired by the transmitting end, or video data acquired by the transmitting end from other electronic devices. The video data to be transmitted may be real-time video data (for example, video data in real-time communication in the form of video) or may be video data obtained in advance. The present disclosure is not so limited.

Step 202: and determining whether the super-resolution capability is contained in the image processing capability set obtained by negotiation with the receiving end.

In this embodiment, before processing a video frame image in video data to be transmitted, the sending end may first determine whether an image processing capability set obtained by negotiation with the receiving end in the video data transmission process includes a super-resolution capability.

In practical application, the super-resolution is to improve the image resolution of an image by a hardware or software method; the super-resolution reconstruction is a process of obtaining a high-resolution image through a low-resolution image.

Step 203: if the image processing capacity set contains super-resolution capacity, the video frame image is processed to reduce the image resolution to obtain a low-resolution image; wherein a resolution of the low resolution image matches a processable resolution of the super resolution capability.

In this embodiment, if the super-resolution capability is included in the set of image processing capabilities, the receiving end may be considered to have the super-resolution capability, that is, the receiving end may super-resolution reconstruct an image with a low resolution into an image with a high resolution. In this case, the transmitting end may perform image resolution reduction processing on the video frame image to obtain a low-resolution image.

It should be noted that, in order to ensure that the receiving end can perform super-resolution reconstruction on the low-resolution image, the image resolution of the low-resolution image may match the processable resolution of the super-resolution capability.

If the image processing capability set does not include the super-resolution capability, the receiving end may not have the super-resolution capability, that is, the receiving end cannot super-resolution reconstruct the low-resolution image into the high-resolution image. In this case, the transmitting end does not need to perform image resolution reduction processing on the video frame image.

Step 204: and sending the low-resolution image to the receiving end so that the receiving end carries out super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and determining the high-resolution image as the video frame image.

In this embodiment, when the transmitting side obtains the low-resolution image, the transmitting side may transmit the low-resolution image to the receiving side.

When the receiving end receives the low-resolution image sent by the sending end, the receiving end can perform super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and the high-resolution image is determined to be the video frame image, so that video display can be performed based on the video frame image.

In correspondence with the video data transmission method as shown in fig. 2, referring to fig. 3, fig. 3 schematically shows a flowchart of another video data transmission method according to an embodiment of the present disclosure.

The video data transmission method can be applied to a receiving end in the video data transmission process; the video data transmission method may include the steps of:

step 301: receiving a low-resolution image sent by a sending end; the low-resolution image is obtained by processing the video frame image to be transmitted to reduce the image resolution under the condition that the sending end determines that the image processing capability set obtained by negotiation with the receiving end contains the super-resolution capability; wherein a resolution of the low resolution image matches a processable resolution of the super resolution capability.

Step 302: and performing super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and determining the high-resolution image as the video frame image.

The specific implementation of the steps 301-302 can refer to the steps 201-204, which will not be described herein again.

According to the embodiments shown in fig. 2 and fig. 3, in the video data transmission process, a sending end may, under the condition that it is determined that an image processing capability set obtained by negotiation with a receiving end includes a super-resolution capability, perform image resolution reduction processing on a video frame image to be transmitted first to obtain a low-resolution image, where an image resolution of the low-resolution image matches a processable resolution of the super-resolution capability, and then may send the low-resolution image to the receiving end, where the receiving end may perform super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and determine the high-resolution image as the video frame image.

In addition, before the sending end performs image resolution reduction processing on the video frame image to be transmitted to obtain the low-resolution image, the sending end may also adjust the image resolution, the video frame rate and/or the video code rate of the video frame image based on the video quality control strategy.

The following describes the video data transmission method shown in fig. 2 and 3 in terms of image processing capability set negotiation, video quality control, processing for reducing image resolution, and super-resolution reconstruction.

In an embodiment shown, the sending end and the receiving end may obtain the image processing capability set through negotiation in advance, so as to avoid that the negotiation of the image processing capability set is performed again in the video data transmission process, which affects the video data transmission efficiency.

Specifically, the sending end and the receiving end may respectively report their image processing capability sets to a server, and the server determines an image processing capability set that is commonly satisfied by the sending end and the receiving end based on an image processing capability set of the sending end (hereinafter referred to as a first image processing capability set) and an image processing capability set of the receiving end (hereinafter referred to as a second image processing capability set), and issues the determined image processing capability set to the sending end, so that the sending end determines the image processing capability set issued by the server as an image processing capability set negotiated with the receiving end.

In addition, the server may also issue the determined image processing capability set to the receiver, so that when the receiver is used as a new sender to transmit video data to the sender, the image processing capability set issued by the server is directly determined as an image processing capability set negotiated with the sender.

Referring to fig. 4, fig. 4 schematically shows a flowchart of an image processing capability set negotiation method according to an embodiment of the present disclosure.

The image processing capability set negotiation method can be realized through data interaction among the sending terminal, the receiving terminal and the server terminal; the image processing capability set negotiation method may include the steps of:

step 401: the sending end sends the first group joining request to the server end; wherein the first join group request comprises a first set of image processing capabilities of the sender.

Step 402: the receiving end sends the second group joining request to the server end; wherein the second join group request comprises a second set of image processing capabilities of the receiving end.

In this embodiment, the sending end may send a join group request (hereinafter referred to as a first join group request) to the server end, so as to perform video data transmission in a corresponding communication group.

Similarly, the receiving end may send a join group request (hereinafter referred to as a second join group request) to the serving end, so as to perform video data transmission in a corresponding communication group.

Under the condition that the sending end and the receiving end are connected to the same communication group, data interaction can be carried out between the sending end and the receiving end, namely the sending end can transmit video data to the receiving end.

It should be noted that there is no sequence between the above step 401 and the above step 402 in terms of time sequence. That is, the sending end may send the first join group request to the server first, or the receiving end may send the second join group request to the server first.

Step 403: a server determines the set of image processing capabilities of the communication group based on the first set of image processing capabilities and the second set of image processing capabilities.

In this embodiment, the server may determine, based on the first image processing capability set and the second image processing capability set, an image processing capability set of a communication group in which the sending end and the receiving end are located.

In an embodiment shown in the foregoing description, the server may determine an intersection of the first image processing capability set and the second image processing capability set as an image processing capability set of the communication group. That is, the image processing capability in the image processing capability set of the communication group is included in both the first image processing capability set and the second image processing capability set, so that it can be ensured that the image processing capability set is matched with both the transmitting end and the receiving end.

Step 404: and the server side sends the image processing capacity set to the sending side.

Step 405: and the sending end determines the image processing capacity set as an image processing capacity set obtained by negotiation with the receiving end.

Step 406: and the server side sends the image processing capacity set to a receiving side.

Step 407: and the receiving end determines the image processing capacity set as an image processing capacity set obtained by negotiation with the transmitting end.

In this embodiment, when determining the image processing capability set of the communication group, the server may send the image processing capability set to the sending end, so that the sending end determines the image processing capability set as the image processing capability set negotiated with the receiving end.

In addition, the server may further send the image processing capability set to the receiver, so that the receiver determines the image processing capability set as an image processing capability set negotiated with the sender.

It should be noted that there is no sequence between the step 404 and the step 406 in terms of time sequence. That is, the server may first transmit the image processing capability set to the transmitting end, or the server may first transmit the image processing capability set to the receiving end.

Referring to fig. 5, fig. 5 schematically illustrates a software architecture diagram for image processing capability set negotiation according to an embodiment of the present disclosure.

In practical applications, for the communication group, the server may create the communication group first, and set a default image processing capability set for the communication group.

The client joining the communication group may be the transmitting end or the receiving end.

The server may determine, when receiving a group joining request requesting to join the communication group sent by a first client, whether the group joining request carries an image processing capability set corresponding to the client, and if so, may overwrite the default image processing capability set with the image processing capability set corresponding to the client, that is, update the image processing capability set of the communication group to the image processing capability set corresponding to the client.

The server may determine, when receiving a group joining request requesting to join the communication group sent by a subsequent client, whether the group joining request carries an image processing capability set corresponding to the client, and if so, determine whether the image processing capability set corresponding to the client is greater than a current image processing capability set of the communication group. If the image processing capability set corresponding to the client is larger than the current image processing capability set of the communication group, the server may not update the image processing capability set of the communication group, but send the current image processing capability set to the client, so that the client determines the current image processing capability set as the negotiated image processing capability set. If the image processing capability set corresponding to the client is smaller than the current image processing capability set of the communication group, the server may take an intersection of the image processing capability set corresponding to the client and the current image processing capability set of the communication group, update the image processing capability set of the communication group to the intersection, and send a new image processing capability set to each client in the communication group, so that each client determines the new image processing capability set as the image processing capability set obtained by negotiation.

The sending end can perform image resolution reduction processing on the video frame image to obtain a low-resolution image under the condition that the super-resolution capability is determined to be included in the image processing capability set obtained by negotiation with the receiving end.

Network environment parameters such as network bandwidth are usually different for different video data transmission processes. In order to enable video data transmission to better meet the requirements of actual network environment parameters, the sending end may further adjust the image resolution, the video frame rate and/or the video code rate of the video frame image based on a video quality control policy before performing image resolution reduction processing on the video frame image to obtain a low-resolution image. For example, in the case of a small network bandwidth, before reducing the image resolution of a video frame image, the image resolution, the video frame rate, and/or the video bitrate of the video frame image may be reduced based on a video quality control policy to reduce the bandwidth occupation during the transmission process; under the condition of larger network bandwidth, the image resolution, the video frame rate and/or the video code rate of the video frame image can be improved based on the video quality control strategy before the image resolution of the video frame image is reduced, so that the network bandwidth is fully utilized.

Referring to fig. 6, fig. 6 schematically shows a software architecture diagram for video quality control according to an embodiment of the present disclosure.

Specifically, in an example, the sending end may determine, through an OveruseFrameDetector module, a CPU (Central Processing Unit) usage rate periodically based on an encoding time consumption when encoding a historical video frame image according to a certain time period (hereinafter, referred to as a first time period), and compare the determined CPU usage rate with a high usage rate threshold and a low usage rate threshold, respectively.

If the CPU utilization is greater than the high utilization threshold, it indicates that more CPU resources are occupied by encoding the video frame image before, which affects the CPU, so the sending end may first reduce the image resolution and/or the video frame rate of the video frame image to reduce the CPU utilization, and then perform processing to reduce the image resolution on the video frame image to obtain a low-resolution image.

If the CPU utilization rate is less than the low utilization rate threshold, it indicates that less CPU resources are occupied by encoding the video frame image before, and the CPU cannot be fully utilized, so the sending end can firstly improve the image resolution and/or the video frame rate of the video frame image to improve the CPU utilization rate, and then perform the processing of reducing the image resolution on the video frame image to obtain the low-resolution image.

The specific values of the first time period, the high usage threshold and the low usage threshold may be preset by a technician, or may be default values, which is not limited by the present disclosure.

In another example, the sending end may determine, through a MediaOptimization module, based on a leaky bucket algorithm, whether to perform frame dropping processing on the video frame image according to a maximum video code rate, a minimum video code rate, a target video code rate, and an actual video code rate when encoding a historical video frame image, so as to adjust the video code rate of the video frame image, and then perform processing for reducing the image resolution on the video frame image without performing frame dropping processing on the video frame image, so as to obtain a low-resolution image.

The maximum video code rate represents the maximum video code rate allowed in the current video data transmission process, the minimum video code rate represents the minimum video code rate allowed in the current video data transmission process, and the target video code rate represents the video code rate required in the current video data transmission process. The specific values of the maximum video bitrate, the minimum video bitrate, and the target video bitrate may be preset by a technician, or may be default values, which is not limited by the present disclosure.

In another example, the transmitting end may periodically determine an average Quantization Parameter (QP) when the historical video frame image is encoded according to a certain time period (hereinafter, referred to as a second time period) through the QualityScaler module, and compare the determined average quantization Parameter with the high quantization Parameter threshold and the low quantization Parameter threshold, respectively.

In practical application, the smaller the value of the quantization parameter is, the finer the quantization is, the higher the image quality is, and the higher the video code rate is; the larger the value of the quantization parameter is, the coarser the quantization is, the lower the image quality is, and the lower the video code rate is.

The high quantization parameter threshold and the low quantization parameter threshold correspond to a coding method used when a history video frame image is coded. The high quantization parameter threshold and the low quantization parameter threshold corresponding to different encoding schemes are usually different; for example, the high quantization parameter threshold corresponding to the encoding method of the h.264 standard may be 37, and the low quantization parameter threshold corresponding to the encoding method of the h.264 standard may be 24.

If the average quantization parameter is greater than the high quantization parameter threshold, the sending end may first reduce the image resolution and/or the video frame rate of the video frame image, and then perform image resolution reduction processing on the video frame image to obtain a low resolution image.

If the average quantization parameter is smaller than the low quantization parameter threshold, the sending end may first increase the image resolution and/or the video frame rate of the video frame image, and then perform image resolution reduction processing on the video frame image to obtain a low resolution image.

The specific values of the second time period, the high quantization parameter threshold and the low quantization parameter threshold may be preset by a technician, or may be default values, which is not limited by the present disclosure.

For the above-mentioned video frame image, when the image resolution of the video frame image is reduced, the image resolution of the video frame image may be specifically reduced to 1/4 to 3/5 of the original image resolution, and when the image resolution of the video frame image is increased, the image resolution of the video frame image may be specifically increased to 5/3 to 4 times of the original image resolution.

The specific numerical values adopted when the image resolution is reduced and improved can be selected according to manual experience and actual requirements.

Similarly, when the video frame rate of the video frame image is decreased, the video frame rate of the video frame image may be specifically decreased to 2/3 of the original video frame rate, and when the video frame rate of the video frame image is increased, the video frame rate of the video frame image may be specifically increased to the highest video frame rate allowed at the resolution of the current image.

The specific numerical values adopted when the video frame rate is reduced and improved can be selected according to manual experience and actual requirements.

The sending end may perform image resolution reduction processing on the video frame image to obtain a low resolution image and send the low resolution image to the receiving end when determining that the image processing capability set negotiated with the receiving end includes the super resolution capability.

In one embodiment shown, the range of processable resolutions for the super-resolution capability described above can be determined as 360 × 360 to 1280 × 1280, depending on practical requirements and technical limitations.

In practical application, 360 × 360 indicates that the number of pixels in the horizontal direction and the number of pixels in the vertical direction of the image are both 360, and 1280 × 1280 indicates that the number of pixels in the horizontal direction and the number of pixels in the vertical direction of the image are both 1280.

For the processable resolution matching the super-resolution capability, the number of the pixel points in the horizontal direction of the low-resolution image may be in the range of 360-.

Referring to fig. 7, fig. 7 schematically shows a schematic diagram of a software architecture of super-resolution reconstruction according to an embodiment of the present disclosure.

In one embodiment, when the receiving end performs the super-resolution reconstruction on the low-resolution image based on the super-resolution algorithm to obtain the high-resolution image corresponding to the low-resolution image, the receiving end may specifically input the low-resolution image into a super-resolution model for performing the super-resolution reconstruction on the image, so that the super-resolution model performs the super-resolution reconstruction on the low-resolution image to obtain the high-resolution image corresponding to the low-resolution image.

In one embodiment shown, the hyper-minute model may be a hyper-minute model based on a dual attention mechanism. That is, the hyper-segmentation model may perform feature extraction on the input image based on a double attention mechanism.

In practical applications, attention is drawn to screen out a small amount of important information from a large amount of information and focus on the important information, while ignoring most of the unimportant information.

The dual attention mechanism may be a spatial attention mechanism and a channel attention mechanism.

For an input image, spatial attention may help focus on important information in the image.

For the convolutional layer in the hyper-divisional model, after the image is input into the convolutional layer for feature extraction, the number of the images output by the convolutional layer is equal to the number of convolution channels of the convolutional layer. That is, each convolution channel of the convolution layer outputs feature data obtained by performing feature extraction on the image through the convolution channel. In this case, channel attention may help focus on the convolution channel outputting important information.

The super-resolution reconstruction is carried out on the image through the super-resolution model based on the double attention mechanism, so that the image obtained through the super-resolution reconstruction has better details and edges.

In one illustrated embodiment, the hyper-segmentation model may include sub-pixel convolution layers for rearranging pixels in the feature image to fully utilize feature data output by each convolution channel of the convolution layers in the hyper-segmentation model.

In the process of camera imaging, the obtained image data is a discretized process of the image, and each pixel on the imaging surface only represents a nearby color due to the capacity limitation of the photosensitive element itself. For example, pixels on two sensory elements are 4.5um apart, macroscopically they are connected together, microscopically there are innumerable tiny things between them, and these pixels that exist between two actual physical pixels are called "sub-pixels".

The sub-pixel should actually be present but be missing a smaller sensor to detect it and therefore only be able to be calculated approximately in software.

According to the difference of interpolation conditions between two adjacent pixels, the accuracy of the sub-pixels can be adjusted, for example, one fourth, namely, each pixel is taken as four pixel points in the transverse direction and the longitudinal direction. Therefore, the mapping from a small rectangle to a large rectangle can be realized by the sub-pixel interpolation method, thereby improving the resolution.

In an embodiment shown in the above, before the low-resolution image is input to the hyper-resolution model, the receiving end may first divide the low-resolution image into a plurality of image blocks, and classify the plurality of image blocks according to the complexity of the image texture, so as to perform different super-resolution reconstructions on the image block with high texture complexity and the image block with low texture complexity.

Specifically, each image block may be input to a texture classification model, so that the texture classification model performs texture classification on each image block. The texture classification model can be obtained by carrying out supervised training based on a plurality of image samples; the plurality of image samples may each be labeled with a label corresponding to a complexity of the texture.

In practical applications, the image may include a photographic subject (also referred to as a detail) such as a person or an object and a background. For a plurality of image blocks obtained by dividing an image, the image texture of the image block containing details is more complex, namely the image block with high texture complexity; the image texture of an image block that does not contain details but only contains the background is simpler, i.e. an image block with low texture complexity.

For an image block with low texture complexity, the effect difference of super-resolution reconstruction is not large based on a complex super-resolution algorithm and a simple super-resolution algorithm. Therefore, the super-resolution reconstruction can be performed on the image block with low texture complexity based on a simple super-resolution algorithm so as to improve the calculation speed on the premise of ensuring the super-resolution reconstruction effect, and the super-resolution reconstruction can be performed on the image block with high texture complexity based on a complex super-resolution algorithm so as to ensure the super-resolution reconstruction effect.

Specifically, the super-resolution reconstruction may be performed on an image block with low texture complexity based on a linear interpolation algorithm, and the image block with high texture complexity is input into the super-resolution model and subjected to the super-resolution reconstruction by the super-resolution model.

That is, after the texture classification model performs texture classification on each image block obtained by dividing the low-resolution image, the super-resolution reconstruction may be performed on each image block whose texture classification result is low texture complexity based on a linear interpolation algorithm, and each image block whose texture classification result is high texture complexity is input to the super-resolution model and subjected to the super-resolution reconstruction by the super-resolution model.

In addition, a parallel mode can be adopted to perform super-resolution reconstruction on a plurality of image blocks with high texture complexity, so that the calculation speed is further improved.

In an illustrated embodiment, in order to ensure the joint calculation effect of the texture classification model and the hyper-segmentation model, the texture classification model and the hyper-segmentation model may be trained in advance until the joint loss function converges.

The joint loss function may be a weighted sum of a loss function corresponding to the texture classification model (hereinafter, referred to as a first loss function) and a loss function corresponding to the hyper-resolution model (hereinafter, referred to as a second loss function), and is represented by the following equation:

L＝w1×L1+w2×L2

where L denotes a joint loss function, L1 denotes a first loss function, w1 denotes a weight corresponding to the first loss function, L2 denotes a second loss function, and w2 denotes a weight corresponding to the second loss function.

In practical application, the hyper-segmentation model may be trained based on the second loss function, the learning rate of the hyper-segmentation model is reduced, the texture classification model is trained based on the joint loss function, and the texture classification model and the hyper-segmentation model are optimized based on the joint loss function until the joint loss function converges.

Correspondingly, the receiving end can divide the received low-resolution image into a plurality of image blocks, super-resolution reconstruction is carried out on the image blocks with lower texture complexity based on a linear interpolation algorithm, super-resolution reconstruction is carried out on the image blocks with higher texture complexity by a super-resolution model, and the super-resolution model can be a super-resolution model which is based on a double attention mechanism and comprises a sub-pixel convolution layer, so that the super-resolution reconstruction effect is improved.

In addition, before the sending end performs image resolution reduction processing on the image to be transmitted to obtain a low-resolution image, the sending end may also adjust the image resolution, the video frame rate and/or the video code rate of the video frame image based on a video quality control strategy to meet the requirements of network environment parameters such as network bandwidth and the like in the video data transmission process.

Exemplary Medium

Having described the method of the exemplary embodiment of the present disclosure, the medium for video data transmission of the exemplary embodiment of the present disclosure will be described next with reference to fig. 8.

In the present exemplary embodiment, the above-described method may be implemented by a program product, such as a portable compact disc read only memory (CD-ROM) and including program code, and may be executed on a device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium.

A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RE, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the C language or similar programming languages. The program code may execute entirely on the user computing device, partly on a remote computing device, or entirely on the remote computing device or server. In situations involving remote computing devices, the remote computing devices may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to external computing devices (e.g., through the internet using an internet service provider).

Exemplary devices

Having described the media of the exemplary embodiments of the present disclosure, an apparatus for video data transmission of the exemplary embodiments of the present disclosure will next be described with reference to fig. 9.

For the implementation process of the functions and functions of each module in the following apparatus, reference is made to the implementation process of the corresponding step in the foregoing method, which is not described herein again. For the device embodiments, they substantially correspond to the method embodiments, and therefore, reference may be made to the partial description of the method embodiments for relevant points.

Referring to fig. 9, fig. 9 schematically illustrates a video data transmission apparatus according to an embodiment of the present disclosure.

The video data transmission device can be applied to a sending end and comprises:

an obtaining module 901, configured to obtain a video frame image to be transmitted;

a determining module 902, configured to determine whether an image processing capability set obtained by negotiation with a receiving end includes a super-resolution capability;

a processing module 903, configured to perform, if the image processing capability set includes a super-resolution capability, processing to reduce the image resolution on the video frame image, so as to obtain a low-resolution image; wherein an image resolution of the low resolution image matches a processable resolution of the super resolution capability;

a sending module 904, configured to send the low-resolution image to the receiving end, so that the receiving end performs super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and determines the high-resolution image as the video frame image.

Optionally, the set of image processing capabilities is obtained by negotiating with the receiving end in the following manner:

sending a first group joining request to a server to perform video data transmission in a corresponding communication group, wherein the first group joining request comprises a first image processing capability set of the sender so that the server is based on the first image processing capability set and a second image processing capability set in a second group joining request sent by the receiver, determining the image processing capability set of the communication group, and returning the image processing capability set to the sender and the receiver;

and receiving the image processing capacity set returned by the server side, and determining the image processing capacity set as an image processing capacity set obtained by negotiation with the receiving end.

Optionally, the set of image processing capabilities of the communication group is an intersection of the first set of image processing capabilities and the second set of image processing capabilities.

Optionally, the apparatus further comprises:

an adjusting module 905, configured to adjust the image resolution, the video frame rate, and/or the video code rate of the video frame image based on a video quality control policy before performing image resolution reduction processing on the video frame image to obtain a low-resolution image.

Optionally, the adjusting module 905 is specifically configured to:

according to a first time period, periodically determining the CPU utilization rate based on the encoding time consumption when the historical video frame image is encoded, and comparing the CPU utilization rate with a high utilization rate threshold value and a low utilization rate threshold value respectively;

if the CPU utilization rate is greater than the high utilization rate threshold, reducing the image resolution and/or the video frame rate of the video frame image;

and if the CPU utilization rate is less than the low utilization rate threshold value, improving the image resolution and/or the video frame rate of the video frame image.

Optionally, the adjusting module 905 is specifically configured to:

and determining whether to perform frame dropping processing on the video frame image or not according to the maximum video code rate, the minimum video code rate, the target video code rate and the actual video code rate when the historical video frame image is encoded on the basis of a leaky bucket algorithm so as to adjust the video code rate of the video frame image.

Optionally, the adjusting module 905 is specifically configured to:

periodically determining an average quantization parameter when the historical video frame image is coded according to a second time period, and comparing the average quantization parameter with a high quantization parameter threshold and a low quantization parameter threshold respectively; wherein the high quantization parameter threshold and the low quantization parameter threshold correspond to a coding mode;

if the average quantization parameter is greater than the high quantization parameter threshold, reducing an image resolution and/or a video frame rate of the video frame image;

and if the average quantization parameter is smaller than the low quantization parameter threshold value, improving the image resolution and/or the video frame rate of the video frame image.

Optionally, reducing the image resolution of the video frame image comprises:

reducing an image resolution of the video frame image to 1/4 to 3/5 of an original image resolution;

increasing an image resolution of the video frame image, comprising:

the image resolution of the video frame image is increased to 5/3 to 4 times the original image resolution.

Optionally, reducing the video frame rate of the video frame image comprises:

2/3, reducing the video frame rate of the video frame image to the original video frame rate;

increasing a video frame rate of the video frame image, comprising:

and increasing the video frame rate of the video frame image to the highest video frame rate allowed under the resolution of the current image.

Optionally, the processable resolution of the super-resolution capability ranges from 360 × 360 to 1280 × 1280.

Referring to fig. 10, fig. 10 schematically illustrates another video data transmission apparatus according to an embodiment of the present disclosure.

The video data transmission device can be applied to a receiving end and comprises:

a receiving module 1001, configured to receive a low-resolution image sent by a sending end; the low-resolution image is obtained by processing the video frame image to be transmitted to reduce the image resolution under the condition that the sending end determines that the image processing capability set obtained by negotiation with the receiving end contains the super-resolution capability; wherein an image resolution of the low resolution image matches a processable resolution of the super resolution capability;

the first reconstruction module 1002 is configured to perform super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm, obtain a high-resolution image corresponding to the low-resolution image, and determine the high-resolution image as the video frame image.

Optionally, the image processing capability set is obtained by negotiating with the sending end in the following manner:

sending a second group joining request to a server to perform video data transmission in a corresponding communication group, wherein the second group joining request comprises a second image processing capability set of the receiving end so that the server is based on the second image processing capability set and a first image processing capability set in the first group joining request sent by the sending end, determining the image processing capability set of the communication group, and returning the image processing capability set to the sending end and the receiving end;

and receiving the image processing capacity set returned by the server, and determining the image processing capacity set as an image processing capacity set obtained by negotiation with the sending end.

Optionally, the first reconstructing module 1002 is specifically configured to:

inputting the low-resolution image into a hyper-resolution model so that the hyper-resolution model carries out super-resolution reconstruction on the low-resolution image to obtain a high-resolution image corresponding to the low-resolution image; the hyper-resolution model is used for performing super-resolution reconstruction on the image.

Optionally, the hyper-segmentation model is a double attention mechanism based hyper-segmentation model.

Optionally, the hyper-segmentation model comprises a sub-pixel convolution layer for rearranging pixels in the feature image.

Optionally, the apparatus further comprises:

a segmentation module 1003 for segmenting the low resolution image into a plurality of image blocks before inputting the low resolution image into the hyper-segmentation model;

the classification module 1004 is configured to input each image block into a texture classification model, so that the texture classification model performs texture classification on the image block; the texture classification model is obtained by carrying out supervised training based on an image sample; labeling the image sample with a label corresponding to the complexity of the texture;

the first reconstruction module 1002 is specifically configured to:

and inputting each image block with the texture classification result of high texture complexity into the hyper-resolution model.

Optionally, the apparatus further comprises:

and a second modeling block 1005, configured to perform super-resolution reconstruction on each image block with a texture classification result of low texture complexity based on a linear interpolation algorithm.

Optionally, the apparatus further comprises:

a training module 1006, configured to perform joint training on the texture classification model and the hyper-segmentation model until a joint loss function converges; wherein the joint loss function is a weighted sum of a first loss function and a second loss function; the first loss function is a loss function corresponding to the texture classification model; the second loss function is a loss function corresponding to the hyper-derivative model.

Exemplary computing device

Having described the methods, media, and apparatus of the exemplary embodiments of the present disclosure, a computing device for video data transmission of the exemplary embodiments of the present disclosure is described next with reference to fig. 11.

The computing device 1100 shown in fig. 11 is only one example and should not impose any limitations on the functionality or scope of use of embodiments of the disclosure.

As shown in fig. 11, computing device 1100 is embodied in the form of a general purpose computing device. Components of computing device 1100 may include, but are not limited to: the at least one processing unit 1101, the at least one storage unit 1102, and a bus 1103 connecting different system components (including the processing unit 1101 and the storage unit 1102).

The bus 1103 includes a data bus, a control bus, and an address bus.

The storage unit 1102 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)11021 and/or cache memory 11022, and may further include readable media in the form of non-volatile memory, such as Read Only Memory (ROM) 11023.

The memory unit 1102 may also include a program/utility 11025 having a set (at least one) of program modules 11024, such program modules 11024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Computing device 1100 can also communicate with one or more external devices 1104 (e.g., keyboard, pointing device, etc.).

Such communication may occur via input/output (I/O) interfaces 1105. Moreover, the computing device 1100 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 1106. As shown in fig. 9, the network adapter 1106 communicates with the other modules of the computing device 1100 over the bus 1103. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the computing device 1100, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

It should be noted that although in the above detailed description several units/modules or sub-units/modules of the video data transmission apparatus are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.

Further, while the operations of the disclosed methods are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

While the spirit and principles of the present disclosure have been described with reference to several particular embodiments, it is to be understood that the present disclosure is not limited to the particular embodiments disclosed, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A video data transmission method is applied to a sending end; the method comprises the following steps:

acquiring a video frame image to be transmitted;

2. The method of claim 1, wherein the set of image processing capabilities is obtained by negotiating with the receiving end in the following way:

and receiving the image processing capacity set returned by the server, and determining the image processing capacity set as an image processing capacity set obtained by negotiation with the receiving end.

3. The method of claim 1, prior to subjecting the video frame image to image resolution reduction processing to obtain a low resolution image, the method further comprising:

and adjusting the image resolution, the video frame rate and/or the video code rate of the video frame image based on the video quality control strategy.

4. A video real-time transmission method is applied to a receiving end; the method comprises the following steps:

5. The method of claim 4, wherein the super-resolution reconstruction of the low-resolution image based on the super-resolution algorithm to obtain the high-resolution image corresponding to the low-resolution image comprises:

6. The method of claim 5, prior to inputting the low resolution image into the hyper-differential model, the method further comprising:

dividing the low-resolution image into a plurality of image blocks;

inputting each image block into a texture classification model so that the texture classification model performs texture classification on the image block; the texture classification model is obtained by carrying out supervised training based on an image sample; labeling the image sample with a label corresponding to the complexity of the texture;

the inputting the low-resolution image into the hyper-segmentation model comprises:

and inputting each image block with the texture classification result of high texture complexity into the hyper-segmentation model.

7. A video data transmission device is applied to a sending end; the device comprises:

8. A video real-time transmission device is applied to a receiving end; the device comprises:

9. A medium having stored thereon a computer program which, when executed by a processor, carries out the method of any one of claims 1-3 or 4-6.

10. A computing device, comprising:

a processor;

a memory for storing a processor executable program;

wherein the processor implements the method of any of claims 1-3 or 4-6 by running the executable program.