CN114900717B

CN114900717B - Video data transmission method, device, medium and computing equipment

Info

Publication number: CN114900717B
Application number: CN202210524716.6A
Authority: CN
Inventors: 陶金亮; 阮良; 陈功; 陈丽
Original assignee: Hangzhou Netease Zhiqi Technology Co Ltd
Current assignee: Hangzhou Netease Zhiqi Technology Co Ltd
Priority date: 2022-05-13
Filing date: 2022-05-13
Publication date: 2023-09-26
Anticipated expiration: 2042-05-13
Also published as: CN114900717A

Abstract

The embodiment of the disclosure provides a video data transmission method, a device, a medium and a computing device, which are applied to a transmitting end; the method comprises the following steps: acquiring a video frame image to be transmitted; determining whether the image processing capability set obtained through negotiation with the receiving end contains super-resolution capability or not; if the image processing capability set contains super-resolution capability, processing the video frame image to reduce the image resolution to obtain a low-resolution image; wherein an image resolution of the low resolution image matches a processable resolution of the super resolution capability; and sending the low-resolution image to the receiving end, so that the receiving end carries out super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and determining the high-resolution image as the video frame image. The video data transmission method and the video data transmission device can improve video data transmission efficiency and ensure transmission quality of corresponding videos.

Description

Video data transmission method, device, medium and computing equipment

Technical Field

Embodiments of the present disclosure relate to the field of video technology, and more particularly, to a video data transmission method, apparatus, medium, and computing device.

Background

This section is intended to provide a background or context to the embodiments of the disclosure recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

With the development of computer technology and the gradual maturity of digital media technology, applications of video recording APP, video call APP and the like are becoming wider and wider. The user can record a video through the video recording APP, and share the video to other users; alternatively, the user may communicate in real time in video form (Real Time Communication, RTC) with other users through the video call APP. In this case, it is generally necessary for an electronic device used by a user to acquire video data and transmit the video data to an electronic device used by another user to display video content corresponding to the video data to the other user by the electronic device used by the other user.

In the video data transmission process, how to ensure the video quality is also a concern.

Disclosure of Invention

In this context, embodiments of the present disclosure desire to provide a.

In a first aspect of embodiments of the present disclosure, a video data transmission method is provided, where the method is applied to a transmitting end; the method comprises the following steps:

acquiring a video frame image to be transmitted;

determining whether the image processing capability set obtained through negotiation with the receiving end contains super-resolution capability or not;

if the image processing capability set contains super-resolution capability, processing the video frame image to reduce the image resolution to obtain a low-resolution image; wherein an image resolution of the low resolution image matches a processable resolution of the super resolution capability;

and sending the low-resolution image to the receiving end, so that the receiving end carries out super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and determining the high-resolution image as the video frame image.

In a second aspect of the embodiments of the present disclosure, a video real-time transmission method is provided, where the method is applied to a receiving end; the method comprises the following steps:

Receiving a low-resolution image sent by a sending end; the low-resolution image is obtained by the sending end through processing the image resolution reduction of the video frame image to be transmitted under the condition that the image processing capability set obtained through negotiation with the receiving end contains super-resolution capability; wherein an image resolution of the low resolution image matches a processable resolution of the super resolution capability;

and performing super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and determining the high-resolution image as the video frame image.

In a third aspect of the embodiments of the present disclosure, there is provided a video data transmission apparatus, the apparatus being applied to a transmitting end; the device comprises:

the acquisition module is used for acquiring a video frame image to be transmitted;

the determining module is used for determining whether the image processing capability set obtained through negotiation with the receiving end contains super-resolution capability or not;

the processing module is used for carrying out image resolution reduction processing on the video frame image to obtain a low-resolution image if the image processing capability set contains super-resolution capability; wherein an image resolution of the low resolution image matches a processable resolution of the super resolution capability;

And the sending module is used for sending the low-resolution image to the receiving end so that the receiving end carries out super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and determining the high-resolution image as the video frame image.

In a fourth aspect of the embodiments of the present disclosure, there is provided a video real-time transmission apparatus, the apparatus being applied to a receiving end; the device comprises:

the receiving module is used for receiving the low-resolution image sent by the sending end; the low-resolution image is obtained by the sending end through processing the image resolution reduction of the video frame image to be transmitted under the condition that the image processing capability set obtained through negotiation with the receiving end contains super-resolution capability; wherein an image resolution of the low resolution image matches a processable resolution of the super resolution capability;

the first reconstruction module is used for carrying out super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and determining the high-resolution image as the video frame image.

In a fifth aspect of embodiments of the present disclosure, there is provided a medium having stored thereon a computer program which, when executed by a processor, implements any of the video data transmission processing methods described above.

In a fourth aspect of embodiments of the present disclosure, there is provided a computing device comprising:

a processor;

a memory for storing a processor executable program;

wherein the processor implements any of the video data transmission methods described above by running the executable program.

According to the embodiment of the disclosure, in the video data transmission process, a transmitting end may firstly perform processing for reducing image resolution on a video frame image to be transmitted to obtain a low-resolution image under the condition that it is determined that an image processing capability set obtained by negotiation with a receiving end includes super-resolution capability, the image resolution of the low-resolution image is matched with the processable resolution of the super-resolution capability, the low-resolution image may be subsequently transmitted to the receiving end, and the receiving end may perform super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and determine the high-resolution image as the video frame image.

By adopting the mode, on one hand, the sending end can process the video frame image to be transmitted to reduce the image resolution and send the processed low-resolution image to the receiving end, so that the data volume of video data transmitted in the video data transmission process can be reduced, and the video data transmission efficiency is improved; on the other hand, the receiving end can reconstruct the received low-resolution image in a super-resolution mode, and the high-resolution image obtained by reconstruction is determined to be the video frame image sent by the sending end, so that the image quality of the video frame image obtained by the receiving end can be ensured, the transmission quality of corresponding videos is ensured, and the user experience is improved.

Drawings

The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, in which:

fig. 1 schematically illustrates a schematic diagram of an application scenario of video data transmission according to an embodiment of the present disclosure;

fig. 2 schematically illustrates a flowchart of a video data transmission method according to an embodiment of the present disclosure;

Fig. 3 schematically illustrates a flowchart of another video data transmission method according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart of an image processing capability set negotiation method according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a software architecture diagram for image processing capability set negotiation according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a software architecture diagram for video quality control according to an embodiment of the present disclosure;

FIG. 7 schematically illustrates a schematic diagram of a software architecture for super-resolution reconstruction according to an embodiment of the present disclosure;

FIG. 8 schematically illustrates a schematic diagram of a medium according to an embodiment of the present disclosure;

fig. 9 schematically illustrates a block diagram of a video data transmission apparatus according to an embodiment of the present disclosure;

fig. 10 schematically illustrates a block diagram of another video data transmission apparatus according to an embodiment of the present disclosure;

fig. 11 schematically illustrates a schematic diagram of a computing device according to an embodiment of the disclosure.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present disclosure will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are presented merely to enable one skilled in the art to better understand and practice the present disclosure and are not intended to limit the scope of the present disclosure in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Those skilled in the art will appreciate that embodiments of the present disclosure may be implemented as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the following forms, namely: complete hardware, complete software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

According to an embodiment of the disclosure, a video data transmission method, a device, a medium and a computing device are provided.

In this document, it should be understood that any number of elements in the drawings is for illustration and not limitation, and that any naming is used only for distinction and not for any limitation.

The principles and spirit of the present disclosure are explained in detail below with reference to several representative embodiments thereof.

Summary of The Invention

Taking two different users (assumed to be user 1 and user 2) to perform real-time communication in a video form through a video call APP as an example, on one hand, an electronic device used by user 1 may collect video data and send the video data to a service end corresponding to the video call APP, and the service end sends the video data to an electronic device used by user 2, so that the electronic device used by user 2 may display a video corresponding to the video data to user 2, so that user 2 may view a video shot by user 1. At this time, the electronic device used by the user 1 is a transmitting end in the video data transmission process, and the electronic device used by the user 2 is a receiving end in the video data transmission process.

On the other hand, the electronic device used by the user 2 collects video data and sends the video data to the service end corresponding to the video call APP, and the service end sends the video data to the electronic device used by the user 1 so that the electronic device used by the user 1 can display the video corresponding to the video data to the user 1, so that the user 1 can view the video shot by the user 2. At this time, the electronic device used by the user 2 is a transmitting end in the video data transmission process, and the electronic device used by the user 1 is a receiving end in the video data transmission process.

In practical applications, the video data comprises a series of images acquired for a photographed object that are sequential over a period of time; these images are commonly referred to as video frame images.

In the related art, due to limitation of network bandwidth, video data to be transmitted is generally processed at a transmitting end so as to reduce the data volume of the video data; for example, the image resolution of each video frame image in the video data is reduced to reduce the number of pixels in each video frame image.

However, reducing the image resolution of an image affects the image quality of the image. Therefore, the image quality of the video frame corresponding to the video data acquired by the receiving end is generally affected, and the user experience is poor.

In order to ensure the video frame image quality in the video data transmission process, the present disclosure provides a technical scheme for video data transmission. In the video data transmission process, the transmitting end may firstly perform processing for reducing the image resolution on the video frame image to be transmitted under the condition that the image processing capability set obtained by negotiating with the receiving end includes super resolution capability, so as to obtain a low resolution image, the image resolution of the low resolution image is matched with the processable resolution of the super resolution capability, the low resolution image may be subsequently transmitted to the receiving end, and the receiving end may perform super resolution reconstruction on the low resolution image based on a super resolution algorithm, so as to obtain a high resolution image corresponding to the low resolution image, and determine the high resolution image as the video frame image.

Having described the basic principles of the present disclosure, various non-limiting embodiments of the present disclosure are specifically described below.

Application scene overview

Referring to fig. 1, fig. 1 schematically illustrates a schematic diagram of an application scenario of video data transmission according to an embodiment of the present disclosure.

As shown in fig. 1, in an application scenario of video data transmission, a server may be included, and at least one client (e.g., clients 1-N) accessing the server through any type of wired or wireless network.

The server can be deployed on a server comprising an independent physical host or a server cluster formed by a plurality of independent physical hosts; or, the server may be a server built based on a cloud computing service.

The client may correspond to an APP installed by a user in an electronic device in which the user uses the APP; the electronic device may be a smart phone, a tablet computer, a notebook computer, a PC (Personal Computer, a personal computer), a palm computer (PDAs, personal Digital Assistants), a wearable device (for example, smart glasses, smart watches, etc.), an intelligent vehicle-mounted device, a game console, etc.

On the other hand, the electronic device may be equipped with shooting hardware such as an embedded camera and an external camera for capturing images or videos. In this case, the electronic device may call the shooting hardware, collect video for the shooting target, and serve as a transmitting end in the video data transmission process, and transmit the collected video data to other electronic devices serving as receiving ends in the video data transmission process through the server.

On the other hand, the electronic device may be used as a receiving end in the video data transmission process, and receive video data sent by other electronic devices as sending ends in the video data transmission process through the service end, and display the video corresponding to the video data to the user.

Exemplary method

A method for video data transmission according to an exemplary embodiment of the present disclosure is described below with reference to fig. 2 to 5 in conjunction with the application scenario of fig. 1. It should be noted that the above application scenario is only shown for the convenience of understanding the spirit and principles of the present disclosure, and the embodiments of the present disclosure are not limited in any way in this respect. Rather, embodiments of the present disclosure may be applied to any scenario where applicable.

Referring to fig. 2, fig. 2 schematically shows a flowchart of a video data transmission method according to an embodiment of the present disclosure.

The video data transmission method can be applied to a transmitting end in the video data transmission process; the video data transmission method may include the steps of:

step 201: and acquiring a video frame image to be transmitted.

In this embodiment, the transmitting end may acquire a video frame image in the video data to be transmitted.

In practical applications, the video data to be transmitted may be video data autonomously acquired by the transmitting end, or may be video data acquired by the transmitting end from other electronic devices. The video data to be transmitted may be real-time video data (e.g., video data in real-time communication in the form of video) or may be video data obtained in advance. The present disclosure is not limited in this regard.

Step 202: and determining whether the image processing capability set obtained through negotiation with the receiving end contains super-resolution capability.

In this embodiment, before processing a video frame image in video data to be transmitted, the sending end may first determine whether an image processing capability set obtained by negotiating with a receiving end in a video data transmission process includes super resolution capability.

In practical application, the super resolution is to improve the image resolution of the image by a hardware or software method; super-resolution reconstruction is the process of obtaining a high-resolution image through a low-resolution image.

Step 203: if the image processing capability set contains super-resolution capability, processing the video frame image to reduce the image resolution to obtain a low-resolution image; wherein the resolution of the low resolution image matches the processable resolution of the super resolution capability.

In this embodiment, if the set of image processing capabilities includes super-resolution capabilities, the receiving end may be considered to have super-resolution capabilities, that is, the receiving end may reconstruct a low-resolution image super-resolution into a high-resolution image. In this case, the transmitting terminal may perform processing for reducing the image resolution on the video frame image to obtain a low-resolution image.

In order to ensure that the receiving end can reconstruct the super-resolution of the low-resolution image, the image resolution of the low-resolution image can be matched with the processable resolution of the super-resolution capability.

If the image processing capability set does not include super-resolution capability, the receiving end may not have super-resolution capability, that is, the receiving end cannot reconstruct the super-resolution of the low-resolution image into the high-resolution image. In this case, the transmitting end does not need to perform processing for reducing the resolution of the video frame image.

Step 204: and sending the low-resolution image to the receiving end, so that the receiving end carries out super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and determining the high-resolution image as the video frame image.

In this embodiment, when the transmitting terminal obtains the low resolution image, the transmitting terminal may transmit the low resolution image to the receiving terminal.

When the receiving end receives the low-resolution image sent by the sending end, the receiving end can reconstruct the low-resolution image in a super-resolution mode based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and the high-resolution image is determined to be the video frame image, so that video display can be performed based on the video frame image.

Accordingly, with reference to fig. 3, fig. 3 schematically shows a flowchart of another video data transmission method according to an embodiment of the present disclosure, in accordance with the video data transmission method shown in fig. 2.

The video data transmission method can be applied to a receiving end in the video data transmission process; the video data transmission method may include the steps of:

step 301: receiving a low-resolution image sent by a sending end; the low-resolution image is obtained by the sending end through processing the image resolution reduction of the video frame image to be transmitted under the condition that the image processing capability set obtained through negotiation with the receiving end contains super-resolution capability; wherein the resolution of the low resolution image matches the processable resolution of the super resolution capability.

Step 302: and performing super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and determining the high-resolution image as the video frame image.

Specific implementations of steps 301-302 may refer to steps 201-204, and are not described herein.

According to the embodiment shown in fig. 2 and fig. 3, in the video data transmission process, the transmitting end may perform the process of reducing the image resolution on the video frame image to be transmitted to obtain a low-resolution image, where the image resolution of the low-resolution image matches the processable resolution of the super-resolution capability, and then may send the low-resolution image to the receiving end, and the receiving end may perform the super-resolution reconstruction on the low-resolution image based on the super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and determine the high-resolution image as the video frame image.

In addition, before the transmitting end performs the process of reducing the image resolution of the video frame image to be transmitted to obtain the low-resolution image, the transmitting end can adjust the image resolution, the video frame rate and/or the video code rate of the video frame image based on the video quality control strategy.

The following describes a video data transmission method as shown in fig. 2 and 3 in terms of negotiation from the set of image processing capabilities, video quality control, processing to reduce image resolution, super resolution reconstruction.

In an embodiment, the sending end and the receiving end may obtain the set of image processing capabilities through negotiation in advance, so as to avoid influencing the video data transmission efficiency by negotiating the set of image processing capabilities in the video data transmission process.

Specifically, the sending end and the receiving end may report the image processing capability sets thereof to the server end, respectively, and the server end determines the image processing capability set jointly satisfied by the sending end and the receiving end based on the image processing capability set of the sending end (hereinafter referred to as a first image processing capability set) and the image processing capability set of the receiving end (hereinafter referred to as a second image processing capability set), and issues the determined image processing capability set to the sending end, so that the sending end determines the image processing capability set issued by the server end as the image processing capability set obtained by negotiating with the receiving end.

In addition, the server may further issue the determined set of image processing capabilities to the receiving end, so that when the receiving end is used as a new sending end to transmit video data to the sending end, the set of image processing capabilities issued by the server is directly determined as the set of image processing capabilities obtained by negotiating with the sending end.

Referring to fig. 4, fig. 4 schematically illustrates a flowchart of an image processing capability set negotiation method according to an embodiment of the present disclosure.

The image processing capability set negotiation method can be realized through data interaction among the sending end, the receiving end and the server; the image processing capability set negotiation method may include the steps of:

step 401: the sending terminal sends a first group joining request to the server; wherein the first group joining request includes a first set of image processing capabilities of the sender.

Step 402: the receiving end sends a second group joining request to the server; wherein the second group joining request includes a second set of image processing capabilities of the receiving end.

In this embodiment, the sending end may send a group joining request (hereinafter referred to as a first group joining request) to the service end, so as to perform video data transmission in the corresponding communication group.

Similarly, the receiving end may send a group joining request (hereinafter referred to as a second group joining request) to the server end, so as to perform video data transmission in the corresponding communication group.

And under the condition that the sending end and the receiving end access the same communication group, the sending end and the receiving end can perform data interaction, namely the sending end can transmit video data to the receiving end.

It should be noted that, there is no sequence in the time sequence between the step 401 and the step 402. That is, the transmitting end may first send the first group joining request to the server, or the receiving end may first send the second group joining request to the server.

Step 403: the server determines the set of image processing capabilities of the communication group based on the first set of image processing capabilities and the second set of image processing capabilities.

In this embodiment, the server may determine the set of image processing capabilities of the communication group in which the transmitting end and the receiving end are located based on the first set of image processing capabilities and the second set of image processing capabilities.

In one embodiment, the server may determine an intersection of the first set of image processing capabilities and the second set of image processing capabilities as the set of image processing capabilities of the communication group. That is, the image processing capability in the image processing capability set of the communication group is included in both the first image processing capability set and the second image processing capability set, so that the image processing capability set can be ensured to be matched with both the transmitting end and the receiving end.

Step 404: and the server side sends the image processing capability set to the sending side.

Step 405: and the sending end determines the image processing capability set as the image processing capability set obtained by negotiation with the receiving end.

Step 406: and the server side sends the image processing capability set to the receiving side.

Step 407: and the receiving end determines the image processing capability set as the image processing capability set obtained by negotiation with the sending end.

In this embodiment, when the server determines the set of image processing capabilities of the communication group, the server may send the set of image processing capabilities to the sender, so that the sender may determine the set of image processing capabilities as the set of image processing capabilities obtained by negotiating with the receiver.

In addition, the server may further send the set of image processing capabilities to the receiving end, so that the receiving end determines the set of image processing capabilities as the set of image processing capabilities obtained by negotiating with the sending end.

It should be noted that, there is no sequence in the time sequence between the step 404 and the step 406. That is, the server may first transmit the set of image processing capabilities to the transmitting end, or the server may first transmit the set of image processing capabilities to the receiving end.

Referring to fig. 5, fig. 5 schematically illustrates a software architecture diagram for image processing capability set negotiation according to an embodiment of the present disclosure.

In practical applications, for the communication group, the server may first create the communication group, and set a default set of image processing capabilities for the communication group.

The client joining the communication group may be the transmitting end or the receiving end.

When the server receives a request of joining the communication group, which is sent by the first client, the server may determine whether the request of joining the communication group carries an image processing capability set corresponding to the client, and if so, the server may use the image processing capability set corresponding to the client to cover the default image processing capability set, that is, update the image processing capability set of the communication group to the image processing capability set corresponding to the client.

When the server receives a group joining request sent by a subsequent client and requesting to join the communication group, it can first determine whether the group joining request carries an image processing capability set corresponding to the client, and if so, then determine whether the image processing capability set corresponding to the client is greater than the current image processing capability set of the communication group. If the set of image processing capabilities corresponding to the client is greater than the current set of image processing capabilities of the communication group, the server may send the current set of image processing capabilities to the client without updating the set of image processing capabilities of the communication group, so that the client determines the current set of image processing capabilities as the negotiated set of image processing capabilities. If the set of image processing capabilities corresponding to the client is smaller than the current set of image processing capabilities of the communication group, the server may take an intersection of the set of image processing capabilities corresponding to the client and the current set of image processing capabilities of the communication group, update the set of image processing capabilities of the communication group to the intersection, and send a new set of image processing capabilities to each client in the communication group, so that each client determines the new set of image processing capabilities as a set of image processing capabilities obtained by negotiation.

The transmitting end may perform processing for reducing the image resolution on the video frame image to obtain a low-resolution image when determining that the image processing capability set obtained by negotiating with the receiving end includes super-resolution capability.

Network environment parameters such as network bandwidth are typically different for different video data transmission processes. In order to better meet the requirements of actual network environment parameters for video data transmission, the transmitting end may adjust the image resolution, the video frame rate and/or the video code rate of the video frame image based on a video quality control policy before performing image resolution reduction processing on the video frame image to obtain a low resolution image. For example, in the case of a smaller network bandwidth, the image resolution, the video frame rate and/or the video code rate of the video frame image may be reduced based on the video quality control policy before the image resolution of the video frame image is reduced, so as to reduce bandwidth occupation in the transmission process; under the condition of large network bandwidth, the image resolution, the video frame rate and/or the video code rate of the video frame image can be improved based on the video quality control strategy before the image resolution of the video frame image is reduced, so that the network bandwidth is fully utilized.

Referring to fig. 6, fig. 6 schematically illustrates a software architecture diagram for video quality control according to an embodiment of the present disclosure.

Specifically, in one example, the sending end may determine, through an overhauseframedetector module, a CPU (Central Processing Unit ) usage periodically based on encoding time consumed in encoding the historical video frame image according to a certain time period (hereinafter referred to as a first time period), and compare the determined CPU usage with a high usage threshold and a low usage threshold, respectively.

If the CPU usage rate is greater than the high usage rate threshold, it indicates that more CPU resources are occupied for encoding the video frame image before, and the CPU is affected, so that the transmitting end can reduce the image resolution and/or the video frame rate of the video frame image to reduce the CPU usage rate, and then perform processing for reducing the image resolution on the video frame image to obtain a low resolution image.

If the CPU usage rate is smaller than the low usage rate threshold, it indicates that less CPU resources are occupied for encoding the video frame image before, and the CPU is not fully utilized, so the transmitting end can increase the image resolution and/or the video frame rate of the video frame image to increase the CPU usage rate, and then perform processing for reducing the image resolution on the video frame image to obtain a low resolution image.

The specific values of the first time period, the high usage threshold value, and the low usage threshold value may be preset by a technician, or may be default values, which are not limited in this disclosure.

In another example, the sending end may determine, through a MediaOptimization module, whether to perform frame loss processing on the video frame image according to a maximum video code rate, a minimum video code rate, a target video code rate, and an actual video code rate when encoding the historical video frame image based on a leaky bucket algorithm, so as to adjust the video code rate of the video frame image, and then perform processing for reducing the image resolution on the video frame image under the condition that the frame loss processing is not required to be performed on the video frame image, so as to obtain a low-resolution image.

The maximum video code rate represents the maximum video code rate allowed in the current video data transmission process, the minimum video code rate represents the minimum video code rate allowed in the current video data transmission process, and the target video code rate represents the video code rate required in the current video data transmission process. The specific values of the maximum video rate, the minimum video rate and the target video rate may be preset by a technician, or may be default values, which are not limited in the present disclosure.

In another example, the transmitting end may periodically determine an average quantization parameter (Quantizer Parameter, QP) when encoding the historical video frame image by using a QualityScaler module according to a certain period of time (hereinafter referred to as a second period of time), and compare the determined average quantization parameter with a high quantization parameter threshold and a low quantization parameter threshold, respectively.

In practical application, the smaller the value of the quantization parameter is, the finer the quantization is, the higher the image quality is, and the higher the video code rate is; the larger the value of the quantization parameter, the coarser the quantization, the lower the image quality and the lower the video rate.

The high quantization parameter threshold and the low quantization parameter threshold correspond to an encoding scheme used when encoding a historical video frame image. The high quantization parameter threshold and the low quantization parameter threshold corresponding to different coding modes are also generally different; for example, the high quantization parameter threshold corresponding to the h.264 standard coding scheme may be 37, and the low quantization parameter threshold corresponding to the h.264 standard coding scheme may be 24.

If the average quantization parameter is greater than the high quantization parameter threshold, the transmitting end may first reduce the image resolution and/or the video frame rate of the video frame image, and then perform the image resolution reduction process on the video frame image to obtain a low resolution image.

If the average quantization parameter is smaller than the low quantization parameter threshold, the transmitting end may first increase the image resolution and/or the video frame rate of the video frame image, and then perform the process of reducing the image resolution on the video frame image to obtain a low resolution image.

The specific values of the second time period, the high quantization parameter threshold value, and the low quantization parameter threshold value may be preset by a technician, or may be default values, which are not limited in the present disclosure.

For the above-mentioned video frame image, when the image resolution of the video frame image is reduced, the image resolution of the video frame image may be reduced to 1/4 to 3/5 of the original image resolution, and when the image resolution of the video frame image is increased, the image resolution of the video frame image may be increased to 5/3 to 4 times the original image resolution.

The specific numerical values adopted in the process of reducing and improving the image resolution can be selected according to manual experience and actual requirements.

Similarly, when the video frame rate of the video frame image is reduced, the video frame rate of the video frame image may be reduced to 2/3 of the original video frame rate, and when the video frame rate of the video frame image is increased, the video frame rate of the video frame image may be increased to the highest video frame rate allowed at the current image resolution.

The specific values adopted in the process of reducing and improving the video frame rate can be selected according to manual experience and actual requirements.

The transmitting end may perform processing for reducing the image resolution on the video frame image to obtain a low-resolution image, and transmit the low-resolution image to the receiving end when determining that the image processing capability set obtained by negotiating with the receiving end includes super-resolution capability.

In one embodiment shown, the range of processable resolutions of the above super resolution capability may be determined to be 360 x 360 to 1280 x 1280 according to actual needs and technical limitations.

In practical application, 360×360 indicates that the number of pixels in the horizontal direction and the vertical direction of the image is 360, and 1280×1280 indicates that the number of pixels in the horizontal direction and the vertical direction of the image is 1280.

For a processable resolution matching the super resolution capability, the number of pixels in the horizontal direction of the low resolution image may be in the range of 360-1280, and the number of pixels in the vertical direction of the low resolution image may be in the range of 360-1280.

Referring to fig. 7, fig. 7 schematically illustrates a schematic diagram of a software architecture for super-resolution reconstruction according to an embodiment of the present disclosure.

In the illustrated embodiment, when the receiving end performs super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, the low-resolution image may be specifically input into a super-resolution model for performing super-resolution reconstruction on the image, so that the super-resolution model performs super-resolution reconstruction on the low-resolution image to obtain a high-resolution image corresponding to the low-resolution image.

In one embodiment shown, the above-described superscore model may be a double-attention mechanism based superscore model. That is, the hyper-segmentation model may perform feature extraction on the input image based on a dual-attention mechanism.

In practical applications, attention is drawn to the fact that a small amount of important information is screened out from a large amount of information, and focused on the important information, while most of the unimportant information is ignored.

The dual attention mechanism described above may be a spatial attention mechanism and a channel attention mechanism.

For an input image, spatial attention may help focus on important information in the image.

For the convolution layer in the superdivision model, after the image is input into the convolution layer for feature extraction, the number of the images output by the convolution layer is equal to the number of convolution channels of the convolution layer. That is, each convolution channel of the convolution layer outputs feature data obtained by feature extraction of the image through the convolution channel. In this case, channel attention may help focus on the convolved channel that outputs important information.

Through super-resolution reconstruction of the image based on the super-resolution model of the dual-attention mechanism, the image obtained by super-resolution reconstruction can have better details and edges.

In one embodiment shown, the above-described superdivision model may include sub-pixel convolution layers for rearranging pixels in the feature image to take advantage of the feature data output by each convolution channel of the convolution layers in the superdivision model.

In the imaging process of the camera, the obtained image data is obtained by discretizing the image, and each pixel on the imaging surface only represents nearby colors due to the capacity limitation of the photosensitive element. For example, there is a 4.5um spacing between the pixels on the two sensory elements, which are macroscopically connected together, and there are countless tiny things in between microscopically, which are pixels between the two actual physical pixels, referred to as "sub-pixels".

The sub-pixels should be present in nature, but they are only detected by the lack of a smaller sensor and therefore can only be approximated in software.

Depending on the interpolation between two adjacent pixels, the sub-pixel accuracy can be adjusted, for example, by a quarter, i.e., each pixel is treated as four pixels in the lateral and longitudinal directions. Thus, mapping from small rectangle to large rectangle can be realized by a sub-pixel interpolation method, so that resolution is improved.

In one embodiment, the receiving end may divide the low resolution image into a plurality of image blocks before inputting the low resolution image into the super-resolution model, and classify the plurality of image blocks according to the complexity of the image texture, so as to reconstruct the image blocks with high texture complexity and the image blocks with low texture complexity with different super resolutions.

Specifically, each image block may be input into a texture classification model such that the texture classification model texture classifies each image block. The texture classification model can be obtained by performing supervised training based on a plurality of image samples; the plurality of image samples may be labeled with labels corresponding to the complexity of the texture, respectively.

In practical applications, the image may include a shooting target (also referred to as detail) such as a person or an object and a background. For a plurality of image blocks obtained by dividing an image, the image texture of the image block containing details is complex, namely the image block with high texture complexity; the image texture of the image block containing no detail but only the background is simpler, i.e. the image block with low texture complexity.

For image blocks with low texture complexity, the effect of performing super-resolution reconstruction on the image blocks based on a complex super-resolution algorithm and a simple super-resolution algorithm is not great. Therefore, the super-resolution reconstruction can be performed on the image blocks with low texture complexity based on a simple super-resolution algorithm so as to improve the calculation speed on the premise of ensuring the effect of the super-resolution reconstruction, and the super-resolution reconstruction can be performed on the image blocks with high texture complexity based on a complex super-resolution algorithm so as to ensure the effect of the super-resolution reconstruction.

Specifically, the super-resolution reconstruction can be performed on the image blocks with low texture complexity based on a linear interpolation algorithm, while the super-resolution reconstruction can be performed on the image blocks with high texture complexity by inputting the image blocks with high texture complexity into the super-division model.

That is, after the texture classification model classifies the image blocks obtained by dividing the low-resolution image, the super-resolution reconstruction may be performed on the image blocks having a low texture complexity as a result of the texture classification based on a linear interpolation algorithm, and the super-resolution reconstruction may be performed on the image blocks having a high texture complexity as a result of the texture classification by inputting the image blocks into the super-resolution model.

In addition, a parallel mode can be adopted to reconstruct super-resolution of a plurality of image blocks with high texture complexity so as to further improve the calculation speed.

In one embodiment, to ensure the joint calculation effect of the texture classification model and the superscore model, the texture classification model and the superscore model may be trained in advance until the joint loss function converges.

The joint loss function may be a weighted sum of a loss function (hereinafter referred to as a first loss function) corresponding to the texture classification model and a loss function (hereinafter referred to as a second loss function) corresponding to the super-division model, as shown in the following equation:

L＝w1×L1+w2×L2

Where L represents a joint loss function, L1 represents a first loss function, w1 represents a weight corresponding to the first loss function, L2 represents a second loss function, and w2 represents a weight corresponding to the second loss function.

In practical application, the super-division model can be trained based on the second loss function, the learning rate of the super-division model is reduced, the texture classification model is trained based on the joint loss function, and finally the texture classification model and the super-division model are optimized based on the joint loss function until the joint loss function converges.

Correspondingly, the receiving end can divide the received low-resolution image into a plurality of image blocks, super-resolution reconstruction is carried out on the image blocks with lower texture complexity based on a linear interpolation algorithm, super-resolution reconstruction is carried out on the image blocks with higher texture complexity by a super-resolution model, and the super-resolution model can be a super-resolution model which is based on a dual-attention mechanism and comprises a sub-pixel convolution layer so as to improve the effect of super-resolution reconstruction.

In addition, before the transmitting end performs the process of reducing the image resolution to be transmitted to obtain the low-resolution image, the transmitting end can adjust the image resolution, the video frame rate and/or the video code rate of the video frame image based on the video quality control strategy so as to meet the requirements of network environment parameters such as network bandwidth and the like in the video data transmission process.

Exemplary Medium

Having described the method of the exemplary embodiments of the present disclosure, next, a medium for video data transmission of the exemplary embodiments of the present disclosure will be described with reference to fig. 8.

In the present exemplary embodiment, the above-described method may be implemented by a program product, such as a portable compact disc read only memory (CD-ROM) and including program code, and may be run on a device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium.

The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RE, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the C programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

Exemplary apparatus

Having described the medium of the exemplary embodiments of the present disclosure, next, an apparatus for video data transmission of the exemplary embodiments of the present disclosure will be described with reference to fig. 9.

The implementation process of the functions and roles of each module in the following apparatus specifically refers to the implementation process of the corresponding steps in the above method, which is not described herein again. For the device embodiments, they essentially correspond to the method embodiments, so that reference is made to the description of the method embodiments for relevant points.

Referring to fig. 9, fig. 9 schematically illustrates a video data transmission apparatus according to an embodiment of the present disclosure.

The video data transmission device can be applied to a transmitting end, and comprises:

an acquisition module 901, configured to acquire a video frame image to be transmitted;

a determining module 902, configured to determine whether the set of image processing capabilities obtained by negotiation with the receiving end includes super resolution capabilities;

a processing module 903, configured to, if the image processing capability set includes super resolution capability, perform processing for reducing image resolution on the video frame image, so as to obtain a low resolution image; wherein an image resolution of the low resolution image matches a processable resolution of the super resolution capability;

The sending module 904 is configured to send the low resolution image to the receiving end, so that the receiving end performs super resolution reconstruction on the low resolution image based on a super resolution algorithm, obtains a high resolution image corresponding to the low resolution image, and determines the high resolution image as the video frame image.

Optionally, the set of image processing capabilities is obtained by negotiating with the receiving end in the following manner:

transmitting a first group joining request to a server to transmit video data in a corresponding communication group, wherein the first group joining request comprises a first image processing capability set of the transmitting end, so that the server determines the image processing capability set of the communication group based on the first image processing capability set and a second image processing capability set in a second group joining request transmitted by the receiving end, and returns the image processing capability set to the transmitting end and the receiving end;

and receiving the image processing capability set returned by the server, and determining the image processing capability set as an image processing capability set obtained by negotiation with the receiving end.

Optionally, the set of image processing capabilities of the communication group is an intersection of the first set of image processing capabilities and the second set of image processing capabilities.

Optionally, the apparatus further comprises:

the adjusting module 905 is configured to adjust, based on a video quality control policy, an image resolution, a video frame rate, and/or a video code rate of the video frame image before performing the processing of reducing the image resolution on the video frame image to obtain a low resolution image.

Optionally, the adjustment module 905 is specifically configured to:

periodically determining CPU utilization rate based on encoding time consumption when encoding the historical video frame image according to a first time period, and comparing the CPU utilization rate with a high utilization rate threshold value and a low utilization rate threshold value respectively;

if the CPU usage is greater than the high usage threshold, reducing the image resolution and/or video frame rate of the video frame image;

and if the CPU utilization rate is smaller than the low utilization rate threshold value, improving the image resolution and/or the video frame rate of the video frame image.

Optionally, the adjustment module 905 is specifically configured to:

based on a leaky bucket algorithm, determining whether to perform frame loss processing on the video frame image according to a maximum video code rate, a minimum video code rate, a target video code rate and an actual video code rate when the historical video frame image is encoded so as to adjust the video code rate of the video frame image.

Optionally, the adjustment module 905 is specifically configured to:

periodically determining an average quantization parameter when encoding the historical video frame image according to a second time period, and comparing the average quantization parameter with a high quantization parameter threshold and a low quantization parameter threshold respectively; wherein the high quantization parameter threshold and the low quantization parameter threshold correspond to a coding mode;

if the average quantization parameter is greater than the high quantization parameter threshold, reducing the image resolution and/or video frame rate of the video frame image;

and if the average quantization parameter is smaller than the low quantization parameter threshold, increasing the image resolution and/or video frame rate of the video frame image.

Optionally, reducing the image resolution of the video frame image includes:

reducing the image resolution of the video frame image to 1/4 to 3/5 of the original image resolution;

improving the image resolution of the video frame image, comprising:

and improving the image resolution of the video frame image to 5/3 to 4 times of the original image resolution.

Optionally, reducing the video frame rate of the video frame image includes:

reducing the video frame rate of the video frame image to 2/3 of the original video frame rate;

Increasing a video frame rate of the video frame image, comprising:

and increasing the video frame rate of the video frame image to the highest video frame rate allowed under the current image resolution.

Alternatively, the super-resolution capability may have a processable resolution in the range of 360×360 to 1280×1280.

Referring to fig. 10, fig. 10 schematically illustrates another video data transmission apparatus according to an embodiment of the present disclosure.

The video data transmission device can be applied to a receiving end, and comprises:

a receiving module 1001, configured to receive a low resolution image sent by a sending end; the low-resolution image is obtained by the sending end through processing the image resolution reduction of the video frame image to be transmitted under the condition that the image processing capability set obtained through negotiation with the receiving end contains super-resolution capability; wherein an image resolution of the low resolution image matches a processable resolution of the super resolution capability;

the first reconstruction module 1002 is configured to perform super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm, obtain a high-resolution image corresponding to the low-resolution image, and determine the high-resolution image as the video frame image.

Optionally, the set of image processing capabilities is obtained by negotiating with the sender in the following manner:

transmitting a second joining group request to a server to transmit video data in a corresponding communication group, wherein the second joining group request comprises a second image processing capability set of the receiving end, so that the server determines the image processing capability set of the communication group based on the second image processing capability set and a first image processing capability set in a first joining group request transmitted by the transmitting end, and returns the image processing capability set to the transmitting end and the receiving end;

and receiving the image processing capability set returned by the server, and determining the image processing capability set as an image processing capability set obtained by negotiation with the sender.

Optionally, the first reconstruction module 1002 is specifically configured to:

inputting the low-resolution image into a super-resolution model, so that the super-resolution model carries out super-resolution reconstruction on the low-resolution image to obtain a high-resolution image corresponding to the low-resolution image; the super-resolution model is used for reconstructing the image in super-resolution.

Optionally, the superdivision model is a superdivision model based on a dual-attention mechanism.

Optionally, the hyper-segmentation model comprises a sub-pixel convolution layer for rearranging pixels in the feature image.

Optionally, the apparatus further comprises:

a segmentation module 1003, configured to segment the low resolution image into a plurality of image blocks before inputting the low resolution image into the super-division model;

a classification module 1004, configured to input each image block into a texture classification model, so that the texture classification model performs texture classification on the image block; the texture classification model is obtained by performing supervised training based on an image sample; the image sample is marked with a label corresponding to the texture complexity;

the first reconstruction module 1002 is specifically configured to:

and inputting each image block with high texture complexity as a texture classification result into the super-division model.

Optionally, the apparatus further comprises:

the second modeling block 1005 is configured to reconstruct each image block with low texture complexity based on the linear interpolation algorithm.

Optionally, the apparatus further comprises:

a training module 1006, configured to perform joint training on the texture classification model and the superdivision model until a joint loss function converges; wherein the joint loss function is a weighted sum of the first loss function and the second loss function; the first loss function is a loss function corresponding to the texture classification model; the second loss function is a loss function corresponding to the superdivision model.

Exemplary computing device

Having described the methods, media, and apparatus of exemplary embodiments of the present disclosure, a computing device for video data transmission of exemplary embodiments of the present disclosure is next described with reference to fig. 11.

The computing device 1100 shown in fig. 11 is merely an example and should not be taken as limiting the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 11, computing device 1100 is in the form of a general purpose computing device. Components of computing device 1100 may include, but are not limited to: the at least one processing unit 1101, the at least one memory unit 1102, and a bus 1103 that connects the various system components (including the processing unit 1101 and the memory unit 1102).

The bus 1103 includes a data bus, a control bus, and an address bus.

The storage unit 1102 may include readable media in the form of volatile memory, such as Random Access Memory (RAM) 11021 and/or cache memory 11022, and may further include readable media in the form of nonvolatile memory, such as Read Only Memory (ROM) 11023.

The storage unit 1102 may also include a program/utility 11025 having a set (at least one) of program modules 11024, such program modules 11024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

Computing device 1100 can also communicate with one or more external devices 1104 (e.g., keyboard, pointing device, etc.).

Such communication may occur through an input/output (I/O) interface 1105. Moreover, computing device 1100 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through a network adapter 1106. As shown in fig. 9, network adapter 1106 communicates with other modules of computing device 1100 over bus 1103. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with computing device 1100, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

It should be noted that although in the above detailed description several units/modules or sub-units/modules of a video data transmission device are mentioned, such a division is only exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.

Furthermore, although the operations of the methods of the present disclosure are depicted in the drawings in a particular order, this is not required to or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

While the spirit and principles of the present disclosure have been described with reference to several particular embodiments, it is to be understood that this disclosure is not limited to the particular embodiments disclosed nor does it imply that features in these aspects are not to be combined to benefit from this division, which is done for convenience of description only. The disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A video data transmission method is applied to a transmitting end; the method comprises the following steps:

acquiring a video frame image to be transmitted;

Transmitting the low-resolution image to the receiving end, so that the receiving end carries out super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and determining the high-resolution image as the video frame image;

wherein the set of image processing capabilities is obtained by negotiating with the receiving end by:

2. The method of claim 1, the set of image processing capabilities of the communication group being an intersection of the first set of image processing capabilities and the second set of image processing capabilities.

3. The method of claim 1, further comprising, prior to subjecting the video frame image to the image resolution reduction process, obtaining a low resolution image:

and adjusting the image resolution, the video frame rate and/or the video code rate of the video frame image based on a video quality control strategy.

4. The method of claim 3, the adjusting the image resolution, video frame rate, and/or video code rate of the video frame images based on a video quality control policy, comprising:

5. The method of claim 3, the adjusting the image resolution, video frame rate, and/or video code rate of the video frame images based on a video quality control policy, comprising:

6. The method of claim 3, the adjusting the image resolution, video frame rate, and/or video code rate of the video frame images based on a video quality control policy, comprising:

7. The method of claim 4 or 6, reducing an image resolution of the video frame image, comprising:

improving the image resolution of the video frame image, comprising:

8. The method of claim 4 or 6, reducing a video frame rate of the video frame image, comprising:

increasing a video frame rate of the video frame image, comprising:

9. The method of claim 1, the super-resolution capability having a processable resolution in the range of 360 x 360 to 1280 x 1280.

10. A video real-time transmission method is applied to a receiving end; the method comprises the following steps:

Performing super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and determining the high-resolution image as the video frame image;

wherein the set of image processing capabilities is obtained by negotiating with the sender in the following manner:

11. The method of claim 10, the set of image processing capabilities of the communication group being an intersection of the first set of image processing capabilities and the second set of image processing capabilities.

12. The method according to claim 10, wherein the performing super-resolution reconstruction on the low-resolution image based on the super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image includes:

13. The method of claim 12, the superdivision model being a dual-attention mechanism based superdivision model.

14. The method of claim 12, the hyper-segmentation model comprising a sub-pixel convolution layer for rearranging pixels in a feature image.

15. The method of claim 12, prior to inputting the low resolution image into the hyper-segmentation model, the method further comprising:

dividing the low resolution image into a plurality of image blocks;

inputting each image block into a texture classification model so that the texture classification model classifies the image block in texture; the texture classification model is obtained by performing supervised training based on an image sample; the image sample is marked with a label corresponding to the texture complexity;

The inputting the low resolution image into the super-resolution model includes:

16. The method of claim 15, the method further comprising:

and (3) carrying out super-resolution reconstruction on each image block with low texture complexity as a texture classification result based on a linear interpolation algorithm.

17. The method of claim 15, the method further comprising:

performing joint training on the texture classification model and the superdivision model until a joint loss function converges; wherein the joint loss function is a weighted sum of the first loss function and the second loss function; the first loss function is a loss function corresponding to the texture classification model; the second loss function is a loss function corresponding to the superdivision model.

18. A video data transmission apparatus, the apparatus being applied to a transmitting end; the device comprises:

The sending module is used for sending the low-resolution image to the receiving end so that the receiving end carries out super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and the high-resolution image is determined to be the video frame image;

19. The device of claim 18, the set of image processing capabilities of the communication group being an intersection of the first set of image processing capabilities and the second set of image processing capabilities.

20. The apparatus of claim 18, the apparatus further comprising:

and the adjusting module is used for adjusting the image resolution, the video frame rate and/or the video code rate of the video frame image based on a video quality control strategy before the image resolution of the video frame image is reduced to obtain a low-resolution image.

21. The apparatus of claim 20, the adjustment module being specifically configured to:

22. The apparatus of claim 20, the adjustment module being specifically configured to:

23. The apparatus of claim 20, the adjustment module being specifically configured to:

24. The apparatus of claim 21 or 23, reducing an image resolution of the video frame image, comprising:

improving the image resolution of the video frame image, comprising:

25. The apparatus of claim 21 or 23, reducing a video frame rate of the video frame image, comprising:

increasing a video frame rate of the video frame image, comprising:

26. The device of claim 18, the super-resolution capability having a processable resolution in a range of 360 x 360 to 1280 x 1280.

27. A video real-time transmission device is applied to a receiving end; the device comprises:

The first reconstruction module is used for carrying out super-resolution reconstruction on the low-resolution image based on a super-resolution algorithm to obtain a high-resolution image corresponding to the low-resolution image, and determining the high-resolution image as the video frame image;

28. The device of claim 27, the set of image processing capabilities of the communication group being an intersection of the first set of image processing capabilities and the second set of image processing capabilities.

29. The apparatus of claim 27, the first reconstruction module being specifically configured to:

30. The apparatus of claim 29, the hyper-model being a dual-attention mechanism based hyper-model.

31. The apparatus of claim 29, the hyper-segmentation model comprising a sub-pixel convolution layer for rearranging pixels in a feature image.

32. The apparatus of claim 29, the apparatus further comprising:

a segmentation module for segmenting the low resolution image into a plurality of image blocks prior to inputting the low resolution image into the super-resolution model;

the classifying module is used for inputting each image block into a texture classifying model so that the texture classifying model classifies the image block in texture; the texture classification model is obtained by performing supervised training based on an image sample; the image sample is marked with a label corresponding to the texture complexity;

The first reconstruction module is specifically configured to:

33. The apparatus of claim 32, the apparatus further comprising:

and the second reconstruction module is used for carrying out super-resolution reconstruction on each image block with low texture complexity according to the texture classification result based on the linear interpolation algorithm.

34. The apparatus of claim 32, the apparatus further comprising:

the training module is used for carrying out joint training on the texture classification model and the superdivision model until the joint loss function converges; wherein the joint loss function is a weighted sum of the first loss function and the second loss function; the first loss function is a loss function corresponding to the texture classification model; the second loss function is a loss function corresponding to the superdivision model.

35. A medium having stored thereon a computer program which, when executed by a processor, implements the method of any of claims 1-9 or 10-17.

36. A computing device, comprising:

a processor;

a memory for storing a processor executable program;

wherein the processor is configured to implement the method of any one of claims 1-9 or 10-17 by running the executable program.